Project 3062 (Run 2, Clone 105, Gen 4) - EUE@24%

Moderators: Site Moderators, FAHC Science Team

Post Reply
DocJonz
Posts: 243
Joined: Thu Dec 06, 2007 6:31 pm
Hardware configuration: Folding with: 4x RTX 4070Ti, 1x RTX 3070
Location: United Kingdom
Contact:

Project 3062 (Run 2, Clone 105, Gen 4) - EUE@24%

Post by DocJonz »

Had an EUE, 0x7b Error with this WU on a long running C2Q machine (no OC). :cry:

Code: Select all

[04:56:57] - Starting from initial work packet
[04:56:57] 
[04:56:57] Project: 3062 (Run 2, Clone 105, Gen 4)
[04:56:57] 
[04:56:57] Assembly optimizations on if available.
[04:56:57] Entering M.D.
[04:56:57] itial work packet
[04:56:57] 
[04:56:57] Project: 3062 (Run 2, Clone 105, Gen 4)
[04:56:57] 
[04:56:57] Assembly optimizations on if available.
[04:56:57] Entering M.D.
[04:57:04] Protein: p3062_lambda5_99sb
[04:57:04] Writing local files
[04:57:10] Extra SSE boost OK.
[05:15:38] ed 50000 out of 5000000 steps  (1 percent)
[05:34:34] Writing local files
[05:34:34] Completed 100000 out of 5000000 steps  (2 percent)
[05:53:48] Writing local files
[05:53:48] Completed 150000 out of 5000000 steps  (3 percent)
[06:12:53] Writing local files
[06:12:53] Completed 200000 out of 5000000 steps  (4 percent)
[06:31:49] Writing local files
[06:31:49] Completed 250000 out of 5000000 steps  (5 percent)
[06:50:46] Writing local files
[06:50:46] Completed 300000 out of 5000000 steps  (6 percent)
[07:10:07] Writing local files
[07:10:08] Completed 350000 out of 5000000 steps  (7 percent)
[07:29:04] Writing local files
[07:29:04] Completed 400000 out of 5000000 steps  (8 percent)
[07:47:58] Writing local files
[07:47:58] Completed 450000 out of 5000000 steps  (9 percent)
[08:06:51] Writing local files
[08:06:51] Completed 500000 out of 5000000 steps  (10 percent)
[08:25:46] Writing local files
[08:25:46] Completed 550000 out of 5000000 steps  (11 percent)
[08:44:39] Writing local files
[08:44:39] Completed 600000 out of 5000000 steps  (12 percent)
[09:03:35] Writing local files
[09:03:35] Completed 650000 out of 5000000 steps  (13 percent)
[09:22:30] Writing local files
[09:22:30] Completed 700000 out of 5000000 steps  (14 percent)
[09:41:27] Writing local files
[09:41:27] Completed 750000 out of 5000000 steps  (15 percent)
[10:00:16] Writing local files
[10:00:16] Completed 800000 out of 5000000 steps  (16 percent)
[10:21:02] Writing local files
[10:21:03] Completed 850000 out of 5000000 steps  (17 percent)
[10:40:02] Writing local files
[10:40:02] Completed 900000 out of 5000000 steps  (18 percent)
[10:59:03] Writing local files
[10:59:03] Completed 950000 out of 5000000 steps  (19 percent)
[11:18:03] Writing local files
[11:18:03] Completed 1000000 out of 5000000 steps  (20 percent)
[11:37:02] Writing local files
[11:37:02] Completed 1050000 out of 5000000 steps  (21 percent)
[11:55:59] Writing local files
[11:55:59] Completed 1100000 out of 5000000 steps  (22 percent)
[12:14:57] Writing local files
[12:14:57] Completed 1150000 out of 5000000 steps  (23 percent)
[12:33:56] Writing local files
[12:33:56] Completed 1200000 out of 5000000 steps  (24 percent)
[12:47:05] Gromacs cannot continue further.
[12:47:05] Going to send back what have done.
[12:47:05] logfile size: 39226
[12:47:05] - Writing 39762 bytes of core data to disk...
[12:47:05]   ... Done.
[12:47:05] - Failed to delete work/wudata_08.arc
[12:47:05] No C.P. to delete.
[12:47:05] - Failed to delete work/wudata_08.sas
[12:47:05] - Failed to delete work/wudata_08.goe
[12:47:05] - Failed to delete work/wudata_08.pdo
[12:47:05] Warning:  check for stray files
[12:49:05] 
[12:49:05] Folding@home Core Shutdown: EARLY_UNIT_END
[12:49:05] 
[12:49:05] Folding@home Core Shutdown: EARLY_UNIT_END
[12:49:08] CoreStatus = 7B (123)
[12:49:08] Client-core communications error: ERROR 0x7b
[12:49:08] Deleting current work unit & continuing...
[12:51:49] - Preparing to get new work unit...
[12:51:49] + Attempting to get work packet
[12:51:49] - Connecting to assignment server
[12:51:51] - Successful: assigned to (171.64.65.63).
[12:51:51] + News From Folding@Home: Welcome to Folding@Home
[12:51:51] Loaded queue successfully.
[12:52:01] + Closed connections
[12:52:06] 
[12:52:06] + Processing work unit
[12:52:06] Core required: FahCore_a1.exe
[12:52:06] Core found.
[12:52:06] Working on Unit 09 [May 21 12:52:06]
[12:52:06] + Working ...
[12:52:07] 
[12:52:07] *------------------------------*
[12:52:07] Folding@Home Gromacs SMP Core
[12:52:07] Version 1.74 (March 10, 2007)
[12:52:07] 
[12:52:07] Preparing to commence simulation
[12:52:07] - Ensuring status. Please wait.
[12:52:24] - Assembly optimizations manually forced on.
[12:52:24] - Not checking prior termination.
[12:52:24] - Expanded 609531 -> 3263133 (decompressed 535.3 percent)
[12:52:24] No C.P. to delete.
[12:52:24] - Failed to delete - Failed to delete - Failed to delete - Assembly optimizations on if available.
[12:52:24] Entering M.D.
[12:52:24] rWarning:  check for stray files
[12:52:24] - Starting from i
[12:52:24] Project: 3062 (Run
[12:52:24] Project: 3062 (Run 2, Assembly optimizatiAssembly optimizations on if available.
[12:52:24] Entering M.D.
[12:52:24] ations on if available.
[12:52:24] Entering M.D.
[12:52:31]  boost OK.
[12:52:31] Writing local files
[12:52:31] Extra SSE boost OK.
[12:52:31] Writing local files
[12:52:31] Completed 0 out of 5000000 steps  (0 percent)
[13:11:26] Writing local files
[13:11:26] Completed 50000 out of 5000000 steps  (1 percent)
[13:30:27] Writing local files
[13:30:27] Completed 100000 out of 5000000 steps  (2 percent)
[13:49:28] Writing local files
[13:49:29] Completed 150000 out of 5000000 steps  (3 percent)
[14:08:30] Writing local files
[14:08:30] Completed 200000 out of 5000000 steps  (4 percent)
[14:27:32] Writing local files
[14:27:32] Completed 250000 out of 5000000 steps  (5 percent)
[14:46:33] Writing local files
[14:46:33] Completed 300000 out of 5000000 steps  (6 percent)
[15:05:34] Writing local files
[15:05:34] Completed 350000 out of 5000000 steps  (7 percent)
[15:24:35] Writing local files
[15:24:35] Completed 400000 out of 5000000 steps  (8 percent)
[15:43:36] Writing local files
[15:43:37] Completed 450000 out of 5000000 steps  (9 percent)
[16:02:35] Writing local files
[16:02:35] Completed 500000 out of 5000000 steps  (10 percent)
[16:22:08] Writing local files
[16:22:08] Completed 550000 out of 5000000 steps  (11 percent)
[16:41:09] Writing local files
[16:41:09] Completed 600000 out of 5000000 steps  (12 percent)
[17:00:10] Writing local files
[17:00:10] Completed 650000 out of 5000000 steps  (13 percent)
[17:19:11] Writing local files
[17:19:11] Completed 700000 out of 5000000 steps  (14 percent)
[17:38:30] Writing local files
[17:38:30] Completed 750000 out of 5000000 steps  (15 percent)
[17:57:39] Writing local files
[17:57:39] Completed 800000 out of 5000000 steps  (16 percent)
[18:17:00] Writing local files
[18:17:00] Completed 850000 out of 5000000 steps  (17 percent)
[18:36:02] Writing local files
[18:36:02] Completed 900000 out of 5000000 steps  (18 percent)
[18:55:02] Writing local files
[18:55:02] Completed 950000 out of 5000000 steps  (19 percent)
[19:14:18] Writing local files
[19:14:18] Completed 1000000 out of 5000000 steps  (20 percent)
[19:33:30] Writing local files
[19:33:30] Completed 1050000 out of 5000000 steps  (21 percent)
[19:53:31] Writing local files
[19:53:32] Completed 1100000 out of 5000000 steps  (22 percent)
[20:12:52] Writing local files
[20:12:52] Completed 1150000 out of 5000000 steps  (23 percent)
[20:31:52] Writing local files
[20:31:52] Completed 1200000 out of 5000000 steps  (24 percent)
[20:45:01] Gromacs cannot continue further.
[20:45:01] Going to send back what have done.
[20:45:01] logfile size: 39226
[20:45:01] - Writing 39762 bytes of core data to disk...
[20:45:01]   ... Done.
[20:45:01] - Failed to delete work/wudata_09.xtc
[20:45:01] No C.P. to delete.
[20:45:01] - Failed to delete work/wudata_09.goe
[20:45:01] - Failed to delete work/wudata_09.pdo
[20:45:01] - Failed to delete work/wudata_09.xvg
[20:45:01] Warning:  check for stray files
[20:47:01] 
[20:47:01] Folding@home Core Shutdown: EARLY_UNIT_END
[20:47:01] 
[20:47:01] Folding@home Core Shutdown: EARLY_UNIT_END
[20:47:05] CoreStatus = 7B (123)
[20:47:05] Client-core communications error: ERROR 0x7b
[20:47:05] Deleting current work unit & continuing...
[20:49:25] - Preparing to get new work unit...
[20:49:25] + Attempting to get work packet
[20:49:25] - Connecting to assignment server
[20:49:26] - Successful: assigned to (171.64.65.64).
[20:49:26] + News From Folding@Home: Welcome to Folding@Home
[20:49:26] Loaded queue successfully.
[20:51:16] + Closed connections
[20:51:21] 
[20:51:21] + Processing work unit
[20:51:21] Core required: FahCore_a1.exe
[20:51:21] Core found.
[20:51:21] Working on Unit 00 [May 21 20:51:21]
[20:51:21] + Working ...
[20:51:21] 
[20:51:21] *------------------------------*
[20:51:21] Folding@Home Gromacs SMP Core
[20:51:21] Version 1.74 (March 10, 2007)
[20:51:21] 
[20:51:21] Preparing to commence simulation
[20:51:21] - Ensuring status. Please wait.
[20:51:38] - Assembly optimizations manually forced on.
[20:51:38] - Not checking prior termination.
[20:51:43] - Expanded 2449201 -> 12890333 (decompressed 526.3 percent)
[20:51:43] - Starting from initial work packet
[20:51:43] 
[20:51:43] Project: 2653 (Run 13, Clone 28, Gen 66)
[20:51:43] 
[20:51:43] Assembly optimizations on if available.
[20:51:43] Entering M.D.
[20:51:50] Rejecting checkpoint
[20:51:51] Protein: Protein in POPCExtra SSE boost OK.
[20:51:51] 
[20:51:52] Extra SSE boost OK.
[20:51:53] Writing local files
[20:51:53] Completed 0 out of 500000 steps  (0 percent)
[21:07:46] Writing local files
[21:07:46] Completed 5000 out of 500000 steps  (1 percent)
Folding Stats (HFM.NET): DocJonz Folding Farm Stats
toTOW
Site Moderator
Posts: 6309
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project 3062 (Run 2, Clone 105, Gen 4) - EUE@24%

Post by toTOW »

You should try qfix and see if it helps ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
anandhanju
Posts: 526
Joined: Mon Dec 03, 2007 4:33 am
Location: Australia

Re: Project 3062 (Run 2, Clone 105, Gen 4) - EUE@24%

Post by anandhanju »

If you get it again, try stopping your client at around 20% and then restart it. Hopefully, it should go beyond the failure point safely.
nwkelley
Pande Group Member
Posts: 57
Joined: Wed May 14, 2008 9:43 pm

Re: Project 3062 (Run 2, Clone 105, Gen 4) - EUE@24%

Post by nwkelley »

client core communication errors are "business as usual" on SMP projects, hopefully one of these options will work (ie receiving partial credit from qfix) thanks for your post, good luck and let us know if you have further troubles with it.
nick
DocJonz
Posts: 243
Joined: Thu Dec 06, 2007 6:31 pm
Hardware configuration: Folding with: 4x RTX 4070Ti, 1x RTX 3070
Location: United Kingdom
Contact:

Re: Project 3062 (Run 2, Clone 105, Gen 4) - EUE@24%

Post by DocJonz »

toTOW wrote:You should try qfix and see if it helps ...
At the end of the log I posted, it said that it deleted the WU and started a new one - I'm asuming, therefore, that qfix isn't going to help in this case :(
anandhanju wrote:If you get it again, try stopping your client at around 20% and then restart it. Hopefully, it should go beyond the failure point safely.
This would be a good thing to try - unfortunately I won't necessarily be around at the right time to give it a go ... maybe one day I'll catch one :wink:
nwkelley wrote:client core communication errors are "business as usual" on SMP projects, hopefully one of these options will work (ie receiving partial credit from qfix) thanks for your post, good luck and let us know if you have further troubles with it.
nick
Are these 'client core comms' errors always going to be there on the SMP? Or can we expect the client-writing whizzo's at Stanford to make this a thing of the past - I'm hoping the latter :D
Folding Stats (HFM.NET): DocJonz Folding Farm Stats
toTOW
Site Moderator
Posts: 6309
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project 3062 (Run 2, Clone 105, Gen 4) - EUE@24%

Post by toTOW »

DocJonz wrote:
toTOW wrote:You should try qfix and see if it helps ...
At the end of the log I posted, it said that it deleted the WU and started a new one - I'm asuming, therefore, that qfix isn't going to help in this case :(
If you see this message, something has been written ... look in your /work folder, and if you find a wuresult_xx.dat, qfix has something to fix ;) :

Code: Select all

[20:45:01] Going to send back what have done.
[20:45:01] logfile size: 39226
[20:45:01] - Writing 39762 bytes of core data to disk...
[20:45:01]   ... Done.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply