Project: 3050 (Run 7, Clone 32, Gen 62)

Moderators: Site Moderators, FAHC Science Team

Post Reply
coo-coo
Posts: 8
Joined: Sun Apr 20, 2008 5:35 pm

Project: 3050 (Run 7, Clone 32, Gen 62)

Post by coo-coo »

Consistently getting EUE at 50% on this WU.
Have seen it assigned several times and it always fails in the same spot:

Code: Select all

[11:26:48] - Preparing to get new work unit...
[11:26:48] + Attempting to get work packet
[11:26:48] - Connecting to assignment server
[11:26:49] - Successful: assigned to (171.64.65.63).
[11:26:49] + News From Folding@Home: Welcome to Folding@Home
[11:26:49] Loaded queue successfully.
[11:26:51] + Closed connections
[11:26:51] 
[11:26:51] + Processing work unit
[11:26:51] Core required: FahCore_a1.exe
[11:26:51] Core found.
[11:26:51] Working on Unit 05 [May 3 11:26:51]
[11:26:51] + Working ...
[11:26:51] 
[11:26:51] *------------------------------*
[11:26:51] Folding@Home Gromacs SMP Core
[11:26:51] Version 1.74 (March 10, 2007)
[11:26:51] 
[11:26:51] Preparing to commence simulation
[11:26:51] - Ensuring status. Please wait.
[11:27:08] - Assembly optimizations manually forced on.
[11:27:08] - Not checking prior termination.
[11:27:08] - Expanded 283952 -> 1506689 (decompressed 530.6 percent)
[11:27:08] - Starting from initial work packet
[11:27:08] 
[11:27:08] Project: 3050 (Run 7, Clone 32, Gen 62)
[11:27:08] 
[11:27:08] Assembly optimizations on if available.
[11:27:08] Entering M.D.
[11:27:15] Protein: 9676 p3050_SProtein: 96Writing local files
[11:27:15] Extra SSE boost OK.
[11:27:15] 
[11:27:15] Extra SSE boost OK.
[11:27:15] Writing local files
[11:27:15] Completed 0 out of 10000000 steps  (0 percent)
[11:39:58] Writing local files
[11:39:58] Completed 100000 out of 10000000 steps  (1 percent)
[11:52:53] Writing local files
[11:52:53] Completed 200000 out of 10000000 steps  (2 percent)
<snip>
[21:22:14] Completed 4600000 out of 10000000 steps  (46 percent)
[21:35:06] Writing local files
[21:35:06] Completed 4700000 out of 10000000 steps  (47 percent)
[21:47:57] Writing local files
[21:47:58] Completed 4800000 out of 10000000 steps  (48 percent)
[22:00:52] Writing local files
[22:00:52] Completed 4900000 out of 10000000 steps  (49 percent)
[22:13:46] Writing local files
[22:13:46] Completed 5000000 out of 10000000 steps  (50 percent)
[22:16:31] Warning:  long 1-4 interactions
[22:16:31] Gromacs cannot continue further.
[22:16:31] Going to send back what have done.
[22:16:31] logfile size: 133136
[22:16:31] - Writing 133672 bytes of core data to disk...
[22:16:31]   ... Done.
[22:16:31] - Failed to delete work/wudata_05.xtc
[22:16:31] - Failed to delete work/wudata_05.sas
[22:16:31] - Failed to delete work/wudata_05.goe
[22:16:31] Warning:  check for stray files
[22:16:31] 
[22:16:31] Folding@home Core Shutdown: EARLY_UNIT_END
[22:16:31] 
[22:16:31] Folding@home Core Shutdown: EARLY_UNIT_END
[22:16:36] CoreStatus = 7B (123)
[22:16:36] Client-core communications error: ERROR 0x7b
[22:16:36] Deleting current work unit & continuing...
Anyone else seeing it? Need me to send this to anyone? Seems like it may be a faulty WU.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 3050 [R7, C32, G62]

Post by bruce »

Some WUs are faulty but we have no way of checking if that is one of them or not unless it is reassigned and then completed by someone else.

Some have been able to stop the client just before the error and restart it and it seems to skip the error.

Others have been able to run qfix and report the partial result. The client then moves on to another WU.

Others have simply deleted the WU that they know is going to fail and repeated that process until they got something else. We don't recommend deleting a WU, but in this case it's justified.
coo-coo
Posts: 8
Joined: Sun Apr 20, 2008 5:35 pm

Re: Project: 3050 (Run 7, Clone 32, Gen 62)

Post by coo-coo »

Thank you for the reply, Bruce.
I just let it do what it normally does. This time it retried only once more after the initial EUE.
Then I was assigned another 2653 WU. :)
Perhaps the dreaded thing has died, nevermore to return! :P
coo-coo
Posts: 8
Joined: Sun Apr 20, 2008 5:35 pm

Re: Project: 3050 (Run 7, Clone 32, Gen 62)

Post by coo-coo »

Update:
I am continuing to see this same R:C:G assigned:

Code: Select all

[08:09:47] - Preparing to get new work unit...
[08:09:47] + Attempting to get work packet
[08:09:47] - Connecting to assignment server
[08:09:48] - Successful: assigned to (171.64.65.63).
[08:09:48] + News From Folding@Home: Welcome to Folding@Home
[08:09:48] Loaded queue successfully.
[08:09:50] + Closed connections
[08:09:50] 
[08:09:50] + Processing work unit
[08:09:50] Core required: FahCore_a1.exe
[08:09:50] Core found.
[08:09:50] Working on Unit 01 [May 7 08:09:50]
[08:09:50] + Working ...
[08:09:50] 
[08:09:50] *------------------------------*
[08:09:50] Folding@Home Gromacs SMP Core
[08:09:50] Version 1.74 (March 10, 2007)
[08:09:50] 
[08:09:50] Preparing to commence simulation
[08:09:50] - Ensuring status. Please wait.
[08:10:07] - Assembly optimizations manually forced on.
[08:10:07] - Not checking prior termination.
[08:10:07] - Expanded 283952 -> 1506689 (decompressed 530.6 percent)
[08:10:07] - Starting from initial work packet
[08:10:07] 
[08:10:07] Project: 3050 (Run 7, Clone 32, Gen 62)
[08:10:07] 
[08:10:07] Assembly optimizations on if available.
[08:10:07] Entering M.D.
[08:10:13] Protein: 9676 p3050_SProtein: 96Writing local files
[08:10:13] Extra SSE boost OK.
[08:10:13] 
[08:10:13] Extra SSE boost OK.
[08:10:13] Writing local files
[08:10:13] Completed 0 out of 10000000 steps  (0 percent)
[08:22:19] Writing local files
[08:22:19] Completed 100000 out of 10000000 steps  (1 percent)
[08:35:01] Writing local files
[08:35:01] Completed 200000 out of 10000000 steps  (2 percent)
[08:47:43] Writing local files
[08:47:43] Completed 300000 out of 10000000 steps  (3 percent)
[09:00:23] Writing local files
[09:00:23] Completed 400000 out of 10000000 steps  (4 percent)
<snip>
[17:51:11] Completed 4700000 out of 10000000 steps  (47 percent)
[18:03:29] Writing local files
[18:03:29] Completed 4800000 out of 10000000 steps  (48 percent)
[18:15:47] Writing local files
[18:15:47] Completed 4900000 out of 10000000 steps  (49 percent)
[18:28:06] Writing local files
[18:28:06] Completed 5000000 out of 10000000 steps  (50 percent)
[18:30:43] Warning:  long 1-4 interactions
[18:30:44] Gromacs cannot continue further.
[18:30:44] Going to send back what have done.
[18:30:44] logfile size: 133136
[18:30:44] - Writing 133672 bytes of core data to disk...
[18:30:44]   ... Done.
[18:30:44] - Failed to delete work/wudata_01.sas
[18:30:44] - Failed to delete work/wudata_01.goe
[18:30:44] Warning:  check for stray files
[18:32:44] 
[18:32:44] Folding@home Core Shutdown: EARLY_UNIT_END
[18:32:44] 
[18:32:44] Folding@home Core Shutdown: EARLY_UNIT_END
[18:32:48] CoreStatus = 7B (123)
[18:32:48] Client-core communications error: ERROR 0x7b
[18:32:48] Deleting current work unit & continuing...
[18:35:08] - Preparing to get new work unit...
[18:35:08] + Attempting to get work packet
[18:35:08] - Connecting to assignment server
[18:35:09] - Successful: assigned to (171.64.65.64).
[18:35:09] + News From Folding@Home: Welcome to Folding@Home
[18:35:09] Loaded queue successfully.
[18:35:16] + Closed connections
This WU appears to be faulty. Each failed iteration is taking over 10 hours of wall clock time. It is a shame to waste so many crunching cycles on this. Please advise on an appropriate course of action.
Thanks!
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Project: 3050 (Run 7, Clone 32, Gen 62)

Post by 7im »

If the client doesn't move on to a new WU after 3 failed attempts, delete the WU and move on. ;)
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Post Reply