Project: 2665 (Run 3, Clone 300, Gen 36)

Moderators: Site Moderators, FAHC Science Team

Post Reply
Zagen30
Posts: 823
Joined: Tue Mar 25, 2008 12:45 am
Hardware configuration: Core i7 3770K @3.5 GHz (not folding), 8 GB DDR3 @2133 MHz, 2xGTX 780 @1215 MHz, Windows 7 Pro 64-bit running 7.3.6 w/ 1xSMP, 2xGPU

4P E5-4650 @3.1 GHz, 64 GB DDR3 @1333MHz, Ubuntu Desktop 13.10 64-bit

Project: 2665 (Run 3, Clone 300, Gen 36)

Post by Zagen30 »

EUE'd on me twice (comp specs in profile):

Code: Select all

[09:06:21] Completed 242500 out of 250000 steps  (97 percent)
[09:14:18] Warning:  long 1-4 interactions
[09:14:21] Quit 101 - NaN detected: (ener[20])
[09:14:21] 
[09:14:21] Simulation instability has been encountered. The run has entered a
[09:14:21]   state from which no further progress can be made.
[09:14:21] This may be the correct result of the simulation, however if you
[09:14:21]   often see other project units terminating early like this
[09:14:21]   too, you may wish to check the stability of your computer (issues
[09:14:21]   such as high temperature, overclocking, etc.).
[09:14:21] Going to send back what have done.
[09:14:21] logfile size: 205208
[09:14:21] - Writing 205758 bytes of core data to disk...
[09:14:21]   ... Done.
[09:14:21] - Failed to delete work/wudata_04.arc
[09:14:21] No C.P. to delete.
[09:14:21] Warning:  check for stray files
[09:14:21] 
[09:14:21] Folding@home Core Shutdown: EARLY_UNIT_END
[09:14:21] 
[09:14:21] Folding@home Core Shutdown: EARLY_UNIT_END

Folding@Home Client Shutdown at user request.

Folding@Home Client Shutdown.

[14:46:41] - Ask before connecting: No
[14:46:41] - User name: Zagen30 (Team 0)
[14:46:41] - User ID: 4B0CBF697DF8B48F
[14:46:41] - Machine ID: 2
[14:46:41] 
[14:46:41] Loaded queue successfully.
[14:46:41] 
[14:46:41] + Processing work unit
[14:46:41] Core required: FahCore_a1.exe
[14:46:41] Core found.
[14:46:41] Working on Unit 04 [August 18 14:46:41]
[14:46:41] + Working ...
[14:46:42] 
[14:46:42] *------------------------------*
[14:46:42] Folding@Home Gromacs SMP Core
[14:46:42] Version 1.74 (March 10, 2007)
[14:46:42] 
[14:46:42] Preparing to commence simulation
[14:46:42] - Ensuring status. Please wait.
[14:46:59] - Looking at optimizations...
[14:46:59] - Working with standard loops on this execution.
[14:46:59] - Previous termination of core was improper.
[14:46:59] - Files status OK
[14:48:59] 
[14:48:59] Folding@home Core Shutdown: MISSING_WORK_FILES
[14:48:59] Finalizing output
[14:49:02] CoreStatus = 1 (1)
[14:49:02] Client-core communications error: ERROR 0x1
[14:49:02] Deleting current work unit & continuing...
[14:51:23] - Preparing to get new work unit...
[14:51:23] + Attempting to get work packet
[14:51:23] - Connecting to assignment server
[14:51:23] - Successful: assigned to (171.64.65.64).
[14:51:23] + News From Folding@Home: Welcome to Folding@Home
[14:51:24] Loaded queue successfully.
[14:51:44] + Closed connections
[14:51:49] 
[14:51:49] + Processing work unit
[14:51:49] Core required: FahCore_a1.exe
[14:51:49] Core found.
[14:51:49] Working on Unit 05 [August 18 14:51:49]
[14:51:49] + Working ...
[14:51:49] 
[14:51:49] *------------------------------*
[14:51:49] Folding@Home Gromacs SMP Core
[14:51:49] Version 1.74 (March 10, 2007)
[14:51:49] 
[14:51:49] Preparing to commence simulation
[14:51:49] - Ensuring status. Please wait.
[14:52:06] - Looking at optimizations...
[14:52:06] - Working with standard loops on this execution.
[14:52:06] - Previous termination of core was improper.
[14:52:06] - Going toatus OK
[14:52:06] ndard loops.
[14:52:06] - Files status OK
[14:52:29] (decompressed 513.4 percent)
[14:52:30] cket
[14:52:30] 
[14:52:30] Project: 2665 (Run 3, Clone 300, Gen 36)
[14:52:30] 
[14:52:30] 65 (Run 3, Clone 300, Gen 36)
[14:52:30] 
[14:52:33] 65 (Run 3, Clone 300, Gen 36)
[14:52:33] 
[14:52:36] Entering M.D.
[14:52:42] Rejecting checkpoint
[14:52:44] 
[14:52:44] Writing local files
[14:52:45] 
[14:52:45] Writing local files
[14:52:56] Extra SSE boost OK.
[14:52:57] Writing local files
[14:52:58] Completed 0 out of 250000 steps  (0 percent)
[15:18:45] Writing local files
[15:18:45] Completed 2500 out of 250000 steps  (1 percent)
[15:42:46] Writing local files
[15:42:46] Completed 5000 out of 250000 steps  (2 percent)
[16:07:45] Writing local files
[16:07:45] Completed 7500 out of 250000 steps  (3 percent)
[16:32:36] Writing local files
[16:32:36] Completed 10000 out of 250000 steps  (4 percent)
[16:58:17] Writing local files
[16:58:17] Completed 12500 out of 250000 steps  (5 percent)
[17:23:16] Writing local files
[17:23:17] Completed 15000 out of 250000 steps  (6 percent)
[17:48:14] Writing local files
[17:48:14] Completed 17500 out of 250000 steps  (7 percent)
[18:13:20] Writing local files
[18:13:21] Completed 20000 out of 250000 steps  (8 percent)
[18:38:26] Writing local files
[18:38:27] Completed 22500 out of 250000 steps  (9 percent)
[19:03:42] Writing local files
[19:03:42] Completed 25000 out of 250000 steps  (10 percent)
[19:31:27] Writing local files
[19:31:27] Completed 27500 out of 250000 steps  (11 percent)
[19:56:57] Writing local files
[19:56:58] Completed 30000 out of 250000 steps  (12 percent)
[20:22:01] Writing local files
[20:22:01] Completed 32500 out of 250000 steps  (13 percent)
[20:46:57] Writing local files
[20:46:57] Completed 35000 out of 250000 steps  (14 percent)
[21:11:49] Writing local files
[21:11:49] Completed 37500 out of 250000 steps  (15 percent)
[21:36:57] Writing local files
[21:36:58] Completed 40000 out of 250000 steps  (16 percent)
[22:01:33] Writing local files
[22:01:34] Completed 42500 out of 250000 steps  (17 percent)
[22:25:26] Writing local files
[22:25:27] Completed 45000 out of 250000 steps  (18 percent)
[22:49:18] Writing local files
[22:49:18] Completed 47500 out of 250000 steps  (19 percent)
[23:13:01] Writing local files
[23:13:01] Completed 50000 out of 250000 steps  (20 percent)
[23:36:45] Writing local files
[23:36:46] Completed 52500 out of 250000 steps  (21 percent)
[00:00:58] Writing local files
[00:00:59] Completed 55000 out of 250000 steps  (22 percent)
[00:35:03] Writing local files
[00:35:03] Completed 57500 out of 250000 steps  (23 percent)
[01:00:16] Writing local files
[01:00:16] Completed 60000 out of 250000 steps  (24 percent)
[01:24:21] Writing local files
[01:24:21] Completed 62500 out of 250000 steps  (25 percent)
[01:48:19] Writing local files
[01:48:19] Completed 65000 out of 250000 steps  (26 percent)
[02:12:13] Writing local files
[02:12:14] Completed 67500 out of 250000 steps  (27 percent)
[02:36:41] Writing local files
[02:36:42] Completed 70000 out of 250000 steps  (28 percent)
[02:38:55] Warning:  long 1-4 interactions
[02:38:56] Gromacs cannot continue further.
[02:38:56] Going to send back what have done.
[02:38:56] logfile size: 18841
[02:38:56] - Writing 19377 bytes of core data to disk...
[02:38:56]   ... Done.
[02:38:56] - Failed to delete work/wudata_05.sas
[02:38:56] - Failed to delete work/wudata_05.goe
[02:38:56] Warning:  check for stray files
[02:38:56] 
[02:38:56] Folding@home Core Shutdown: EARLY_UNIT_END
[02:38:56] 
[02:38:56] Folding@home Core Shutdown: EARLY_UNIT_END
Tried to qfix it after the first failure, but didn't work. After the second failure, qfix fixed the first result, and I was just able to send that one back in, apparently for full credit (the local tally was increased by 1). I guess the original was close enough to completion to be fully valid.

Is it normal for qfix to only work if there's more than 1 WU in the queue?
Image
toTOW
Site Moderator
Posts: 6312
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 2665 (Run 3, Clone 300, Gen 36)

Post by toTOW »

That's because your queue slot is not properly freed (symptom : the MISSING_WORK_FILEs error ... the client deletes work files, but fails (I don't know why) to delete queue informations). If you run -delete XX (where XX is the number in queue of the faulty WU) before running qfix, it would be able to recover it. If qfix finds a result file, but the queue is not empty, it won't fix anything.

For example on your first WU :
* fah.exe -delete 04
* qfix.exe
* fah.exe -send all (or simply restarting the client with your usual shortcut)
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Zagen30
Posts: 823
Joined: Tue Mar 25, 2008 12:45 am
Hardware configuration: Core i7 3770K @3.5 GHz (not folding), 8 GB DDR3 @2133 MHz, 2xGTX 780 @1215 MHz, Windows 7 Pro 64-bit running 7.3.6 w/ 1xSMP, 2xGPU

4P E5-4650 @3.1 GHz, 64 GB DDR3 @1333MHz, Ubuntu Desktop 13.10 64-bit

Re: Project: 2665 (Run 3, Clone 300, Gen 36)

Post by Zagen30 »

Maybe someone should update the FAQ on how to use qfix, because it specifically has the delete XX step after running qfix and sending the results.
Image
toTOW
Site Moderator
Posts: 6312
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 2665 (Run 3, Clone 300, Gen 36)

Post by toTOW »

In fact the issue you're seeing is quite new ... I can't remember if we already saw it with v5.9x clients ... :(
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Zagen30
Posts: 823
Joined: Tue Mar 25, 2008 12:45 am
Hardware configuration: Core i7 3770K @3.5 GHz (not folding), 8 GB DDR3 @2133 MHz, 2xGTX 780 @1215 MHz, Windows 7 Pro 64-bit running 7.3.6 w/ 1xSMP, 2xGPU

4P E5-4650 @3.1 GHz, 64 GB DDR3 @1333MHz, Ubuntu Desktop 13.10 64-bit

Re: Project: 2665 (Run 3, Clone 300, Gen 36)

Post by Zagen30 »

Well you've seen it at least once, since after having troubles with 6.22 MPICH, I rolled back to 5.91.
Image
Post Reply