Page 1 of 1

Early unit ends on 4114 units

Posted: Tue Dec 04, 2007 4:31 pm
by Pette Broad
I only started getting these yesterday in fact I'm only getting sent 41xx units regardless of flags/settings on my Windows system. So far, 2 have already failed with Lincs errors, the first EUE's I've had for a while and the first that either of the 2 machines have had.

[08:32:51] Project: 4114 (Run 6, Clone 12, Gen 2)
[08:32:51]
[08:32:51] Assembly optimizations on if available.
[08:32:51] Entering M.D.
[08:33:12] (Starting from checkpoint)
[08:33:12] Protein: p4114_villinwt_folded_Amber03_Native
[08:33:12]
[08:33:12] Writing local files
[08:33:12] Completed 295729 out of 1500000 steps (20)
[08:33:12] Extra SSE boost OK.
[08:38:50] Writing local files
--------------------------------------------------------------------
[11:36:27] Writing local files
[11:36:27] Completed 435000 out of 1500000 steps (29)
[11:56:12] Writing local files
[11:56:12] Completed 450000 out of 1500000 steps (30)
[12:12:57] Quit 101 - Fatal error:
[12:12:57] Step 462742, time 925.484 (ps) LINCS WARNING
[12:12:57] relative constraint deviation after LINCS:
[12:12:57] max 0.001563 (between atoms 535 and 536) rms 0.000125

[11:42:20] Project: 4114 (Run 55, Clone 16, Gen 0)
[11:42:20]
[11:42:20] Assembly optimizations on if available.
[11:42:20] Entering M.D.
[11:42:26] Protein: p4114_villinwt_folded_Amber03_Native
[11:42:26]
[11:42:26] Writing local files
[11:42:26] Extra SSE boost OK.
[11:42:26] Writing local files
[11:42:26] Completed 0 out of 1500000 steps (0)
[11:51:18] Writing local files
[11:51:18] Completed 15000 out of 1500000 steps (1)
[12:00:20] Writing local files
[12:00:20] Completed 30000 out of 1500000 steps (2)
[12:09:24] Writing local file
---------------------------------------------------------------------
[13:40:08] Completed 195000 out of 1500000 steps (13)
[13:49:13] Writing local files
[13:49:13] Completed 210000 out of 1500000 steps (14)
[13:53:33] Quit 101 - Fatal error:
[13:53:33] Step 217186, time 434.372 (ps) LINCS WARNING
[13:53:33] relative constraint deviation after LINCS:
[13:53:33] max 0.001475 (between atoms 535 and 536) rms 0.000113

Pete

Posted: Wed Dec 05, 2007 1:36 pm
by John_Weatherman
Hi!
A LINCS WARNING is nothing to worry about. It's when the WU reaches a point of not being able to continue, and sends back the results.

Posted: Wed Dec 05, 2007 8:49 pm
by PeterA
Interesting. As soon as I read this tread, I get a 4115 WU. Wish me luck. :D

Posted: Wed Dec 05, 2007 11:03 pm
by Pette Broad
No other problems to report, have completed a few 4114's and about 30 41xx's without any further EUE's.

Pete

Posted: Wed Dec 05, 2007 11:30 pm
by sortofageek
Thus far I have received two 4114s. One completed successfully and the other was an EUE like yours. Don't worry about EUEs if they are occasional.

[19:35:34] Working on Unit 04 [December 3 19:35:34]
[19:35:34] + Working ...
[19:35:34] - Calling 'FahCore_81.exe -dir work/ -suffix 04 -checkpoint 15 -forceasm -verbose -lifeline 2560 -version 600'


[19:35:34]
[19:35:34] *------------------------------*
[19:35:34] Folding@Home Gromacs Simulated Tempering Core
[19:35:34] Version 1.10 (Oct 4, 2007)
[19:35:34]
[19:35:34] Preparing to commence simulation
[19:35:34] - Assembly optimizations manually forced on.
[19:35:34] - Not checking prior termination.
[19:35:34] - Expanded 469347 -> 2292782 (decompressed 488.5 percent)
[19:35:35] - Starting from initial work packet
[19:35:35]
[19:35:35] Project: 4114 (Run 15, Clone 0, Gen 1)
[19:35:35]
[19:35:35] Assembly optimizations on if available.
[19:35:35] Entering M.D.
[19:35:41] Protein: p4114_villinwt_folded_Amber03_Native
[19:35:41]
[19:35:41] Writing local files
[19:35:41] Extra SSE boost OK.
[19:35:41] Writing local files
[19:35:41] Completed 0 out of 1500000 steps (0)
[19:44:07] Writing local files
[19:44:07] Completed 15000 out of 1500000 steps (1)
[19:52:36] Writing local files
[19:52:36] Completed 30000 out of 1500000 steps (2)

::SNIP::


[23:37:17] - Autosending finished units...
[23:37:17] Trying to send all finished work units
[23:37:17] + No unsent completed units remaining.
[23:37:17] - Autosend completed
[23:40:51] Writing local files
[23:40:51] Completed 435000 out of 1500000 steps (29)
[23:42:26] Quit 101 - Fatal error:
[23:42:26] Step 437900, time 875.8 (ps) LINCS WARNING
[23:42:26] relative constraint deviation after LINCS:
[23:42:26] max 0.002359 (between atoms 112 and 115) rms 0.000149
[23:42:26]
[23:42:26] Simulation instability has been encountered. The run has entered a
[23:42:26] state from which no further progress can be made.
[23:42:26] This may be the correct result of the simulation, however if you
[23:42:26] often see other project units terminating early like this
[23:42:27] too, you may wish to check the stability of your computer (issues
[23:42:27] such as high temperature, overclocking, etc.).
[23:42:27] Going to send back what have done.
[23:42:27] logfile size: 18195
[23:42:27] - Writing 18875 bytes of core data to disk...
[23:42:27] Done: 18363 -> 3432 (compressed to 18.6 percent)
[23:42:27] ... Done.
[23:42:27]
[23:42:27] Folding@home Core Shutdown: EARLY_UNIT_END
[23:42:29] CoreStatus = 72 (114)
[23:42:29] Sending work to server
[23:42:29] - Read packet limit of 540015616... Set to 524286976.


[23:42:29] + Attempting to send results
[23:42:29] - Reading file work/wuresults_04.dat from core
[23:42:29] (Read 3944 bytes from disk)
[23:42:29] Connecting to http://171.64.65.111:8080/
[23:42:29] Posted data.
[23:42:29] Initial: 0000; Conversation time very short, giving reduced weight in bandwidth avg
[23:42:29] - Uploaded at ~9 kB/s
[23:42:29] - Averaged speed for that direction ~151 kB/s
[23:42:29] + Results successfully sent

Posted: Wed Dec 05, 2007 11:32 pm
by sortofageek
Pette Broad wrote:No other problems to report, have completed a few 4114's and about 30 41xx's without any further EUE's.

Pete
You also got partial credit for the partial one you did and, although several tried to finish it, nobody yet has been successful. :)

Hi Pette_Broad (team 33258),
Your WU (P4114 R55 C16 G0) was added to the stats database on 2007-12-04 06:30:11 for 15.49 points of credit.

Posted: Thu Dec 06, 2007 5:41 pm
by James121
Hmmmmmm
I have a p1429_POLYQBETA-STI GROMACS core running
(actually running under JaneH ID team 48053)
It has been running for weeks and is now 374/500
With a prediction of 10 more days to finish. (on a 1.7 ghz machine)
Will this job ever finish? Is this a realistic length of time for it to take?

Posted: Thu Dec 06, 2007 6:02 pm
by 7im
fahinfo.org is a good site to answer questions like that. Users submit they performance numbers, and then you can search on the averages by various things like Project #, processor type, etc.

Posted: Fri Dec 07, 2007 1:29 am
by Pette Broad
Yeah, they take like forever, luckily not too many of them about :(. Just over 200 hours on an AMD 64 3700+ if memory serves me correctly.


Pete

Posted: Fri Dec 07, 2007 1:34 am
by James121
Pette Broad wrote:Yeah, they take like forever, luckily not too many of them about :(. Just over 200 hours on an AMD 64 3700+ if memory serves me correctly.


Pete

Thank you very much Pete

Posted: Fri Dec 07, 2007 3:15 pm
by efishy
Noticed that I have folded bunch of 41XX WUs. I do also notice that the points are below par (average of 60-85 PPD -- The highest on 3.2GHz Intel Pentium 4 with 2GB memory).

I sure miss the old forum where I can bring up issue easily. :)

Posted: Sat Dec 08, 2007 12:23 am
by Pette Broad
efishy wrote:Noticed that I have folded bunch of 41XX WUs. I do also notice that the points are below par (average of 60-85 PPD -- The highest on 3.2GHz Intel Pentium 4 with 2GB memory).

I sure miss the old forum where I can bring up issue easily. :)
I've also had a load of these. I've been too busy to check out the PPD but at first glance it seems to be on the low side.

Yes, 110 ppd on a 4103 and that's on a 4400+ with a mere 512mb Ram. Acceptable , but that machine gets 120-125 ppd on 304x units.

Pete

Posted: Thu Dec 13, 2007 1:33 am
by Pette Broad
Very High EUE rate on 4113......70% of the ones I've had so far have failed all with LINKS errors. Apart from 4114 which has a 50% failure rate the other WU's in the 41xx series look to be completing O.K.

Pete

Posted: Sun Dec 16, 2007 4:10 am
by Biskquik
Was getting a little nervous because I haven't had an EUE in a while.

Dell Inspiron E1505
Intel Core 2 Duo T5500
2GB DDR2 667 (corsair)

Dunno if this'll help but here's the 4113 log:


[16:40:04] Working on Unit 01 [December 14 16:40:04]
[16:40:04] + Working ...
[16:40:04] - Calling 'FahCore_81.exe -dir work/ -suffix 01 -checkpoint 15 -forceasm -verbose -lifeline 2084 -version 600'

[16:40:05] *------------------------------*
[16:40:05] Folding@Home Gromacs Simulated Tempering Core
[16:40:05] Version 1.10 (Oct 4, 2007)
[16:40:05]
[16:40:05] Preparing to commence simulation
[16:40:05] - Assembly optimizations manually forced on.
[16:40:05] - Not checking prior termination.
[16:40:05] - Expanded 740593 -> 3703559 (decompressed 500.0 percent)
[16:40:05] - Starting from initial work packet
[16:40:05]
[16:40:05] Project: 4113 (Run 72, Clone 8, Gen 1)
[16:40:05]
[16:40:05] Assembly optimizations on if available.
[16:40:05] Entering M.D.
[16:40:11] Protein: p4113_villinwt_big_folded_Amber03_Native
[16:40:11]
[16:40:11] Writing local files
[16:40:11] Extra SSE boost OK.
[16:40:11] Writing local files
[16:40:12] Completed 0 out of 1500000 steps (0)
[16:56:05] Timered checkpoint triggered.
[17:00:42] Writing local files
[17:00:42] Completed 15000 out of 1500000 steps (1)
[17:16:42] Timered checkpoint triggered.
[17:21:11] Writing local files

..............................................................................

[13:42:21] (Starting from checkpoint)
[13:42:21] Protein: p4113_villinwt_big_folded_Amber03_Native
[13:42:21]
[13:42:21] Writing local files
[13:42:21] Completed 386128 out of 1500000 steps (26)
[13:42:21] Extra SSE boost OK.
[13:48:47] Writing local files
[13:48:47] Completed 390000 out of 1500000 steps (26)
[14:04:48] Timered checkpoint triggered.
[14:09:50] Writing local files
[14:09:50] Completed 405000 out of 1500000 steps (27)

..............................................................................

[16:16:59] Completed 480000 out of 1500000 steps (32)
[16:28:01] Quit 101 - Fatal error:
[16:28:01] Step 485551, time 971.102 (ps) LINCS WARNING
[16:28:01] relative constraint deviation after LINCS:
[16:28:01] max 0.001745 (between atoms 112 and 113) rms 0.000136
[16:28:01]
[16:28:01] Simulation instability has been encountered. The run has entered a
[16:28:01] state from which no further progress can be made.
[16:28:01] This may be the correct result of the simulation, however if you
[16:28:01] often see other project units terminating early like this
[16:28:01] too, you may wish to check the stability of your computer (issues
[16:28:01] such as high temperature, overclocking, etc.).
[16:28:01] Going to send back what have done.
[16:28:01] logfile size: 24762
[16:28:01] - Writing 25444 bytes of core data to disk...
[16:28:01] Done: 24932 -> 4065 (compressed to 16.3 percent)
[16:28:01] ... Done.
[16:28:01]
[16:28:01] Folding@home Core Shutdown: EARLY_UNIT_END

Thanks.

Posted: Mon Dec 17, 2007 1:36 am
by Pette Broad
Yeah, the failures I've had have all been around the 30 step mark.

Pete