Early unit ends on 4114 units

Moderators: Site Moderators, FAHC Science Team

Early unit ends on 4114 units

Postby Pette Broad » Tue Dec 04, 2007 5:31 pm

I only started getting these yesterday in fact I'm only getting sent 41xx units regardless of flags/settings on my Windows system. So far, 2 have already failed with Lincs errors, the first EUE's I've had for a while and the first that either of the 2 machines have had.

[08:32:51] Project: 4114 (Run 6, Clone 12, Gen 2)
[08:32:51]
[08:32:51] Assembly optimizations on if available.
[08:32:51] Entering M.D.
[08:33:12] (Starting from checkpoint)
[08:33:12] Protein: p4114_villinwt_folded_Amber03_Native
[08:33:12]
[08:33:12] Writing local files
[08:33:12] Completed 295729 out of 1500000 steps (20)
[08:33:12] Extra SSE boost OK.
[08:38:50] Writing local files
--------------------------------------------------------------------
[11:36:27] Writing local files
[11:36:27] Completed 435000 out of 1500000 steps (29)
[11:56:12] Writing local files
[11:56:12] Completed 450000 out of 1500000 steps (30)
[12:12:57] Quit 101 - Fatal error:
[12:12:57] Step 462742, time 925.484 (ps) LINCS WARNING
[12:12:57] relative constraint deviation after LINCS:
[12:12:57] max 0.001563 (between atoms 535 and 536) rms 0.000125

[11:42:20] Project: 4114 (Run 55, Clone 16, Gen 0)
[11:42:20]
[11:42:20] Assembly optimizations on if available.
[11:42:20] Entering M.D.
[11:42:26] Protein: p4114_villinwt_folded_Amber03_Native
[11:42:26]
[11:42:26] Writing local files
[11:42:26] Extra SSE boost OK.
[11:42:26] Writing local files
[11:42:26] Completed 0 out of 1500000 steps (0)
[11:51:18] Writing local files
[11:51:18] Completed 15000 out of 1500000 steps (1)
[12:00:20] Writing local files
[12:00:20] Completed 30000 out of 1500000 steps (2)
[12:09:24] Writing local file
---------------------------------------------------------------------
[13:40:08] Completed 195000 out of 1500000 steps (13)
[13:49:13] Writing local files
[13:49:13] Completed 210000 out of 1500000 steps (14)
[13:53:33] Quit 101 - Fatal error:
[13:53:33] Step 217186, time 434.372 (ps) LINCS WARNING
[13:53:33] relative constraint deviation after LINCS:
[13:53:33] max 0.001475 (between atoms 535 and 536) rms 0.000113

Pete
Pette Broad
 
Posts: 128
Joined: Mon Dec 03, 2007 10:38 pm
Location: Chester U.K

Postby John_Weatherman » Wed Dec 05, 2007 2:36 pm

Hi!
A LINCS WARNING is nothing to worry about. It's when the WU reaches a point of not being able to continue, and sends back the results.
User avatar
John_Weatherman
 
Posts: 289
Joined: Sun Dec 02, 2007 5:31 am
Location: Carrizo Plain National Monument, California

Postby PeterA » Wed Dec 05, 2007 9:49 pm

Interesting. As soon as I read this tread, I get a 4115 WU. Wish me luck. :D
User avatar
PeterA
 
Posts: 59
Joined: Mon Dec 03, 2007 3:28 am
Location: Blaine, MN

Postby Pette Broad » Thu Dec 06, 2007 12:03 am

No other problems to report, have completed a few 4114's and about 30 41xx's without any further EUE's.

Pete
Pette Broad
 
Posts: 128
Joined: Mon Dec 03, 2007 10:38 pm
Location: Chester U.K

Postby sortofageek » Thu Dec 06, 2007 12:30 am

Thus far I have received two 4114s. One completed successfully and the other was an EUE like yours. Don't worry about EUEs if they are occasional.

[19:35:34] Working on Unit 04 [December 3 19:35:34]
[19:35:34] + Working ...
[19:35:34] - Calling 'FahCore_81.exe -dir work/ -suffix 04 -checkpoint 15 -forceasm -verbose -lifeline 2560 -version 600'


[19:35:34]
[19:35:34] *------------------------------*
[19:35:34] Folding@Home Gromacs Simulated Tempering Core
[19:35:34] Version 1.10 (Oct 4, 2007)
[19:35:34]
[19:35:34] Preparing to commence simulation
[19:35:34] - Assembly optimizations manually forced on.
[19:35:34] - Not checking prior termination.
[19:35:34] - Expanded 469347 -> 2292782 (decompressed 488.5 percent)
[19:35:35] - Starting from initial work packet
[19:35:35]
[19:35:35] Project: 4114 (Run 15, Clone 0, Gen 1)
[19:35:35]
[19:35:35] Assembly optimizations on if available.
[19:35:35] Entering M.D.
[19:35:41] Protein: p4114_villinwt_folded_Amber03_Native
[19:35:41]
[19:35:41] Writing local files
[19:35:41] Extra SSE boost OK.
[19:35:41] Writing local files
[19:35:41] Completed 0 out of 1500000 steps (0)
[19:44:07] Writing local files
[19:44:07] Completed 15000 out of 1500000 steps (1)
[19:52:36] Writing local files
[19:52:36] Completed 30000 out of 1500000 steps (2)

::SNIP::


[23:37:17] - Autosending finished units...
[23:37:17] Trying to send all finished work units
[23:37:17] + No unsent completed units remaining.
[23:37:17] - Autosend completed
[23:40:51] Writing local files
[23:40:51] Completed 435000 out of 1500000 steps (29)
[23:42:26] Quit 101 - Fatal error:
[23:42:26] Step 437900, time 875.8 (ps) LINCS WARNING
[23:42:26] relative constraint deviation after LINCS:
[23:42:26] max 0.002359 (between atoms 112 and 115) rms 0.000149
[23:42:26]
[23:42:26] Simulation instability has been encountered. The run has entered a
[23:42:26] state from which no further progress can be made.
[23:42:26] This may be the correct result of the simulation, however if you
[23:42:26] often see other project units terminating early like this
[23:42:27] too, you may wish to check the stability of your computer (issues
[23:42:27] such as high temperature, overclocking, etc.).
[23:42:27] Going to send back what have done.
[23:42:27] logfile size: 18195
[23:42:27] - Writing 18875 bytes of core data to disk...
[23:42:27] Done: 18363 -> 3432 (compressed to 18.6 percent)
[23:42:27] ... Done.
[23:42:27]
[23:42:27] Folding@home Core Shutdown: EARLY_UNIT_END
[23:42:29] CoreStatus = 72 (114)
[23:42:29] Sending work to server
[23:42:29] - Read packet limit of 540015616... Set to 524286976.


[23:42:29] + Attempting to send results
[23:42:29] - Reading file work/wuresults_04.dat from core
[23:42:29] (Read 3944 bytes from disk)
[23:42:29] Connecting to http://171.64.65.111:8080/
[23:42:29] Posted data.
[23:42:29] Initial: 0000; Conversation time very short, giving reduced weight in bandwidth avg
[23:42:29] - Uploaded at ~9 kB/s
[23:42:29] - Averaged speed for that direction ~151 kB/s
[23:42:29] + Results successfully sent
User avatar
sortofageek
Site Admin
 
Posts: 3111
Joined: Fri Nov 30, 2007 9:06 pm
Location: Team Helix

Postby sortofageek » Thu Dec 06, 2007 12:32 am

Pette Broad wrote:No other problems to report, have completed a few 4114's and about 30 41xx's without any further EUE's.

Pete


You also got partial credit for the partial one you did and, although several tried to finish it, nobody yet has been successful. :)

Hi Pette_Broad (team 33258),
Your WU (P4114 R55 C16 G0) was added to the stats database on 2007-12-04 06:30:11 for 15.49 points of credit.
User avatar
sortofageek
Site Admin
 
Posts: 3111
Joined: Fri Nov 30, 2007 9:06 pm
Location: Team Helix

Postby James121 » Thu Dec 06, 2007 6:41 pm

Hmmmmmm
I have a p1429_POLYQBETA-STI GROMACS core running
(actually running under JaneH ID team 48053)
It has been running for weeks and is now 374/500
With a prediction of 10 more days to finish. (on a 1.7 ghz machine)
Will this job ever finish? Is this a realistic length of time for it to take?
James121
Top Cats (Team #48053)
http://janesflowers.topcities.com/topcats/
James121
 
Posts: 4
Joined: Sun Dec 02, 2007 7:30 pm

Postby 7im » Thu Dec 06, 2007 7:02 pm

fahinfo.org is a good site to answer questions like that. Users submit they performance numbers, and then you can search on the averages by various things like Project #, processor type, etc.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
User avatar
7im
 
Posts: 10189
Joined: Thu Nov 29, 2007 5:30 pm
Location: Arizona

Postby Pette Broad » Fri Dec 07, 2007 2:29 am

Yeah, they take like forever, luckily not too many of them about :(. Just over 200 hours on an AMD 64 3700+ if memory serves me correctly.


Pete
Pette Broad
 
Posts: 128
Joined: Mon Dec 03, 2007 10:38 pm
Location: Chester U.K

Postby James121 » Fri Dec 07, 2007 2:34 am

Pette Broad wrote:Yeah, they take like forever, luckily not too many of them about :(. Just over 200 hours on an AMD 64 3700+ if memory serves me correctly.


Pete


Thank you very much Pete
James121
 
Posts: 4
Joined: Sun Dec 02, 2007 7:30 pm

Postby efishy » Fri Dec 07, 2007 4:15 pm

Noticed that I have folded bunch of 41XX WUs. I do also notice that the points are below par (average of 60-85 PPD -- The highest on 3.2GHz Intel Pentium 4 with 2GB memory).

I sure miss the old forum where I can bring up issue easily. :)
efishy
 
Posts: 13
Joined: Fri Dec 07, 2007 8:47 am

Postby Pette Broad » Sat Dec 08, 2007 1:23 am

efishy wrote:Noticed that I have folded bunch of 41XX WUs. I do also notice that the points are below par (average of 60-85 PPD -- The highest on 3.2GHz Intel Pentium 4 with 2GB memory).

I sure miss the old forum where I can bring up issue easily. :)


I've also had a load of these. I've been too busy to check out the PPD but at first glance it seems to be on the low side.

Yes, 110 ppd on a 4103 and that's on a 4400+ with a mere 512mb Ram. Acceptable , but that machine gets 120-125 ppd on 304x units.

Pete
Pette Broad
 
Posts: 128
Joined: Mon Dec 03, 2007 10:38 pm
Location: Chester U.K

Postby Pette Broad » Thu Dec 13, 2007 2:33 am

Very High EUE rate on 4113......70% of the ones I've had so far have failed all with LINKS errors. Apart from 4114 which has a 50% failure rate the other WU's in the 41xx series look to be completing O.K.

Pete
Pette Broad
 
Posts: 128
Joined: Mon Dec 03, 2007 10:38 pm
Location: Chester U.K

Postby Biskquik » Sun Dec 16, 2007 5:10 am

Was getting a little nervous because I haven't had an EUE in a while.

Dell Inspiron E1505
Intel Core 2 Duo T5500
2GB DDR2 667 (corsair)

Dunno if this'll help but here's the 4113 log:


[16:40:04] Working on Unit 01 [December 14 16:40:04]
[16:40:04] + Working ...
[16:40:04] - Calling 'FahCore_81.exe -dir work/ -suffix 01 -checkpoint 15 -forceasm -verbose -lifeline 2084 -version 600'

[16:40:05] *------------------------------*
[16:40:05] Folding@Home Gromacs Simulated Tempering Core
[16:40:05] Version 1.10 (Oct 4, 2007)
[16:40:05]
[16:40:05] Preparing to commence simulation
[16:40:05] - Assembly optimizations manually forced on.
[16:40:05] - Not checking prior termination.
[16:40:05] - Expanded 740593 -> 3703559 (decompressed 500.0 percent)
[16:40:05] - Starting from initial work packet
[16:40:05]
[16:40:05] Project: 4113 (Run 72, Clone 8, Gen 1)
[16:40:05]
[16:40:05] Assembly optimizations on if available.
[16:40:05] Entering M.D.
[16:40:11] Protein: p4113_villinwt_big_folded_Amber03_Native
[16:40:11]
[16:40:11] Writing local files
[16:40:11] Extra SSE boost OK.
[16:40:11] Writing local files
[16:40:12] Completed 0 out of 1500000 steps (0)
[16:56:05] Timered checkpoint triggered.
[17:00:42] Writing local files
[17:00:42] Completed 15000 out of 1500000 steps (1)
[17:16:42] Timered checkpoint triggered.
[17:21:11] Writing local files

..............................................................................

[13:42:21] (Starting from checkpoint)
[13:42:21] Protein: p4113_villinwt_big_folded_Amber03_Native
[13:42:21]
[13:42:21] Writing local files
[13:42:21] Completed 386128 out of 1500000 steps (26)
[13:42:21] Extra SSE boost OK.
[13:48:47] Writing local files
[13:48:47] Completed 390000 out of 1500000 steps (26)
[14:04:48] Timered checkpoint triggered.
[14:09:50] Writing local files
[14:09:50] Completed 405000 out of 1500000 steps (27)

..............................................................................

[16:16:59] Completed 480000 out of 1500000 steps (32)
[16:28:01] Quit 101 - Fatal error:
[16:28:01] Step 485551, time 971.102 (ps) LINCS WARNING
[16:28:01] relative constraint deviation after LINCS:
[16:28:01] max 0.001745 (between atoms 112 and 113) rms 0.000136
[16:28:01]
[16:28:01] Simulation instability has been encountered. The run has entered a
[16:28:01] state from which no further progress can be made.
[16:28:01] This may be the correct result of the simulation, however if you
[16:28:01] often see other project units terminating early like this
[16:28:01] too, you may wish to check the stability of your computer (issues
[16:28:01] such as high temperature, overclocking, etc.).
[16:28:01] Going to send back what have done.
[16:28:01] logfile size: 24762
[16:28:01] - Writing 25444 bytes of core data to disk...
[16:28:01] Done: 24932 -> 4065 (compressed to 16.3 percent)
[16:28:01] ... Done.
[16:28:01]
[16:28:01] Folding@home Core Shutdown: EARLY_UNIT_END

Thanks.
Biskquik
 
Posts: 2
Joined: Sun Dec 16, 2007 5:01 am
Location: West Michigan

Postby Pette Broad » Mon Dec 17, 2007 2:36 am

Yeah, the failures I've had have all been around the 30 step mark.

Pete
Pette Broad
 
Posts: 128
Joined: Mon Dec 03, 2007 10:38 pm
Location: Chester U.K


Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 2 guests

cron