Assignment server delivers "wrong" unit

Moderators: Site Moderators, FAHC Science Team

Post Reply
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Assignment server delivers "wrong" unit

Post by anko1 »

Not sure if this should be here or in the WU forum, but I have a dual core running SMP (this is the Sager lap top) and got assigned a 3043, which there is no way I can finish in the 1.5 day preferred deadline. I thought someone might want to look into the assignment logic. Here's the log:

Code: Select all

[13:51:24] Sending work to server
[13:51:24] Project: 2653 (Run 36, Clone 113, Gen 114)


[13:51:24] + Attempting to send results [February 23 13:51:24 UTC]
[13:51:24] - Reading file work/wuresults_04.dat from core
[13:51:24]   (Read 5521987 bytes from disk)
[13:51:24] Connecting to http://171.64.65.64:8080/
[13:51:43] Posted data.
[13:51:43] Initial: 0000; - Uploaded at ~283 kB/s
[13:51:43] - Averaged speed for that direction ~315 kB/s
[13:51:43] + Results successfully sent
[13:51:43] Thank you for your contribution to Folding@Home.
[13:51:43] + Number of Units Completed: 43

[13:51:47] - Warning: Could not delete all work unit files (4): Core returned invalid code
[13:51:47] Trying to send all finished work units
[13:51:47] + No unsent completed units remaining.
[13:51:47] - Preparing to get new work unit...
[13:51:47] + Attempting to get work packet
[13:51:47] - Will indicate memory of 2046 MB
[13:51:47] - Connecting to assignment server
[13:51:47] Connecting to http://assign.stanford.edu:8080/
[13:51:48] Posted data.
[13:51:48] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[13:51:48] + News From Folding@Home: Welcome to Folding@Home
[13:51:48] Loaded queue successfully.
[13:51:48] Connecting to http://171.64.65.63:8080/
[13:51:48] Posted data.
[13:51:48] Initial: 0000; - Receiving payload (expected size: 283401)
[13:51:49] - Downloaded at ~276 kB/s
[13:51:49] - Averaged speed for that direction ~718 kB/s
[13:51:49] + Received work.
[13:51:49] Trying to send all finished work units
[13:51:49] + No unsent completed units remaining.
[13:51:49] + Closed connections
[13:51:49] 
[13:51:49] + Processing work unit
[13:51:49] Work type a1 not eligible for variable processors
[13:51:49] Core required: FahCore_a1.exe
[13:51:49] Core found.
[13:51:49] Using generic mpiexec calls
[13:51:49] Working on queue slot 05 [February 23 13:51:49 UTC]
[13:51:49] + Working ...
[13:51:49] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 05 -checkpoint 15 -verbose -lifeline 3864 -version 623'

[13:51:49] 
[13:51:49] *------------------------------*
[13:51:49] Folding@Home Gromacs SMP Core
[13:51:49] Version 1.74 (March 10, 2007)
[13:51:49] 
[13:51:49] Preparing to commence simulation
[13:51:49] - Ensuring status. Please wait- Created dyn
[13:51:49] - Files status OK
[13:51:50] - Expanded 282889 -> 1508541 (decompressed 533.2 percent)
[13:51:50] - Starting from initial work packet
[13:51:50] 
[13:51:50] Project: 3043 (Run 2, Clone 54, Gen 40)
[13:51:50] 
[13:51:50] Assembly optimizations on if available.
[13:51:50] Entering M.D.
[13:52:07] 2 percent)
[13:52:07] - Starting from initial work packet
[13:52:07] 
[13:52:07] Project: 3Entering M.D.
[13:52:07] one 54, Gen 40)
[13:52:07] 
[13:52:07] Entering M.D.
[13:52:13] cal files
[13:52:13] ocal files
[13:52:13] Extra SSE boost OK.
[13:52:13] ocal files
[13:52:13] Extra SSE boost OK.
[14:07:14] int triggered.
[14:22:14] Timered checkpoint triggered.
[14:36:31] Writing local files
[14:36:31] Completed 100000 out of 10000000 steps  (1 percent)
[14:51:31] Timered checkpoint triggered.
[15:06:32] Timered checkpoint triggered.
[15:20:05] Writing local files
[15:20:05] Completed 200000 out of 10000000 steps  (2 percent)
Normally I get 2653s and 2665s which process fine.
susato
Site Moderator
Posts: 511
Joined: Fri Nov 30, 2007 4:57 am
Location: Team MacOSX
Contact:

Re: Assignment server delivers "wrong" unit

Post by susato »

You have the right forum because the source of this work unit is a server problem. When the duallie SMP server is down (171.67.65.56) - and only when it's down - dual-core machines are shifted to server 171.64.65.63 which usually provisions quad-core Folding machines. The PG knows that some duallies can't finish these units, but all in all it's better to provide oversize work units temporarily than to have no work available at all.

At the rate I see on your FAHlog.txt (just between frame 1 and frame 2 writes), you have about a 43.57 minute frame time and will finish the unit in 3.025 days. It's now nearly 11 hours later and you should have a better idea if it's going to finish before the final deadline.

A unit that will surely fail to complete by the *final* deadline may legitimately be discarded. To go this route:
- stop Folding
- delete the work folder
- delete queue.dat
- run the client with the -configonly flag, say "yes" to accepting advanced configuration options, and change the machineID.
- while you're at it, check to see that your client is the most up-to-date, and delete your FahCore_xx.exe files to force download of the latest versions when the cores are next needed.

Then when you start Folding again, Stanford will assign you a different work unit instead of the same one over again.

Best wishes for success in getting a 2653 or 2665-series WU this time!
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Assignment server delivers "wrong" unit

Post by anko1 »

Thanks very much for your input and advice, esp. the "blessing" to dispose of the impossible unit. ;-)
susato
Site Moderator
Posts: 511
Joined: Fri Nov 30, 2007 4:57 am
Location: Team MacOSX
Contact:

Re: Assignment server delivers "wrong" unit

Post by susato »

So, no chance of completing by the final deadline? That's a shame.

Keep in mind that even if you don't complete a unit by the preferred deadline (1.5 days for this unit) , but can complete it by the final deadline (3 days for this unit), it may still be well worth completing because:

- the next person to get it may also fail to complete it by the preferred deadline meaning your late-completed one would be turned in first, and would spawn the next generation
- work units aren't always reassigned immediately upon expiry of the preferred deadline - if they have a backup it may take several more hours, so a WU returned late may arrive before it gets to the head of the reassignment queue.

Keep in mind that both of those conditions are more easily satisfied in the wake of a server problem
Also, if you have already made a substantial time investment in a WU oversized for your machine, you may get just as good a PPD from completing and returning it as you would by deleting it and picking up a smaller WU.

I've edited "*final* " into the post above to clarify what I intended - really, it's only in truly hopeless cases that I recommended deleting a WU poorly sized for a machine.
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Assignment server delivers "wrong" unit

Post by anko1 »

I've got to go check that machine when I get a chance to see if it completed on time (including final deadline). Since I discovered it after a substantial investment in time, I was going to allow it to complete to see if I hit the preferred deadline. Thanks on your perspective of whether to allow a unit to continue to complete if you can hit the final deadline but not the preferred deadline. I'd always wondered what the preferred course of action is. Based on your comments, I'll let the one's that will complete by the final deadline continue.

"- work units aren't always reassigned immediately upon expiry of the preferred deadline - if they have a backup it may take several more hours, so a WU returned late may arrive before it gets to the head of the reassignment queue."

Would this be one of the reasons for multiple completions of the same WU? I've noticed that occasionally mods will report that a WU has been completed several times.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Assignment server delivers "wrong" unit

Post by bruce »

anko1 wrote:Would this be one of the reasons for multiple completions of the same WU? I've noticed that occasionally mods will report that a WU has been completed several times.
One significant reason for multiple completions is when a WU passes the Preferred Deadline, a second copy is issued, and then both are returned. I'm not sure if there are other situations leading to the same result. If there are, I don't understand them.
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Assignment server delivers "wrong" unit

Post by anko1 »

Finished the unit just in the nick of time:

Code: Select all

[09:59:44] Completed 10000000 out of 10000000 steps  (100 percent)
[09:59:44] Writing final coordinates.
[09:59:44] Past main M.D. loop
[09:59:44] Will end MPI now
[10:00:44] 
[10:00:44] Finished Work Unit:
[10:00:44] - Reading up to 232560 from "work/wudata_05.arc": Read 232560
[10:00:44] - Reading up to 13725504 from "work/wudata_05.xtc": Read 13725504
[10:00:44] goefile size: 0
[10:00:44] logfile size: 274036
[10:00:44] Leaving Run
[10:00:47] - Writing 14632456 bytes of core data to disk...
[10:00:48]   ... Done.
[10:00:48] - Failed to delete work/wudata_05.sas
[10:00:48] - Failed to delete work/wudata_05.goe
[10:00:48] Warning:  check for stray files
[10:00:48] - Shutting down core
[10:02:48] 
[10:02:48] Folding@home Core Shutdown: FINISHED_UNIT
[10:02:48] 
[10:02:48] Folding@home Core Shutdown: FINISHED_UNIT
[10:02:50] CoreStatus = 64 (100)
[10:02:50] Unit 5 finished with 5 percent of time to deadline remaining.
[10:02:50] Updated performance fraction: 0.436823
[10:02:50] Sending work to server
[10:02:50] Project: 3043 (Run 2, Clone 54, Gen 40)


[10:02:50] + Attempting to send results [February 26 10:02:50 UTC]
[10:02:50] - Reading file work/wuresults_05.dat from core
[10:02:50]   (Read 14632456 bytes from disk)
[10:02:50] Connecting to http://171.64.65.63:8080/
[10:04:05] Posted data.
[10:04:05] Initial: 0000; - Uploaded at ~190 kB/s
[10:04:05] - Averaged speed for that direction ~290 kB/s
[10:04:05] + Results successfully sent
[10:04:05] Thank you for your contribution to Folding@Home.
Oddly enough, just a quirk of timing I'm sure, but this dual core pulled a 3065 following the 3043. Otoh, my quads have been getting 2653s and 2665s.
toTOW
Site Moderator
Posts: 6309
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Assignment server delivers "wrong" unit

Post by toTOW »

I have both dual and quad cores, and I haven't seen any p30xx for ages ... they're only folding p2653 and p2665 (mostly 2653 in the last couple of weeks).
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
anko1
Posts: 438
Joined: Mon Dec 03, 2007 1:31 am
Hardware configuration: Old Faithful CPU: Windows Graphical 5.03; Intel Pentium 4 Processor 540
(3.2GHz) HT;Windows XP
Big Red: Windows SMP Console 6.29; Windows GPU console 6.20r1; Intel Q9450 2.66G; ASUS P5Q 775 P45; [BFG 9800GTX+ old graphics card] NVidia GeForce 8800 GTX [as of 5/9/09]; Windows XP Pro SP3
Lenovo Think Pad: Windows 6.29 w/ SMP; Windows GPU Console 6.20r1 systray; Intel QX9300; NVIDIA Quadro FX-3700M; Windows XP Professional
Location: SF Peninsula

Re: Assignment server delivers "wrong" unit

Post by anko1 »

I'm just lucky, I guess. ;-)
Post Reply