Can't stop client without losing WU

Can't stop client without losing WU

Postby Gemini Cricket » Sun Jul 05, 2009 5:11 am

Recently I have lost two WUs from my MacBook (2.4 GHz C2D, 4 GB 1067 MHz DDR3, OS X 10.5.7) after stopping the client, then restarting it. The first time, about a week and a half ago involved the need to install some sort of Apple update and restart the computer, and the 2nd time, two days ago, was when I wanted to take my laptop home, so I needed to stop the client before disconnecting the ethernet cable. Both times, the WUs being processed were over 80% complete. At any rate, I press (cloverleaf, period) to stop the client, as usual, and wait a few seconds before actually clicking the close box on the terminal window. As far as I can tell at that point, there is nothing out of the ordinary. However, when I go to restart the client, I get a "CoreStatus = 1 (1)" line in my log, it attempts to send the still incomplete wu to the server, and fails. Furthermore, even after re-downloading the same WU, it is unable to process it, and enters a futile cycle repeatedly downloading workunits, but being unable to process them. The first list below shows the termination of the client, and the second shows the trouble it experiences when I try to start the client back up. I didn't used to run into this problem, but now it has happened the last two times I stopped the client. And honestly, it's more than a bit discouraging after doing everything I am supposed to do to preserve a WU before restarting or unplugging the ethernet cable. Aside from discarding the work folder and queue.dat file to allow me to download a WU and process it, what do I need to do to be able to close down the client without losing the WU?

Code: Select all
[04:48:08] Completed 182500 out of 250000 steps  (73%)
[04:53:08] - Autosending finished units... [July 3 04:53:08 UTC]
[04:53:08] Trying to send all finished work units
[04:53:08] + No unsent completed units remaining.
[04:53:08] - Autosend completed
[05:03:33] Completed 185000 out of 250000 steps  (74%)
[05:19:43] Completed 187500 out of 250000 steps  (75%)
[05:36:18] Completed 190000 out of 250000 steps  (76%)
[05:53:21] Completed 192500 out of 250000 steps  (77%)
[06:09:27] Completed 195000 out of 250000 steps  (78%)
[06:25:29] Completed 197500 out of 250000 steps  (79%)
[06:41:11] Completed 200000 out of 250000 steps  (80%)
[06:55:53] Completed 202500 out of 250000 steps  (81%)
[07:10:26] Completed 205000 out of 250000 steps  (82%)
[07:25:00] Completed 207500 out of 250000 steps  (83%)
[07:39:37] Completed 210000 out of 250000 steps  (84%)
[07:54:46] Completed 212500 out of 250000 steps  (85%)
[08:09:41] Completed 215000 out of 250000 steps  (86%)
[08:09:49] ***** Got an Activate signal (2)
[08:09:49] Killing all core threads

Folding@Home Client Shutdown.
[/list]

[list]--- Opening Log file [July 3 12:50:08 UTC]


# Mac OS X SMP Console Edition ################################################
###############################################################################

                       Folding@Home Client Version 6.24beta

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /Users/name/FAH
Executable: ./fah6
Arguments: -local -smp -(selfcensored) -forceasm -verbosity 9

[12:50:08] - Ask before connecting: No
[12:50:08] - User name: Gemini_Cricket (Team 14)
[12:50:08] - User ID: 7D4C7D011A787F2C
[12:50:08] - Machine ID: 1
[12:50:08]
[12:50:08] Loaded queue successfully.
[12:50:08] - Preparing to get new work unit...
[12:50:08] + Attempting to get work packet
[12:50:08] - Will indicate memory of 4096 MB
[12:50:08] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 6
[12:50:08] - Connecting to assignment server
[12:50:08] - Autosending finished units... [July 3 12:50:08 UTC]
[12:50:08] Connecting to http://assign.stanford.edu:8080/
[12:50:08] Trying to send all finished work units
[12:50:08] + No unsent completed units remaining.
[12:50:08] - Autosend completed
[12:50:09] Posted data.
[12:50:09] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[12:50:09] + News From Folding@Home: Welcome to Folding@Home
[12:50:09] Loaded queue successfully.
[12:50:09] Connecting to http://171.64.65.56:8080/
[12:50:18] Posted data.
[12:50:18] Initial: 0000; - Receiving payload (expected size: 4843907)
[12:50:22] - Downloaded at ~1182 kB/s
[12:50:22] - Averaged speed for that direction ~646 kB/s
[12:50:22] + Received work.
[12:50:23] + Closed connections
[12:50:23]
[12:50:23] + Processing work unit
[12:50:23] At least 4 processors must be requested.Core required: FahCore_a2.exe
[12:50:23] Core found.
[12:50:23] - Using generic ./mpiexec
[12:50:23] Working on queue slot 04 [July 3 12:50:23 UTC]
[12:50:23] + Working ...
[12:50:23] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 04 -checkpoint 15 -forceasm -verbose -lifeline 44743 -version 624'

[12:50:23]
[12:50:23] *------------------------------*
[12:50:23] Folding@Home Gromacs SMP Core
[12:50:23] Version 2.07 (Sun Apr 19 14:29:51 PDT 2009)
[12:50:23]
[12:50:23] Preparing to commence simulation
[12:50:23] - Ensuring status. Please wait.
[12:50:32] - Assembly optimizations manually forced on.
[12:50:32] - Not checking prior termination.
[12:50:34] - Expanded 4843395 -> 24023789 (decompressed 496.0 percent)
[12:50:35] Called DecompressByteArray: compressed_data_size=4843395 data_size=24023789, decompressed_data_size=24023789 diff=0
[12:50:35] - Digital signature verified
[12:50:35]
[12:50:35] Project: 2677 (Run 9, Clone 50, Gen 20)
[12:50:35]
[12:50:35] Assembly optimizations on if available.
[12:50:35] Entering M.D.
[12:50:47] CoreStatus = 1 (1)
[12:50:47] Sending work to server
[12:50:47] Project: 2677 (Run 9, Clone 50, Gen 20)
[12:50:47] - Error: Could not get length of results file work/wuresults_04.dat
[12:50:47] - Error: Could not read unit 04 file. Removing from queue.
[12:50:47] Trying to send all finished work units
[12:50:47] + No unsent completed units remaining.
[12:50:47] - Preparing to get new work unit...
[12:50:47] + Attempting to get work packet
[12:50:47] - Will indicate memory of 4096 MB
[12:50:47] - Connecting to assignment server
[12:50:47] Connecting to http://assign.stanford.edu:8080/
[12:50:48] Posted data.
[12:50:48] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[12:50:48] + News From Folding@Home: Welcome to Folding@Home
[12:50:48] Loaded queue successfully.
[12:50:48] Connecting to http://171.64.65.56:8080/
[12:50:54] Posted data.
[12:50:54] Initial: 0000; - Receiving payload (expected size: 4843907)
[12:50:59] - Downloaded at ~946 kB/s
[12:50:59] - Averaged speed for that direction ~706 kB/s
[12:50:59] + Received work.
[12:50:59] Trying to send all finished work units
[12:50:59] + No unsent completed units remaining.
[12:50:59] + Closed connections
[12:51:04]
[12:51:04] + Processing work unit
[12:51:04] At least 4 processors must be requested.Core required: FahCore_a2.exe
[12:51:04] Core found.
[12:51:04] - Using generic ./mpiexec
[12:51:04] Working on queue slot 05 [July 3 12:51:04 UTC]
[12:51:04] + Working ...
[12:51:04] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 05 -checkpoint 15 -forceasm -verbose -lifeline 44743 -version 624'

[12:51:04]
[12:51:04] *------------------------------*
[12:51:04] Folding@Home Gromacs SMP Core
[12:51:04] Version 2.07 (Sun Apr 19 14:29:51 PDT 2009)
[12:51:04]
[12:51:04] Preparing to commence simulation
[12:51:04] - Ensuring status. Please wait.
[12:51:13] - Assembly optimizations manually forced on.
[12:51:13] - Not checking prior termination.
[12:51:16] - Expanded 4843395 -> 24023789 (decompressed 496.0 percent)
[12:51:16] Called DecompressByteArray: compressed_data_size=4843395 data_size=24023789, decompressed_data_size=24023789 diff=0
[12:51:16] - Digital signature verified
[12:51:16]
[12:51:16] Project: 2677 (Run 9, Clone 50, Gen 20)
[12:51:16]
[12:51:16] Assembly optimizations on if available.
[12:51:16] Entering M.D.
[12:51:28] CoreStatus = 1 (1)
[12:51:28] Sending work to server
[12:51:28] Project: 2677 (Run 9, Clone 50, Gen 20)
[12:51:28] - Error: Could not get length of results file work/wuresults_05.dat
[12:51:28] - Error: Could not read unit 05 file. Removing from queue.
[12:51:28] Trying to send all finished work units
[12:51:28] + No unsent completed units remaining.
[12:51:28] - Preparing to get new work unit...
[12:51:28] + Attempting to get work packet
[12:51:28] - Will indicate memory of 4096 MB
[12:51:28] - Connecting to assignment server
[12:51:28] Connecting to http://assign.stanford.edu:8080/
[12:51:29] Posted data.
[12:51:29] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[12:51:29] + News From Folding@Home: Welcome to Folding@Home
[12:51:29] Loaded queue successfully.
[12:51:29] Connecting to http://171.64.65.56:8080/
[12:51:34] Posted data.
[12:51:34] Initial: 0000; - Receiving payload (expected size: 4843907)
[12:51:39] - Downloaded at ~946 kB/s
[12:51:39] - Averaged speed for that direction ~754 kB/s
[12:51:39] + Received work.
[12:51:40] Trying to send all finished work units
[12:51:40] + No unsent completed units remaining.
[12:51:40] + Closed connections
[12:51:45]
[12:51:45] + Processing work unit
[12:51:45] At least 4 processors must be requested.Core required: FahCore_a2.exe
[12:51:45] Core found.
[12:51:45] - Using generic ./mpiexec
[12:51:45] Working on queue slot 06 [July 3 12:51:45 UTC]
[12:51:45] + Working ...
[12:51:45] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 06 -checkpoint 15 -forceasm -verbose -lifeline 44743 -version 624'


ad infinitum . . .
Last edited by toTOW on Sun Jul 05, 2009 10:32 am, edited 1 time in total.
Reason: Added code tags.
Gemini Cricket
 
Posts: 11
Joined: Sat Dec 08, 2007 5:41 am
Location: Tsuchiura, Japan

Re: Can't stop client without losing WU

Postby AZBrandon » Sun Jul 05, 2009 10:01 pm

At risk of embarrassing myself, have you tried it with the "-verbosity 9" removed? I found the same exact problem on Ubuntu 9.04 and at the suggestion of other members in the linux forum, removed -verbosity 9 entirely and suddenly I can stop and restart with no problem now. I realize that is linux and you're MacOS, but it seems the FAH software is similar, and looking at your log, it seems the error is VERY similar to what I was experiencing. I figure it's worth a try.
AZBrandon
 
Posts: 225
Joined: Sat Jan 17, 2009 1:43 am

Re: Can't stop client without losing WU

Postby Gemini Cricket » Mon Jul 06, 2009 2:26 am

Thanks, AZBrandon. I'll give it a whirl. If that fixes it, we'll have identified a wide-spread and serious bug in the client.
Gemini Cricket
 
Posts: 11
Joined: Sat Dec 08, 2007 5:41 am
Location: Tsuchiura, Japan

Re: Can't stop client without losing WU

Postby susato » Mon Jul 06, 2009 3:09 pm

Since you're already running from the Terminal, can you run the client with the -delete xx flag, where xx is the troubled work unit? This should get rid of the unit at that queue position and possibly enable you to start fresh with a new WU at a new queue position.
susato
Site Moderator
 
Posts: 950
Joined: Fri Nov 30, 2007 4:57 am
Location: Team MacOSX

Re: Can't stop client without losing WU

Postby Gemini Cricket » Sat Jul 11, 2009 6:35 am

Thanks susato and again to AZBrandon. Yes, using the delete xx flag was necessary to start folding again, because deleting the queue.dat file and work folder did not solve the problem. But now I am folding on the MacBook again, am able to stop the client without losing the WU, and have turned in one so far, with a 2nd nearing completion. In fairness, I can't conclude that omitting the "verbosity 9" flag was important, but I have done so all the same, and so this is anecdotal support for AZBrandon's suggestion.
Gemini Cricket
 
Posts: 11
Joined: Sat Dec 08, 2007 5:41 am
Location: Tsuchiura, Japan


Return to Mac OS X Beta

Who is online

Users browsing this forum: No registered users and 0 guests