Error: Could not get length of results file work

Moderators: Site Moderators, PandeGroup

Error: Could not get length of results file work

Postby hutleytj » Sun Feb 27, 2011 5:50 pm

There are many cases where my computer shuts down - for whatever reason - and then, upon restart, and upon opening FAH, I get this kind of dialogue, that there was an "Error: Could not get length of results file work". The result is that the work done is apparently discarded, removed from the queue, and new work starts. Since this is a frequent observation of mine, I am reporting it, so that - perhaps - the software can be made more robust, and this issue eliminated, and the work done is not wasted each time.

regards, Trevor

[17:01:47] Folding@Home Gromacs SMP Core
[17:01:47] Version 2.22 (May 7 2010)
[17:01:47]
[17:01:47] Preparing to commence simulation
[17:01:47] - Looking at optimizations...
[17:01:47] - Files status OK
[17:01:47] - Expanded 1764077 -> 2251569 (decompressed 127.6 percent)
[17:01:47] Called DecompressByteArray: compressed_data_size=1764077 data_size=2251569, decompressed_data_size=2251569 diff=0
[17:01:47] - Digital signature verified
[17:01:47]
[17:01:47] Project: 6057 (Run 0, Clone 32, Gen 308)
[17:01:47]
[17:01:47] Assembly optimizations on if available.
[17:01:47] Entering M.D.
[17:01:54] Using Gromacs checkpoints
[17:01:57] CoreStatus = 0 (0)
[17:01:57] Sending work to server
[17:01:57] Project: 6057 (Run 0, Clone 32, Gen 308)
[17:01:57] - Error: Could not get length of results file work/wuresults_03.dat
[17:01:57] - Error: Could not read unit 03 file. Removing from queue.
[17:01:57] - Preparing to get new work unit...
[17:01:57] Cleaning up work directory
[17:01:57] + Attempting to get work packet
[17:01:57] - Connecting to assignment server
[17:01:58] - Successful: assigned to (171.64.65.54).
[17:01:58] + News From Folding@Home: Welcome to Folding@Home
[17:01:58] Loaded queue successfully.
[17:02:13] + Closed connections
hutleytj
 
Posts: 13
Joined: Sun Feb 27, 2011 5:33 pm

Re: Error: Could not get length of results file work

Postby HendricksSA » Sun Feb 27, 2011 9:20 pm

hutleytj, welcome to the Fold. I know that losing a SMP work unit is frustrating. I don't know if your OS is Windows or Linux but frequent shutdowns could indicate many types of problems. The SMP client is pretty good about surviving orderly shutdowns. However, an unexpected shutdown can mess up even the most robust of software.

The SMP client runs fairly stable on many (millions?) of machines so I feel pretty confident it should not be contributing to your system shutdowns. That said, Folding can work your computer harder than almost any other code so any system weakness will show up quickly. I would suggest you look at PantherX's guide at viewtopic.php?f=55&t=15088#p149620 . You can find useful troubleshooting tips there. Evaluate your hardware thoroughly and hopefully you can find some solution to your shutdown failures. Many smart people hang out here and we will help you with any Folding problems.

The mods may move this topic to a more appropriate location. Edit by Mod: Done.
HendricksSA
 
Posts: 557
Joined: Fri Jun 26, 2009 4:34 am

Re: Error: Could not get length of results file work

Postby bruce » Mon Feb 28, 2011 2:16 am

Welcome to foldingforum.org, hutleytj.

Please back up some and show us more information from FAHlog. Show the messages covering the last part of the WU before the shutdown and the first part of the startup before the part that you did include.

It also helps to debug a problem like this if you add -verbosity 9 to the client's parameters.

Is your computer being shut down intentionally, or is it crashing? (what do you mean by "for whatever reason"?)
bruce
 
Posts: 21420
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Error: Could not get length of results file work

Postby hutleytj » Thu Mar 17, 2011 11:09 am

Thanks for taking the time to reply. I realised from your responses that I should have been clearer.

First, I am definitely NOT thinking or suggesting that FAH is any cause of stability in my system.

Secondly (and I simply repeat what I wrote) "I am reporting it, so that - perhaps - the software can be made more robust, and this issue eliminated, and the work done is not wasted each time".

Thirdly, for clarification: I wrote shutting "for whatever reason" to cover (very infrequent cases of my MacBookPro) shutting down due to OS problems, but mostly "normal" shutdowns of FAH, when I want to close FAH whilst I am working on something where I do not want the cooling fans running all the time. I go to the System Preferences and select "disable folding at home".

I have another example, today. 58% through a folding job. I closed FAH. When I started it, the LOG showed it at 1%, so I read the LOG and found that FAH had

detected an invalid checkpoint. Restarting...

I append a fuller LOG, in case that helps explain WHY this 58% was lost / cpu time wasted.

Perhaps I should mention that I am a 21y power-user of AppleMac, and I have been Folding at Home since 2001 I think, when I got my first Titanium laptop.
I am now running a NOV-08 MacBookPro 2.8 with a 128 SSD, 750/7200 HD, 8GB RAM, OS 10.6.6

regards, Trevor




Code: Select all
[19:12:45] Folding@Home Gromacs SMP Core
[19:12:45] Version 2.22 (May 7 2010)
[19:12:45]
[19:12:45] Preparing to commence simulation
[19:12:45] - Looking at optimizations...
[19:12:45] - Files status OK
[19:12:46] - Expanded 1768017 -> 1971489 (decompressed 111.5 percent)
[19:12:46] Called DecompressByteArray: compressed_data_size=1768017 data_size=1971489, decompressed_data_size=1971489 diff=0
[19:12:46] - Digital signature verified
[19:12:46]
[19:12:46] Project: 6020 (Run 0, Clone 117, Gen 425)
[19:12:46]
[19:12:46] Assembly optimizations on if available.
[19:12:46] Entering M.D.
[19:12:52] Using Gromacs checkpoints
[19:12:52] Resuming from checkpoint
[19:12:52] Verified work/wudata_05.log
[19:12:52] Verified work/wudata_05.trr
[19:12:52] Verified work/wudata_05.edr
[19:12:53] Completed 49545 out of 500000 steps  (9%)
[19:14:04] Completed 50000 out of 500000 steps  (10%)
[19:27:02] Completed 55000 out of 500000 steps  (11%)
[19:45:42] Completed 60000 out of 500000 steps  (12%)
[20:08:25] Completed 65000 out of 500000 steps  (13%)
[20:30:35] Completed 70000 out of 500000 steps  (14%)
[20:56:24] Completed 75000 out of 500000 steps  (15%)
[21:20:05] Completed 80000 out of 500000 steps  (16%)
[21:40:55] Completed 85000 out of 500000 steps  (17%)
[22:00:17] Completed 90000 out of 500000 steps  (18%)
[22:14:57] Completed 95000 out of 500000 steps  (19%)
[22:30:18] Completed 100000 out of 500000 steps  (20%)
[22:45:21] Completed 105000 out of 500000 steps  (21%)
[23:00:52] Completed 110000 out of 500000 steps  (22%)
[23:18:21] Completed 115000 out of 500000 steps  (23%)
[23:35:36] Completed 120000 out of 500000 steps  (24%)
[23:54:03] Completed 125000 out of 500000 steps  (25%)
[00:12:35] Completed 130000 out of 500000 steps  (26%)
[00:29:33] Completed 135000 out of 500000 steps  (27%)
[00:44:12] Completed 140000 out of 500000 steps  (28%)
[00:59:00] Completed 145000 out of 500000 steps  (29%)
[01:13:40] Completed 150000 out of 500000 steps  (30%)
[01:29:36] Completed 155000 out of 500000 steps  (31%)
[01:47:22] Completed 160000 out of 500000 steps  (32%)
[02:07:29] Completed 165000 out of 500000 steps  (33%)
[02:25:29] Completed 170000 out of 500000 steps  (34%)
[02:42:57] Completed 175000 out of 500000 steps  (35%)
[02:57:21] Completed 180000 out of 500000 steps  (36%)
[03:11:50] Completed 185000 out of 500000 steps  (37%)
[03:26:17] Completed 190000 out of 500000 steps  (38%)
[03:42:21] Completed 195000 out of 500000 steps  (39%)
[03:56:47] Completed 200000 out of 500000 steps  (40%)
[04:10:39] Completed 205000 out of 500000 steps  (41%)
[04:24:01] Completed 210000 out of 500000 steps  (42%)
[04:38:08] Completed 215000 out of 500000 steps  (43%)
[04:51:57] Completed 220000 out of 500000 steps  (44%)
[05:24:54] Completed 225000 out of 500000 steps  (45%)
[05:39:55] Completed 230000 out of 500000 steps  (46%)
[05:54:39] Completed 235000 out of 500000 steps  (47%)
[06:09:24] Completed 240000 out of 500000 steps  (48%)
[06:23:29] Completed 245000 out of 500000 steps  (49%)
[06:37:44] Completed 250000 out of 500000 steps  (50%)
[06:52:20] Completed 255000 out of 500000 steps  (51%)
[07:11:04] Completed 260000 out of 500000 steps  (52%)
[07:28:05] Completed 265000 out of 500000 steps  (53%)
[07:42:39] Completed 270000 out of 500000 steps  (54%)
[07:59:18] Completed 275000 out of 500000 steps  (55%)
[08:16:40] Completed 280000 out of 500000 steps  (56%)
[08:31:38] Completed 285000 out of 500000 steps  (57%)
[08:48:41] Completed 290000 out of 500000 steps  (58%)

Folding@Home Client Shutdown.


--- Opening Log file [March 17 08:38:35 UTC]


# Mac OS X SMP Console Edition ################################################
###############################################################################

                       Folding@Home Client Version 6.29r3

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /Users/TJH/Library/Folding@home
Executable: /usr/local/fah/fah6


[08:38:35] - Ask before connecting: No
[08:38:35] - User name: Trevor_Hutley (Team 3072)
[08:38:35] - User ID: 35DA31201ABF20EF
[08:38:35] - Machine ID: 1
[08:38:35]
[08:38:35] Loaded queue successfully.
[08:38:35]
[08:38:35] + Processing work unit
[08:38:35] Core required: FahCore_a3.exe
[08:38:35] Core found.
[08:38:35] Working on queue slot 05 [March 17 08:38:35 UTC]
[08:38:35] + Working ...
[08:38:35]
[08:38:35] *------------------------------*
[08:38:35] Folding@Home Gromacs SMP Core
[08:38:35] Version 2.22 (May 7 2010)
[08:38:35]
[08:38:35] Preparing to commence simulation
[08:38:35] - Ensuring status. Please wait.
[08:38:45] - Looking at optimizations...
[08:38:45] - Working with standard loops on this execution.
[08:38:45] - Previous termination of core was improper.
[08:38:45] - Files status OK
[08:38:45] - Expanded 1768017 -> 1971489 (decompressed 111.5 percent)
[08:38:45] Called DecompressByteArray: compressed_data_size=1768017 data_size=1971489, decompressed_data_size=1971489 diff=0
[08:38:45] - Digital signature verified
[08:38:45]
[08:38:45] Project: 6020 (Run 0, Clone 117, Gen 425)
[08:38:45]
[08:38:45] Entering M.D.
[08:38:51] Using Gromacs checkpoints
[08:38:52] Resuming from checkpoint
[08:38:52] fcSaveRestoreState: I/O failed dir=0, var=0180DF08, varsize=21120
[08:38:52] Can't restore state.fcSaveRestoreState: I/O failed dir=0, var=0180DF08, varsize=21120
[08:38:52] Can't restore state.mdrun returned 3
[08:38:52] Gromacs detected an invalid checkpoint.  Restarting...
[08:38:53] Folding@home Core Shutdown: UNKNOWN_ERROR
[08:38:53] CoreStatus = 62 (98)
[08:38:53] + Restarting core (settings changed)


Mod Edit: Added Code Tags - PantherX
hutleytj
 
Posts: 13
Joined: Sun Feb 27, 2011 5:33 pm

Re: Error: Could not get length of results file work

Postby bruce » Thu Mar 17, 2011 11:47 am

How long was FAH shut down between those two log segments?

We need a Mac expert here, and I am not that person.

Does anybody know what kind of filesystem 10.6.6 uses? If it were Linux, there would be choices. Does OS-X allow similar choices? Is there any kind of option which could assure that of a checkpoint is currently being written, that the shutdown is delayed until that process can be completed? The error message indicates that there's a pretty significant piece of the checkpoint file missing.
bruce
 
Posts: 21420
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Error: Could not get length of results file work

Postby mephistopheles » Thu Mar 17, 2011 1:38 pm

I'm definitely not a Mac expert, but the file system is HFS+.
mephistopheles
 
Posts: 195
Joined: Tue Apr 07, 2009 7:51 am

Re: Error: Could not get length of results file work

Postby Baowoulf » Fri Mar 18, 2011 1:50 am

Have you tried using "Ctrl+C" to close the SMP client instead of going to System Preferences? Or would that just lead to the same missing checkpoint problem?
User avatar
Baowoulf
 
Posts: 532
Joined: Wed Dec 12, 2007 8:44 pm
Location: Jupiter 6

some answers to you questions

Postby hutleytj » Fri Mar 18, 2011 10:47 am

Bruce - you asked "How long was FAH shut down". Maximum 24 hours. Basically, I keep FAH running all the time. I doubt if there is ever more than 24h between activity.

Baowoulf - since the client only operates as a System Preference Pane, I have no other option for closing it down, except to go to System Preferences and click on "Disable Folding@Home".
In previous versions of FAH, that ran as an Application, there were other options.

FileSystem: hierarchical file system plus (HFS+)
hutleytj
 
Posts: 13
Joined: Sun Feb 27, 2011 5:33 pm

Re: Error: Could not get length of results file work

Postby HendricksSA » Sun Mar 20, 2011 7:06 pm

Hutleytj, I was wondering how this is going? Since this isn't getting the attention from the Mac experts here, I would suggest the mods pack this up and relocate it to the Mac forum if the issue isn't resolved. Please let us know. Thanks!
HendricksSA
 
Posts: 557
Joined: Fri Jun 26, 2009 4:34 am

Error: Could not get length of results file work/wuresults_0

Postby hutleytj » Tue Mar 22, 2011 5:16 pm

HendricksSA - the issue continues.

This seems to be the shutdown from when I simply closed my laptop, to come home from work.
The error means that all the previous work is lost, and the task starts again.

I do not think there is anything that I can do about it..... except report it, as something that should not be happening.
Who knows how many hours of cpu time are being wasted if this happens to other Folders !

regards, Trevor


T.

Code: Select all
--- Opening Log file [March 22 15:38:02 UTC]


# Mac OS X SMP Console Edition ################################################
###############################################################################

                       Folding@Home Client Version 6.29r3

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /Users/TJH/Library/Folding@home
Executable: /usr/local/fah/fah6


[15:38:02] - Ask before connecting: No
[15:38:02] - User name: Trevor_Hutley (Team 3072)
[15:38:02] - User ID: 35DA31201ABF20EF
[15:38:02] - Machine ID: 1
[15:38:02]
[15:38:02] Loaded queue successfully.
[15:38:02]
[15:38:02] + Processing work unit
[15:38:02] Core required: FahCore_a3.exe
[15:38:02] Core found.
[15:38:02] Working on queue slot 09 [March 22 15:38:02 UTC]
[15:38:02] + Working ...
[15:38:03]
[15:38:03] *------------------------------*
[15:38:03] Folding@Home Gromacs SMP Core
[15:38:03] Version 2.22 (May 7 2010)
[15:38:03]
[15:38:03] Preparing to commence simulation
[15:38:03] - Ensuring status. Please wait.
[15:38:12] - Looking at optimizations...
[15:38:12] - Working with standard loops on this execution.
[15:38:12] - Previous termination of core was improper.
[15:38:12] - Files status OK
[15:38:13] - Expanded 1764516 -> 2252021 (decompressed 127.6 percent)
[15:38:13] Called DecompressByteArray: compressed_data_size=1764516 data_size=2252021, decompressed_data_size=2252021 diff=0
[15:38:13] - Digital signature verified
[15:38:13]
[15:38:13] Project: 6051 (Run 0, Clone 16, Gen 369)
[15:38:13]
[15:38:13] Entering M.D.
[15:38:19] Using Gromacs checkpoints
[15:38:22] CoreStatus = 0 (0)
[15:38:22] Sending work to server
[15:38:22] Project: 6051 (Run 0, Clone 16, Gen 369)
[15:38:22] - Error: Could not get length of results file work/wuresults_09.dat
[15:38:22] - Error: Could not read unit 09 file. Removing from queue.
[15:38:22] - Preparing to get new work unit...
[15:38:22] Cleaning up work directory
[15:38:22] + Attempting to get work packet
[15:38:22] - Connecting to assignment server
[15:38:24] - Successful: assigned to (171.64.65.54).
[15:38:24] + News From Folding@Home: Welcome to Folding@Home
[15:38:24] Loaded queue successfully.
[15:38:44] + Closed connections
[15:38:49]
[15:38:49] + Processing work unit
[15:38:49] Core required: FahCore_a3.exe
[15:38:49] Core found.
[15:38:49] Working on queue slot 00 [March 22 15:38:49 UTC]
[15:38:49] + Working ...
[15:38:49]
[15:38:49] *------------------------------*
[15:38:49] Folding@Home Gromacs SMP Core
[15:38:49] Version 2.22 (May 7 2010)
[15:38:49]
[15:38:49] Preparing to commence simulation
[15:38:49] - Ensuring status. Please wait.
[15:38:59] - Looking at optimizations...
[15:38:59] - Working with standard loops on this execution.
[15:38:59] - Created dyn
[15:38:59] - Files status OK
[15:39:00] - Expanded 1764516 -> 2252021 (decompressed 127.6 percent)
[15:39:00] Called DecompressByteArray: compressed_data_size=1764516 data_size=2252021, decompressed_data_size=2252021 diff=0
[15:39:00] - Digital signature verified
[15:39:00]
[15:39:00] Project: 6051 (Run 0, Clone 16, Gen 369)
[15:39:00]
[15:39:00] Entering M.D.
[15:39:07] CoreStatus = 0 (0)
[15:39:07] Sending work to server
[15:39:07] Project: 6051 (Run 0, Clone 16, Gen 369)
[15:39:07] - Error: Could not get length of results file work/wuresults_00.dat
[15:39:07] - Error: Could not read unit 00 file. Removing from queue.
[15:39:07] - Preparing to get new work unit...
[15:39:07] Cleaning up work directory
[15:39:07] + Attempting to get work packet
[15:39:07] - Connecting to assignment server
[15:39:08] - Successful: assigned to (171.64.65.54).
[15:39:08] + News From Folding@Home: Welcome to Folding@Home
[15:39:08] Loaded queue successfully.
[15:39:30] + Closed connections
[15:39:35]
[15:39:35] + Processing work unit
[15:39:35] Core required: FahCore_a3.exe
[15:39:35] Core found.
[15:39:35] Working on queue slot 01 [March 22 15:39:35 UTC]
[15:39:35] + Working ...
[15:39:36]
[15:39:36] *------------------------------*
[15:39:36] Folding@Home Gromacs SMP Core
[15:39:36] Version 2.22 (May 7 2010)
[15:39:36]
[15:39:36] Preparing to commence simulation
[15:39:36] - Ensuring status. Please wait.
[15:39:45] - Looking at optimizations...
[15:39:45] - Working with standard loops on this execution.
[15:39:45] - Created dyn
[15:39:45] - Files status OK
[15:39:45] - Expanded 1764516 -> 2252021 (decompressed 127.6 percent)
[15:39:45] Called DecompressByteArray: compressed_data_size=1764516 data_size=2252021, decompressed_data_size=2252021 diff=0
[15:39:45] - Digital signature verified
[15:39:45]
[15:39:45] Project: 6051 (Run 0, Clone 16, Gen 369)
[15:39:45]
[15:39:45] Entering M.D.
[15:39:54] Completed 0 out of 500000 steps  (0%)
[15:57:05] Completed 5000 out of 500000 steps  (1%)
[16:10:43] Completed 10000 out of 500000 steps  (2%)
[16:27:26] Completed 15000 out of 500000 steps  (3%)
[16:43:47] Completed 20000 out of 500000 steps  (4%)
[16:59:49] Completed 25000 out of 500000 steps  (5%)


Mod Edit: Added Code Tags - PantherX
hutleytj
 
Posts: 13
Joined: Sun Feb 27, 2011 5:33 pm

Re: Error: Could not get length of results file work

Postby HendricksSA » Tue Mar 22, 2011 7:44 pm

hutleytj, I don't know squat about Macs. However, I can read the logs. They tell me you are having two errors. Perhaps both are related to some sort of filesystem problem. Your logs about projects 6057 and 6051 show core status 0 errors. The 6020 shows error code 62. In your last log, the 6051 errors out, gets the same work unit again, then errors out. This error occurred without an intervening shutdown. Then it gets the same work unit for the third time and starts processing it. The 6020 errors out after a shutdown.

Again, I'm not a Mac user, but this looks like filesystem corruption. At a minimum, I would recommend you remove everything folding and start over again. While you are at it, I would run a disk checking utility to see if it can fix what is wrong. Good luck, please let us know how it goes.
HendricksSA
 
Posts: 557
Joined: Fri Jun 26, 2009 4:34 am

Re: Error: Could not get length of results file work

Postby susato » Wed Mar 23, 2011 3:22 am

Trevor, you are seeing a well known bug in the OSX V6 client. It's very common for the client to download and then refuse a WU several times, then crunch it successfully. It wastes bandwidth but does not substantially degrade performance in points per day, because the failures occur so rapidly.

We'll see if the forthcoming V7 client will be able to defeat this bug.
susato
Site Moderator
 
Posts: 950
Joined: Fri Nov 30, 2007 4:57 am
Location: Team MacOSX

Re: Error: Could not get length of results file work

Postby Joe_H » Mon Mar 28, 2011 3:04 am

Is anyone looking into this bug as far as you know? I had not run into it until recently. I read this after posting the unit as possibly bad. Since then I have run into it again, after a few tries the unit would start processing and then complete successfully. But so far only on my Core 2 Duo Mac, I have not seen this behavior on my i7 iMac.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 3896
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: Error: Could not get length of results file work

Postby susato » Mon Mar 28, 2011 8:56 pm

I see it all the time on my core 2 Duo mini. Rarely if ever on my dual quad Mac Pro with Xeon CPUs.
It's hard to know how much work was done to address this bug (the Pande Group does not share details of software development) but at present, all development work is directed toward the new v7 client. It appears that this particular bug will be allowed to die a natural death.
susato
Site Moderator
 
Posts: 950
Joined: Fri Nov 30, 2007 4:57 am
Location: Team MacOSX

Re: Error: Could not get length of results file work

Postby hutleytj » Sat May 07, 2011 7:15 pm

When will we see (get a beta of?) this v7 client ?

I am still often seeing this bug ...


Error: Could not get length of results file work/wuresults_04.dat
[15:05:19] - Error: Could not read unit 04 file. Removing from queue.
[15:05:19] - Preparing to get new work unit...

I can post some "before and after the error" log, if it helps, or do we just "live with it" until v7 comes along ??

regards, Trevor
hutleytj
 
Posts: 13
Joined: Sun Feb 27, 2011 5:33 pm

Next

Return to Intel Mac V6 client

Who is online

Users browsing this forum: No registered users and 1 guest

cron