Incomplete work unit?

Moderators: Site Moderators, PandeGroup

Incomplete work unit?

Postby Penfold » Tue Apr 17, 2012 2:58 pm

First WU finished - fireworks, marching bands, air thick with champagne corks ... etc. :lol:

I now have a couple more questions:

1.) Not long after the next WU started, I had to switch my machine off and move it to another room. The FAHlog document shows that I shut the client down after 5000 of 500000 steps. The next item in the log says 'Opening Log file', and the work subsequently resuming. It ran overnight and reached 330000 steps out of 500000 before I had to move the machine back. After that at some point the log shows:

[11:05:20] Going to send back what have done -- stepsTotalG=0
[11:05:20] Work fraction=0.0000 steps=0.
[11:05:24] logfile size=49359 infoLength=49359 edr=0 trr=25
[11:05:24] logfile size: 49359 info=49359 bed=0 hdr=25
[11:05:24] - Writing 49897 bytes of core data to disk...
[11:05:24] ... Done.
[11:05:24]
[11:05:24] Folding@home Core Shutdown: UNSTABLE_MACHINE
[11:05:24] CoreStatus = 7A (122)
[11:05:24] Sending work to server
[11:05:24] Project: 6063 (Run 1, Clone 180, Gen 440)

[11:05:24] + Attempting to send results [April 17 11:05:24 UTC]
[11:05:27] + Results successfully sent

without any further work being done.
Why has this happened? I notice the capitalised 'UNSTABLE_MACHINE' - never seen that before.

There is now another WU which is chugging away seemingly OK.

2.) A minor point, but in ~/Library/Folding@home the .dat files have icons displaying the VLC icon (the VLC traffic cone). Why might this have happened, and how can I correct it?
Image

Ubuntu 16.04.1
F@H v 7.4.4
User avatar
Penfold
 
Posts: 135
Joined: Tue Apr 10, 2012 11:09 pm
Location: Scotland

Re: Incomplete work unit?

Postby Joe_H » Tue Apr 17, 2012 9:13 pm

1 - The WU could have been bad, and that happened to coincide with the restart. But, also likely is that your shutdown happened at the same time as when the folding client was writing a checkpoint file. That can corrupt the checkpoint, and when it was restarted it failed. The messages that would help determine that are in the log before the portion you included.

You can avoid corrupting the checkpoint by looking for it in the work folder and checking its modification time. Just make sure it is a minute or so in the past. You can also check the frequency of the checkpoints, about every 15 minutes is a good value. It has been a while since I read a V6 log, so I don't recall if it gets listed there.

2 - It doesn't really cause a problem as you can use "Open with", but as I recall you can change the default app to open the .dat files using Get Info. Just don't do it for all .dat files on the system if that option is listed. Why it got that way is an issue with the VLC installer, I don't know if that is still present in current versions of VLC.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 3899
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: Incomplete work unit?

Postby Penfold » Tue Apr 17, 2012 9:56 pm

Joe_H wrote:1 - The WU could have been bad, and that happened to coincide with the restart. But, also likely is that your shutdown happened at the same time as when the folding client was writing a checkpoint file. That can corrupt the checkpoint, and when it was restarted it failed. The messages that would help determine that are in the log before the portion you included.

Here is what went immediately before (having switched it all back on after the move):

Launch directory: /Users/penfold/Library/Folding@home
Executable: /usr/local/fah/fah6
Arguments: -advmethods

[11:05:13] - Ask before connecting: No
[11:05:13] - User name: Penfold (Team 1971)
[11:05:13] - User ID: 7A7ACD5E13C0DFDD
[11:05:13] - Machine ID: 1
[11:05:13]
[11:05:13] Loaded queue successfully.
[11:05:13]
[11:05:13] + Processing work unit
[11:05:13] Core required: FahCore_a3.exe
[11:05:13] Core found.
[11:05:13] Working on queue slot 02 [April 17 11:05:13 UTC]
[11:05:13] + Working ...
[11:05:13]
[11:05:13] *------------------------------*
[11:05:13] Folding@Home Gromacs SMP Core
[11:05:13] Version 2.22 (May 7 2010)
[11:05:13]
[11:05:13] Preparing to commence simulation
[11:05:13] - Looking at optimizations...
[11:05:13] - Files status OK
[11:05:13] - Expanded 1761711 -> 2249733 (decompressed 127.7 percent)
[11:05:13] Called DecompressByteArray: compressed_data_size=1761711 data_size=2249733, decompressed_data_size=2249733 diff=0
[11:05:13] - Digital signature verified
[11:05:13]
[11:05:13] Project: 6063 (Run 1, Clone 180, Gen 440)
[11:05:13]
[11:05:13] Assembly optimizations on if available.
[11:05:13] Entering M.D.
[11:05:19] Using Gromacs checkpoints
[11:05:20] mdrun returned 255


Joe_H wrote:You can avoid corrupting the checkpoint by looking for it in the work folder and checking its modification time. Just make sure it is a minute or so in the past. You can also check the frequency of the checkpoints, about every 15 minutes is a good value. It has been a while since I read a V6 log, so I don't recall if it gets listed there.

Yeah, there's about 16~17minutes between each "Completed ..... out of ..... " statement in the log. I'll take heed of that next time I switch off.

Joe_H wrote:2 - It doesn't really cause a problem as you can use "Open with", but as I recall you can change the default app to open the .dat files using Get Info. Just don't do it for all .dat files on the system if that option is listed. Why it got that way is an issue with the VLC installer, I don't know if that is still present in current versions of VLC.

What apps would actually be appropriate should I ever feel like opening these .dat files (just to see what the contain)? I have the latest version of VLC for my OS.
User avatar
Penfold
 
Posts: 135
Joined: Tue Apr 10, 2012 11:09 pm
Location: Scotland

Re: Incomplete work unit?

Postby Joe_H » Tue Apr 17, 2012 10:49 pm

Yeah, there's about 16~17minutes between each "Completed ..... out of ..... " statement in the log. I'll take heed of that next time I switch off.


That records the completion of each frame (1%) of the simulation, the checkpoints can be written in between that on the 15 minute mark for instance. I would have to check, but the last time I installed V6 it had the default checkpoint every 5 minutes. So in the work folder itself you can watch the file modification time of the checkpoint, and use that to know when one was last written. It was easier in the PPC V5 client, if the verbosity level was set high enough it listed in the terminal screen when it was writing a checkpoint.

What apps would actually be appropriate should I ever feel like opening these .dat files (just to see what the contain)? I have the latest version of VLC for my OS.


It has been a while, but I recall from opening some of these .dat files for just reading in TextEdit that they contain binary information. I don't remember anything readable, but it has been a while.

Also, looking at the log fragment, it does look like a bad checkpoint leading to immediate failure.
Joe_H
Site Admin
 
Posts: 3899
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: Incomplete work unit?

Postby Penfold » Wed Apr 18, 2012 10:05 am

Thanks again, Joe.
User avatar
Penfold
 
Posts: 135
Joined: Tue Apr 10, 2012 11:09 pm
Location: Scotland


Return to Intel Mac V6 client

Who is online

Users browsing this forum: No registered users and 0 guests

cron