Losing progress after pausing - is this normal?

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

Post Reply
flu2146
Posts: 3
Joined: Thu Sep 25, 2014 3:49 am

Losing progress after pausing - is this normal?

Post by flu2146 »

If I pause or switch the amount of power I want my computer to dedicate, the progress bar (on a pretty large WU) will decrease by a few percent. In the log, the number of steps will go down as well. Compare the beginning of the log to the end.

Code: Select all

03:27:57:WU00:FS00:0xa4:Completed 45000 out of 1500000 steps  (3%)
03:38:34:WU00:FS00:0xa4:Completed 60000 out of 1500000 steps  (4%)
03:39:27:FS00:Paused
03:39:27:FS00:Shutting core down
03:39:32:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
03:39:38:Saving configuration to config.xml
03:39:38:<config>
03:39:38:  <!-- Slot Control -->
03:39:38:  <power v='medium'/>
03:39:38:
03:39:38:  <!-- User Information -->
03:39:38:  <passkey v='********************************'/>
03:39:38:  <user v='flu2146'/>
03:39:38:
03:39:38:  <!-- Folding Slots -->
03:39:38:  <slot id='0' type='CPU'>
03:39:38:    <paused v='true'/>
03:39:38:  </slot>
03:39:38:</config>
03:44:39:FS00:Unpaused
03:44:39:WU00:FS00:Starting
03:44:39:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Fred/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 704 -lifeline 3128 -checkpoint 15 -np 7
03:44:39:WU00:FS00:Started FahCore on PID 9520
03:44:39:WU00:FS00:Core PID:5904
03:44:39:WU00:FS00:FahCore 0xa4 started
03:44:40:WU00:FS00:0xa4:
03:44:40:WU00:FS00:0xa4:*------------------------------*
03:44:40:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
03:44:40:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
03:44:40:WU00:FS00:0xa4:
03:44:40:WU00:FS00:0xa4:Preparing to commence simulation
03:44:40:WU00:FS00:0xa4:- Looking at optimizations...
03:44:40:WU00:FS00:0xa4:- Files status OK
03:44:40:WU00:FS00:0xa4:- Expanded 2053747 -> 5365960 (decompressed 261.2 percent)
03:44:40:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=2053747 data_size=5365960, decompressed_data_size=5365960 diff=0
03:44:40:WU00:FS00:0xa4:- Digital signature verified
03:44:40:WU00:FS00:0xa4:
03:44:40:WU00:FS00:0xa4:Project: 7808 (Run 4, Clone 342, Gen 66)
03:44:40:WU00:FS00:0xa4:
03:44:40:WU00:FS00:0xa4:Assembly optimizations on if available.
03:44:40:WU00:FS00:0xa4:Entering M.D.
03:44:43:Saving configuration to config.xml
03:44:43:<config>
03:44:43:  <!-- Slot Control -->
03:44:43:  <power v='medium'/>
03:44:43:
03:44:43:  <!-- User Information -->
03:44:43:  <passkey v='********************************'/>
03:44:43:  <user v='flu2146'/>
03:44:43:
03:44:43:  <!-- Folding Slots -->
03:44:43:  <slot id='0' type='CPU'/>
03:44:43:</config>
03:44:46:WU00:FS00:0xa4:Using Gromacs checkpoints
03:44:46:WU00:FS00:0xa4:Mapping NT from 7 to 7 
03:44:46:WU00:FS00:0xa4:Resuming from checkpoint
03:44:46:WU00:FS00:0xa4:Verified 00/wudata_01.log
03:44:46:WU00:FS00:0xa4:Verified 00/wudata_01.trr
03:44:46:WU00:FS00:0xa4:Verified 00/wudata_01.xtc
03:44:46:WU00:FS00:0xa4:Verified 00/wudata_01.edr
03:44:46:WU00:FS00:0xa4:Completed 42250 out of 1500000 steps  (2%)
03:46:44:WU00:FS00:0xa4:Completed 45000 out of 1500000 steps  (3%)
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Losing progress after pausing - is this normal?

Post by Joe_H »

Yes, this is normal. After pausing the client will restart at the last checkpoint that was written by the folding core. In the case of CPU WU's such as the Core_A4 one being worked on in your log, the default checkpoint frequency is every 15 minutes. So at most you will redo that much processing.

For current GPU folding cores the checkpoint frequency is set by the researcher running the project. Typically it is set between 2 to 5% of progress.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
flu2146
Posts: 3
Joined: Thu Sep 25, 2014 3:49 am

Re: Losing progress after pausing - is this normal?

Post by flu2146 »

Well, that's frustrating. Is there any way to determine an optimal place/time to pause?
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Losing progress after pausing - is this normal?

Post by Joe_H »

If you examine the work folder contained in the F@H data folder on your system, you can see the modification times of the current and the previous checkpoint files. If you pause about a minute after the current checkpoint file is written, then you are sure that the entire contents have been flushed to your drive from being cached while being written.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
flu2146
Posts: 3
Joined: Thu Sep 25, 2014 3:49 am

Re: Losing progress after pausing - is this normal?

Post by flu2146 »

Ok, thank you for your help!
czeski
Posts: 11
Joined: Sat Oct 27, 2012 4:59 pm

Re: Losing progress after pausing - is this normal?

Post by czeski »

It would be nice if the new cores being developed like core17, core18 (and preferably update old ones a3, a4, core15) would report in logs when they successfully written checkpoint files. This would allow in future to implement an option in GUI to pause WU after next checkpoint.

It doesn't seem to take much work for a developer to add unified lines to core output like:
Writing checkpoint at xx%
Checkpoint at xx% written and verified successfully

In GUI there could be a timer when last checkpoint was written and maybe even average time between checkpoints to see if You should pause WU immediately or wait for next checkpoint.
This would really help uses who are not 24h folders.
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Losing progress after pausing - is this normal?

Post by 7im »

Most core 17, 18 projects write check points every 5%. 5, 10, 15, etc. It's a known quantity.

It's a good idea, but development on the cores you listed have all but stopped. Maybe their replacements will do better.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Post Reply