Why loose fractional progress after Pausing or changing Power ?

Moderators: Site Moderators, FAHC Science Team

Post Reply
PFM
Posts: 38
Joined: Sat Jan 03, 2009 3:14 am
Hardware configuration: MacBook Air 2020 i5 Quad Core
Lenovo SL410 Intel P9600 Core 2 Duo
Location: Bay Area, USA

Why loose fractional progress after Pausing or changing Power ?

Post by PFM »

I noticed that whenever I pause and restart the folding process, or change the power level slider, the fractional progress is lost and rounds down to the last percentage. For example, if progress was at 37.89% and I pause/fold or because the laptop goes into battery power mode, or I move the slider up or down the progress resets to 37%.
Why not just continue from 37.89% ? After all the client has Not been shut down or anything, it still has all the work live in its session. Even if it was shutdown it could very well save the work and continue from exactly where it left off and not impose a penalty.
Its a problem because for a WU that runs over several hours or days and there may be several pauses due to variety of reasons - say 5-10 pauses then you have lost a good 5-10% of work right there. Not too efficient either.
I recommend fixing this in the next update.
Thanks.
--
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Why loose fractional progress after Pausing or changing Power ?

Post by Joe_H »

When processing is stopped the state of the WU is not saved, the restart is at the last checkpoint. The checkpoint is done in a manner that has consistent data across the whole system being simulated. Stopping at a random spot is likely to be inconsistent and the calculations can't start from that.

Part of what you are missing is that calculations are not done by the client, but are done in a separate process started by the client. The client just handles downloading WUs, setting them up to be processed, and returning the completed data besides the starting and stopping of the processing. If you check using Task Manager or similar tools on other OSs, you can see this. FAHClient runs in the background and uses very little CPU time. When a WU is downloaded you will see the network and disc I/O and then a folding core will be launched with a wrapper process to pass parameters - FAHCore_nn and FAHCoreWrapper. They communicate using the local loopback network connection with FAHClient to pass the log information and errors and to receive stop commands or that the WU has completed processing. At the end the packaged WU is sent back to the servers by FAHClient.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
PFM
Posts: 38
Joined: Sat Jan 03, 2009 3:14 am
Hardware configuration: MacBook Air 2020 i5 Quad Core
Lenovo SL410 Intel P9600 Core 2 Duo
Location: Bay Area, USA

Re: Why loose fractional progress after Pausing or changing Power ?

Post by PFM »

Joe_H wrote: Mon Jan 02, 2023 5:24 am When processing is stopped the state of the WU is not saved, the restart is at the last checkpoint. The checkpoint is done in a manner that has consistent data across the whole system being simulated. Stopping at a random spot is likely to be inconsistent and the calculations can't start from that.
In that case if the checkpoint frequency is much less than the TPF then I would expect it to start from the last checkpoint save, but it still goes back to the last full % point before starting.
The WU I am folding right now has a TPF of 27 mins. I have set the checkpoint frequency to 6 mins. Yet when it was at 18.76% I paused and restarted and it went back to 18%. I would have expected it to start around 18.6%.
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Why loose fractional progress after Pausing or changing Power ?

Post by Joe_H »

Checkpoint frequency also depends on whether you are taking about a WU being processed on your CPU or your GPU. The code used on the CPU folding core has more opportunities for creating a valid checkpoint, so the checkpoint is controlled by the setting in FAHControl. The default is very 15 minutes. At the closest cycle through the data to that setting a checkpoint is created.

The GPU folding core has its checkpoint set by the researcher. At the same time as the checkpoint a sanity check is run on the data using your CPU, and other data is stored for upload of the WU to be used during analysis of the returned WUs. That frequency is generally every 2-5% of progress and you can not alter it using the setting in FAHControl. The creation of a checkpoint is noted in your log file.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply