Page 3 of 3

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 7:29 pm
by foldinghomealone2
iceman1992 wrote:
foldinghomealone2 wrote:Therefore my opinion is to fold with power efficient hardware only and to fold dedicated. (not to be understood as to fold with dedicated rigs).
With dedicated I mean that you take a WU and you fold it as fast as possible and return it.
Okay, but that's a bit of a paradox. By your definition, folding dedicated is not dedicated rigs, but if someone uses the machine while it's folding, it will slow the progress down, depending on what they're doing it can almost stop the progress completely. So nobody should use the machine while it's folding. That makes it a sort-of dedicated rig.
dedication
/dɛdɪˈkeɪʃ(ə)n/
noun
1.
the quality of being dedicated or committed to a task or purpose.

I have a GPU because I game.
But I won't game and fold at the same time (it slows down both processes) and I finish a WU first, then I start to game.
I don't start folding, then pause, then game, then start over folding.

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 8:15 pm
by Knish
I'm guessing u haven't seen the movie Lucky Number Slevin

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 8:31 pm
by foldinghomealone2
Kansas City Shuffle.
However I can't follow you.

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 10:31 pm
by PantherX
The current limitation AFAIK, it the lack of identifying fast GPUs from slow GPUs. Currently, the system only knows the GPU architecture and that's it. Maybe in a future, the F@H Servers can identify the exact GPU model and assign a WU that is best suited for it's speed. That would a win-win situation where the fast GPUs get huge proteins to fold while the slow GPUs get smaller proteins to fold.

Re: Feature Request: Pause at next checkpoint

Posted: Sun Jul 05, 2020 5:02 am
by bruce
uyaem wrote:
iceman1992 wrote:
foldinghomealone2 wrote:Hence my! general recommendation: never ever pause a GPU slot.
Yeah no that's not realistic
A log notification about hitting a savepoint would be cool. :)
A new feature in the most recent versions of FAHCore_22 is a notification at the beginning of the WU saying:

Code: Select all

22:04:06:WU00:FS02:0x22:  Checkpoint write interval: xxx00 steps (5%) [20 total]
Although the reported values will change based on what the PI has set for his project, at least if I know it's every 5% for this project and I can estimate the time until the beginning of the next 5% interval.

Re: Feature Request: Pause at next checkpoint

Posted: Sun Jul 05, 2020 5:38 am
by JohnChodera
Great idea! We'd need to add something to the client to instruct the cores when to stop.

Could someone submit this to https://github.com/foldingathome/fah-issues/issues so we can get that into the queue for client features?

> A log notification about hitting a savepoint would be cool. :)

That's easy enough to add to core22! I'll see if I can add that in 0.0.12.

~ John Chodera // MSKCC

Re: Feature Request: Pause at next checkpoint

Posted: Wed Jul 08, 2020 10:08 am
by Knish
already done back on page 2 i believe viewtopic.php?f=16&t=34239&start=15#p325121

Re: Feature Request: Pause at next checkpoint

Posted: Wed Jul 08, 2020 10:23 am
by ajm
Why not just write a checkpoint whenever a slot or a kit is paused?

Re: Feature Request: Pause at next checkpoint

Posted: Wed Jul 08, 2020 6:25 pm
by Frogging101
ajm wrote:Why not just write a checkpoint whenever a slot or a kit is paused?
The CPU cores do write when they are paused or are otherwise gracefully shut down.

The GPU cores, as I understand it, operate differently. The CPU sends work to the GPU in large "chunks" (there's probably a correct term for this, but I don't know it), and reads the output from each chunk when the GPU finishes processing it. And a "chunk" is either processed in full, or it isn't. When a GPU slot is paused and the core is shut down, the current "chunk" that the GPU is working on is abandoned. AFAIK, this is just how GPU compute works; it's most efficient when it can run an algorithm in parallel on a large amount of input at once. It doesn't run piecemeal bits of data back and forth with the CPU.

Essentially, with GPU processing, there's more work "in flight" at a given time, so if it gets cancelled, more work is lost. That's a tradeoff of GPU computing.

Note: This is just my understanding of how GPU computing works. I'm a software engineer, but I haven't done any in-depth work in this area. Please correct me if I got things wrong :)

Re: Feature Request: Pause at next checkpoint

Posted: Wed Jul 08, 2020 9:29 pm
by bruce
GPU "chunks" = kernels.

FAHCores cannot write a checkpoint at an arbitrary point in the the procedure so not whenever you decide to pause would be a bad time. Checkpoint frequencies can be set by the project designer.

Both GROMACS (Core_a7) and OpenMM (Core_2x) will back up to the previous suitable break-point.

Re: Feature Request: Pause at next checkpoint

Posted: Sun Aug 23, 2020 7:55 pm
by calkapokole
On Windows as a workaround I use a simple batch script which searches for checkpoints and displays their modification dates:

Code: Select all

@echo off
setlocal enableextensions

set fah_path=%HOMEDRIVE%%HOMEPATH%\AppData\Roaming\FAHClient

if "%~1" == "/?" goto help

if "%~1" == "" (
    call :searchForCheckpoints "%fah_path%\work"
) else (
    if exist "%fah_path%\work\%~1" (
        call :searchForCheckpoints "%fah_path%\work\%~1"
    ) else goto notexist
)
exit /b

:notexist
echo Work Queue with ID '%~1' does not exist.
echo(

:help
echo %0 [work_queue_id]
goto :eof

:searchForCheckpoints
:: %1 - path where search will be performed
setlocal
dir /o:-d /s "%~1\check*"
echo(
dir /o:-d /s "%~1\*.cpt"
echo(
dir /o:-d /s "%~1\*.ckp"
endlocal
goto :eof
If a checkpoint was saved recently, then I manually pause the slot, otherwise I wait and use the script every few minutes to check if a new checkpoint has been made.