Feature Request: Pause at next checkpoint

Moderators: Site Moderators, FAHC Science Team

foldinghomealone2
Posts: 148
Joined: Sun Jul 30, 2017 8:40 pm

Re: Feature Request: Pause at next checkpoint

Post by foldinghomealone2 »

iceman1992 wrote:
foldinghomealone2 wrote:Therefore my opinion is to fold with power efficient hardware only and to fold dedicated. (not to be understood as to fold with dedicated rigs).
With dedicated I mean that you take a WU and you fold it as fast as possible and return it.
Okay, but that's a bit of a paradox. By your definition, folding dedicated is not dedicated rigs, but if someone uses the machine while it's folding, it will slow the progress down, depending on what they're doing it can almost stop the progress completely. So nobody should use the machine while it's folding. That makes it a sort-of dedicated rig.
dedication
/dɛdɪˈkeɪʃ(ə)n/
noun
1.
the quality of being dedicated or committed to a task or purpose.

I have a GPU because I game.
But I won't game and fold at the same time (it slows down both processes) and I finish a WU first, then I start to game.
I don't start folding, then pause, then game, then start over folding.
Knish
Posts: 232
Joined: Tue Mar 17, 2020 5:20 am

Re: Feature Request: Pause at next checkpoint

Post by Knish »

I'm guessing u haven't seen the movie Lucky Number Slevin
foldinghomealone2
Posts: 148
Joined: Sun Jul 30, 2017 8:40 pm

Re: Feature Request: Pause at next checkpoint

Post by foldinghomealone2 »

Kansas City Shuffle.
However I can't follow you.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Feature Request: Pause at next checkpoint

Post by PantherX »

The current limitation AFAIK, it the lack of identifying fast GPUs from slow GPUs. Currently, the system only knows the GPU architecture and that's it. Maybe in a future, the F@H Servers can identify the exact GPU model and assign a WU that is best suited for it's speed. That would a win-win situation where the fast GPUs get huge proteins to fold while the slow GPUs get smaller proteins to fold.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Feature Request: Pause at next checkpoint

Post by bruce »

uyaem wrote:
iceman1992 wrote:
foldinghomealone2 wrote:Hence my! general recommendation: never ever pause a GPU slot.
Yeah no that's not realistic
A log notification about hitting a savepoint would be cool. :)
A new feature in the most recent versions of FAHCore_22 is a notification at the beginning of the WU saying:

Code: Select all

22:04:06:WU00:FS02:0x22:  Checkpoint write interval: xxx00 steps (5%) [20 total]
Although the reported values will change based on what the PI has set for his project, at least if I know it's every 5% for this project and I can estimate the time until the beginning of the next 5% interval.
JohnChodera
Pande Group Member
Posts: 470
Joined: Fri Feb 22, 2013 9:59 pm

Re: Feature Request: Pause at next checkpoint

Post by JohnChodera »

Great idea! We'd need to add something to the client to instruct the cores when to stop.

Could someone submit this to https://github.com/foldingathome/fah-issues/issues so we can get that into the queue for client features?

> A log notification about hitting a savepoint would be cool. :)

That's easy enough to add to core22! I'll see if I can add that in 0.0.12.

~ John Chodera // MSKCC
Knish
Posts: 232
Joined: Tue Mar 17, 2020 5:20 am

Re: Feature Request: Pause at next checkpoint

Post by Knish »

already done back on page 2 i believe viewtopic.php?f=16&t=34239&start=15#p325121
ajm
Posts: 754
Joined: Sat Mar 21, 2020 5:22 am
Location: Lucerne, Switzerland

Re: Feature Request: Pause at next checkpoint

Post by ajm »

Why not just write a checkpoint whenever a slot or a kit is paused?
Frogging101
Posts: 85
Joined: Wed Mar 25, 2020 2:39 am
Location: Canada

Re: Feature Request: Pause at next checkpoint

Post by Frogging101 »

ajm wrote:Why not just write a checkpoint whenever a slot or a kit is paused?
The CPU cores do write when they are paused or are otherwise gracefully shut down.

The GPU cores, as I understand it, operate differently. The CPU sends work to the GPU in large "chunks" (there's probably a correct term for this, but I don't know it), and reads the output from each chunk when the GPU finishes processing it. And a "chunk" is either processed in full, or it isn't. When a GPU slot is paused and the core is shut down, the current "chunk" that the GPU is working on is abandoned. AFAIK, this is just how GPU compute works; it's most efficient when it can run an algorithm in parallel on a large amount of input at once. It doesn't run piecemeal bits of data back and forth with the CPU.

Essentially, with GPU processing, there's more work "in flight" at a given time, so if it gets cancelled, more work is lost. That's a tradeoff of GPU computing.

Note: This is just my understanding of how GPU computing works. I'm a software engineer, but I haven't done any in-depth work in this area. Please correct me if I got things wrong :)
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Feature Request: Pause at next checkpoint

Post by bruce »

GPU "chunks" = kernels.

FAHCores cannot write a checkpoint at an arbitrary point in the the procedure so not whenever you decide to pause would be a bad time. Checkpoint frequencies can be set by the project designer.

Both GROMACS (Core_a7) and OpenMM (Core_2x) will back up to the previous suitable break-point.
calkapokole
Posts: 80
Joined: Sun Nov 18, 2012 11:03 pm
Hardware configuration: Lenovo IdeaPad Y580: Chipset Intel HM76 | Socket G2 | BIOS 2.07
Display: 15.6" | 1920x1080 | LG LP156WF1-TLC1 | TN LED | glossy
CPU: Intel Core i7-3610QM | 2.3-3.3 GHz | 6 MB L3 | 22 nm | TDP 45 W
iGPU: Intel HD Graphics 4000 (GT2) | 22 nm | 16 Unified Shaders: 1100 Mhz
dGPU: NVIDIA GeForce GTX 660M (GK107) | 28 nm:
- 384 Unified Shaders: 835@1215 MHz (45.5% OC)
- 2 GB GDDR5 128 bit: 1000@1250 (5000 effective) MHz (25% OC)
RAM: Patriot | 16 GB | DDR3 1600 MHz | 11-11-11-28-1 | Dual Channel
SSD 1: Crucial MX200 | mSATA | 250 GB | Micron 16 nm 128 Gb MLC NAND
SSD 2: Crucial MX500 | 2.5" | 2 TB | Micron 256 Gb 64L 3D TLC NAND
HDD: WD Scorpio Black WD7500BPKT | 2.5" | 750 GB | 7200 RPM | 16 MB
ODD: Samsung SN-506BB | 4 MB | BD-RE XL
WiFi: Intel Centrino Wireless-N 2200
OS: Windows 10 Pro (x64) | ForceWare 425.31 WHQL
Cooler: Zalman ZM-NC2000
Location: Poland

Re: Feature Request: Pause at next checkpoint

Post by calkapokole »

On Windows as a workaround I use a simple batch script which searches for checkpoints and displays their modification dates:

Code: Select all

@echo off
setlocal enableextensions

set fah_path=%HOMEDRIVE%%HOMEPATH%\AppData\Roaming\FAHClient

if "%~1" == "/?" goto help

if "%~1" == "" (
    call :searchForCheckpoints "%fah_path%\work"
) else (
    if exist "%fah_path%\work\%~1" (
        call :searchForCheckpoints "%fah_path%\work\%~1"
    ) else goto notexist
)
exit /b

:notexist
echo Work Queue with ID '%~1' does not exist.
echo(

:help
echo %0 [work_queue_id]
goto :eof

:searchForCheckpoints
:: %1 - path where search will be performed
setlocal
dir /o:-d /s "%~1\check*"
echo(
dir /o:-d /s "%~1\*.cpt"
echo(
dir /o:-d /s "%~1\*.ckp"
endlocal
goto :eof
If a checkpoint was saved recently, then I manually pause the slot, otherwise I wait and use the script every few minutes to check if a new checkpoint has been made.
Image
Post Reply