Page 1 of 3

Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 2:24 am
by Crawdaddy79
Hello Folding Team,

I use the pause feature a lot. Sometimes because I want to do something else on my PC that requires cycles, sometimes because I just want this corner in my basement to cool down. Either way, I would like a better way to pause progress so that X amount of work isn't lost and has to be redone. Currently I have the CPs set to five minutes, but even 4:59 worth of waste seems unnecessary to me.

I would like for another button to be added, "Pause at next checkpoint". This would encompass the checkpoints of every folding slot.

This way I could click the button and futz around until everything is paused, and then do my thing.

Also there's a whole "my PC crashes when it runs at 100% for too long and it comes back reporting BAD WORK_UNIT" story that I'm not going to go into too much detail about.

Re: Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 2:56 am
by iceman1992
Agreed, I would like this too.

Re: Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 3:09 am
by Frogging101
I think it checkpoints when you pause it. The UI will go back to the last whole percentage point, but when you unpause it the "Completed steps" will show that you've kept your progress since then.

Re: Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 3:23 am
by Crawdaddy79
I wish that were the case, Frogging101, but check out meh logfile of when I arbitrarily paused, waited a minute, then unpaused.

0x22 went from 68% to 65% after unpausing.

Code: Select all

03:15:19:WU00:FS00:0xa7:Completed 15000 out of 125000 steps (12%)
03:15:43:WU01:FS01:0x22:Completed 670000 out of 1000000 steps (67%)
03:16:22:WU00:FS00:0xa7:Completed 16250 out of 125000 steps (13%)
03:16:39:WU01:FS01:0x22:Completed 680000 out of 1000000 steps (68%)
03:17:03:FS00:Paused
03:17:03:FS01:Paused
03:17:03:FS00:Shutting core down
03:17:03:FS01:Shutting core down
03:17:03:WU01:FS01:0x22:WARNING:Console control signal 1 on PID 3584
03:17:03:WU00:FS00:0xa7:WARNING:Console control signal 1 on PID 16184
03:17:03:WU01:FS01:0x22:Exiting, please wait. . .
03:17:03:WU00:FS00:0xa7:Exiting, please wait. . .
03:17:04:WU01:FS01:0x22:Folding@home Core Shutdown: INTERRUPTED
03:17:04:WU01:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
03:17:05:Removing old file 'configs/config-20200410-193411.xml'
03:17:05:Saving configuration to config.xml
03:17:05:<config>
03:17:05:  <!-- Folding Core -->
03:17:05:  <checkpoint v='5'/>
03:17:05:
03:17:05:  <!-- Network -->
03:17:05:  <proxy v=':8080'/>
03:17:05:
03:17:05:  <!-- Slot Control -->
03:17:05:  <power v='MEDIUM'/>
03:17:05:
03:17:05:  <!-- User Information -->
03:17:05:  <passkey v='********************************'/>
03:17:05:  <team v='64'/>
03:17:05:  <user v='Crawdaddy79'/>
03:17:05:
03:17:05:  <!-- Folding Slots -->
03:17:05:  <slot id='0' type='CPU'>
03:17:05:    <paused v='true'/>
03:17:05:  </slot>
03:17:05:  <slot id='1' type='GPU'>
03:17:05:    <paused v='true'/>
03:17:05:  </slot>
03:17:05:</config>
03:17:05:WU00:FS00:0xa7:Folding@home Core Shutdown: INTERRUPTED
03:17:05:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
03:17:21:FS00:Unpaused
03:17:21:FS01:Unpaused
03:17:21:WU01:FS01:Starting
03:17:21:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\crawd\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 10740 -checkpoint 5 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
03:17:21:WU01:FS01:Started FahCore on PID 1776
03:17:21:WU01:FS01:Core PID:14812
03:17:21:WU01:FS01:FahCore 0x22 started
03:17:21:WU01:FS01:0x22:*********************** Log Started 2020-04-11T03:17:21Z ***********************
03:17:21:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
03:17:21:WU01:FS01:0x22:       Type: 0x22
03:17:21:WU01:FS01:0x22:       Core: Core22
03:17:21:WU01:FS01:0x22:    Website: https://foldingathome.org/
03:17:21:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
03:17:21:WU01:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
03:17:21:WU01:FS01:0x22:             <rafal.wiewiora@choderalab.org>
03:17:21:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 705 -lifeline 1776 -checkpoint 5
03:17:21:WU01:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
03:17:21:WU01:FS01:0x22:     Config: <none>
03:17:21:WU01:FS01:0x22:************************************ Build *************************************
03:17:21:WU01:FS01:0x22:    Version: 0.0.2
03:17:21:WU01:FS01:0x22:       Date: Dec 6 2019
03:17:21:WU01:FS01:0x22:       Time: 21:30:31
03:17:21:WU01:FS01:0x22: Repository: Git
03:17:21:WU01:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
03:17:21:WU01:FS01:0x22:     Branch: HEAD
03:17:21:WU01:FS01:0x22:   Compiler: Visual C++ 2008
03:17:21:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
03:17:21:WU01:FS01:0x22:   Platform: win32 10
03:17:21:WU01:FS01:0x22:       Bits: 64
03:17:21:WU01:FS01:0x22:       Mode: Release
03:17:21:WU01:FS01:0x22:************************************ System ************************************
03:17:21:WU01:FS01:0x22:        CPU: AMD Ryzen 7 2700X Eight-Core Processor
03:17:21:WU01:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
03:17:21:WU01:FS01:0x22:       CPUs: 16
03:17:21:WU01:FS01:0x22:     Memory: 31.95GiB
03:17:21:WU01:FS01:0x22:Free Memory: 23.59GiB
03:17:21:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
03:17:21:WU01:FS01:0x22: OS Version: 6.2
03:17:21:WU01:FS01:0x22:Has Battery: false
03:17:21:WU01:FS01:0x22: On Battery: false
03:17:21:WU01:FS01:0x22: UTC Offset: -4
03:17:21:WU01:FS01:0x22:        PID: 14812
03:17:21:WU01:FS01:0x22:        CWD: C:\Users\crawd\AppData\Roaming\FAHClient\work
03:17:21:WU01:FS01:0x22:         OS: Windows 10 Home
03:17:21:WU01:FS01:0x22:    OS Arch: AMD64
03:17:21:WU01:FS01:0x22:********************************************************************************
03:17:21:WU01:FS01:0x22:Project: 11745 (Run 0, Clone 2225, Gen 26)
03:17:21:WU01:FS01:0x22:Unit: 0x000000388ca304f15e67f104dec31f90
03:17:21:WU01:FS01:0x22:Digital signatures verified
03:17:21:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
03:17:21:WU01:FS01:0x22:Version 0.0.2
03:17:22:WU01:FS01:0x22:  Found a checkpoint file
03:17:22:WU00:FS00:Starting
03:17:22:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\crawd\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 00 -suffix 01 -version 705 -lifeline 10740 -checkpoint 5 -np 14
03:17:22:WU00:FS00:Started FahCore on PID 7020
03:17:22:WU00:FS00:Core PID:15444
03:17:22:WU00:FS00:FahCore 0xa7 started
03:17:22:WU00:FS00:0xa7:*********************** Log Started 2020-04-11T03:17:22Z ***********************
03:17:22:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
03:17:22:WU00:FS00:0xa7:       Type: 0xa7
03:17:22:WU00:FS00:0xa7:       Core: Gromacs
03:17:22:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 7020 -checkpoint 5 -np 14
03:17:22:WU00:FS00:0xa7:************************************ CBang *************************************
03:17:22:WU00:FS00:0xa7:       Date: Oct 26 2019
03:17:22:WU00:FS00:0xa7:       Time: 01:38:25
03:17:22:WU00:FS00:0xa7:   Revision: c46a1a011a24143739ac7218c5a435f66777f62f
03:17:22:WU00:FS00:0xa7:     Branch: master
03:17:22:WU00:FS00:0xa7:   Compiler: Visual C++ 2008
03:17:22:WU00:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
03:17:22:WU00:FS00:0xa7:   Platform: win32 10
03:17:22:WU00:FS00:0xa7:       Bits: 64
03:17:22:WU00:FS00:0xa7:       Mode: Release
03:17:22:WU00:FS00:0xa7:************************************ System ************************************
03:17:22:WU00:FS00:0xa7:        CPU: AMD Ryzen 7 2700X Eight-Core Processor
03:17:22:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
03:17:22:WU00:FS00:0xa7:       CPUs: 16
03:17:22:WU00:FS00:0xa7:     Memory: 31.95GiB
03:17:22:WU00:FS00:0xa7:Free Memory: 23.53GiB
03:17:22:WU00:FS00:0xa7:    Threads: WINDOWS_THREADS
03:17:22:WU00:FS00:0xa7: OS Version: 6.2
03:17:22:WU00:FS00:0xa7:Has Battery: false
03:17:22:WU00:FS00:0xa7: On Battery: false
03:17:22:WU00:FS00:0xa7: UTC Offset: -4
03:17:22:WU00:FS00:0xa7:        PID: 15444
03:17:22:WU00:FS00:0xa7:        CWD: C:\Users\crawd\AppData\Roaming\FAHClient\work
03:17:22:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
03:17:22:WU00:FS00:0xa7:    Version: 0.0.18
03:17:22:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
03:17:22:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
03:17:22:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
03:17:22:WU00:FS00:0xa7:       Date: Oct 26 2019
03:17:22:WU00:FS00:0xa7:       Time: 01:52:30
03:17:22:WU00:FS00:0xa7:   Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
03:17:22:WU00:FS00:0xa7:     Branch: master
03:17:22:WU00:FS00:0xa7:   Compiler: Visual C++ 2008
03:17:22:WU00:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
03:17:22:WU00:FS00:0xa7:   Platform: win32 10
03:17:22:WU00:FS00:0xa7:       Bits: 64
03:17:22:WU00:FS00:0xa7:       Mode: Release
03:17:22:WU00:FS00:0xa7:************************************ Build *************************************
03:17:22:WU00:FS00:0xa7:       SIMD: avx_256
03:17:22:WU00:FS00:0xa7:********************************************************************************
03:17:22:WU00:FS00:0xa7:Project: 13870 (Run 0, Clone 529, Gen 59)
03:17:22:WU00:FS00:0xa7:Unit: 0x000000440d5262775e764918e9059201
03:17:22:WU00:FS00:0xa7:Digital signatures verified
03:17:22:WU00:FS00:0xa7:Reducing thread count from 14 to 13 to avoid domain decomposition with large prime factor 7
03:17:22:WU00:FS00:0xa7:Reducing thread count from 13 to 12 to avoid domain decomposition by a prime number > 3
03:17:22:WU00:FS00:0xa7:Calling: mdrun -s frame59.tpr -o frame59.trr -x frame59.xtc -e frame59.edr -cpi state.cpt -cpt 5 -nt 12
03:17:22:WU00:FS00:0xa7:Steps: first=7375000 total=125000
03:17:24:WU00:FS00:0xa7:Completed 17072 out of 125000 steps (13%)
03:17:36:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
03:17:41:WU01:FS01:0x22:Completed 650000 out of 1000000 steps (65%)
03:17:41:WU01:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
03:17:46:WU00:FS00:0xa7:Completed 17500 out of 125000 steps (14%)
03:18:06:Removing old file 'configs/config-20200410-204117.xml'
03:18:06:Saving configuration to config.xml
03:18:06:<config>
03:18:06:  <!-- Folding Core -->
03:18:06:  <checkpoint v='5'/>
03:18:06:
03:18:06:  <!-- Network -->
03:18:06:  <proxy v=':8080'/>
03:18:06:
03:18:06:  <!-- Slot Control -->
03:18:06:  <power v='MEDIUM'/>
03:18:06:
03:18:06:  <!-- User Information -->
03:18:06:  <passkey v='********************************'/>
03:18:06:  <team v='64'/>
03:18:06:  <user v='Crawdaddy79'/>
03:18:06:
03:18:06:  <!-- Folding Slots -->
03:18:06:  <slot id='0' type='CPU'/>
03:18:06:  <slot id='1' type='GPU'/>
03:18:06:</config>
03:18:38:WU01:FS01:0x22:Completed 660000 out of 1000000 steps (66%)
03:18:49:WU00:FS00:0xa7:Completed 18750 out of 125000 steps (15%)

Re: Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 5:09 am
by Frogging101
Interesting. Your CPU slot did save, though:

Code: Select all

03:16:22:WU00:FS00:0xa7:Completed 16250 out of 125000 steps (13%)
03:17:03:FS00:Paused
[...]
03:17:21:FS00:Unpaused
03:17:22:WU00:FS00:Starting
03:17:22:WU00:FS00:0xa7:Steps: first=7375000 total=125000
03:17:24:WU00:FS00:0xa7:Completed 17072 out of 125000 steps (13%)
Also, I've not seen a WU lose more than 1% before. I thought it at least checkpointed at every %. How odd.

Re: Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 5:16 am
by Joe_H
GPU WUs pause at set percentages that are set by the researcher when the project is set up. Depending on the project, typical values are every 2-5%.

CPU WUs running on the A7 core write out a checkpoint at the time interval set in the client through FAHControl. Depending on how it is paused, the A7 core will attempt to write a checkpoint then as well.

Re: Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 5:19 am
by Frogging101
Joe_H wrote: GPU WUs pause at set percentages that are set by the researcher when the project is set up. Depending on the project, typical values are every 2-5%.
That figures, actually, since GPU computing generally processes data in large chunks for efficiency.

Re: Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 6:33 am
by PantherX
Crawdaddy79 wrote:...Sometimes because I want to do something else on my PC that requires cycles...
Generally speaking, if you're using the CPU, folding would hardly impact whatever tasks you're running on your CPU as folding priority is very low while most other tasks are set at a higher priority. I understand if wanted to pause the GPU slot since you may encounter screen lag. You can reduce/eliminate the screen lag by disabling the hardware acceleration on the application (if it's supported) and/or disable Windows animation.
Crawdaddy79 wrote:..."my PC crashes when it runs at 100% for too long and it comes back reporting BAD WORK_UNIT" story that I'm not going to go into too much detail about.
That's an indication of something else. I have folded on my CPU at 100% for months without issues. The only time I would restart was due to the monthly Windows updates and apart from that, it would fold day and night without crashing. If you would like us to investigate, please do share your log and as much details as possible about your system setup and usage.

Re: Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 6:52 am
by iceman1992
The other day I needed to pause the GPU slot for a minute because I needed to move the electrical plug for the laptop (old battery not strong enough to sustain a full load), and I lost around 45 minutes of work

Re: Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 11:09 am
by foldinghomealone2
Hence my! general recommendation: never ever pause a GPU slot.

Re: Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 12:05 pm
by iceman1992
foldinghomealone2 wrote:Hence my! general recommendation: never ever pause a GPU slot.
Yeah no that's not realistic

Re: Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 4:26 pm
by Crawdaddy79
PantherX wrote:
Crawdaddy79 wrote:..."my PC crashes when it runs at 100% for too long and it comes back reporting BAD WORK_UNIT" story that I'm not going to go into too much detail about.
That's an indication of something else. I have folded on my CPU at 100% for months without issues. The only time I would restart was due to the monthly Windows updates and apart from that, it would fold day and night without crashing. If you would like us to investigate, please do share your log and as much details as possible about your system setup and usage.
Agree about the CPU impact being minimal and 100% reliable when pegged at 100%. It's the GPU that I have issues with. It seems to become unstable if it runs at 100% for too long (60 - 90 minutes OK, 120+ minutes, not OK) - but that's digressing.
Joe_H wrote:GPU WUs pause at set percentages that are set by the researcher when the project is set up. Depending on the project, typical values are every 2-5%.

CPU WUs running on the A7 core write out a checkpoint at the time interval set in the client through FAHControl. Depending on how it is paused, the A7 core will attempt to write a checkpoint then as well.
This is good info. Thank you.

Re: Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 5:52 pm
by uyaem
iceman1992 wrote:
foldinghomealone2 wrote:Hence my! general recommendation: never ever pause a GPU slot.
Yeah no that's not realistic
A log notification about hitting a savepoint would be cool. :)

Re: Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 6:08 pm
by iceman1992
uyaem wrote:
iceman1992 wrote:
foldinghomealone2 wrote:Hence my! general recommendation: never ever pause a GPU slot.
Yeah no that's not realistic
A log notification about hitting a savepoint would be cool. :)
That would be (I would guess) the easiest update that can solve this problem

Re: Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 9:55 pm
by Rel25917
I haven't looked at any of the new covid projects but before that when I checked 2.5% was pretty much normal, a safe bet would be to stop after multiples of 5%.