Page 3 of 3

Re: How to get more points?

Posted: Thu Jan 05, 2017 7:30 pm
by SteveWillis
I followed the suggestion and checked my log before and after a reboot.

Code: Select all

19:15:45:WU03:FS03:0x21:Completed 4250000 out of 5000000 steps (85%)
19:15:59:WU02:FS02:0x21:Completed 4700000 out of 5000000 steps (94%)
19:16:07:WU01:FS01:0x18:Completed 675000 out of 2500000 steps (27%)


19:18:15:WU01:FS01:0x18:Version 0.0.4
19:18:15:WU01:FS01:0x18:  Found a checkpoint file
19:18:28:WU02:FS02:0x21:Completed 4625000 out of 5000000 steps (92%)
19:18:28:WU02:FS02:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
19:18:29:WU03:FS03:0x21:Completed 4250000 out of 5000000 steps (85%)
19:18:29:WU03:FS03:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
19:18:36:WU01:FS01:0x18:Completed 700000 out of 2500000 steps (28%)
19:18:36:WU01:FS01:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
19:19:11:WU02:FS02:0x21:Completed 4650000 out of 5000000 steps (93%)
19:19:26:WU01:FS01:0x18:Completed 725000 out of 2500000 steps (29%)
19:19:49:WU03:FS03:0x21:Completed 4300000 out of 5000000 steps (86%)
19:20:14:WU01:FS01:0x18:Completed 750000 out of 2500000 steps (30%)
So since one of the WUs lost a couple percent I confirm that there is no checkpoint on reboot. I also at least partly confirm that on my system checkpoints are made very often, consistent with what I saw on the file timestamps.

And by the way, All my folding is by GPU

Re: How to get more points?

Posted: Thu Jan 05, 2017 7:53 pm
by Joe_H
Checkpoint files for WU's from which projects? The time you set in the client is currently only used by the CPU folding cores - A4 and A7. The value is passed to all folding cores, but the GPU ones do not make use of it.

Checkpoints for the GPU projects are set by the person running the project and are based on a number of steps done, or percentage of completion. Typically that has been every 2-5% of processing to be done on a GPU WU. That will be more often time wise for a fast GPU and less often on a slower one.

Re: How to get more points?

Posted: Thu Jan 05, 2017 9:50 pm
by SteveWillis
Joe_H
GPU ones do not make use of it
Thanks for clarifying that.

Re: How to get more points?

Posted: Thu Jan 05, 2017 10:16 pm
by Aurum
Aurum wrote: This morning I checked FAHControl and just happened to see a GPU about to hit Progress 99% on the Status/Work Queue list. I've had next-unit-percentage=99.4 set globally on the Expert tab for days and watched to see if it worked. But the GPU did not poll the AS until Progress crossed 99.8%. But in the Log tab work progress is reported as integers:

Code: Select all

08:10:51:WU02:FS01:0x21:Completed 50000 out of 5000000 steps (1%)
08:42:23:WU02:FS01:0x21:Completed 100000 out of 5000000 steps (2%)
09:14:17:WU02:FS01:0x21:Completed 150000 out of 5000000 steps (3%)
09:45:52:WU02:FS01:0x21:Completed 200000 out of 5000000 steps (4%)
I bet the "next-unit-percentage" option uses a rounded up or off version of the "value" one assigns. If "value" can only be integers it'd be nice to know, e.g. reject a decimal entry when adding the option or a note in the Help file. It might be better to use ETA to trigger DLing the next WU.
The help text does say integers:
"next-unit-percentage <integer=99>
Pre-download the next work unit when the current one is this far along.
Values less than 90 are not allowed."
viewtopic.php?f=61&t=26036

If one enters 89.9 then a popup explains it's being rejected. If one enters 93.3 it goes in but seems to interpret it as 94%. I suggest a popup saying intergers only.
Aurum wrote:The 99.9% bug: When a GPU is infected by the 99.9% bug it never seems to recover. The only way I know to cure it and get the GPU back to folding is to Quit F@H and restart it which wastes a lot of work when the other GPUs in the folding rig get set back to their checkpoints. I bet you're already working on a fix for 7.4.16 but since I see it daily just thought I'd mention it.
Are any of these options useful in restarting a GPU afflicted with the 99.9% bug :?:

stall-detection-enabled <boolean=false>
Attempt to detect stalled work units and restart them.

stall-percent <integer=5>
Minimum estimated percent work unit completion since last frame before a WU
can be considered stalled, if zero the percentage is ignored.

stall-timeout <integer=1800>
Minimum time, in seconds, since last frame before a WU can be considered
stalled.

Re: How to get more points?

Posted: Thu Jan 05, 2017 11:34 pm
by Joe_H
Aurum wrote:Are any of these options useful in restarting a GPU afflicted with the 99.9% bug :?:

stall-detection-enabled <boolean=false>
Attempt to detect stalled work units and restart them.

stall-percent <integer=5>
Minimum estimated percent work unit completion since last frame before a WU
can be considered stalled, if zero the percentage is ignored.

stall-timeout <integer=1800>
Minimum time, in seconds, since last frame before a WU can be considered
stalled.
No. The "99.9% bug" is not related to the core being stalled, it is related to a driver and/or core crash. Until the driver and core processes are restarted for that specific GPU, the client is not able to restart. The percentage going to 99.9% is caused by the client not detecting that the GPU core has crashed.

Work has been done trying to detect and handle this condition within the folding core, but the last reports I saw on this was that it was only partially successful.

As a side note, the code to detect stalled processing was defaulted to off when testing during development of the client showed it to have a number of problems. It remains in the client as the developers hope to improve it and make it useful at some point in the future. As I understand it, that was not a priority item for the current public beta release - 7.4.15.