How to get more points?

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

SteveWillis
Posts: 409
Joined: Fri Apr 15, 2016 12:42 am
Hardware configuration: PC 1:
Linux Mint 17.3
three gtx 1080 GPUs One on a powered header
Motherboard = [MB-AM3-AS-SB-990FXR2] qty 1 Asus Sabertooth 990FX(+59.99)
CPU = [CPU-AM3-FX-8320BR] qty 1 AMD FX 8320 Eight Core 3.5GHz(+41.99)

PC2:
Linux Mint 18
Open air case
Motherboard: ASUS Crosshair V Formula-Z AM3+ AMD 990FX SATA 6Gb/s USB 3.0 ATX AMD
AMD FD6300WMHKBOX FX-6300 6-Core Processor Black Edition with Cooler Master Hyper 212 EVO - CPU Cooler with 120mm PWM Fan
three gtx 1080,
one gtx 1080 TI on a powered header

Re: How to get more points?

Post by SteveWillis »

I followed the suggestion and checked my log before and after a reboot.

Code: Select all

19:15:45:WU03:FS03:0x21:Completed 4250000 out of 5000000 steps (85%)
19:15:59:WU02:FS02:0x21:Completed 4700000 out of 5000000 steps (94%)
19:16:07:WU01:FS01:0x18:Completed 675000 out of 2500000 steps (27%)


19:18:15:WU01:FS01:0x18:Version 0.0.4
19:18:15:WU01:FS01:0x18:  Found a checkpoint file
19:18:28:WU02:FS02:0x21:Completed 4625000 out of 5000000 steps (92%)
19:18:28:WU02:FS02:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
19:18:29:WU03:FS03:0x21:Completed 4250000 out of 5000000 steps (85%)
19:18:29:WU03:FS03:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
19:18:36:WU01:FS01:0x18:Completed 700000 out of 2500000 steps (28%)
19:18:36:WU01:FS01:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
19:19:11:WU02:FS02:0x21:Completed 4650000 out of 5000000 steps (93%)
19:19:26:WU01:FS01:0x18:Completed 725000 out of 2500000 steps (29%)
19:19:49:WU03:FS03:0x21:Completed 4300000 out of 5000000 steps (86%)
19:20:14:WU01:FS01:0x18:Completed 750000 out of 2500000 steps (30%)
So since one of the WUs lost a couple percent I confirm that there is no checkpoint on reboot. I also at least partly confirm that on my system checkpoints are made very often, consistent with what I saw on the file timestamps.

And by the way, All my folding is by GPU
Image

1080 and 1080TI GPUs on Linux Mint
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: How to get more points?

Post by Joe_H »

Checkpoint files for WU's from which projects? The time you set in the client is currently only used by the CPU folding cores - A4 and A7. The value is passed to all folding cores, but the GPU ones do not make use of it.

Checkpoints for the GPU projects are set by the person running the project and are based on a number of steps done, or percentage of completion. Typically that has been every 2-5% of processing to be done on a GPU WU. That will be more often time wise for a fast GPU and less often on a slower one.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
SteveWillis
Posts: 409
Joined: Fri Apr 15, 2016 12:42 am
Hardware configuration: PC 1:
Linux Mint 17.3
three gtx 1080 GPUs One on a powered header
Motherboard = [MB-AM3-AS-SB-990FXR2] qty 1 Asus Sabertooth 990FX(+59.99)
CPU = [CPU-AM3-FX-8320BR] qty 1 AMD FX 8320 Eight Core 3.5GHz(+41.99)

PC2:
Linux Mint 18
Open air case
Motherboard: ASUS Crosshair V Formula-Z AM3+ AMD 990FX SATA 6Gb/s USB 3.0 ATX AMD
AMD FD6300WMHKBOX FX-6300 6-Core Processor Black Edition with Cooler Master Hyper 212 EVO - CPU Cooler with 120mm PWM Fan
three gtx 1080,
one gtx 1080 TI on a powered header

Re: How to get more points?

Post by SteveWillis »

Joe_H
GPU ones do not make use of it
Thanks for clarifying that.
Last edited by SteveWillis on Thu Jan 05, 2017 10:20 pm, edited 1 time in total.
Image

1080 and 1080TI GPUs on Linux Mint
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: How to get more points?

Post by Aurum »

Aurum wrote: This morning I checked FAHControl and just happened to see a GPU about to hit Progress 99% on the Status/Work Queue list. I've had next-unit-percentage=99.4 set globally on the Expert tab for days and watched to see if it worked. But the GPU did not poll the AS until Progress crossed 99.8%. But in the Log tab work progress is reported as integers:

Code: Select all

08:10:51:WU02:FS01:0x21:Completed 50000 out of 5000000 steps (1%)
08:42:23:WU02:FS01:0x21:Completed 100000 out of 5000000 steps (2%)
09:14:17:WU02:FS01:0x21:Completed 150000 out of 5000000 steps (3%)
09:45:52:WU02:FS01:0x21:Completed 200000 out of 5000000 steps (4%)
I bet the "next-unit-percentage" option uses a rounded up or off version of the "value" one assigns. If "value" can only be integers it'd be nice to know, e.g. reject a decimal entry when adding the option or a note in the Help file. It might be better to use ETA to trigger DLing the next WU.
The help text does say integers:
"next-unit-percentage <integer=99>
Pre-download the next work unit when the current one is this far along.
Values less than 90 are not allowed."
viewtopic.php?f=61&t=26036

If one enters 89.9 then a popup explains it's being rejected. If one enters 93.3 it goes in but seems to interpret it as 94%. I suggest a popup saying intergers only.
Aurum wrote:The 99.9% bug: When a GPU is infected by the 99.9% bug it never seems to recover. The only way I know to cure it and get the GPU back to folding is to Quit F@H and restart it which wastes a lot of work when the other GPUs in the folding rig get set back to their checkpoints. I bet you're already working on a fix for 7.4.16 but since I see it daily just thought I'd mention it.
Are any of these options useful in restarting a GPU afflicted with the 99.9% bug :?:

stall-detection-enabled <boolean=false>
Attempt to detect stalled work units and restart them.

stall-percent <integer=5>
Minimum estimated percent work unit completion since last frame before a WU
can be considered stalled, if zero the percentage is ignored.

stall-timeout <integer=1800>
Minimum time, in seconds, since last frame before a WU can be considered
stalled.
In Science We Trust Image
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: How to get more points?

Post by Joe_H »

Aurum wrote:Are any of these options useful in restarting a GPU afflicted with the 99.9% bug :?:

stall-detection-enabled <boolean=false>
Attempt to detect stalled work units and restart them.

stall-percent <integer=5>
Minimum estimated percent work unit completion since last frame before a WU
can be considered stalled, if zero the percentage is ignored.

stall-timeout <integer=1800>
Minimum time, in seconds, since last frame before a WU can be considered
stalled.
No. The "99.9% bug" is not related to the core being stalled, it is related to a driver and/or core crash. Until the driver and core processes are restarted for that specific GPU, the client is not able to restart. The percentage going to 99.9% is caused by the client not detecting that the GPU core has crashed.

Work has been done trying to detect and handle this condition within the folding core, but the last reports I saw on this was that it was only partially successful.

As a side note, the code to detect stalled processing was defaulted to off when testing during development of the client showed it to have a number of problems. It remains in the client as the developers hope to improve it and make it useful at some point in the future. As I understand it, that was not a priority item for the current public beta release - 7.4.15.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply