Measuring Folding Efficiency

gordonbb · Post by **gordonbb** » Sat Mar 02, 2019 11:58 pm

While profiling Pascal-based NVidia Graphics Processing Units (GPUs) with FAHBench It was observed that their yield increases at a decreasing rate as the Power Limit for the GPU is adjusted from the minimum to the maximum values. This curve also exhibits a well-defined “knee” as the maximum Power Level of the GPU is approached.

These results suggested that running the GPU at or near the maximum draw uses a disproportionately large increase in power consumption with a small increase in yield. Empirical testing has shown that reducing the Power Limit to 90% results in only a slight (2-5%) decrease in yield for mid-range (GTX 1060, GTX 1070) GPUs.
FAHBench reports results in ns/day, a measure of the progression of the Folding simulation compared to real-time. Folders, however, are rewarded in Points which FAHBench cannot easily report in due to the Quick Return Bonus (QRB) which applies an inverse square-root function to reward Work Units (WUs) completed in advance of the Deadline set for a WU.
In theory these cards should be most efficient at their lowest Power Limit but efficiency for Folding should be measured in Points-per-Day per Watt (PPD/W) not ns/day/W.
On one hand the GPUs get less efficient as their Power Limit approaches the maximum but, on the other hand, as their Power Limit is increased they perform faster and get more value from the QRB.
The GPUs in use are manufactured by an Add-In Board (AIB) partner and are thus factory overclocked and have Maximum Power Limits that exceed those found on NVidia reference designs (“Founder’s Editions” or FE). The AIB Maximums are often wildly optimistic and as we are focusing our investigation on efficiency rather than yield the Default Power Limit from the FE models will be used as the Maximum Power Limit for these tests.
It is proposed to find the maximal efficiency of GPUs by running the GPUs at fixed percentages of the Default FE Power Limit from the maximum towards the minimum values and recording the Yields for a period of time sufficient to provide a reasonable average for differing WUs.
Yield will be recorded from the GPU using a script on the local host that queries the Folding Client once a minute and forwards the PPD to a remote collection server. Power Consumption will be recorded using a script on the local host that queries the Graphics Driver once a minute and forwards the Wattage to the remote collection server.
WU yields will also be recorded to compare of the Yields of WUs at the varying Power Limits.
Zabbix will be used for the remote collection server with Zabbix Agents installed on the Folding hosts collecting Power Consumption. A script will be used on the Folding hosts to collect Yields from the Folding Client as using the Zabbix Agent during initial testing resulted in occasional time-outs and loss of data.
HFM.net will be used to record WU statistics.
It is proposed to use one week as the time interval to collect data at a specific Power Limit percentage and to measure Yields at 5% intervals from 100% of the Maximum Power Limit down to 60-50% where the minimum Power Limit settings are found.

Post by **bruce** » Sun Mar 03, 2019 1:06 am

Once a minute isn't a good query interval. The FAHCore observes the progress periodically based on the checkpoint frequency -- which for GPUs is defined by the Project Owner -- and not at intervals based on wall-clock. Between those project-based query intervals, the FAHClient extrapolates an expected progress which may then be adjusted at the next checkpoint.

gordonbb · Post by **gordonbb** » Sun Mar 03, 2019 2:24 am

bruce wrote:Once a minute isn't a good query interval. The FAHCore observes the progress periodically based on the checkpoint frequency -- which for GPUs is defined by the Project Owner -- and not at intervals based on wall-clock. Between those project-based query intervals, the FAHClient extrapolates an expected progress which may then be adjusted at the next checkpoint.

What might be a good polling interval?

Here is a comparison of the 24 hour average PPD from the aggregate of all GPUs from 01:00 EST versus the EoC daily values.

01 4.11M 4,110,931
28 4.16M 4,048,053
27 4.11M 4,102,506
26 4.05M 3,946,091
25 4.10M 4,298,684
24 4.07M 4,082,931

If we average these values we get:
Av 4.10M 4,098,199

So while not digging into the MySQL data on the Zabbix server to get better Precision on the aggregate PPD it appears if the sample size is large enough we get a fairly good correspondence between the average and actual values.

Theodore · Post by **Theodore** » Sun Mar 03, 2019 5:10 pm

To limit a graphics card to the minimum power settings, gets you highest efficiency.

In a high performance system, like a mining rig, it is an excellent way to add more graphics cards to your setup(if you can); while still staying within the power limitations of the fuse (12A, 1500W);
Or to lower your power bill, while still getting the same amount of PPD (when combined with adding one or two extra cards).

Or one could lower their electric bill by almost 1/3rd, for a performance penalty of often just 10-25%.

As far as points go, usually I wait about 5% of a WU. It seems that FAH adjusts the rating, usually 0-1% underrating the card, 1-2% overrating the card, and 3% and up it has a good average estimate of the card's performance.
From there on it just fine adjusts the rating, and usually resets after 100% for the next work unit.
Sometimes I see batches that run much rated PPDs, usually WUs in project 14000, so I keep an eye out for those projects as well.
The 9000 projects (alzheimer, cancer) usually are very consistent in their PPD ratings, and I use them as a reference.

MeeLee · Post by **MeeLee** » Sun Mar 03, 2019 11:33 pm

When FAHControl starts out, it does start at zero, and quickly ramps up after the first percent.
When it finishes a WU, it keeps the score for the next WU.
If the card is running slower on the next WU, it'll throttle the speed down somewhat from that number, causing at least 3 to 4 percent of corrections, before getting an overall balanced value.

Once you have a balanced value for the WU, you can repeat the process over the next few WUs.
Most PPD scores are measured, when the PC is doing nothing, and nothing is displayed on the screen.
If you have an animated screen saver, or are doing things on the PC, it'll affect the overall score.

Post by **bruce** » Tue Mar 05, 2019 4:05 am

MeeLee wrote:When FAHControl starts out, it does start at zero, and quickly ramps up after the first percent.
When it finishes a WU, it keeps the score for the next WU.
If the card is running slower on the next WU, it'll throttle the speed down somewhat from that number, causing at least 3 to 4 percent of corrections, before getting an overall balanced value.

True. The first time a project is assigned to your machine, FAH has no reasonable way to guess how that project will fold. As has been suggested, you can pretty much ignore any reports until you've completed 5% at the expected performance level.

This is complicated somewhat by there being a setup computation before the FAHCore reports folding starting at 0%. Often that setup quite brief, but I have seen execptions where it can take more than something we can call "brief." That, too, depends on the project.

Folding Forum

Measuring Folding Efficiency

Measuring Folding Efficiency

Re: Measuring Folding Efficiency

Re: Measuring Folding Efficiency

Re: Measuring Folding Efficiency

Re: Measuring Folding Efficiency

Re: Measuring Folding Efficiency