Plotting Overall Folding System Efficiency

gordonbb · Post by **gordonbb** » Fri Feb 01, 2019 6:38 pm

rwh202 wrote:
ProDigit wrote:This is only true when they're running from a PCIE 16x slot.
If they're running from a PCIE1x/4x slot, the power reading is incorrect, as it gets additional power from a riser.
In my case, both 1060s are using between 100-110W, deducted from a killawat meter; yet GPUZ reports 107W for one plugged in teh PCIE16x slot, and 63W for the one plugged in the PCIE1x slot.
I'd be surprised if risers affected anything (other than slowing the cards down and reducing power consumption)
The power sensors are on the card and don't care whether the power comes from the slot, the riser or 6/8-pin power connectors. However, I'm not sure if they are pre- or post-VRM and whether the losses there are included.
The difference between killawat and GPUZ is likely down to PSU efficiency.
107 + 63 = 170 W total for GPU. Assuming 85% PSU efficiency, then you have 200 W from the wall as shown by the killawat.

The power draw on the card includes the VRMs and the fans. I suspect that this is what the 0.005 Ohm shunt resistors that overclockers like to bridge are used for. Note that the overclockers typically only modify the shunt resistors for the External power connections and NOT the PCIe power leads as that would likely lead to melted traces and/or melted wires on the 24-pin motherboard connector and, in some cases, has been known to cause the main motherboard power connector to catch on fire.

Shunt Mods are not recommended in General and certainly not worth it for Folding as the gains would likely be minimal with the potential for killing or greatly diminishing the useful life of the card or motherboard.

Post by **toTOW** » Sat Feb 02, 2019 12:22 pm

I wonder why there are pikes in your curves ...maybe the power measurement happened just when a sanity check was done, which results in no GPU load, but also no change in PPD ...

Post by **bruce** » Sat Feb 02, 2019 9:48 pm

toTOW wrote:I wonder why there are pikes in your curves ...maybe the power measurement happened just when a sanity check was done, which results in no GPU load, but also no change in PPD ...

In most cases, the sanity check uses both GPU resources and CPU resources, comparing the results. The power spike is probably when the CPU kicks into high-gear.

Post by **toTOW** » Sat Feb 02, 2019 10:39 pm

Current sanity checks are not like we used to, they no longer use all CPU threads for a few seconds leaving the GPU sleep during this time. They just leave the GPU idle for a few seconds with no additional CPU load.

Here's an example I captured :

And I can clearly hear the laptop fans slow down and spin faster again ...

If gordonbb captured data at this time, he would see normal PPD but very low power draw ... hence a spike in efficiency.

Post by **bruce** » Sun Feb 03, 2019 1:05 am

Then how does the "reference platform" complete its calculations? {Presumably it replaces spin-wait CPU cycles with CPU Double Precision calculations which might or might not be observable with a change in CPU heating while the GPU is ldle. It also might look like the writing of a checkpoint.

gordonbb · Post by **gordonbb** » Sun Feb 03, 2019 1:59 am

toTOW wrote:I wonder why there are pikes in your curves ...maybe the power measurement happened just when a sanity check was done, which results in no GPU load, but also no change in PPD ...

I’m using SNMP to grab the output Power from the UPS and a call to FAHClient to read the PPD and though the interval is the same (1 minute) the two processes will invariably sample at different points within the window and, as Bruce noted, the GPU will drop in power if it’s being fed by the CPU.

I’ve tried to smooth things out by using a 3 minute sample average but there’s still spikes. For this metric though looking at the calculated average over at least a couple of days to smooth out variations in WUs I think will give the best indication off efficiency.

Once I have a better baseline I want to start lowering the power limits to see how The efficiency changes. I suspect due to the Quick Return Bonus though that once your under the top end knee that it may be fairly linear.

gordonbb · Post by **gordonbb** » Sun Feb 03, 2019 2:04 am

toTOW wrote:Current sanity checks are not like we used to, they no longer use all CPU threads for a few seconds leaving the GPU sleep during this time. They just leave the GPU idle for a few seconds with no additional CPU load.

Here's an example I captured :

And I can clearly hear the laptop fans slow down and spin faster again ...

If gordonbb captured data at this time, he would see normal PPD but very low power draw ... hence a spike in efficiency.

Interesting. I’m usually running nvidia-smi -q in a terminal window for each GPU and I’ve noticed these pauses where the GPU clocks drop every few minutes but I assumed these were the result of a new frame being transferred.

ProDigit · Post by **ProDigit** » Sun Feb 03, 2019 2:26 am

gordonbb wrote:
rwh202 wrote:
ProDigit wrote:This is only true when they're running from a PCIE 16x slot.
If they're running from a PCIE1x/4x slot, the power reading is incorrect, as it gets additional power from a riser.
In my case, both 1060s are using between 100-110W, deducted from a killawat meter; yet GPUZ reports 107W for one plugged in teh PCIE16x slot, and 63W for the one plugged in the PCIE1x slot.
I'd be surprised if risers affected anything (other than slowing the cards down and reducing power consumption)
The power sensors are on the card and don't care whether the power comes from the slot, the riser or 6/8-pin power connectors. However, I'm not sure if they are pre- or post-VRM and whether the losses there are included.
The difference between killawat and GPUZ is likely down to PSU efficiency.
107 + 63 = 170 W total for GPU. Assuming 85% PSU efficiency, then you have 200 W from the wall as shown by the killawat.
The power draw on the card includes the VRMs and the fans. I suspect that this is what the 0.005 Ohm shunt resistors that overclockers like to bridge are used for. Note that the overclockers typically only modify the shunt resistors for the External power connections and NOT the PCIe power leads as that would likely lead to melted traces and/or melted wires on the 24-pin motherboard connector and, in some cases, has been known to cause the main motherboard power connector to catch on fire.

Shunt Mods are not recommended in General and certainly not worth it for Folding as the gains would likely be minimal with the potential for killing or greatly diminishing the useful life of the card or motherboard.

I found out that the pcie riser cards, only provide 35W to the GPU, while the remaining power comes from the connector on the top.
In one of the cards, the readout was low, because the riser was plugged in the pcie voltage rail, and card was tapped off a Sata port.

I realized when I switched the connectors, the power consumption went up to 80Watts, and performance increased.

ProDigit · Post by **ProDigit** » Sun Feb 03, 2019 2:35 am

The dips in power draw are less than 1 second on my system, and usually decrease in time (width) with lower RAM overclock.
I doubt they're responsible for the efficiency spikes.
I think PPD in FAHcontrol updates every so many percents of a percent. The spikes might indicate times when fah updates estimated PPD, because it just finished processing 1% of a WU, or something...
I mean, it's another possible explanation for the spikes.

Post by **toTOW** » Sun Feb 03, 2019 1:48 pm

bruce wrote:Then how does the "reference platform" complete its calculations? {Presumably it replaces spin-wait CPU cycles with CPU Double Precision calculations which might or might not be observable with a change in CPU heating while the GPU is ldle. It also might look like the writing of a checkpoint.

Good question ... I think that since we moved to use mixed precision calculations, the portion of sanity check made on the CPU has been reduced : either it is partly done on the GPU using double precision or the amount of checks needed have been reduced.

I don't know enough of OpenMM code to conclude.

foldy · Post by **foldy** » Mon Feb 04, 2019 9:02 am

@ProDigit: Sata only delivers 54 watts, PCIe or Molex can do over 100 watts. Some say it has more risk to run riser from SATA compared to Molex because it burns more easily. So they use a Molex to 6pin connector where 6pin goes into riser and Molex connects to power supply Molex.
https://i.imgur.com/Xg2wvF1.png

Folding Forum

Plotting Overall Folding System Efficiency

Re: Plotting Overall Folding System Efficiency

Re: Plotting Overall Folding System Efficiency

Re: Plotting Overall Folding System Efficiency

Re: Plotting Overall Folding System Efficiency

Re: Plotting Overall Folding System Efficiency

Re: Plotting Overall Folding System Efficiency

Re: Plotting Overall Folding System Efficiency

Re: Plotting Overall Folding System Efficiency

Re: Plotting Overall Folding System Efficiency

Re: Plotting Overall Folding System Efficiency

Re: Plotting Overall Folding System Efficiency