GPU utilization drops regularily

Moderators: Site Moderators, FAHC Science Team

Post Reply
foldinghomealone
Posts: 130
Joined: Wed Feb 01, 2017 7:07 pm

GPU utilization drops regularily

Post by foldinghomealone »

I'm using 7.4.4 and a GTX 1070 on Win10.
There are utilization drops which occur very often, like every few minutes. Is there a reason behind this?

I worry a bit about steep temp gradients which appear every few minutes on the GPU.
Is there a way I can stop this or at least make it happen less frequent? Like a config setting?

In case those drops are necessary - maybe because CPU only is used for some checkup - and it is not possible to continue GPU calculation during those checkups, I would rather prefer that you calculate some garbage with the GPU meanwhile to keep GPU utilization up and therefore temps at the same level.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU utilization drops regularily

Post by bruce »

There are two possible things that happen periodically:
* A FAHCore suspends processing long enough to write checkpoint data to disk (via whatever cache you have)
* A GPU FAHCore does a sanity check periodically to make sure the simulation hasn't encountered certain types of errors. It uses the CPU to check the results produced up to that point by the GPU.

For GPU cores (at least), the frequency of these events is set by the scientist in the configuration of the WU.
In my experience, they often seem to happen at the same time, but I'm not sure that's true in all cases.
foldinghomealone
Posts: 130
Joined: Wed Feb 01, 2017 7:07 pm

Re: GPU utilization drops regularily

Post by foldinghomealone »

Thanks again for your fast answer.

I doubt it is the checkpoint data which causes the GPU to stop for a 'long' time like this. I use an SSD and a few MB should be written very fast.

Why can't sanity checks be done in parallel to GPU folding? Why GPU folding has to be stopped?
I would be happy if sanity checks could be done in parallel to GPU folding, then I wouldn't need to worry about my GPU and processing of WUs would be even faster.
I haven't measured but I guess that the processing time of a medium sized WU could be reduced by a few minutes which would result in higher overall GPU yields.
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: GPU utilization drops regularily

Post by 7im »

GPU usage charts I have seen are square saw-toothed shape. Nature of the beast. How long is "long"? and how often? Except for the checkpoints every 5 frames, the temp shouldn't fluctuate much. What degree of temp changes are you seeing?
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: GPU utilization drops regularily

Post by foldy »

foldinghomealone wrote:I would rather prefer that you calculate some garbage with the GPU meanwhile to keep GPU utilization up and therefore temps at the same level.
I don't get it, what is the problem when GPU temps do not stay at the full load level during checkpoints?
ComputerGenie
Posts: 236
Joined: Mon Dec 12, 2016 4:06 am

Re: GPU utilization drops regularily

Post by ComputerGenie »

If you mean something like this:
Image
That's perfectly normal. :wink:
foldinghomealone
Posts: 130
Joined: Wed Feb 01, 2017 7:07 pm

Re: GPU utilization drops regularily

Post by foldinghomealone »

ComputerGenie wrote:That's perfectly normal. :wink:
That's a perfect waste of time

Currently I'm folding Project 9151 (7, 21, 607). Utilization drops every 4 frames for about 5 sec. Temp reduces about 14-18K.
https://ibb.co/d4Eu6F

On other WUs I can hear every time when a checkpoint is reached because temps drop much further and therefore the fans almost stop.
I would prefer constant temps for durability of GPU.

And I would prefer that whatever causes the utilization drops to do it parallel to CPU computing.
5secs every 4 frames means that the WU takes 2mins longer or around 2% than (in my opinion) necessary.

What is the reason to stop GPU computing to write checkpoints or make sanity checks?
Last edited by foldinghomealone on Mon Feb 20, 2017 9:51 pm, edited 1 time in total.
ComputerGenie
Posts: 236
Joined: Mon Dec 12, 2016 4:06 am

Re: GPU utilization drops regularily

Post by ComputerGenie »

foldinghomealone wrote:
ComputerGenie wrote:That's perfectly normal. :wink:
That's a perfect waste of time
...
As you can see, every project is going to be different. Just relax and let it do what it does. :wink:

P.S. - if you prefer the software to act differently than it's designed to act, then, perhaps, you should get on the team and get involved in a rewrite. :egeek:
ComputerGenie
Posts: 236
Joined: Mon Dec 12, 2016 4:06 am

Re: GPU utilization drops regularily

Post by ComputerGenie »

foldinghomealone wrote:...I would prefer constant temps for durability of GPU...
That statement doesn't match reality. Permanently sustained high temps lower durability and longevity.
foldinghomealone
Posts: 130
Joined: Wed Feb 01, 2017 7:07 pm

Re: GPU utilization drops regularily

Post by foldinghomealone »

ComputerGenie wrote:
foldinghomealone wrote:...I would prefer constant temps for durability of GPU...
That statement doesn't match reality. Permanently sustained high temps lower durability and longevity.

:roll:
foldinghomealone
Posts: 130
Joined: Wed Feb 01, 2017 7:07 pm

Re: GPU utilization drops regularily

Post by foldinghomealone »

ComputerGenie wrote:P.S. - if you prefer the software to act differently than it's designed to act, then, perhaps, you should get on the team and get involved in a rewrite. :egeek:
I don't demand things but I see room for optimization
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU utilization drops regularily

Post by bruce »

There's absolutely nothing unusual about the first and third GPU. It's perfectly reasonable to assume that the dips you see are repeated many times at equal intervals but outside of the field of view.

In the middle image, a periodic pattern is harder to discern. There is no reason to be bothered by a variatino in the height/width of individual pulses. If the cache is mostly empty, it will look different that if the cache i mostly full of data that needs to sync to disk. (i.e.- the third pulse from the left compared to the first two,)

What's important here is that the total time each GPU is waiting on the HardDisk adds up to almost nothing . (That's why running FAH on a SSD is only a little bit faster than running it on a HD.)
foldinghomealone
Posts: 130
Joined: Wed Feb 01, 2017 7:07 pm

Re: GPU utilization drops regularily

Post by foldinghomealone »

bruce wrote:What's important here is that the total time each GPU is waiting on the HardDisk adds up to almost nothing . (That's why running FAH on a SSD is only a little bit faster than running it on a HD.)
Still, each time almost nothing adds up to a >2% longer processing time for each WU.
For sure, 2% performance increase is nothing compared to waiting for next GPU generation.
Joe_H
Site Admin
Posts: 7867
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: GPU utilization drops regularily

Post by Joe_H »

foldinghomealone wrote: What is the reason to stop GPU computing to write checkpoints or make sanity checks?
To write a checkpoint, the data structures describing the current state of the WU being processed that is being written out needs to be in a consistent and static state. Continuing to compute would not allow that. My assumption is that the calculations for the sanity checks needs that same static, consistent set of data structures.

As for the necessity of doing sanity checks, that was found to be needed with the computational results on consumer level GPU cards. They are not optimized for stable numerical calculations like the "Pro" series of cards sold explicitly for that purpose.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply