Page 1 of 2

core 0x17 GPU usage

Posted: Fri Sep 26, 2014 8:55 pm
by Breach
Hi,

Have a question re core 0x17. I'm playing with my new graphics card and noticed that GPU utilization never goes beyond 90% when folding which is a bit strange - I'd expect it to make full use of the GPU. I have a full physical (+1 a HT one) core dedicated at 4.6 Ghz so I don't think this has anything to do with being CPU starved (also confirmed by the fact I have 10% going to system idle process in task manager). So I'm wondering whether this [GPU being loaded up to 90% only] is a general thing or card specific and why that is.

Thanks

Re: core 0x17 GPU usage

Posted: Sat Sep 27, 2014 3:01 am
by Zagen30
It isn't specific to your card. I just took a look at the usage of my 780s and they're also hovering in the 90% range. I don't fold on my CPU, so it's definitely not due to the GPUs being starved for CPU time.

I wouldn't worry about it. It's probably just the cores not being able to scale perfectly with shader count.

Re: core 0x17 GPU usage

Posted: Sat Sep 27, 2014 3:48 am
by Rel25917
The core can max out a titan or 780ti, not sure what is holding yours back.

Re: core 0x17 GPU usage

Posted: Sat Sep 27, 2014 5:25 am
by Zagen30
What project? I admit to not having checked GPU usage in a long time, so the only data I have is what's running now, which have been from 9201.

Re: core 0x17 GPU usage

Posted: Sat Sep 27, 2014 5:49 am
by Rel25917
All projects that I've seen so far.

Re: core 0x17 GPU usage

Posted: Sat Sep 27, 2014 6:13 am
by bruce
It's false to assume your CPU is maxed out only when it's reporting 100%. Your CPU has to spend some time reading/writing disk data, switching tasks, reading/writing memory, etc. A good disk cache can hide most of the time your CPU has to wait for the disk, but there still will be times when the CPU is not at 100%. Sometimes it has to wait for the internet or for Keyboard/Mouse interactions, too.

Similarly, it's false to assume that a GPU is only maxed-out when it reports 100% utilization. Some time has to be spent moving data through the PCIe bus and the data being processed isn't always an exact multiple of the number of shaders. so when data is delivered to the GPU, some data blocks will be smaller than the number of free shaders. At certain points in the calculations, data must be synchronized before the next type of processing can begin. [ETC.}

Re: core 0x17 GPU usage

Posted: Sat Sep 27, 2014 11:16 am
by Breach
Thanks. Yes, it does look a bottleneck somewhere. So shaders count? With core 0x15 my utilisation is 99% so it does seem to also have to do with the way core 0x17 works.

Re: core 0x17 GPU usage

Posted: Sat Sep 27, 2014 5:42 pm
by bruce
If you want to reduce everything to a single number, then look only at the shader count. My point is that you can't reduce everything to a single number.

Of course there is a bottlleneck but you're assuming that somehow the shaders will always be your bottleneck and nothing else matters. In fact, the real bottleneck is always a mixture of all of the delays from everything that might saturate for some fraction of the time you're processing a WU. Reducing any of those delays will speed up production in proportion to how those pieces, both big and small, add up to the total delay.

In your case, 90% of the delay is due to the limitations of your shaders and 10% of your delay is due to something else. Some people would attempt to reduce the shader delay by overclocking, but that won't help the other 10%. Assuming the overclock is stable, both throughput and the percentage of non-shader delays will increase. Replacing PCIe 2.0 with PCIe 3.0 will reduce the non-shader delays and increase the percentage of delays that are attributable to the shaders in proportion to what fraction of the time is spent waiting on PCIe transfers.

Re: core 0x17 GPU usage

Posted: Sat Sep 27, 2014 10:08 pm
by Breach
Thanks. I haven't assumed anything - my observation is that not 1%, but 10% of my GPU's core is not used which is a shame. So if indeed (most) of these 10% is caused by a shaders bottleneck this should not be an issue (or at least not to such an extent) on 980 cards, right? Which would mean that 970 cards are at a disadvantage regardless of how high they can clock. That's rather an academic point of course since in the grand scheme of things it's the bottom line performance we get...

By the way I'm on PCI-E 3.0 16x so GPU-to-bus latencies should be minimal. With the core/shaders clock shared now I can't overclock shaders alone.

Re: core 0x17 GPU usage

Posted: Mon Jan 26, 2015 2:13 pm
by CBT
As I moved my GTX980 (EVGA SC ACX2) from my old Core2Quad Q6600@2.4/PCIe 2.0 to my new i7-4790K@4.0/PCIe 3.0, the GPU usage (as shown by EVGA PrecisionX) went up from 82% to 88-89%.
To be honest, I had hoped for a better result. ;-) :-(

Re: core 0x17 GPU usage

Posted: Tue Jan 27, 2015 4:44 pm
by JimF
If you want a higher GPU usage, get a GTX 750 Ti. I have two on a Haswell MB, running at 98% for both Core_17 and Core_18 (each fed by a virtual core of an i7-4770).
On the other hand, if you want higher output, stick with your GTX 980.

The real issue insofar as I am concerned is PPD per dollar (and per watt, depending on your energy costs, green-ness, etc.). The GTX 980 is nothing to complain about, all things considered. And as Bruce points out, there will always be a bottleneck somewhere. You have just found the bottleneck du jour.

Re: core 0x17 GPU usage

Posted: Tue Jan 27, 2015 5:49 pm
by bruce
Breach wrote:Thanks. I haven't assumed anything - my observation is that not 1%, but 10% of my GPU's core is not used which is a shame. So if indeed (most) of these 10% is caused by a shaders bottleneck this should not be an issue (or at least not to such an extent) on 980 cards, right?
You missed my point. If 10% of the time you shaders are not busy, then 90% of the time the result has not been completed is because you're waiting on the shaders -- which is what I called "shader delay" You're asking for 100% of the delays should be caused because there aren't enough shaders or they're not fast enough (and they never have to wait for ANYTHING ELSE). Your shaders can't process data that has not arrived yet and whenever the shaders complete a work segment, some time is spend transferring data (PCIe delay) and some time (CPU delay) is spent while the CPU processes the data it has received and prepares the next work segment. You're saying those pieces add up to 10%.

(I'm assuming there are only three categories and the only reason the WU has not been complete falls into one of those three. When more than one thing is busy it has to be listed under one of those three categories.) Also, during the processing of each WU, there are several different types of work segments. Some will be delayed more by one thing and some will be delayed by something else and by only looking at one number, you're taking some kind of weighted average.

CBT reduced his PCIe delay and CPU delay and a bigger percentage was spent waiting for the GPU (from 82% to 88-89%). JimF suggests that if you use a slower GPU without changing other delays a bigger percentage of the time will be spent waiting on the shaders.

Re: core 0x17 GPU usage

Posted: Fri Jan 30, 2015 10:03 am
by CBT
FYI, By switching off the CPU client (even though it was already configured for 7 cores), I managed to get the '%GPU usage' to 91%.
I'm not sure if the i7 turbo modus kicks in at this point. If not, there may be another improvement in that point.
How can I tell if turbo is active and at which frequency that core runs?

This now leads to a higher PPD from the GPU, but I miss the PPDs from the CPU. I'm not sure yet which combination is best (for a core_17). Since I seem to keep receiving Core_15 WU's, where the equation is quite different, I certainly don't want to miss out on the PPDs from the CPU client for now.

Corné

Re: core 0x17 GPU usage

Posted: Fri Jan 30, 2015 11:08 am
by ChristianVirtual
@CBT: try CPU:6 instead ... That might keep your level up for GPU and still add CPU-points. Or even just CPU:4

Re: core 0x17 GPU usage

Posted: Fri Jan 30, 2015 1:40 pm
by CBT
I've been thinking along the same lines. I might even try CPU:3, to make sure a 'complete' core is available, not just a HT module. My GPU now has a Core_15, so I can't test for the moment.

Btw. why doesn't core_15 receive any bonus points?

Corné