Low GPU utilization

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

scm2000
Posts: 26
Joined: Sun Mar 15, 2020 12:13 am

Low GPU utilization

Post by scm2000 »

I had 3 GPUs in a system with a 2 core Celeron CPU...
It had reasonable performance.

I just added a 4th GPU and now all the GPU's are lucky to get slightly above 50% utilization.

I thought the CPU thread per core should not be doing much actual work , but is it the case I need a full CPU core per GPU?

Or is there some current issue with FAH GPU work units. As I see the latest software was supposed to address GPU utilization problems.
I installed that but it did not help GPU utilization.
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Low GPU utilization

Post by Neil-B »

Could you post you log including the top 100 lines or so with the configuration details - It may help identify what is happening.

A number of factors that might be playing into this include the OS, AMD or Nvidia, Age of GPUs, various configuration settings - once a clearer picture of what you setup is (log should provide that) any advice is likely to be more relevant :)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
scm2000
Posts: 26
Joined: Sun Mar 15, 2020 12:13 am

Re: Low GPU utilization

Post by scm2000 »

Neil-B wrote:Could you post you log including the top 100 lines or so with the configuration details - It may help identify what is happening.

A number of factors that might be playing into this include the OS, AMD or Nvidia, Age of GPUs, various configuration settings - once a clearer picture of what you setup is (log should provide that) any advice is likely to be more relevant :)

Code: Select all

*********************** Log Started 2020-05-12T21:02:43Z ***********************
21:02:43:Trying to access database...
21:02:44:Successfully acquired database lock
21:02:44:Read GPUs.txt
21:03:16:Enabled folding slot 01: READY gpu:0:GP107 [GeForce GTX 1050 Ti]  2138
21:03:16:Enabled folding slot 02: READY gpu:1:GK110 [Tesla K40m]
21:03:16:Enabled folding slot 03: READY gpu:2:GK110 [Tesla K40m]
21:03:16:Enabled folding slot 00: READY gpu:3:GP107 [GeForce GTX 1050 Ti]  2138
21:03:16:****************************** FAHClient ******************************
21:03:16:        Version: 7.6.13
21:03:16:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:03:16:      Copyright: 2020 foldingathome.org
21:03:16:       Homepage: https://foldingathome.org/
21:03:16:           Date: Apr 27 2020
21:03:16:           Time: 21:21:01
21:03:16:       Revision: 5a652817f46116b6e135503af97f18e094414e3b
21:03:16:         Branch: master
21:03:16:       Compiler: Visual C++ 2008
21:03:16:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:03:16:       Platform: win32 10
21:03:16:           Bits: 32
21:03:16:           Mode: Release
21:03:16:         Config: C:\Users\steph\AppData\Roaming\FAHClient\config.xml
21:03:16:******************************** CBang ********************************
21:03:16:           Date: Apr 24 2020
21:03:16:           Time: 17:07:55
21:03:16:       Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
21:03:16:         Branch: master
21:03:16:       Compiler: Visual C++ 2008
21:03:16:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:03:16:       Platform: win32 10
21:03:16:           Bits: 32
21:03:16:           Mode: Release
21:03:16:******************************* System ********************************
21:03:16:            CPU: Intel(R) Celeron(R) CPU G3930 @ 2.90GHz
21:03:16:         CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
21:03:16:           CPUs: 2
21:03:16:         Memory: 7.70GiB
21:03:16:    Free Memory: 5.40GiB
21:03:16:        Threads: WINDOWS_THREADS
21:03:16:     OS Version: 6.2
21:03:16:    Has Battery: false
21:03:16:     On Battery: false
21:03:16:     UTC Offset: -4
21:03:16:            PID: 8284
21:03:16:            CWD: C:\Users\steph\AppData\Roaming\FAHClient
21:03:16:  Win32 Service: false
21:03:16:             OS: Windows 10 Enterprise
21:03:16:        OS Arch: AMD64
21:03:16:           GPUs: 4
21:03:16:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:7 GP107 [GeForce GTX 1050 Ti] 2138
21:03:16:          GPU 1: Bus:3 Slot:0 Func:0 NVIDIA:3 GK110 [Tesla K40m]
21:03:16:          GPU 2: Bus:2 Slot:0 Func:0 NVIDIA:3 GK110 [Tesla K40m]
21:03:16:          GPU 3: Bus:4 Slot:0 Func:0 NVIDIA:7 GP107 [GeForce GTX 1050 Ti] 2138
21:03:16:  CUDA Device 0: Platform:0 Device:0 Bus:2 Slot:0 Compute:3.5 Driver:11.0
21:03:16:  CUDA Device 1: Platform:0 Device:1 Bus:3 Slot:0 Compute:3.5 Driver:11.0
21:03:16:  CUDA Device 2: Platform:0 Device:2 Bus:1 Slot:0 Compute:6.1 Driver:11.0
21:03:16:  CUDA Device 3: Platform:0 Device:3 Bus:4 Slot:0 Compute:6.1 Driver:11.0
21:03:16:OpenCL Device 0: Platform:0 Device:0 Bus:NA Slot:NA Compute:2.1 Driver:26.20
21:03:16:OpenCL Device 2: Platform:1 Device:0 Bus:2 Slot:0 Compute:1.2 Driver:445.87
21:03:16:OpenCL Device 3: Platform:1 Device:1 Bus:3 Slot:0 Compute:1.2 Driver:445.87
21:03:16:OpenCL Device 4: Platform:1 Device:2 Bus:1 Slot:0 Compute:1.2 Driver:445.87
21:03:16:OpenCL Device 5: Platform:1 Device:3 Bus:4 Slot:0 Compute:1.2 Driver:445.87
21:03:16:******************************* libFAH ********************************
21:03:16:           Date: Apr 15 2020
21:03:16:           Time: 14:53:14
21:03:16:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
21:03:16:         Branch: master
21:03:16:       Compiler: Visual C++ 2008
21:03:16:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:03:16:       Platform: win32 10
21:03:16:           Bits: 32
21:03:16:           Mode: Release
21:03:16:***********************************************************************
21:03:16:<config>
21:03:16:  <!-- Folding Slot Configuration -->
21:03:16:  <cause v='COVID_19'/>
21:03:16:
21:03:16:  <!-- HTTP Server -->
21:03:16:  <allow v='127.0.0.1 192.168.1.0/24'/>
21:03:16:
21:03:16:  <!-- Network -->
21:03:16:  <proxy v=':8080'/>
21:03:16:
21:03:16:  <!-- Remote Command Server -->
21:03:16:  <password v='*****'/>
21:03:16:
21:03:16:  <!-- Slot Control -->
21:03:16:  <power v='full'/>
21:03:16:
21:03:16:  <!-- User Information -->
21:03:16:  <passkey v='*****'/>
21:03:16:  <team v='41355'/>
21:03:16:  <user v='scm2000'/>
21:03:16:
21:03:16:  <!-- Folding Slots -->
21:03:16:  <slot id='1' type='GPU'/>
21:03:16:  <slot id='2' type='GPU'/>
21:03:16:  <slot id='3' type='GPU'/>
21:03:16:  <slot id='0' type='GPU'/>
21:03:16:</config>
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: Low GPU utilization

Post by JimboPalmer »

scm2000 it depends which GPU vendor.

AMD uses interrupts, and loads the CPUs fairly lightly, although I would still not recommend more GPUs than CPU threads.

Nvidia uses polled I/O and this fully utilizes a CPU thread per GPU. You will always have difficulty running more GPUs than CPU threads, some folders need more CPU threads then GPUs, they use their PCs.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
scm2000
Posts: 26
Joined: Sun Mar 15, 2020 12:13 am

Re: Low GPU utilization

Post by scm2000 »

Looks like I'm going to do a CPU upgrade then... because I have all NVIDIA GPUs
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Low GPU utilization

Post by Neil-B »

I'll be honest … not the groups of cards I expected given the CPU … Interesting collection :) … I'm not a full time GPU folder, but my gut instinct would be (until you CPU upgrade) to pause the 1050s and let the K40ms have a full CPU core each, check what PPD they are pushing out over a few WUs then un-pause one of the 1050s and test the impact.

Your Celeron as JimboPalmer said should only support two cards properly (as nvidia) but "should", and the realities of what actually happens can sometimes surprise - and you said you have had three running reasonably before - baselining "reasonable" with just the two cards then adding the third will soon tell you is it is worth it until such time as you CPU upgrade … but yes a CPU upgrade would make things easier/better.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
scm2000
Posts: 26
Joined: Sun Mar 15, 2020 12:13 am

Re: Low GPU utilization

Post by scm2000 »

Neil-B wrote:I'll be honest … not the groups of cards I expected given the CPU … Interesting collection :) … I'm not a full time GPU folder, but my gut instinct would be (until you CPU upgrade) to pause the 1050s and let the K40ms have a full CPU core each, check what PPD they are pushing out over a few WUs then un-pause one of the 1050s and test the impact.

Your Celeron as JimboPalmer said should only support two cards properly (as nvidia) but "should", and the realities of what actually happens can sometimes surprise - and you said you have had three running reasonably before - baselining "reasonable" with just the two cards then adding the third will soon tell you is it is worth it until such time as you CPU upgrade … but yes a CPU upgrade would make things easier/better.
I bought the motherboard used, the 2 core celeron came with it.. I've been adding a hodge podge of GPUs all the while suspecting I should upgrade the CPU.. so I guess now is the time.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Low GPU utilization

Post by bruce »

I would never expect a 2 core celeron of being able to supply enough data to 3 GPUs, let alone 4, to keep those GPUs busy. Second, what are the speeds of the PCIe slots?

Third, your [GeForce GTX 1050 Ti]s are respectable GPUs and should be able top produce nicely. The [Tesla K40m]s also pretty good. A lot is going to depend on the speed of the PCIe slot.
Last edited by bruce on Wed May 13, 2020 10:32 pm, edited 1 time in total.
Reason: Incorrect information has been corrected.
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Low GPU utilization

Post by Neil-B »

… and that (bruce's post) just goes to show I'm not a GPU folder :)

It does surprise me though that the Tesla K40ms are considered rather weak … I know their clocks are down in comparison to the 1050s but with the significantly larger shader count (x4 ish) and higher FLOPs performance (x2 ish) I'd have expected them to have been better - Techpowerup rate the 1050s 9% behind HD7970 relative performance and the K40ms 23% ahead of HD7970 but I suppose it comes down to what type of relative performance and how that equates to FAH loadings … as I said just shows how little I know about GPUs.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: Low GPU utilization

Post by _r2w_ben »

Neil-B wrote:… and that (bruce's post) just goes to show I'm not a GPU folder :)

It does surprise me though that the Tesla K40ms are considered rather weak … I know their clocks are down in comparison to the 1050s but with the significantly larger shader count (x4 ish) and higher FLOPs performance (x2 ish) I'd have expected them to have been better - Techpowerup rate the 1050s 9% behind HD7970 relative performance and the K40ms 23% ahead of HD7970 but I suppose it comes down to what type of relative performance and how that equates to FAH loadings … as I said just shows how little I know about GPUs.
Your assessment is probably closer to reality. K40m is in the same neighbourhood as a GTX 970. TechPowerUp reports peak FLOPS for the K40m as more than double a 1050 Ti. Combine that with QRB and it could be 3x the points per day!
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Low GPU utilization

Post by Neil-B »

My "assessment" was simply a paper one with no grounding in reality - hence why I happily defer to those who do GPU folding for real :)

The extent of my GPU folding is a Quadro K420 1GB and a Quadro M1000M 2GB that I run a WU through once in a blue moon just cause I can when I get bored and want to watch paint dry (they still make deadlines and occasionally Timeouts) and a GTX 750 Ti 2GB from my late father that I was going to toss in the recycling until in a moment of madness I ran a WU through it and found it actually gets pretty much the same ppd as my 24/56 core CPU slot - so I have left it running out of amusement and in his memory as I know he would have laughed about it :)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
scm2000
Posts: 26
Joined: Sun Mar 15, 2020 12:13 am

Re: Low GPU utilization

Post by scm2000 »

the absolute power of the GPUs I have are actually not important to me... I chose them for various reasons.

2 of them perform at about 100 percent utilization each with a 2 core CPU... Running 2 of them (a 1050ti and k40) for a while made me think they are on par with each other. If the Teslas are under-powered it's not a problem anyways.

Simply upgrading to to a 4 core CPU should get 4 GPU's back up to about 100 percent each based on my experience with 2 alone and the information that the CPU threads use polling.

I do have a question though. and that is why are NVIDIA GPUs polled and not interrupt driven? Am I to believe that NVIDIA does not know how to build GPUs with interrupt capability? Or write their SDK to sleep for an interrupt? So whats the real story here...?
Joe_H
Site Admin
Posts: 7868
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Low GPU utilization

Post by Joe_H »

scm2000 wrote:I do have a question though. and that is why are NVIDIA GPUs polled and not interrupt driven? Am I to believe that NVIDIA does not know how to build GPUs with interrupt capability? Or write their SDK to sleep for an interrupt? So whats the real story here...?
That is how nVidia wrote the driver code to handle OpenCL commands. From what I understand, CUDA commands are handled by interrupts.

There has been much conjecture on why they chose to do it that way. As far as I know, nVidia has not made any statement as to the reason.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: Low GPU utilization

Post by JimboPalmer »

scm2000 wrote:I do have a question though. and that is why are NVIDIA GPUs polled and not interrupt driven? Am I to believe that NVIDIA does not know how to build GPUs with interrupt capability? Or write their SDK to sleep for an interrupt? So whats the real story here...?
I do not have the answer, but I do have an opinion.

CUDA is a proprietary interface that locks you into Nvidia cards forever.

OpenCL is an open standard interface that can run anywhere.

How can Nvidia make CUDA look wildly more attractive than OpenCL while still supporting open standards?
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
scm2000
Posts: 26
Joined: Sun Mar 15, 2020 12:13 am

Re: Low GPU utilization

Post by scm2000 »

JimboPalmer wrote:
scm2000 wrote:I do have a question though. and that is why are NVIDIA GPUs polled and not interrupt driven? Am I to believe that NVIDIA does not know how to build GPUs with interrupt capability? Or write their SDK to sleep for an interrupt? So whats the real story here...?
I do not have the answer, but I do have an opinion.

CUDA is a proprietary interface that locks you into Nvidia cards forever.

OpenCL is an open standard interface that can run anywhere.

How can Nvidia make CUDA look wildly more attractive than OpenCL while still supporting open standards?
i have the cuda sdk, and 4 nvidia gpus, and happily writing code for them. not locked in to anything... if i buy an amd gpu i’ll use whatever sdk i need to to program it.
Post Reply