Trouble With More Than 10 GPUs

Moderators: Site Moderators, FAHC Science Team

Post Reply
Funkomancer
Posts: 4
Joined: Fri Jul 06, 2018 11:33 am

Trouble With More Than 10 GPUs

Post by Funkomancer »

I have a rig with 13 GPUs running.
However, it seems that work units from GPU slots 10,11, and 12 are being run on GPU 1 (in addition to the work unit already attempting to run on slot 1)
This essentially means that four work units are attempting to run on GPU1 and nothing is actually getting done on that GPU.

This is what my Folding Slots section of config.xml looked like initially:

Code: Select all

<slot id='0' type='GPU'/>
<slot id='1' type='GPU'/>
<slot id='2' type='GPU'/>
<slot id='3' type='GPU'/>
<slot id='4' type='GPU'/>
<slot id='5' type='GPU'/>
<slot id='6' type='GPU'/>
<slot id='7' type='GPU'/>
<slot id='8' type='GPU'/>
<slot id='9' type='GPU'/>
<slot id='10' type='GPU'/>
<slot id='11' type='GPU'/>
<slot id='12' type='GPU'/>
I had attempted to specify the gpu-, cuda-, and opencl-indices manually for each slot, as in:

Code: Select all

<slot id='0' type='GPU'>
  <gpu-index v='0'/>
  <opencl-index v='0'/>
  <cuda-index v='0'/>
</slot>
<slot id='1' type='GPU'>
  <gpu-index v='1'/>
  <opencl-index v='1'/>
  <cuda-index v='1'/>
</slot>
...
<slot id='12' type='GPU'>
  <gpu-index v='12'/>
  <opencl-index v='12'/>
  <cuda-index v='12'/>
</slot>
There was no change.
Right now I've removed slots 10-12 from config.xml in order to get slot 1 back in action, but I still have 3 GPUs just sitting around doing nothing other than looking pretty.

Is this a bug?
Have I messed something up in config.xml?
Any advice would be greatly appreciated.

I've tested this on Linux (Ubuntu Server 18.04 LTS) primarily.
However, Windows 10 Pro (Build 17134) has similar issues.
Windows was worse by far as the automatic gpu-index wasn't always in sequence with the slot id, and so, without specifying a gpu-index it was hard to tell what WU was running on what GPU.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Trouble With More Than 10 GPUs

Post by bruce »

1) I'll ask about the potential limitation of 10 GPUs. (Sorry about not knowing the answer. You're the first to ask the question.)
2) The gpus are detected in the order FAH detects them electrically which is not necessarily related to the way you detect them visually.
3) What does FAHBench tell you? ... or GPU-Z or something similar for Linux?

If they're various part-numbers, you can get some help from FAHBench and from the identities listed below the log segment you posted.,

<sarcasm mode on>
It's called folding@HOME for a reason.

I don' have a system like that here@home so I can't test it for you. Neither do any of the beta testers or any of the FAH Consortium members either, as far as I know.
Please ship your system to me and I'll see what we can do.
I'll probably have to upgrade my Air Conditioner. It's already struggling.
<sarcasm mode off>
Funkomancer
Posts: 4
Joined: Fri Jul 06, 2018 11:33 am

Re: Trouble With More Than 10 GPUs

Post by Funkomancer »

1) I thought I might be. I can usually find solutions to my issues with some Google-fu, but no luck in this case.

2) Yup, I think I was trying to make a point but failed at it. What I was trying to say (but didn't) is that if this is indeed a bug and not a configuration issue, it's most likely to do with the gpu-index. I think it may only be expecting a single digit and is taking the first one i.e. theoretically, gpu-index 1,10-19,100-199, etc. would run on GPU 1; gpu-index 2,20-29,200-299,etc. would run on GPU 2; and so on. That's my initial thought as I, in my past, have accidentally written scripts that expect single digits and have broken in similar ways when presented with multiple digits.

3) I'm not entirely sure what FAHBench is (other than a benchmark for FAH) or how to use it, but I'll look into it and get back to you. Is there anything specific I should be reporting on from FAHBench or a GPU-Z like program that I have yet to find on Linux? Is this output from nvidia-smi sufficient?

Code: Select all

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.67                 Driver Version: 390.67                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:06:00.0 Off |                  N/A |
| 80%   82C    P2   122W / 151W |    131MiB /  8119MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1070    Off  | 00000000:07:00.0 Off |                  N/A |
| 48%   74C    P2   146W / 151W |    442MiB /  8119MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 1070    Off  | 00000000:08:00.0 Off |                  N/A |
| 51%   73C    P2   140W / 151W |    113MiB /  8119MiB |     96%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 1070    Off  | 00000000:09:00.0 Off |                  N/A |
| 80%   82C    P2   120W / 151W |    123MiB /  8119MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   4  GeForce GTX 1070    Off  | 00000000:0A:00.0 Off |                  N/A |
| 79%   80C    P2   144W / 151W |    113MiB /  8119MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   5  GeForce GTX 1070    Off  | 00000000:0D:00.0 Off |                  N/A |
| 77%   81C    P2   153W / 151W |    123MiB /  8119MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   6  GeForce GTX 1070    Off  | 00000000:0E:00.0 Off |                  N/A |
| 80%   82C    P2   104W / 151W |    123MiB /  8119MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   7  GeForce GTX 1070    Off  | 00000000:0F:00.0 Off |                  N/A |
| 69%   78C    P2   142W / 151W |    123MiB /  8119MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   8  GeForce GTX 1070    Off  | 00000000:10:00.0 Off |                  N/A |
| 76%   82C    P2   151W / 151W |    123MiB /  8119MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   9  GeForce GTX 1070    Off  | 00000000:11:00.0 Off |                  N/A |
| 80%   82C    P2   113W / 151W |    113MiB /  8119MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|  10  GeForce GTX 1070    Off  | 00000000:12:00.0 Off |                  N/A |
|  0%   40C    P8     9W / 151W |     10MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  11  GeForce GTX 1070    Off  | 00000000:13:00.0 Off |                  N/A |
|  0%   37C    P8     8W / 151W |     10MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  12  GeForce GTX 1070    Off  | 00000000:14:00.0 Off |                  N/A |
|  0%   37C    P8     8W / 151W |     10MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2949      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   121MiB |
|    1      2905      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   103MiB |
|    1      3230      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   103MiB |
|    1      3262      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   113MiB |
|    1      3294      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   113MiB |
|    2      2956      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   103MiB |
|    3      2898      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   113MiB |
|    4      2912      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   103MiB |
|    5      2935      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   113MiB |
|    6      2928      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   113MiB |
|    7      2942      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   113MiB |
|    8      2963      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   113MiB |
|    9      2919      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   103MiB |
+-----------------------------------------------------------------------------+
4) She's a beast all right! I have to leave the garage door open during the day and it's still sweltering in there. It's basically a mining rig that's been zazzed up to be FAH capable. I can't send her over but I am willing to help beta test if that'll help.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Trouble With More Than 10 GPUs

Post by bruce »

Funkomancer wrote:Have I messed something up in config.xml?
Any advice would be greatly appreciated.
You might want to open an enhancment request here: https://github.com/FoldingAtHome/fah-issues/issues

As I suggested above, the development team (and the beta testers) haven't ever seriously though about how to deal with anything that complex.
Yavanius
Posts: 121
Joined: Thu Nov 03, 2016 4:55 am
Location: 92408

Re: Trouble With More Than 10 GPUs

Post by Yavanius »

It might attract some BOINCers. The super power users are always complaining their systems are underutilized... *only* 20 cores being used and only *10* GPUs.... ;)

I think some of these folks live AT work, so @home is @work.

Funko, somebody will drive to your home and pick up the machine for me...err, I mean Bruce (grins at Bruce)
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: Trouble With More Than 10 GPUs

Post by foldy »

Just another thought: With so many GPUs I guess you run them on x1 risers, so you must use Linux as Windows bottlenecks too much. Each GPU needs a CPU thread to feed it, so depending on your CPU even 10 GPUs could be too much and your CPU bottleneck.
Funkomancer
Posts: 4
Joined: Fri Jul 06, 2018 11:33 am

Re: Trouble With More Than 10 GPUs

Post by Funkomancer »

bruce: Thanks, I've just posted on there.

Yavanius: Oh that's great! As long as someone is coming to picking it up-- wait, I forgot to put the sarcasm filter on :P
Funkomancer
Posts: 4
Joined: Fri Jul 06, 2018 11:33 am

Re: Trouble With More Than 10 GPUs

Post by Funkomancer »

foldy: It's got an 8 core/16 threads CPU to work with. CPU utilization has some headroom (27%-ish) and GPU utilization is fairly high (high 90%'s on the GPUs that are fed) so I don't think that's necessarily the issue, but I appreciate the thought. Thanks.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Trouble With More Than 10 GPUs

Post by bruce »

@foldy: You and I expect that the GPU will be the factor limiting throughput.

No matter how many GPUs you have, if the PCIe subsystem gets saturated, the GPUs may be starved for data at times and that may become a limiting factor. Similarly, if the CPUs have to manage data for more than one GPU, that can also reduce utilization.

Neither you or I would be happy buying an expensive GPUs that isn't ~99% busy, but the fact is, N underutilized GPUs still do more folding that (N-1) GPUs that are somewhat less underutilized.

Recommendations for complex systems don't match the recommendations for smaller @home systems.
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: Trouble With More Than 10 GPUs

Post by foldy »

That CPU is fine, GPU at high 90% is perfect for folding and for pcie on x1 risers must use Linux.
SteveWillis
Posts: 409
Joined: Fri Apr 15, 2016 12:42 am
Hardware configuration: PC 1:
Linux Mint 17.3
three gtx 1080 GPUs One on a powered header
Motherboard = [MB-AM3-AS-SB-990FXR2] qty 1 Asus Sabertooth 990FX(+59.99)
CPU = [CPU-AM3-FX-8320BR] qty 1 AMD FX 8320 Eight Core 3.5GHz(+41.99)

PC2:
Linux Mint 18
Open air case
Motherboard: ASUS Crosshair V Formula-Z AM3+ AMD 990FX SATA 6Gb/s USB 3.0 ATX AMD
AMD FD6300WMHKBOX FX-6300 6-Core Processor Black Edition with Cooler Master Hyper 212 EVO - CPU Cooler with 120mm PWM Fan
three gtx 1080,
one gtx 1080 TI on a powered header

Re: Trouble With More Than 10 GPUs

Post by SteveWillis »

My GPUs constantly vary between 89-99%, mostly in low to mid 90s (Linux)
Image

1080 and 1080TI GPUs on Linux Mint
Post Reply