Trouble With More Than 10 GPUs

Moderators: Site Moderators, FAHC Science Team

Trouble With More Than 10 GPUs

Postby Funkomancer » Fri Jul 06, 2018 1:29 pm

I have a rig with 13 GPUs running.
However, it seems that work units from GPU slots 10,11, and 12 are being run on GPU 1 (in addition to the work unit already attempting to run on slot 1)
This essentially means that four work units are attempting to run on GPU1 and nothing is actually getting done on that GPU.

This is what my Folding Slots section of config.xml looked like initially:
Code: Select all
<slot id='0' type='GPU'/>
<slot id='1' type='GPU'/>
<slot id='2' type='GPU'/>
<slot id='3' type='GPU'/>
<slot id='4' type='GPU'/>
<slot id='5' type='GPU'/>
<slot id='6' type='GPU'/>
<slot id='7' type='GPU'/>
<slot id='8' type='GPU'/>
<slot id='9' type='GPU'/>
<slot id='10' type='GPU'/>
<slot id='11' type='GPU'/>
<slot id='12' type='GPU'/>


I had attempted to specify the gpu-, cuda-, and opencl-indices manually for each slot, as in:
Code: Select all
<slot id='0' type='GPU'>
  <gpu-index v='0'/>
  <opencl-index v='0'/>
  <cuda-index v='0'/>
</slot>
<slot id='1' type='GPU'>
  <gpu-index v='1'/>
  <opencl-index v='1'/>
  <cuda-index v='1'/>
</slot>
...
<slot id='12' type='GPU'>
  <gpu-index v='12'/>
  <opencl-index v='12'/>
  <cuda-index v='12'/>
</slot>

There was no change.
Right now I've removed slots 10-12 from config.xml in order to get slot 1 back in action, but I still have 3 GPUs just sitting around doing nothing other than looking pretty.

Is this a bug?
Have I messed something up in config.xml?
Any advice would be greatly appreciated.

I've tested this on Linux (Ubuntu Server 18.04 LTS) primarily.
However, Windows 10 Pro (Build 17134) has similar issues.
Windows was worse by far as the automatic gpu-index wasn't always in sequence with the slot id, and so, without specifying a gpu-index it was hard to tell what WU was running on what GPU.
Funkomancer
 
Posts: 4
Joined: Fri Jul 06, 2018 12:33 pm

Re: Trouble With More Than 10 GPUs

Postby bruce » Fri Jul 06, 2018 6:52 pm

1) I'll ask about the potential limitation of 10 GPUs. (Sorry about not knowing the answer. You're the first to ask the question.)
2) The gpus are detected in the order FAH detects them electrically which is not necessarily related to the way you detect them visually.
3) What does FAHBench tell you? ... or GPU-Z or something similar for Linux?

If they're various part-numbers, you can get some help from FAHBench and from the identities listed below the log segment you posted.,

<sarcasm mode on>
It's called folding@HOME for a reason.

I don' have a system like that here@home so I can't test it for you. Neither do any of the beta testers or any of the FAH Consortium members either, as far as I know.
Please ship your system to me and I'll see what we can do.
I'll probably have to upgrade my Air Conditioner. It's already struggling.
<sarcasm mode off>
bruce
 
Posts: 19854
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: Trouble With More Than 10 GPUs

Postby Funkomancer » Sat Jul 07, 2018 12:04 am

1) I thought I might be. I can usually find solutions to my issues with some Google-fu, but no luck in this case.

2) Yup, I think I was trying to make a point but failed at it. What I was trying to say (but didn't) is that if this is indeed a bug and not a configuration issue, it's most likely to do with the gpu-index. I think it may only be expecting a single digit and is taking the first one i.e. theoretically, gpu-index 1,10-19,100-199, etc. would run on GPU 1; gpu-index 2,20-29,200-299,etc. would run on GPU 2; and so on. That's my initial thought as I, in my past, have accidentally written scripts that expect single digits and have broken in similar ways when presented with multiple digits.

3) I'm not entirely sure what FAHBench is (other than a benchmark for FAH) or how to use it, but I'll look into it and get back to you. Is there anything specific I should be reporting on from FAHBench or a GPU-Z like program that I have yet to find on Linux? Is this output from nvidia-smi sufficient?
Code: Select all
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.67                 Driver Version: 390.67                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:06:00.0 Off |                  N/A |
| 80%   82C    P2   122W / 151W |    131MiB /  8119MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1070    Off  | 00000000:07:00.0 Off |                  N/A |
| 48%   74C    P2   146W / 151W |    442MiB /  8119MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 1070    Off  | 00000000:08:00.0 Off |                  N/A |
| 51%   73C    P2   140W / 151W |    113MiB /  8119MiB |     96%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 1070    Off  | 00000000:09:00.0 Off |                  N/A |
| 80%   82C    P2   120W / 151W |    123MiB /  8119MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   4  GeForce GTX 1070    Off  | 00000000:0A:00.0 Off |                  N/A |
| 79%   80C    P2   144W / 151W |    113MiB /  8119MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   5  GeForce GTX 1070    Off  | 00000000:0D:00.0 Off |                  N/A |
| 77%   81C    P2   153W / 151W |    123MiB /  8119MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   6  GeForce GTX 1070    Off  | 00000000:0E:00.0 Off |                  N/A |
| 80%   82C    P2   104W / 151W |    123MiB /  8119MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   7  GeForce GTX 1070    Off  | 00000000:0F:00.0 Off |                  N/A |
| 69%   78C    P2   142W / 151W |    123MiB /  8119MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   8  GeForce GTX 1070    Off  | 00000000:10:00.0 Off |                  N/A |
| 76%   82C    P2   151W / 151W |    123MiB /  8119MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   9  GeForce GTX 1070    Off  | 00000000:11:00.0 Off |                  N/A |
| 80%   82C    P2   113W / 151W |    113MiB /  8119MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|  10  GeForce GTX 1070    Off  | 00000000:12:00.0 Off |                  N/A |
|  0%   40C    P8     9W / 151W |     10MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  11  GeForce GTX 1070    Off  | 00000000:13:00.0 Off |                  N/A |
|  0%   37C    P8     8W / 151W |     10MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  12  GeForce GTX 1070    Off  | 00000000:14:00.0 Off |                  N/A |
|  0%   37C    P8     8W / 151W |     10MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2949      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   121MiB |
|    1      2905      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   103MiB |
|    1      3230      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   103MiB |
|    1      3262      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   113MiB |
|    1      3294      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   113MiB |
|    2      2956      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   103MiB |
|    3      2898      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   113MiB |
|    4      2912      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   103MiB |
|    5      2935      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   113MiB |
|    6      2928      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   113MiB |
|    7      2942      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   113MiB |
|    8      2963      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   113MiB |
|    9      2919      C   ...D64/NVIDIA/Fermi/Core_21.fah/FahCore_21   103MiB |
+-----------------------------------------------------------------------------+


4) She's a beast all right! I have to leave the garage door open during the day and it's still sweltering in there. It's basically a mining rig that's been zazzed up to be FAH capable. I can't send her over but I am willing to help beta test if that'll help.
Funkomancer
 
Posts: 4
Joined: Fri Jul 06, 2018 12:33 pm

Re: Trouble With More Than 10 GPUs

Postby bruce » Sat Jul 07, 2018 1:54 am

Funkomancer wrote:Have I messed something up in config.xml?
Any advice would be greatly appreciated.


You might want to open an enhancment request here: https://github.com/FoldingAtHome/fah-issues/issues

As I suggested above, the development team (and the beta testers) haven't ever seriously though about how to deal with anything that complex.
bruce
 
Posts: 19854
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: Trouble With More Than 10 GPUs

Postby Yavanius » Sat Jul 07, 2018 3:36 am

It might attract some BOINCers. The super power users are always complaining their systems are underutilized... *only* 20 cores being used and only *10* GPUs.... ;)

I think some of these folks live AT work, so @home is @work.

Funko, somebody will drive to your home and pick up the machine for me...err, I mean Bruce (grins at Bruce)
User avatar
Yavanius
 
Posts: 121
Joined: Thu Nov 03, 2016 5:55 am
Location: 92408

Re: Trouble With More Than 10 GPUs

Postby foldy » Sat Jul 07, 2018 9:29 am

Just another thought: With so many GPUs I guess you run them on x1 risers, so you must use Linux as Windows bottlenecks too much. Each GPU needs a CPU thread to feed it, so depending on your CPU even 10 GPUs could be too much and your CPU bottleneck.
foldy
 
Posts: 1976
Joined: Sat Dec 01, 2012 4:43 pm

Re: Trouble With More Than 10 GPUs

Postby Funkomancer » Sat Jul 07, 2018 9:39 am

bruce: Thanks, I've just posted on there.

Yavanius: Oh that's great! As long as someone is coming to picking it up-- wait, I forgot to put the sarcasm filter on :P
Funkomancer
 
Posts: 4
Joined: Fri Jul 06, 2018 12:33 pm

Re: Trouble With More Than 10 GPUs

Postby Funkomancer » Sat Jul 07, 2018 9:51 am

foldy: It's got an 8 core/16 threads CPU to work with. CPU utilization has some headroom (27%-ish) and GPU utilization is fairly high (high 90%'s on the GPUs that are fed) so I don't think that's necessarily the issue, but I appreciate the thought. Thanks.
Funkomancer
 
Posts: 4
Joined: Fri Jul 06, 2018 12:33 pm

Re: Trouble With More Than 10 GPUs

Postby bruce » Sun Jul 08, 2018 4:26 am

@foldy: You and I expect that the GPU will be the factor limiting throughput.

No matter how many GPUs you have, if the PCIe subsystem gets saturated, the GPUs may be starved for data at times and that may become a limiting factor. Similarly, if the CPUs have to manage data for more than one GPU, that can also reduce utilization.

Neither you or I would be happy buying an expensive GPUs that isn't ~99% busy, but the fact is, N underutilized GPUs still do more folding that (N-1) GPUs that are somewhat less underutilized.

Recommendations for complex systems don't match the recommendations for smaller @home systems.
bruce
 
Posts: 19854
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: Trouble With More Than 10 GPUs

Postby foldy » Sun Jul 08, 2018 9:14 am

That CPU is fine, GPU at high 90% is perfect for folding and for pcie on x1 risers must use Linux.
foldy
 
Posts: 1976
Joined: Sat Dec 01, 2012 4:43 pm

Re: Trouble With More Than 10 GPUs

Postby SteveWillis » Sun Jul 08, 2018 4:02 pm

My GPUs constantly vary between 89-99%, mostly in low to mid 90s (Linux)
Image

1080 and 1080TI GPUs on Linux Mint
SteveWillis
 
Posts: 409
Joined: Fri Apr 15, 2016 1:42 am


Return to V7.5.1 Public Release Windows/Linux/MacOS X

Who is online

Users browsing this forum: No registered users and 2 guests

cron