Tiny projects killing PPD on big GPUs

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Shirty
Posts: 49
Joined: Thu Jul 11, 2019 10:19 pm

Tiny projects killing PPD on big GPUs

Post by Shirty »

I'm not sure this is the right subforum for this complaint/rant, or whether it's even valid, but here we go...

I keep getting tiny protein WUs assigned to my cards, which are absolutely destroying my PPD. I'm perfectly aware that different projects have different bonuses etc, but less than 700K PPD on my best cards (2080Ti) is just silly when I can hit nearly 3 million on a more suitable WU.

Case in point, I seem to have gotten a handful of P14184 WUs today. I am currently crunching one on a 2080Ti and I'm looking at under 890K. On the official release post it is even suggested that these will not be released to high-end GPUs due to the inefficiency.

Is there a mistake here or am I dong something wrong?

I love folding and am willing to take some hits, but the machines I have running 24/7 are easily capable of producing over 10M PPD and that figure keeps dropping by up to 40% due to these inappropriate WUs being assigned.
Last edited by Shirty on Wed Sep 11, 2019 8:11 am, edited 1 time in total.
Image
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: Tiny projects killing PPD on big GPUs

Post by MeeLee »

Currently FAH servers are being upgraded, causing Control not to show the bonus points.
Your bonus points are automatically awarded after each WU.
You can check your awarded team or personal points on the donor page:
https://stats.foldingathome.org/donor/
Or externally, via extreme overclocking:
https://folding.extremeoverclocking.com/
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Tiny projects killing PPD on big GPUs

Post by bruce »

You won't see a correction to FAHClient until a new version is released. You'll just have to live with the way it's reporting PPDs until then.
Shirty
Posts: 49
Joined: Thu Jul 11, 2019 10:19 pm

Re: Tiny projects killing PPD on big GPUs

Post by Shirty »

I am definitely getting the bonus points for these WUs, they just take an inordinately long time to complete for a very small bonus.

I checked the calculator over on LinuxForge for 14184 and with the TPF I was getting on the 2080ti of 43 seconds, I definitely got maximum points. 5000 atom WUs just aren't suitable for very powerful hardware as the benefits of extra shaders don't scale across such as small set of calculations.

I assume there was a reason for the note at the bottom of the announcement post, yet the assignment server still seems to be dishing these out to overpowered hardware.
Image
HayesK
Posts: 342
Joined: Sun Feb 22, 2009 4:23 pm
Hardware configuration: hardware folding: 24 GPUs (8-GTX980Ti, 2-GTX1060, 2-GTX1070, 6-1080Ti, 2-1660Ti, 4-2070 Super)
hardware idle: 7-2600K, 1-i7-950, 2-i7-930, 2-i7-920, 3-i7-860, 2-Q9450, 4-L5640, 1-Q9550, 2-GTS450, 2-GTX550Ti, 3-GTX560Ti, 3-GTX650Ti, 11-GTX650Ti-Boost, 4-GTX660Ti, 2-GTX670, 6-GTX750Ti, 7-GTX970
Location: La Porte, Texas

Re: Tiny projects killing PPD on big GPUs

Post by HayesK »

Make sure you have most recent version of the GPUs.txt file. Delete the existing copy and the client will download a new copy within a few minutes. In windows, GPUs.txt is located under the user in AppData\Roaming\FAHClient.
folding for OCF T32
<= 10-GPU ( 8-GTX980Ti, 2-RTX2070Super ) as HayesK =>
<= 24-GPU ( 3-650TiBoost, 1-660Ti, 3-750Ti, 1-960m, 4-970, 2-1060, 2-1070, 6-1080Ti, 2-1660Ti, 2-2070Super )
as HayesK_ALL_18SjyNbF8VdXaNAFCVfG4rAHUyvtdmoFvX =>
Shirty
Posts: 49
Joined: Thu Jul 11, 2019 10:19 pm

Re: Tiny projects killing PPD on big GPUs

Post by Shirty »

Definitely up to date. Currently got WUs from the same project (14184) folding on an RTX 2070 (582,632 ppd) and an RTX 2070 Super (614,699).

Something's not right, those are half what I'd normally see at a minimum. I didn't spend in excess of £3000 for 980Ti performance levels!
Last edited by Shirty on Wed Sep 11, 2019 8:11 am, edited 1 time in total.
Image
HayesK
Posts: 342
Joined: Sun Feb 22, 2009 4:23 pm
Hardware configuration: hardware folding: 24 GPUs (8-GTX980Ti, 2-GTX1060, 2-GTX1070, 6-1080Ti, 2-1660Ti, 4-2070 Super)
hardware idle: 7-2600K, 1-i7-950, 2-i7-930, 2-i7-920, 3-i7-860, 2-Q9450, 4-L5640, 1-Q9550, 2-GTS450, 2-GTX550Ti, 3-GTX560Ti, 3-GTX650Ti, 11-GTX650Ti-Boost, 4-GTX660Ti, 2-GTX670, 6-GTX750Ti, 7-GTX970
Location: La Porte, Texas

Re: Tiny projects killing PPD on big GPUs

Post by HayesK »

The RTX2080Ti and RTX2080 were reclassified to species 8 recently. Don't know about the 2070. If your log I'd showing species 7 for those cards and you have the newest GPUs.txt file (today's version), you may have to restart the client (reboot) to get the changes to affect the next assignment.
folding for OCF T32
<= 10-GPU ( 8-GTX980Ti, 2-RTX2070Super ) as HayesK =>
<= 24-GPU ( 3-650TiBoost, 1-660Ti, 3-750Ti, 1-960m, 4-970, 2-1060, 2-1070, 6-1080Ti, 2-1660Ti, 2-2070Super )
as HayesK_ALL_18SjyNbF8VdXaNAFCVfG4rAHUyvtdmoFvX =>
Shirty
Posts: 49
Joined: Thu Jul 11, 2019 10:19 pm

Re: Tiny projects killing PPD on big GPUs

Post by Shirty »

Uninstalled the client, rebooted and reinstalled it on my single 2080Ti rig. First project it downloaded? 14184. Same issue, 700k ppd.

It's finished that now and sitting at 2.7M ppd crunching a P14226 unit.

I'll probably just have to accept it, seems a shame though.
Last edited by Shirty on Wed Sep 11, 2019 8:11 am, edited 1 time in total.
Image
HayesK
Posts: 342
Joined: Sun Feb 22, 2009 4:23 pm
Hardware configuration: hardware folding: 24 GPUs (8-GTX980Ti, 2-GTX1060, 2-GTX1070, 6-1080Ti, 2-1660Ti, 4-2070 Super)
hardware idle: 7-2600K, 1-i7-950, 2-i7-930, 2-i7-920, 3-i7-860, 2-Q9450, 4-L5640, 1-Q9550, 2-GTS450, 2-GTX550Ti, 3-GTX560Ti, 3-GTX650Ti, 11-GTX650Ti-Boost, 4-GTX660Ti, 2-GTX670, 6-GTX750Ti, 7-GTX970
Location: La Porte, Texas

Re: Tiny projects killing PPD on big GPUs

Post by HayesK »

Have you confirmed what species the "system" section at the beginning of the logs shows for your 2070/2080Ti cards? Re-installing the client over existing files may not have replaced the GPUs.txt file. A few minutes ago, I deleted one of my GPUs.txt file and it took about 5 minutes for a new file to download. The client would still need to be restarted for the new file to be read.

Below is the portion of the GPUs.txt file for the TU102 and TU104 cards 2070/2080/2080Ti, all of which are identified as species 8 except for the 2080Ti Rev. A. Not sure why the Rev A card is still listed as species 7, but possible that the file needs an update. You could have a 2080Ti Rev A identified as species 7, but the 2070 should identify as species 8. Also possible the project constraint is not setup correctly.

Code: Select all

0x10de:0x1e02:2:8:TU102 [TITAN RTX]
0x10de:0x1e04:2:8:TU102 [GeForce RTX 2080 Ti] M 13448
0x10de:0x1e07:2:7:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448
0x10de:0x1e2d:2:7:TU102B
0x10de:0x1e2e:2:7:TU102B
0x10de:0x1e30:2:7:TU102GL [Quadro RTX 6000/8000]
0x10de:0x1e38:2:7:TU102GL
0x10de:0x1e3c:2:7:TU102GL
0x10de:0x1e3d:2:7:TU102GL
0x10de:0x1e3e:2:7:TU102GL
0x10de:0x1e81:2:8:TU104 [GeForce RTX 2080 Super]
0x10de:0x1e82:2:8:TU104 [GeForce RTX 2080]
0x10de:0x1e84:2:8:TU104 [GeForce RTX 2070 Super] 8218
0x10de:0x1e87:2:8:TU104 [GeForce RTX 2080 Rev. A] 10068
0x10de:0x1e90:2:7:TU104M [GeForce RTX 2080 Mobile]
Below is the system section of the log from one of my linux hosts showing the species as NVIDIA:8 for my 1080Ti cards.

Code: Select all

23:20:40:******************************* System ********************************
23:20:40:            CPU: Intel(R) Core(TM) i5-3470T CPU @ 2.90GHz
23:20:40:         CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
23:20:40:           CPUs: 4
23:20:40:         Memory: 15.61GiB
23:20:40:    Free Memory: 14.74GiB
23:20:40:        Threads: POSIX_THREADS
23:20:40:     OS Version: 4.15
23:20:40:    Has Battery: false
23:20:40:     On Battery: false
23:20:40:     UTC Offset: -5
23:20:40:            PID: 1338
23:20:40:            CWD: /var/lib/fahclient
23:20:40:             OS: Linux 4.15.0-60-generic x86_64
23:20:40:        OS Arch: AMD64
23:20:40:           GPUs: 2
23:20:40:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 GP102 [GeForce GTX 1080 Ti] 11380
23:20:40:          GPU 1: Bus:2 Slot:0 Func:0 NVIDIA:8 GP102 [GeForce GTX 1080 Ti] 11380
23:20:40:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:6.1 Driver:10.1
23:20:40:  CUDA Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:6.1 Driver:10.1
23:20:40:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:435.21
23:20:40:OpenCL Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:1.2 Driver:435.21
23:20:40:***********************************************************************
folding for OCF T32
<= 10-GPU ( 8-GTX980Ti, 2-RTX2070Super ) as HayesK =>
<= 24-GPU ( 3-650TiBoost, 1-660Ti, 3-750Ti, 1-960m, 4-970, 2-1060, 2-1070, 6-1080Ti, 2-1660Ti, 2-2070Super )
as HayesK_ALL_18SjyNbF8VdXaNAFCVfG4rAHUyvtdmoFvX =>
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Tiny projects killing PPD on big GPUs

Post by bruce »

P14184 has 5000 atoms and your RTX2070 has 2304 (or 2560) shaders so each shader gets to work on about 2 atoms ... which obviously won't keep it busy for very long before every atom gets a chance to update it's position relative to those other atoms that are nearby. I don't know for certain, but I'm guessing that the entire WU has to be moved across the PCIe bus and then moved back before the shader is given a chance to work on two atoms. It is impossible to keep that GPU busy with this project.

What is the PCIe speed driving your GPU?

One fundamental question: The list of projects that have WUs to assign varies much more dynamically that could ever be shown on psummary. I don't know a good way to answer a simple question: At the moment your FAHClient decided it needed to download a new WU, what other projects could have been assigned and how many atoms do they have? I sounds reasonable for a GPU with lots of shaders would need to be able to choose a WU with the largest number of atoms but I'm not aware that the assignment logic takes that into consideration.

(Atoms per shader) is an important performance factor and maybe that needs to become part of the assignment logic.
Shirty
Posts: 49
Joined: Thu Jul 11, 2019 10:19 pm

Re: Tiny projects killing PPD on big GPUs

Post by Shirty »

I re-did my GPUS.txt files to be certain yesterday, and rebooted the machines. I have snipped the GPUs as reported by the logs in my 4 rigs, and all bar the "old" 2070 are showing as species 8:

Code: Select all

GPU 0: Bus:2 Slot:0 Func:0 NVIDIA:8 TU104 [GeForce RTX 2070 Super] 8218
GPU 1: Bus:1 Slot:0 Func:0 NVIDIA:7 TU106 [GeForce RTX 2070] M 6497
GPU 0: Bus:10 Slot:0 Func:0 NVIDIA:8 TU104 [GeForce RTX 2070 Super] 8218
GPU 1: Bus:9 Slot:0 Func:0 NVIDIA:8 TU102 [GeForce RTX 2080 Ti Rev. A] M 13448
GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 GP102 [GeForce GTX 1080 Ti] 11380
GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 TU102 [GeForce RTX 2080 Ti] M
Yet I've still crunched a few more P14184 WUs since then on the most powerful cards.

Bruce, the dual-card rigs are running both cards at PCIe 3.0 x 8 and the singles are running at 3.0 x 16. They don't have any issues with more complex WUs, achieving exactly the sort of output I'd expect to see.

My money is on the project criteria not being set up correctly. It was obviously a consideration as it was mentioned in the announcement of the project, but it just doesn't seem to be filtering as it should.
Image
yunhui
Scientist
Posts: 147
Joined: Mon May 16, 2016 3:24 pm

Re: Tiny projects killing PPD on big GPUs

Post by yunhui »

Hi all,

p14184 is my project and I just found the original constraints I added for this project are gone. I don't know if they are removed by mistakes. But I added them back and now they should work as expected. I also did a check on p14185-14187 and they seem to be fine. Please let me know if you find the constraints fail again. Sorry about any inconvenience in this case.

Best,
Yunhui
Shirty
Posts: 49
Joined: Thu Jul 11, 2019 10:19 pm

Re: Tiny projects killing PPD on big GPUs

Post by Shirty »

Many thanks for looking into this so quickly, in person.

I'll let you know if I get any more WUs on RTX hardware.
Image
Shirty
Posts: 49
Joined: Thu Jul 11, 2019 10:19 pm

Re: Tiny projects killing PPD on big GPUs

Post by Shirty »

Just to follow this up (as is traditional on forums), I never got any more 14184s.

It is interesting, though, to note the huge variance in ppd on 2080Ti cards. I'm running a couple of them full time, and they love getting stuck into big atom count work units, often reaching nearly 3M ppd on the most complex jobs. Yet they still regularly get smaller WUs which literally halve their output.

I know the latest Titans, Radeon VII and 2080Ti chips are outliers, huge shader count models that are often prohibitively expensive and probably make up a small minority of GPUs working on F@H. But for those of us who have invested to help the cause, it would be wonderful if there was some way of feeding these beasts with the projects they work best on.

Of course, I don't claim to have the faintest idea of the logic used by the assignment/work servers, nor the availability of work units. Yet I can't help but feel that cards such as these would help speed up some of the more complex work if such work were biased toward them.
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Tiny projects killing PPD on big GPUs

Post by bruce »

GPUs are classified by GPUSpecies. The AS/WS uses that information to make assignments. The Project Owner can include/exclude a species from his project.

The species is shown in GPUs.txt but there's no way to tell which project(s) assign to which species. You stopped getting 14184 because somebody determined that the atom-count was too small for it to be efficient on your GPU so your species was excluded.

It's a cumbersome process, but we try to make it work as efficiently as we can.

I have not seen tabulated data saying WUs with atom-counts in this range should be assigned to GPUs with shader-counts in this range. The boundaries are solid, not flexible.
Thus if there are no WUs permitted for your GPU's species, the AS/WS won't give you one that's just a little smaller or just a little bigger.
Post Reply