Nvidia vs AMD CPU usage

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

arisu
Posts: 92
Joined: Mon Feb 24, 2025 11:11 pm

Re: Nvidia vs AMD CPU usage

Post by arisu »

Joe_H wrote: Tue Mar 04, 2025 4:40 am From what I understand, the difference is in how Nvidia and AMD wrote their drivers. Nvidia's driver is doing a spin-wait looking for instructions to be processed and sent to the GPU. AMD from the explanations I have seen implemented this as an interrupt instead. As soon as something is handed off to the driver to process, it wakes up and takes CPU cycles to handle the request and then goes inactive until the next request. So the Nvidia driver process is always active, but the actual amount of work done by the CPU may be a fraction of the cycles available.
I believe I have the definitive answer about why we see this.

It turns out it's not the driver, just a choice that FAH made in their configuration of OpenMM. They overrode the default for UseBlockingSync and set it to false. This increases performance slightly but causes the CPU usage people report.

http://docs.openmm.org/latest/userguide ... a-platform
UseBlockingSync: This is used to control how the CUDA runtime synchronizes between the CPU and GPU. If this is set to “true” (the default), CUDA will allow the calling thread to sleep while the GPU is performing a computation, allowing the CPU to do other work. If it is set to “false”, CUDA will spin-lock while the GPU is working. Setting it to “false” can improve performance slightly, but also prevents the CPU from doing anything else while the GPU is working.
When the CPU sends data to the GPU, it calls cudaDeviceSynchronize() which will wait until the GPU has finished before it returns. The majority of the CPU's time will be spent in that function. That function will either use a spin wait loop or will yield the CPU and wait for an interrupt before returning, depending on if cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync) has been called or not. The UseBlockingSync feature in OpenMM just calls that function.

So why did FAH do this? I don't know. From reports, the performance improvement is very very slight. Maybe it was just set and forget? Maybe there are some systems out there where the performance improvement is big enough to make it worth while? I'll test this out when I get a new Nvidia and see if I can write a simple program that injects the blocking sync flag for a test WU.
calxalot
Site Moderator
Posts: 1375
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Nvidia vs AMD CPU usage

Post by calxalot »

I remember reading about attempts to reduce cpu usage. Usage could be reduced but also greatly reduced folding speed. I think I read 30% slower.

This was somewhere on the openmm.org site or the github repos for openmm.
arisu
Posts: 92
Joined: Mon Feb 24, 2025 11:11 pm

Re: Nvidia vs AMD CPU usage

Post by arisu »

The site only says "slightly", so I'm guessing it's on the repo. I'll look.

Edit: Yep it's in the repo, probably this one https://github.com/openmm/openmm/issues/2955

The perf decrease is negligible... except on Linux where it is 35% (and Windows if kept in balanced power mode). That explains why it was done. Linux runs on a lot of the most important folding boxes!
Joe_H
Site Admin
Posts: 8050
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: Nvidia vs AMD CPU usage

Post by Joe_H »

The last note on the issue, that it was not possible and to continue supporting OpenCL 1.2 may be a big part. Nvidia was slow to support any version of OpenCL beyond 1.2, and drivers released in the last few years including support for higher versions were not back ported to older generation GPUs.
Image
muziqaz
Posts: 1324
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Nvidia vs AMD CPU usage

Post by muziqaz »

arisu wrote: Thu Mar 13, 2025 3:31 am The site only says "slightly", so I'm guessing it's on the repo. I'll look.

Edit: Yep it's in the repo, probably this one https://github.com/openmm/openmm/issues/2955

The perf decrease is negligible... except on Linux where it is 35% (and Windows if kept in balanced power mode). That explains why it was done. Linux runs on a lot of the most important folding boxes!
The person who done these configs was nVidia employee at the time and one of the FAH founders. Their knowledge of nVidia hardware preferences beats anyone here or anywhere else.
The choices made could be considered best choices made for performance. Since that person now moved on from nVidia and CUDA fahcore is running great as is, I don't see anything changing too much in near future regarding this
FAH Omega tester
Image
arisu
Posts: 92
Joined: Mon Feb 24, 2025 11:11 pm

Re: Nvidia vs AMD CPU usage

Post by arisu »

Yep it looks like it. I've been reading up on potential improvements (not because I think I know better than the devs, but because I like to learn). I find that the ones that reduce CPU use without reducing GPU folding performance, like estimating the amount of time a certain kernel will take to complete, and sleeping for 90% of that duration so it only spins at the end, come with considerable engineering and testing challenges that wouldn't be worth it just to free up a single CPU core.
muziqaz
Posts: 1324
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Nvidia vs AMD CPU usage

Post by muziqaz »

arisu wrote: Thu Mar 13, 2025 6:21 am Yep it looks like it. I've been reading up on potential improvements (not because I think I know better than the devs, but because I like to learn). I find that the ones that reduce CPU use without reducing GPU folding performance, like estimating the amount of time a certain kernel will take to complete, and sleeping for 90% of that duration so it only spins at the end, come with considerable engineering and testing challenges that wouldn't be worth it just to free up a single CPU core.
You can try pushing PR to OpenMM GitHub, but remember, people who brought CUDA to the table, did not just do a single test and said: yeah, looks good enough. There was a lot of testing and nVidia's own labs involved in developing this stuff.
There are always compromises available in development (software or hardware).
FAH Omega tester
Image
Post Reply