Page 3 of 3

Re: Folding@Home RAM usage grows until exceeded, then PC cra

Posted: Thu Nov 14, 2019 11:04 pm
by Dark_Vera
MeeLee wrote:
Dark_Vera wrote:
bruce wrote:A lot depends on whether you run Linux or windows. My main machine can boot eithe Windows or Linux (I'm not at home today, so I can't verify every detail.) It has a GTX960 and a GtX750Ti plus an 8-way CPU. I commonly run 3 slots (including 6 virtual CPU cores dedicate to FAH plus often a browser. Those 3 slots generally push me into the paging range. If I stop one slot, there's no paging.

The monitor runs off of the GTX750Ti. In Windows, the screen lags appreciably, but works fine if I pause the WU that's running on the same GPU as the desktop. If I pause the CPU slot or the other CPU slot, there's still a lag, so it's not paging that's causing the screen lag, it's the limitation of sharing the GTX750Ti. If I pause that slot, the browser works fine.

If I switch Windows to CPU rendering, it doesn't help, which surprises me. I guess that's a paging issue. If I could add a 3rd (slow) GPU and dedicate it to the Windows Desktop, it might work fine. (M/B has no more slots except the 1x PCI and I haven't figured out how to use that yet.)

On Linux, I don't notice the same limitations. Then, too, Linux gets better PPD.
I've learned from researching other forums and threads that F@H (and certain BOINC apps) handle CPU resource allocation differently in Linux than under Windows. The main observation has been that F@H REQUIRES 1 CPU core per GPU folding slot, yet under Linux a GPU folding slot requires a fraction of a core (proven also by my rig discussed earlier in this thread, until paging begins).

Additionally, GPU BOINC apps and Folding have far higher performance under Linux than in Windows, which is even more apparent when the GPUs are bottlenecked with PCIe X1 slots (versus the recommended X4, X8 or X16 speeds). My GPUs, even when running one slot at a time under Windows, would choke, as I'm running a K37 mining board with all slots restricted to PCIe X1 speeds. Under Linux, I get roughly 2.5 times the PPD per GPU.

Overall, Linux somehow squeezes superior performance out of GPUs throttled at the PCIe lane level while also using significantly less CPU when running multiple GPUs - a proven fact seen throughout this thread and also from dozens of other discussions online.
I'd like to correct that,
For full speed results, both Windows and Linux, require one core per GPU, if the GPU is Nvidia.
The difference between Nvidia and AMD GPUs, is that both GPUs (should) use about the same CPU load, when the GPUs are similar in performance.
However the CPU for AMD GPUs will show the CPU load, while NVIDIA drivers will fill CPU passive time with idle data.
That being said, if you have a 4Ghz CPU, you could easily share a CPU core with 2x RTX 2060 or 2070 GPUs; since the CPU's idle data can easily be allocated to the second GPU, just like on AMD.
The difference is that now each GPU runs a bit slower; just like AMD with the AMD drivers. The idle data Nvidia drivers send over the CPU, is actually helping the GPU for higher performance.
If however, you have more GPUs than CPU cores, and your CPU is fast enough, you can split one CPU core with 2 GPUS (or 3 GPUs on 1 CPU core that supports hyperthreading).
The phenomenon '1GPU per CPU core' is true for Nvidia drivers, and is true for both Windows and Linux.
You may be correct regarding "1 CPU core per 1 GPU slot" for Nvidia - unfortunately I've never really folded with Nvidia cards before, so I can't contribute much. All of my folding history has been using AMD cards coupled with Intel processors, running Windows or Linux.

Using Windows 10, the rig that I'm using now (Celeron + 8X RX460s) was getting crushed and throttled to the bone, presumably by CPU usage (100% usage all day long when running 8 GPU slots at once). I did observe the "1 core for 1 GPU" rule in effect on Windows 10, to deleterious effects.

I'm running the same PC now with Archlinux and from the beginning (when it wasn't paging) CPU spiked at a maximum of 60% for 8 GPUs folding on the "Full" preset. Now that I've seemingly resolved my RAM leak issues and paging is no longer happening, my CPU use has maxed out at 32%.

So we might be able to conclude that Windows "forces" 1 CPU core per 1 GPU slot whereas Archlinux does not, and is therefore far more efficient with CPU management where Folding is concerned.

Re: Folding@Home RAM usage grows until exceeded, then PC cra

Posted: Fri Nov 15, 2019 5:24 pm
by bruce
The difference is probably simply one associted with the sales goals of the GPU manufacturers.

If you're a Windows gamer, interested only in the frame rate that can be maintained by your GPU, the NVidia philosophy is to your benefit. When the GPU isn't needed to calculate a screen update, it's also reasonable to assume the CPUs don't have anything to do except wait for the next screen update. By using the spin-wait concept, the CPU is able to start processing the next screen update within a couple of instructions. The AMD philosophy requires an interrupt to be processes and the state of the OS to be evaluated before the attention of the CPU can be assigned to a free CPU (or if they're all busy doing something else, the lowest priority task can be suspended before processing the next screen update can begin.

Sure, this is contrary to the interests of folks like you who are primarily interested in FAH -- but NVidia knows that interest in FAH doesn't sell as many GPUs as interest in games does.

Re: Folding@Home RAM usage grows until exceeded, then PC cra

Posted: Thu Nov 28, 2019 1:11 am
by MeeLee
So we might be able to conclude that Windows "forces" 1 CPU core per 1 GPU slot whereas Archlinux does not, and is therefore far more efficient with CPU management where Folding is concerned.
Not really.
Nvidia drivers are the cause of a full CPU thread load, even on Linux.
Has nothing to do with Windows.
Windows overall is just less efficient over the PCIE bus than Linux.
But the 100% CPU core utilization comes from Nvidia drivers.

Like I mentioned before, the '1 GPU per core' can be easily bypassed.
Just forcing 3 or 4 nVidia GPUs to work on 1 or 2 cores 'can' be done.
If the GPUs aren't too fast, and the CPU can keep up, you'll only notice a mild performance penalty; and in terms of performance, the Nvidia drivers should operate exactly the same as AMD's.

A 4Ghz CPU can easily run 2 RTX 2060s per core, as an RTX 2060 really only needs a single core CPU running at 1,8Ghz frequency.
Using this system, Nvidia will automatically disable their core thread filling; and taskmanager (variants, like htop,...) will show you truly how much CPU core activity those 2 gpus cause.

If you force 4 faster GPUs (eg: 2080s) on a slower CPU (eg: a 3Ghz dual core), the CPU will become the bottleneck, and performance plummets.