GPU folding, CPU physical cores vs threads?

Skufer · Post by **Skufer** » Thu Sep 25, 2014 9:39 am

Hi all,

In a couple of months I will be building a new PC and I will be buying at least one GPU to use for folding. My question is about CPU threads; if I was just folding using the CPU I might choose to configure the number of available cores manually, if a CPU had four cores with two threads per core then this number should be 8.

I understand that you should set aside a core per GPU when folding with both a CPU and GPU(s) but would this be a thread (leaving 7 available cores as per the example above) or an actual physical core, i.e. configuring just six cores out of eight?

Your advice is much appreciated!

PS3EdOlkkola · Post by **PS3EdOlkkola** » Thu Sep 25, 2014 2:44 pm

Here are some conclusions I've reached based on experience running 27 GPU slots composed of both AMD (7970 and R9 290s) and Nvidia (GTX 680, 690, 780, 780ti) GPUs:

* Choose a motherboard that supports PCIe 3.0 (i.e. don't choose an AMD motherboard): The best performance on these GPUs is achieved using a PCI3 3.0 bus at either 8x or 16x. PCIe 2.0, even at 16X, will negatively impact PPD, from appx 4K PPD on the 7970's to 40K PPD on the 780ti. In general, the faster the GPU, the more it relies on the faster speed of the PCIe 3.0 bus interface to get maximum PPD.

* For one Nvidia or one AMD GPU, you'll need to assign both a "real" core and a virtual (hyper-threaded) core per GPU. In your configuration with an 8 core CPU (say, an i7-4790K) and one Nvidia GPU (GTX 780 and above) you'd assign 6 cores (three real and three virtual) to SMP folding and two cores to GPU folding.

* If you plan on having two Nvidia GPUs, then you'll need to further back-off on SMP folding and split the assignments evenly (4 and 4) between GPU and SMP. However, if you have two AMD GPUs, you can still split the assignments 6 and 2 between SMP and GPU, respectively.

* It gets a little more complicated when you get to three GPUs or more (I have 6 AMD GPUs in two different systems, each pushing over 1mm PPD), but if you plan to go that far there are some other variables to consider and deserves it's own separate discussion.

* The easiest way to perform CPU thread assignments is to use Process Lasso. It makes it very easy to assign specific processes to specific CPU threads. The key to getting maximum performance out of your GPU is to keep the real and virtual GPU threads as quiet as possible by assigning virtually every other process to the CPUs dedicated to the SMP assignment, except for the default processes marked as high priority (like windows manager, etc). You set the number of CPUs assigned to SMP in the FAHControl application, then tune that assignment to specific CPU cores using Process Lasso. The reason to keep as many applications as possible off the GPU is because even a small amount of latency has a meaningful impact on GPU PPD. I set the GPU process (the Core 17 process) with priority "normal", I/O priority as "high", exclude it from "ProBalance restraint" and classify the process as a "Game". Those settings will improve the PPD performance of the GPU. There is a free version of Process Lasso, but it's worth buying the software; it's very well supported and a nice application, in my opinion.

I hope that's helpful. Good luck!

Skufer · Post by **Skufer** » Thu Sep 25, 2014 2:56 pm

What a great answer! Thank you for taking the time to explain all that PS3EdOlkkola, I definitely understand the requirements of GPU folding much better now.

Post by **bruce** » Thu Sep 25, 2014 5:12 pm

The number of CPU threads (or cores) is mostly dependent on the way AMD or NV wrote the drivers -- so changing the driver version might alter the recommendations, but it's not likely.

Personally, I have not found any differences between having one free thread per GPU or having a free core per GPU (The FahCores for GPUs don't seem to use the FPU.) but PS3EdOlkkola has probably done more testing than I have so he's likely to be right. The latency (timing) before a service requests from the PCIe interface is becoming more and more important as new, faster GPUs are released and interrupting the processing of a WU running on the CPU does slow things down.

jrweiss · Post by **jrweiss** » Thu Sep 25, 2014 9:33 pm

My AMD 7750s (Cat 14.4) Fold just fine with a single thread available (CPU 7) on the 3770S or 4770K. I don't know if the higher-power cards need more CPU time, though...

PS3EdOlkkola · Post by **PS3EdOlkkola** » Fri Sep 26, 2014 12:09 am

@jrweiss: From my testing, it is generally true that performance (measured using PPD) is less affected when using low-to-mid range GPUs with a single thread. Higher-end AMD GPUs like the R9 290/295x perform better with both a real and virtual thread allocated to a Core17 process on the basis of the measurements I've done using total PPD as the yardstick. The first difference between using one and two threads is the amount of time it takes for a GPU work unit to be fully engaged by the GPU. Two threads gets the work unit running faster, which directly impacts PPD. Using Process Lasso, you can directly see how much CPU time is being used to get the GPU work unit running at full speed - it typically pegs the assigned CPU(s) for a good bit of time; more CPU horsepower shortens the time it takes to get the GPU work unit fully operational. The second difference is that allocating only a virtual or real core (assuming the other "half" of the CPU core is running an SMP work unit within the same CPU module), is the slight amount of context switching time it takes the CPU to service the GPU interrupt. As bruce said, the greater the latency, the more it will impact GPU performance, and hence PPD. You can see the impact of this when you run Afterburner and set the sample time to a second or less and watch the "GPU Usage %" line in the monitor. With one core assigned (and the other side of the core busy with an SMP work unit), you'll see a more dramatic fluctuation in GPU Usage % than if you dedicated both a real and virtual CPU to the Core17 process. The difference isn't dramatic, but between getting the GPU work unit operational faster and the reduced context switching, the impact to PPD approximates 8K to 10K. Admittedly, that's not huge, but if you want to the most out of your system, it does makes a difference. Nvidia GPUs are even more affected by CPUs allocated to Core17 with the PPD impacted as much as 20K. It really comes down to how much tuning and fiddling you want to do in order to extract the maximum amount of efficiency out of your system(s). I figure I pick up a total of about 300K PPD by optimizing my rigs using this approach, but I fully recognize most people don't want to be bothered with managing this level of detail.

Post by **bruce** » Fri Sep 26, 2014 6:01 am

Let's pretend that a real CPU can be considered as one FPU and two ALUs. [Those are old terms that are no longer precise but I'm going to use them anyway.] I think of a single SMP thread generally saturating the FPU and generally NOT saturating the ALU. A single thread driving data to/from a GPU will often saturate an ALU for a while but not all of the time.

A lot will depend on how pipelining handles a pair of unsaturated ALUs. Nevertheless, it seems like an idle ALU would be able to handle the context switching at nearly the same speed whether the FPU is busy or not, especially since we're assuming the other ALU is not saturated.

Have you compared the latency of two threads on the same CPU which is not processing SMP with two threads on different CPUs, half of which are processing SMP? {certainly the latter would be better for SMP by allocating an extra FPU which is a good thing unless there's a measurable difference for the GPUs.) If there's a significant difference, how should I alter this oversimplified explanation to better reflect reality? Would the explanation still apply to the various CPU architectures associated with either Intel or AMD?

Folding Forum

GPU folding, CPU physical cores vs threads?

GPU folding, CPU physical cores vs threads?

Re: GPU folding, CPU physical cores vs threads?

Re: GPU folding, CPU physical cores vs threads?

Re: GPU folding, CPU physical cores vs threads?

Re: GPU folding, CPU physical cores vs threads?

Re: GPU folding, CPU physical cores vs threads?

Re: GPU folding, CPU physical cores vs threads?