Threadripper Performance

Moderators: Site Moderators, FAHC Science Team

Gleep
Posts: 13
Joined: Tue Dec 17, 2019 4:12 pm

Threadripper Performance

Post by Gleep »

Hi,

I recently upgraded to an AMD Threadripper 3960X, it has 24 physical cores and with SMT enabled 48 logical cores. Since the FAH A7 client is limited to 32 threads I configured 2 cpu slots, one with 32 threads and the other with 14 (I left 2 logical cores free for GPU folding/misc system stuff) . Digging around the forum it's unclear if this is the best configuration as I've seen multiple comments that 32 thread WUs are in short supply. Should I continue to use 32/14? Or switch to a more balanced configuration of 24/22? or 3 cpu slots of 16/16/14?
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: AMD 3960X

Post by bruce »

It's not possible to make a universal recommendation on how to configure CPUs with large numbers of threads. The number of CPU threads that can be processed by active WUs will vary over time, just as the number of atoms in active proteins varies. In general, keeping the numbers assigned to various slots of similar magnitude is a good idea. Currently there probably aren't any proteins that can use all 46.

Also, its best to avoid values that have larger prime factors. (e.g. I'd avoid 14 because GROMACS is more likely to have trouble with the prime factor 7. Numbers like 8, 9, 12, 16, ... tend to work well because all of the factors are 2 or 3.
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: AMD 3960X

Post by foldy »

With 24 physical core CPU using one cpu slot configured to 23 threads (leaving one physical core for GPU/System) is maybe the optimum for your CPU.
Joe_H
Site Admin
Posts: 7868
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: AMD 3960X

Post by Joe_H »

foldy wrote:With 24 physical core CPU using one cpu slot configured to 23 threads (leaving one physical core for GPU/System) is maybe the optimum for your CPU.
23 won't work, it is a large prime number. 22 and 21 are also out, 20 is about the highest integer below 24 that is usable with some projects supporting multiples of 5. The client set at numbers other than multiples of small primes will pick up WU's that use as many of the set number as possible though.

Personally I would set the client to use 24 threads, and let the HT threads be available to support everything else. My experience with the CPU A7 core was that I did not see much improvement from adding HT threads to the processing of WU's.

P.S. mentioning that the A7 core is limited to 32 threads implies you are running Windows. You don't run into that limit using Linux.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Gleep
Posts: 13
Joined: Tue Dec 17, 2019 4:12 pm

Re: AMD 3960X

Post by Gleep »

Thanks for the information everyone.

I'll test to see if AMD's SMT is of value or not, it's a different implementation than Intel's HT and may be of value. If I do see value in running more than 24 threads I'll experiment with running 3 or 4 cpu slots to find the sweet spot. I was unaware of GROMACS' problems with some thread counts and will avoid them.

Correct, I am running Windows. The primary software I use is available for Windows only, folding is what the hardware will do while idle. I would run linux if I could.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: AMD 3960X

Post by bruce »

Joe_H wrote:Personally I would set the client to use 24 threads, and let the HT threads be available to support everything else. My experience with the CPU A7 core was that I did not see much improvement from adding HT threads to the processing of WU's.
This is not new information. In the case of the early Intel Core CPUs, comparing the same system by enabling/disabling HT would double the number of logical threads but only increase throughput by 15 - 20%. It hasn't really changed much since then.

By using half of your threads and "leaving HT threads to do every else" you have to be sure that the half you enable are on independent CPUs rather than possibly on both halves of a single pair, sharing a single FPU. Some OSs know how to do that; some do not. It's a PITA if you have to manually set affinity manually.
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: AMD 3960X

Post by MeeLee »

When folding, on my Xeon, I noticed very little difference between folding at 20 threads, or 17. 17 or 18 threads fully saturated the CPU load on my 20thread Xeon.
Perhaps you'll see something similar on the threadrippers.

I would also check if you can get any higher turbo boost frequency, by lighten the load not to a full 100%.
In some cases, running fewer threads at a higher core frequency, might outperform more at a lower CPU frequency.

Personally, I would use as many cores as possible, and divide the CPU in as few slots as possible.
It would also be interesting to see, if disabling ht on these PCs makes them use less power per PPD...
Gleep
Posts: 13
Joined: Tue Dec 17, 2019 4:12 pm

Re: AMD 3960X

Post by Gleep »

I configured 4 slots, 16/16/8/6 and after 20 hours or so I checked the PPD and it was down quite a bit compared to 2 slots (32/14). The drop was 80k+ PPD.

Today I'm giving 25/21 a shot, giving bruce's suggestion of keeping the slots more balanced in thread count, while avoiding avoid large primes. Setting a slot to 22 threads gets a log entry saying it's dropping to 21 threads to avoid a large prime, so I figured 25/21 is probably better than 24/22.
MeeLee wrote:When folding, on my Xeon, I noticed very little difference between folding at 20 threads, or 17. 17 or 18 threads fully saturated the CPU load on my 20thread Xeon.
Perhaps you'll see something similar on the threadrippers.

I would also check if you can get any higher turbo boost frequency, by lighten the load not to a full 100%.
In some cases, running fewer threads at a higher core frequency, might outperform more at a lower CPU frequency.
When all cores are loaded I see 4ghz to 4.15ghz. While the frequency did go up with only 6 cores in use, the frequency increase was minimal. I think the frequency boost of Ryzen 3000 cpus is very different than current Intel cpus. Intel cpus seem to consult a table and based on the number of cores in use it will scale up or down (with temp causing down scaling if too high), and package power is calculated not measured. AMD boosting is more like GPU dynamic boosting were its monitoring temp, voltage, and package power and adjusting for max performance. Along those lines I've tried overclocking and haven't found the right balance to justify giving up the dynamic boost (which any manual settings causes the lose of).
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: AMD 3960X

Post by bruce »

22=2*1*11 and the prime factor of 11 rules out 22. 25=5*5*1 may or may not work. Sometimes FAH has trouble with 5 but certainly not always. 21=3*7*1 and 7 is slightly worse than 5.

Unfortunately nobody has found an absolute definition of "large prime" so you're in a gray area.
Metallus_teamLTT
Posts: 3
Joined: Mon Oct 07, 2019 1:09 pm

Re: AMD 3960X

Post by Metallus_teamLTT »

About HT:
I am running a 3900X since a few days now and still have to experiment with how much cores I assign. At the moment I have a 20 core a slot and the rest is for GPU/OS. Disabling HT and running a7 on 10 actual cores (leaving one for the GPU and one for the system), results in a PPD drop of about 25-30%, depending on project.

I know that’s less cores than on your TR, but this should be comparable because of the same architecture being used
Gleep
Posts: 13
Joined: Tue Dec 17, 2019 4:12 pm

Re: AMD 3960X

Post by Gleep »

Today I tried 2 cpu slots with 24 threads each and 1 gpu slot. This seems to be working well, the gpu slot is not suffering at all. I think with 48 logical processors the effect of not having a dedicated core for gpu is negligible.

So far 32/14 seems best, but has the risk of running dry on 32 thread compatible WUs. Slightly behind are 24/24, and 25/21 the difference from 32/14 being small. As mentioned before 16/16/8/4 was slower by quite a bit, probably 100k+ PPD.

For anyone wondering two cpu slots (24/24) at ~4ghz all core stock settings I get 580-600k PPD for the cpu slots.
petnek
Posts: 30
Joined: Wed Sep 02, 2009 9:30 pm
Hardware configuration: Asus 990FX Sabertooth, CPU Vishera FX-8350 not OC (Gelid Solution Tranquillo ), 2x4 GB DDR3 RAM 1600MHz (Kingston Blu), GPU Gigabyte HD6770, HDD Seagate Barracuda 7200.14 - 500GB + Samsung Spinpoint F1 320GB, FSP Raider 550W
Location: Czech Republic, Prague
Contact:

Re: AMD 3960X

Post by petnek »

Hello, why is prime number in cores count a problem?

Currently I'm running on Ryzen 5 2600, computing on 11 cores and 1 left for GPU, 61K PPD on CPU. Will try different config..

Edit: 6+4 cores on 2 slots is giving for 10k less PPD. Switching back to one slot with 11 cores.
Edit2: After switching back to 11 cores I see that WU started with parameter -nt 10. So its really terrified from primes :lol:
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: AMD 3960X

Post by bruce »

petnek wrote:Hello, why is prime number in cores count a problem?

Currently I'm running on Ryzen 5 2600, computing on 11 cores and 1 left for GPU, 61K PPD on CPU. Will try different config..
You might want to search for an answer on gromacs.org. I've never found a good answer to the prime number question. Perhaps it's because gromacs is designed and tested on dedicated hardware and nobody manufacturers a system with only 11 cores.

You might try a single CPU slot with 12 cores and let the GPU compete for resources. Per the previous post by Gleep, the GPU doesn't seem to suffer with 24 core + 24 core slots. YMMV.
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: AMD 3960X

Post by MeeLee »

I'm only assuming that 11 core processes have very few WUs on them, while 8 threads can get fully utilized using 8, 4, or 2 threaded WUs, by double/quadding up on smaller WUs.
I'm only speculating...
Gleep
Posts: 13
Joined: Tue Dec 17, 2019 4:12 pm

Re: AMD 3960X

Post by Gleep »

I installed linux and let it auto configure and run over the weekend. From a PPD perspective the difference is massive. I have 2 GPUs now so it set the CPU slot to use 46 threads (which the logs show it drop to 45, I assume because of the prime thread count weirdness). The PPD of the CPU slot is 1-1.1 million. Under Windows I was getting 600-650k, with some slot configs dropping to 500k (the four slot config).

Of course I have no idea if 45 threads is better overall for the science or if it risks the slot stalling due to WU shortage or whatever. I'm surprised to see a CPU above 1M PPD, although most of the points are from the bonus of returning WUs so quickly.

Unrelated to FAH, the current high core count offerings of AMD and Intel make it clear that Windows has some significant issues handling 30+ logical processors.
Post Reply