Increasing no. of cores and diminishing returns

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
Post Reply
user123
Posts: 23
Joined: Wed Jan 11, 2012 1:20 pm

Increasing no. of cores and diminishing returns

Post by user123 »

I found this online article. It may explain why for GPUs, the doubling of Streaming Multiprocessor (SM) count does not always double performance..

http://www.extremetech.com/computing/11 ... ll-stuck/2
Note the interesting illustration in the above article mentioning that as the no. of cores increases, performance improves disproportionally before eventually leveling off.
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Increasing no. of cores and diminishing returns

Post by 7im »

The only thing you can be referring to is the doubling of shaders going from the Fermi architecture to the Kepler architecture. And why that did not double performance has been explained many times over. Google it so I don't have to repeat it again here.

And this article misses the point completely. Everyday software has yet to catch up with today's multicore processors. That's not the fault of the CPU makers.

And this doesn't apply to fah in the least. The SMP client is well known to scale in performance in a linear manner up to 64 cores and beyond. That puts fah on that green line in that chart in the article. Which means that until the consumer desktop CPUs start shipping with 256 cores, (not any time soon) the article is very premature in its predictions in regards to fah.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: Increasing no. of cores and diminishing returns

Post by JimboPalmer »

The article talks about 'software' but there are only individual programs. Some scale very well, some do not. I wrote a payroll program that scaled to over 16 CPUs (in theory it could scale to CPU = The number of employees) as no part of your check interacts with anyone elses check. Most lack of scalability has to due with tight interaction. Multi-core programming is going to be about breaking interaction.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Increasing no. of cores and diminishing returns

Post by PantherX »

Do note that F@H uses the FPU (Floating Point Unit) heavily. Thus, depending on the actual specifications of the processor, not the marketing adverts, your performance could vary significantly, for example you got a system with 4 CPUs:
1) 2 Cores with 4 Threads
2) 4 Cores with 4 Threads

In this case, processor 2 would be significantly faster than processor 1 since it has twice the number of FPUs (4 VS 2). The virtual core may increase the performance of F@H between 0% to 25%, depending on the type of WU assigned. Thus, when it comes to F@H, you should always focus on the actual number of FPUs (real cores) present and not the total number of CPUs/Threads since they may mislead you. For example a processor with 6 Cores with 6 Threads is faster than a processor with 4 Cores with 8 Threads.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Increasing no. of cores and diminishing returns

Post by bruce »

7im wrote:And this doesn't apply to fah in the least. The SMP client is well known to scale in performance in a linear manner up to 64 cores and beyond. That puts fah on that green line in that chart in the article. Which means that until the consumer desktop CPUs start shipping with 256 cores, (not any time soon) the article is very premature in its predictions in regards to fah.
This is accurate ... almost. The programmers at Gromacs and at OpenMM have gone to great lengths to create code that is hightly parallelizable -- sufficient to keep all your cores busy, but there's still going to be some serial code. The speed at which a WU progresses from 0% to 100% will essentially (i.e.-almost) double if you can give it twice as many FPUs but the time between reaching 100% of one WU while it packs up the data for upload and while it initializes the next WU (up to when you see the next 0% message) is still serial, and can't be doubled.

The other serial segment is internal to Gromacs -- namely synchronizing threads from each CPU. When there are a large number of atoms, that's insignificant, but a very small protein (measured by atom count) can become less efficient with a really large number of cores. If JimboPalmer's company has 256 employees and they happen have a 256-core CPU, There is zero advantage to processing all the checks concurrently since it still takes time initialize the program and they can't all be transmitted to the bank concurrently nor can the printer process all the checks concurrently. Admittedly, that's an unrealistically extreme case, but it does explain why those efficiency curves tend to flatten out at the top. Fortunately, the proteins that FAH analyzes tend to have a lot more atoms than your hardware has CPU-cores so FAH is decidedly in the best part of the curve.
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: Increasing no. of cores and diminishing returns

Post by JimboPalmer »

bruce wrote:If JimboPalmer's company has 256 employees and they happen have a 256-core CPU, There is zero advantage to processing all the checks concurrently since it still takes time initialize the program and they can't all be transmitted to the bank concurrently nor can the printer process all the checks concurrently. Admittedly, that's an unrealistically extreme case, but it does explain why those efficiency curves tend to flatten out at the top. Fortunately, the proteins that FAH analyzes tend to have a lot more atoms than your hardware has CPU-cores so FAH is decidedly in the best part of the curve.
The sole internal shared logic is in reducing the contents of the companies payroll account, a single subtraction that only one employee can be allowed to do at a time. And the program must wait for ALL employees to finish before terminating, so if some employees have a very complex job history, They might delay completion out of proportion to their percentage of the workforce. In serial mode the program took 22 hours, with 16 threads handling 1600 employees, it finished in 90 minutes.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Increasing no. of cores and diminishing returns

Post by 7im »

Complex tasks with larger data sets is where multiple processors can really shine, just like fah.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Increasing no. of cores and diminishing returns

Post by bruce »

Right. You brought up a different example than I had in mind but it's a good one. Working backward, the best possible scenario is that 22 hrs of serial work divided equally across 16 threads might take as little as 82.5 minutes. With it actually taking 90 minutes, we can say it's really good code because there's only 7.5 minutes of extra overhead plus the remaining serial operations. With half the number of employees, The 82.5 minutes might become 41.3 plus almost all of the 7.5 for a total of ~49 minutes which is not quite double the speed.

I also like your example for another reason. 1600 employees distributed over 16 threads means an average of 100 employees per thread. FAH's proteins currently have between 250 and 22000 atoms. The number I like to use to estimate an ideal number of processors is 100 or more serial operations per thread, meaning FAHCore_a5 at 22000 atoms could still be improved almost linearly by adding more CPUs. It also means that the proteins that are being being tested on FAHCore_17 at 16000-17000 atoms are very happy with today's high-end GPUs. Conversely, the proteins with only a few hundred atoms are too small to use either Core_a5 or Core_17 efficiently. I'd say that 100 atoms per thread is near the point where the curve flattens out pretty significantly. (It's just a rule-of-thumb anyway.)

Anyway, the the whole point is that for the right kind of operations more cores can be beneficial, but for other types, less will be gained. FAH is the right kind of operation for the range of hardware that we have today.
user123
Posts: 23
Joined: Wed Jan 11, 2012 1:20 pm

Re: Increasing no. of cores and diminishing returns

Post by user123 »

Sounds like there's no need to worry about diminishing returns with increasing CPU thread count in FAH in the near future. :)

Generally, the atom count in SMP work units is increasing right?
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Increasing no. of cores and diminishing returns

Post by bruce »

user123 wrote:Generally, the atom count in SMP work units is increasing right?
In general terms, probably. In more accurate terms, not rapidly and not if the science can be evaluated with fewer atoms.

For implicit solvent solutions, there has been a tendency to identify simple proteins whenever possible. For explicit solvent models, the protein is enclosed in a box of water molecules and while a larger box would contain more atoms and those atoms add up, there's no need to wonder what a larger box of water will do.
user123
Posts: 23
Joined: Wed Jan 11, 2012 1:20 pm

Re: Increasing no. of cores and diminishing returns

Post by user123 »

(Sorry for a late detailed reply)
There appears to be a misunderstanding.
7im wrote: The only thing you can be referring to is the doubling of shaders going from the Fermi architecture to the Kepler architecture. And why that did not double performance has been explained many times over. Google it so I don't have to repeat it again here.

And this article misses the point completely. Everyday software has yet to catch up with today's multicore processors. That's not the fault of the CPU makers.

And this doesn't apply to fah in the least. The SMP client is well known to scale in performance in a linear manner up to 64 cores and beyond. That puts fah on that green line in that chart in the article. Which means that until the consumer desktop CPUs start shipping with 256 cores, (not any time soon) the article is very premature in its predictions in regards to fah.
I wasn't referring to the doubling of shaders going from Fermi to Kepler architecture.
I was referring to doubling of shaders within the same architecture.

Mentioning personal experience, back in 2012, I tried running FAH on a Palit GTX460 (336 shaders, card was overclocked to 810 MHz) and compared the performance to a Gigabyte GT430 (96 shaders, card was also overclocked to 810 MHz).
The Palit GTX460 had 3.5 times the no. of shaders of the GT430 but was less than 3.5 times as fast.

GPUs have alot more shaders (anologous to cores in CPU) than CPUs and are much more parallel. With a large no. of shaders and travelling along the green line in the chart in the article, it can be seen why there is diminishing returns.
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Increasing no. of cores and diminishing returns

Post by 7im »

CPUs are not GPUs. The article is about CPUs.

And you singled out only one data set for comparison, which unfortunately is old and flawed. Your 460 GPU is similar the benchmark GPU at the time. And the 430 is way below it and very bottle necked.

And you used a very vague "as fast" measurement for your comparison. Was that PPD? Which fahcore? With bonus? Or only the more accurate base points? Same project?

I'm sure there is an old GPU performance chart around here somewhere. It will show a more complete picture.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Post Reply