Video: diminished return on parallel processing....

MeeLee · Post by **MeeLee** » Sun Mar 10, 2019 4:40 pm

Interesting video on parallel processing.

It talks about parallel programming, the diminished return on parallel processing, and it's increased cost, at each doubling of the cores.
what it could mean for folding, is that we won't see any significant improvements in GPU speeds for newer designs;
And my take on this is that future cards that will fold much better than current generation of graphics cards, might very well be more specialized (industrial) cards, than graphics cards for gamers.
Kind of how CAD designers didn't use regular graphics cards, but used specialized cards back in the day, for 3D object designs.

The future of Folding, might lie in utilizing specialized hardware for folding, mining, or other massive parallel applications; as soon, even the best graphics cards will cap off at certain performance levels.
Not only in GPU/CPU parallel threads, or frequency, but even RAM and VRAM running at GDDR7 levels will have only marginally faster performance than current GDDR6 (V)RAM.

https://www.youtube.com/watch?v=eJBOU23L720

Post by **bruce** » Sun Mar 10, 2019 10:01 pm

Yes, the benefit of adding more parallel processors will have diminishing returns. At some point, there may be a GPU that has as many shaders as a protein has atoms and adding more shaders cannot speed up the calculations for that atom.

Actually, the return from adding more shaders decreases considerably earlier than that. Some of today's highly parallel GPUs have enough shaders that they work less efficiently on smaller proteins than GPUs with, say half as many shaders. Note that efficiency in that sentence isn't the same as speed. Having twice as many shaders than, say a GTX 750 Ti will run faster, but not twice as fast on a small protein, though it's still close to twice as fast on a protein with lots of atoms. FAH has never suggested that speed was linearly related to GFLOPS. (Note that GFLOPS is an artificial benchmark that cannot be achieved for real-life calculations.)

If you have recently purchased a top-of-the-line GPU, you may notice that small proteins don't achieve as high a GPU utilization percentage as large proteins. There's nothing anybody can do about that.

Note: The official GFLOPS rating is actually proportional to both the number of shaders and the speed of those shaders but a GPU running at 99% utilzation will produce more than the same GPU running at 85% utilization when the particular protein can only keep the shaders busy an average of 85% of the time.

Theodore · Post by **Theodore** » Mon Mar 11, 2019 6:51 am

Interesting!
I can already see certain WUs using only 145Watts of the 180 available watts of my RTX2060, and it's not even a top of the line graphics card.

Bruce, you mentioned somewhere on the forum, that in the future larger (molecule) chains will be implemented(for faster cards).
Are you now saying that there will come a time when a graphics card will have more cores/shaders than science will be able to make WUs for?
Or you think that fah will find ways to utilize these many cores (like eg: processing more than one WU at a time).

Post by **bruce** » Mon Mar 11, 2019 5:58 pm

Science moves forward. In the early days of FAH, studies were limited to smaller proteins. There are many small proteins that still need study, but there are also many larger proteins that could not be studied with a lifetime of CPU resources (and I mean that literally). Then GPU folding became feasible, expanding what could be accomplished in more reasonable periods of time. Every generation of GPU comes out with increased numbers of shaders, which continues to expand what can be studied ... but there still are many smaller proteins that need to be studied.

When a GPU requests a new assignment, there's no way to predict whether the projects that have work to do happen to be disributing small or large proteins, so people with "large" GPUs might get a small protein or a large protein. The servers do not have a message that says (in effect) we can't give you a WU that requires all of your shaders right now so we're giving you a WU that will use all of a smaller GPU -- but it's going to run faster on your GPU than it would if we gave that same WU to a smaller GPU.

The same sort of thing happens for CPU projects and there have been some useful assignment techniques developed there which may be ported to GPU assignments in the future. Suppose you're running a (very) high end CPU with, say 32 threads. Suppose there are no WUs that will use all 32 of your CPU threads. The servers CAN assign you a WU that uses, say 16 threads or 24 threads (or whatever). That's better than saying we can't give you an assignment for that CPU right now.

It should also be noted that if your GPU is running a particular WU at 85% or 95%, your GPU is still needed -- and the next assignment is likely to be different.

And, yes, the OpenMM developers are working on making the serial segments of a calculation more efficient. Some of those improvements have been incorporated into Core_22. The downside of increasing the utilization are
(A) It can interfere with your OS being able to update the screen promptly ("screen lag").
(B) It eats into the stability margin on overclocked machines
(C) The GPU's Power and Heat limitations may step in, limiting your performance
(D) etc.

Folding Forum

Video: diminished return on parallel processing....

Video: diminished return on parallel processing....

Re: Video: diminished return on parallel processing....

Re: Video: diminished return on parallel processing....

Re: Video: diminished return on parallel processing....