CPU Cores/Threads vs GPU

Sparkly · Post by **Sparkly** » Sun Jun 21, 2020 1:31 pm

bruce wrote:True, and that makes sense to me. In fact, I suspect that's the way it normally works.

Well, I don’t know what is normal, or if the atom count is a deciding factor when doing distribution, since I see everything from 4K to 500K atom count in my GPU logs, but as far as I can also see in the logs, the ability of assigning projects to only CPU or GPU already exists, since a typical example of this is the 148xx projects, which are CPU only and 55K+ atoms.

https://stats.foldingathome.org/project?p=14800

And how often the atom count is actually taken into consideration, when assigning projects to CPU, GPU or both, would need a deeper look into the backend statistics, but as a rule of thumb, based on the available benchmark statistics I have in my own systems, they all say the same thing, which is that higher atom count should go to GPU, while lower atom count should go to CPU.

There might of course be occasional variations to this, based on what type of calculation each project uses the most of, where some types of calculations could still be decent for GPU, even at lower atom counts, but as far as I can see in the benchmark statistics I have available for the different projects at this time, that variation seems to be non-existent.

What amount of CPU versus GPU recourses that are available in the network at any given time will fluctuate based on completion rate, and thus would impact the possible distribution of projects, to get them done, but looking at my own statistics, I would assume that the overall network utilization would benefit from doing a general split based on atom count as a start, and if my logs are any representative measure for this, then I would say that this initial split should be on at least 50K atoms.

Post by **Joe_H** » Sun Jun 21, 2020 2:26 pm

What I am getting from some of the discussion related to the COVID Moonshot work is that efficiency is taking second place in the consideration. Instead time to complete enough WUs and the associated trajectories is the primary driving force in the consideration of whether to process on CPU or GPU. Even running inefficiently, GPU folding will complete these projects much faster than CPU folding.

There will be waves of new work looking at how various compounds would potentially bind to various sites on the virus. They are looking to get sufficient results back in a period of a week or two to determine which are the best candidates for being synthesized and going to the next stage of testing in vitro.

Some prioritization to get these small atom count WUs to less powerful GPUs with fewer shaders is planned, but without a complete rewriting of the assignment code, these will still be seen on higher end GPUs.

BobWilliams757 · Post by **BobWilliams757** » Sun Jun 21, 2020 3:01 pm

I think it's all fairly complex, much more than most of us are assuming personally.

My little Vega 11 seems to thrive on the low atom count WU's, where many of the powerful cards show slow (to them) progress. But the superior hardware running at 30% will still outperform my modest APU even if it's returning 200% of average PPD. Then we have the question of availability of various hardware, even if the time difference didn't matter. The same applies to CPU vs GPU folding, even if the choice was available at the time to use either.

My APU might favor certain WU's, when another dedicated cards of roughly equal architecture might run them slowly, or favor another type WU that my system struggles with.

With my system I have a variation of over 200% when I compare my slowest WU to my quickest. Some faster hardware seems to have the same or greater swings. No matter how they set them up, there will be a small windows of the systems that run them in the most efficient way, and even then efficiency might be less important than overall throughput speed.

I'm going to just keep folding, and providing input where it's possibly helpful.

HugoNotte · Post by **HugoNotte** » Sun Jun 21, 2020 3:03 pm

Sparky, you only take into consideration the atom count in your reasoning. What about the different processing abilities, not just performance, between GPU & CPU?
There are a lot of low end GPUs around for which these smaller atom count projects are probably better suited.
However, FAH development doesn't have unlimited resources and must prioritize: churn out new FAH Cores, improve server infrastructure, or make sure your GPU is fed the proper size WUs?
In the end I agree, it would be ideal if work servers would have the ability to send WUs to best matched hardware available, large atom count WUs to the most capable hardware and smaller ones to the lower end devices. But reality is, projects like these always lack capacity in the development department, since the main goal is to keep thing going, not to make sure every volunteer gets the optimum PPD.

ajm · Post by **ajm** » Sun Jun 21, 2020 3:08 pm

I don't know. I often observed that small jobs leading to lower PPD on higher end GPUs get dumped by many users before they are successfully crunched, probably because of that meager output. If so, to optimize that aspect would also benefit science, not only volunteers' ego.

BobWilliams757 · Post by **BobWilliams757** » Sun Jun 21, 2020 4:02 pm

ajm wrote:I don't know. I often observed that small jobs leading to lower PPD on higher end GPUs get dumped by many users before they are successfully crunched, probably because of that meager output. If so, to optimize that aspect would also benefit science, not only volunteers' ego.

That's a shame. I don't remember points when I folded in the past, but even then I knew that some had more hardware than others. But it didn't matter to me.

Now that I'm folding again, I only set up a passkey to compare notes, since most people used the passkey/QRB to compare performance and points. But I run a meager onboard GPU, so I still know there are people with a lot more hardware resources than I have. And here and there I have WU's that take a full day or so to complete, so that PPD return will in fact represent the full day. But I would hope that people with fast GPU's wouldn't dump them due to PPD return, since if htey picked up the same WU it would likely be done in hours and not really impact their point that much. And if everyone just took the attitude of completing the WU regardless of points, the science would advance quicker.

But I run a WU regardless of the PPD return anyway. The only time I would dump a WU is if I could confirm that it was already returned correctly by someone else. I've had several that passed the timeout, and returned next to nothing in points. But I keep folding them in hopes that my finished WU will still be the first to return completed, and allow the next WU to start.

ajm · Post by **ajm** » Sun Jun 21, 2020 4:26 pm

Well, the point system fosters this kind of competitive state of mind, which is part of FAH's success, I gather. And with higher end hardware, those special small yield WUs can make a serious difference. I have monitored them on a one GPU since last night, as there is a big series of them right now, and in this particular case, for this one card, accepting those WUs it will bring me a loss of around 1M in 24 hours. For people competing within a team, it is a real bummer.

Post by **bruce** » Sun Jun 21, 2020 4:30 pm

BobWilliams757 wrote:I think it's all fairly complex, much more than most of us are assuming personally.

My little Vega 11 seems to thrive on the low atom count WU's, ....

My APU might favor certain WU's, when another dedicated cards of roughly equal architecture might run them slowly, or favor another type WU that my system struggles with.

I'm going to just keep folding, and providing input where it's possibly helpful.

That's the reason the benchmarking study was started. In an ideal world, there would be projects that your APU favors that would maximize the science it accomplishes ... and that would also maximize the points.

It might turn out that the atom count isn't the only factor that should be driving the assignment process. Saying my dGPU from nVidia is "similar" to your APU doesn't make them the same. They might have different preferences even if the projects we are comparing happen to have similar atom counts. No one person has enough information to draw firm conclusions.

See viewtopic.php?f=66&t=35599#p338125
and viewtopic.php?f=61&t=35498#p338133

The point system needs to be "fair" for both of us and the servers should be passing out whichever project allows both of us to produce the most science. ... and that would be a HUGE restructuring.

Neil-B · Post by **Neil-B** » Sun Jun 21, 2020 4:43 pm

Since the people competing within teams will all have comparatively high end kit then everyone will be taking a hit on these - barring those who chose to subvert the work of FAH and dump WUs simply to game the system for points

Sparkly · Post by **Sparkly** » Sun Jun 21, 2020 7:41 pm

HugoNotte wrote:Sparky, you only take into consideration the atom count in your reasoning. What about the different processing abilities, not just performance, between GPU & CPU?

As far as I can see from all my logs, running GPU projects in several different systems, the ONLY factor that so far has had a significant impact on the processing of the GPU WUs is the atom count. Someone else needs to post a list of dedicated CPU performance on different projects for comparison.

Not saying that other factors can’t impact this in other ways in other projects at some future time, but from running these dedicated test systems 24/7 for over 2 months now, the atom count is the only factor that shows a significant impact difference.

And it is not like I am running any state of the art high-speed fancy GPU stuff either, since all the GPUs are old RX580, which compared to the newer 1070, 1080, 2060 or whatnot is slow.

And if we look at one of the links bruce added

bruce wrote:See viewtopic.php?f=66&t=35599#p338125
and viewtopic.php?f=61&t=35498#p338133

we can see that a more fancy GPU like the RTX 2060 Super shows similar performance drops as my old cards.

Project – PPD – Atom Count
P14448 - 1481K – 216 769
P16905 - 844K – 28 000

HugoNotte wrote:However, FAH development doesn't have unlimited resources and must prioritize: churn out new FAH Cores, improve server infrastructure, or make sure your GPU is fed the proper size WUs?

When it comes to atom count, nothing new needs to be developed or upgraded, since everything regarding that functionality can already be handled with the system the way it is, the only thing needed would be to assign a project to CPU or GPU based on the projects atom count, which is something that is already being done in some cases, but hardly as a “standard”.

The main problem with all the small atom count projects going to GPU is that it actually delays the overall throughput of completed science and WUs, sometimes very significantly, and especially in systems running multiple GPUs, since in addition to the under utilization of the GPU in the first place, the added effect of also delaying the other GPU projects running at the same time in the same system is a "bonus", since the low atom count project uses a lot more CPU recourses to feed the GPU, thus all feeding of all the other simultaneously running GPUs are slowed down too.

The impact of this is very easy to see in the completion times recorded, since you can sit and look at a fairly normal multi GPU run with average atom count projects, where the WUs listed show completion times like 6h, but suddenly a low atom count project is picked up and the completion time on those same projects increases to 12h, just because that newly picked up low atom count project grabs significantly more of the CPU resources that is used to feed all the GPUs.

So, instead of someone completing 2 x Average size WUs in x amount of time, they are now only completing 1 x Average size an 1/2 x Small size in twice the amount of time, which isn’t a very efficient use of available resources in the network, if maximum completion is the goal.

Neil-B · Post by **Neil-B** » Sun Jun 21, 2020 8:20 pm

I am sure the FAH team are aware of the overall impact .. but actually from what I can tell these are important test projects that they need doing and which are more important overall than pure throughput that you are considering .. in the scale of things the delay impacts for the short period of time that they will happen are seemingly less than the impacts of not running these projects ... the team do appear to be tuning this as fast as they can but it is work that needs to be done or they wouldn't be doing it

Post by **Joe_H** » Sun Jun 21, 2020 9:10 pm

I will also point out, while your card is older, it is a high end card from the series it was part of. As such for performance on low atom count WUs it has more in common with that RTX 2060 than older, low to mid range cards. Specifically your card has a larger number of shaders, and low atom count WUs do not use all of them. Something like an RX 560 or RX 550 with fewer shaders is going to see much less difference in utilization by these WUs.

There are a number of other factors you are ignoring. To start with, this COVID Moonshot is on very short deadlines to get information out of these runs to the other researchers who will actually be making selected compounds to test further. It would be really nice to have the luxury of months of time to develop all of the necessary assignment tools to go for optimum usage of all resources. That would also involve the researchers needing to manage that development, even if done by volunteers, instead of getting spending time getting projects set up to be processed.

And the proceeding is directly in response to this:

Sparkly wrote:When it comes to atom count, nothing new needs to be developed or upgraded, since everything regarding that functionality can already be handled with the system the way it is, the only thing needed would be to assign a project to CPU or GPU based on the projects atom count, which is something that is already being done in some cases, but hardly as a “standard”.

The assignment system does not work the way, nor have the features you appear to think it does. It can assign by class of equipment. That has been modified over the years, but would need many more modifications to break things down the way you are advocating for. Some work in that direction has been done, they could wait for that to be finished, or they may end up figuring otut a different way of doing this. But it will not be ready soon.

Sparkly · Post by **Sparkly** » Sun Jun 21, 2020 9:53 pm

Joe_H wrote:The assignment system does not work the way, nor have the features you appear to think it does.

I don’t get what features you think I am talking about that is missing, since the thing I describe is already happening and being used, where the projects are assigned/made to run as CPU, GPU or both (via cloning or whatever), something that is decided by the person/persons setting up the projects for distribution, same as is being done for the P148xx projects, which are CPU only, so I am not talking about subdividing GPUs into different GPU classes, I am talking about running on CPU or GPU.

Post by **bruce** » Sun Jun 21, 2020 10:33 pm

Take 4 WUs: 10k, 30k, 100k, 300K atoms. and assign them to 4 GPUs with 192, 640, 2000, 4500 shaders. Gather data for all 16 trials.

All GPUs will report 98% utilization with the large proteins. The smallest protein may report high utilization on the smallest GPU, but certainly not on the widest GPU which will show a low utilization factor. The intermediate results will be the most interesting data gathered and that's where the most interesting assignment choices need to be made.

Sparkly · Post by **Sparkly** » Mon Jun 22, 2020 12:49 am

bruce wrote:Take 4 WUs: 10k, 30k, 100k, 300K atoms. and assign them to 4 GPUs with 192, 640, 2000, 4500 shaders. Gather data for all 16 trials.

Well, the numbers for 2k shaders is already available here, since the RX 580 has 2304 and the RTX 2060 Super has 2176, but the shader count doesn’t really give a very good indication on anything other than available shaders, since the RTX 2060 Super is 3-4 times faster than the RX 580 on the same projects, even thou they have a similar amount of shaders.

Quick fix is to send the 10k/lower atom count projects to CPU only and the rest to GPU, since the main issue isn’t the maximum utilization of the fastest cards and most shaders in any given situation, which would require a rewrite somewhere, maybe starting with something like what PantherX describes

PantherX wrote:https://github.com/FoldingAtHome/fah-issues/issues/1504

but rather to prevent the CPU from nearly being in a constant 100% peak from communicating with the GPU doing the low-very low atom count projects.

This is of course more of an issue in multi GPU systems, which this thread is about, compared to single GPU systems.

Folding Forum

CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU

Re: CPU Cores/Threads vs GPU