GPU assignment of big WUs to slow GPUs

Moderators: Site Moderators, FAHC Science Team

GPU assignment of big WUs to slow GPUs

Postby foldinghomealone » Mon Apr 27, 2020 8:00 pm

Some of our team members face the problem that big WUs are assigned to slow GPUs.
Like P13877 to a GTX 980 with a preferred deadline of 1day.
It's basically not possible to return the WU in time for part time folders

FAH always promotes the idea that part time folding is possible but then it counteracts it with preferred deadlines of 1 day.

I don't understand how the assignment process works but heard of many reports that it's not possible for the assignment servers to assign a WU by HW configuration.

If that is true FAH could use 'cause preference' to introduce a new category, like 'part time folding'.
Anyone who choses this 'cause' would get small WUs or WUs with longer preferred deadlines.
foldinghomealone
 
Posts: 130
Joined: Wed Feb 01, 2017 8:07 pm

Re: GPU assignment of big WUs to slow GPUs

Postby HaloJones » Mon Apr 27, 2020 11:10 pm

It's an interesting idea. In a perfect world, FAH-central would be told by the client what the GPU or CPU is and FAH-central would look in its list of work to allocate the most suitable. Sounds great but there's so many possible GPU and CPU (and more being constantly released) that it would be a nightmare to maintain.

But the 'part time folding' thing is an interesting concept. Unfortunately, for the sake of the project and the way it is iterative, projects really want a quick result to determine what to do next. Each unit informs the next unit what it should ask. That's why there are sometimes very short deadlines.

In more normal times, when FAH isn't trying to solve a very specific and time-sensitive problem, your suggestion might be adopted. maybe.
1x Titan X, 5x 1070, 1x 970, 1 x Ryzen 3600

Image
HaloJones
 
Posts: 857
Joined: Thu Jul 24, 2008 11:16 am

Re: GPU assignment of big WUs to slow GPUs

Postby JimboPalmer » Mon Apr 27, 2020 11:30 pm

F@H has what it calls Species numbers, but they are based on the capability of a card, not it's speed. A card may Support OpenCL 1.2 and Double Precision floating point math and thus be in a given Species. But that does not give the server it's speed. So far as I know the last AMD Species was for the RDNA/Navi cards, as they support Core_22 but not Core_21. Again not a measure of speed but of ability.
Last edited by JimboPalmer on Tue Apr 28, 2020 8:28 am, edited 1 time in total.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
JimboPalmer
 
Posts: 2018
Joined: Mon Feb 16, 2009 5:12 am
Location: Greenwood MS USA

Re: GPU assignment of big WUs to slow GPUs

Postby Rel25917 » Tue Apr 28, 2020 12:52 am

Before the whole covid thing the timeouts were much longer, usually 5 days for gpu and possibly more for cpu. For the new covid projects they want results really fast so we got 1 day timeouts.
Rel25917
 
Posts: 299
Joined: Wed Aug 15, 2012 3:31 am

Re: GPU assignment of big WUs to slow GPUs

Postby MeeLee » Tue Apr 28, 2020 2:27 am

There might be a way to set what projects you support. Some projects might have longer deadlines on their WUs.
MeeLee
 
Posts: 1021
Joined: Tue Feb 19, 2019 11:16 pm

Re: GPU assignment of big WUs to slow GPUs

Postby foldinghomealone » Tue Apr 28, 2020 5:53 am

HaloJones wrote:But the 'part time folding' thing is an interesting concept. Unfortunately, for the sake of the project and the way it is iterative, projects really want a quick result to determine what to do next. Each unit informs the next unit what it should ask. That's why there are sometimes very short deadlines.


But still FAH gives the possibility to use idle processing power and to pause a projects.

It's energy and time wasting to assign big WUs / WUs with short deadlines to slow HW so that they have to be reassigned.

If FAH dislikes the term 'part time folding' then call the 'cause' any other name FAH prefers. Maybe 'long deadlines' or whatever.
foldinghomealone
 
Posts: 130
Joined: Wed Feb 01, 2017 8:07 pm

Re: GPU assignment of big WUs to slow GPUs

Postby foldy » Tue Apr 28, 2020 7:23 am

You are right. That project 13877 has 150k atoms count and should be restricted to fast GPUs with many shaders only. So not assign to nvidia Maxwell GPUs or AMD Polaris GPUs.
foldy
 
Posts: 1966
Joined: Sat Dec 01, 2012 4:43 pm

Re: GPU assignment of big WUs to slow GPUs

Postby rwh202 » Tue Apr 28, 2020 7:32 am

The old v6 client allowed you to 'request work units without deadlines' and there was greater use of the 'packet-size' flags to get bigger or smaller WUs (that was mostly concerned about data bandwidth, but had the same effect).

That's progress for you...
rwh202
 
Posts: 421
Joined: Mon Nov 15, 2010 9:51 pm
Location: South Coast, UK

Re: GPU assignment of big WUs to slow GPUs

Postby foldinghomealone » Tue Apr 28, 2020 9:56 am

foldy wrote:You are right. That project 13877 has 150k atoms count and should be restricted to fast GPUs with many shaders only. So not assign to nvidia Maxwell GPUs or AMD Polaris GPUs.

That would be a good start.
However there are slow GPUs like 1030 or 1050 that - when not folding 24/7 - are not able to return the WU till timeout.

Maybe such projects should be only assigned to Turing and Navi GPUs
foldinghomealone
 
Posts: 130
Joined: Wed Feb 01, 2017 8:07 pm

Re: GPU assignment of big WUs to slow GPUs

Postby foldy » Sat May 16, 2020 9:31 am

foldy
 
Posts: 1966
Joined: Sat Dec 01, 2012 4:43 pm

Re: GPU assignment of big WUs to slow GPUs

Postby BobWilliams757 » Sun May 17, 2020 1:40 pm

Having some slower hardware myself, I do see the point. I have now done 10 runs of various 16435 WU's, and they barely meet the timeout folding 24/7. In this case, it takes them about 24 hours, some slightly more and missing the timeout slightly. I don't have a problem letting the machine run overnight so I let them run in the hopes that I'm still providing the first result. But others just don't want to let the system run overnight.


But in all fairness, really breaking them down might be must harder than it seems. Atom count alone does not dictate the architecture needed in all cases. Nor does the step count, or the CPU use. The more I dig, the more I think the WU variances are a lot more complex than most of us realize. Having been watching the Beta forum out of curiosity and possible desire to volunteer, it seems some WU's just work much better (or worse) with specific hardware architecture, and/or possible driver sets, OS, etc. So short of having a lot more Beta testers with a lot of various older, unique, rare, etc hardware/OS it would almost be impossible to really dial them in to guarantee any solid point of turnaround time.


So I'd have to agree that the WU's with lengthy time periods until timeout is really the only quick solution. And I also agree with it making things more efficient, since the less WU's timeout, the less the overhead on both the F@H and donor end.




As an interesting note on how complex it gets with WU's, I had searched and found a thread on a specific WU that gave me a crazy high PPD return. During Beta testing, several testers reported PPD return was low. I assume they adjusted the WU specifics to give it a more "fair" point return. But for whatever reason, the specifics of my onboard graphics, small atom count, number of steps, complexity, etc... I got a PPD nearing double my average. Most of the Beta testers have much more powerful GPU's and they reported lower PPD returns when testing. And it was also a Core 21 project, and most people state 20% or so lower PPD expectations. So every WU probably has a range of hardware that runs it great, some just ok, some struggle, etc.
BobWilliams757
 
Posts: 113
Joined: Fri Apr 03, 2020 3:22 pm

Re: GPU assignment of big WUs to slow GPUs

Postby bruce » Sun Jun 28, 2020 1:39 am

BobWilliams757 wrote:But in all fairness, really breaking them down might be must harder than it seems.


That certainly true. Nevertheless, FAH can improve the assignment logic in incremental stages. It starts by assigning a variety of projects to a variety of GPUs. Most likely it also starts with an assumption that atom count is at least one of the factors influencing final performance and gathering additional data from whatever gets returned from those assignments.

Timeouts should also be based on expected performance.

As an interesting note on how complex it gets with WU's, I had searched and found a thread on a specific WU that gave me a crazy high PPD return. During Beta testing, several testers reported PPD return was low. I assume they adjusted the WU specifics to give it a more "fair" point return. But for whatever reason, the specifics of my onboard graphics, small atom count, number of steps, complexity, etc... I got a PPD nearing double my average. Most of the Beta testers have much more powerful GPU's and they reported lower PPD returns when testing. And it was also a Core 21 project, and most people state 20% or so lower PPD expectations. So every WU probably has a range of hardware that runs it great, some just ok, some struggle, etc.


Also true.
bruce
 
Posts: 19830
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: GPU assignment of big WUs to slow GPUs

Postby SetiCrew486 » Tue Jun 30, 2020 9:27 pm

I also have deactivated my GPU slot a couple of weeks ago - for the same reasons.
Yes, I know my GPU is at the lower end of the list and will probably be dropped with the next update, but currently, it's still supported.

But, as I'm also folding part-time, I personally won't need a precise WU control like mentioned above. I don't have to tweak the lastest out of my GPU, because it will be shut down sooner or later anyway.
Instead, I would be fine by limiting the size roughly i.e. by atom count, by base credit or even by reactivating the max-packet-size flag.
I'm pretty sure I'll be able to find a setting that lets my GPU finish it's WUs safely within time.
“Knowledge is knowing that a tomato is a fruit; wisdom is not putting it in a fruit salad.” ― Miles Kington
SetiCrew486
 
Posts: 5
Joined: Fri Apr 24, 2020 4:28 pm

Re: GPU assignment of big WUs to slow GPUs

Postby bruce » Wed Jul 01, 2020 4:56 am

foldinghomealone wrote:Some of our team members face the problem that big WUs are assigned to slow GPUs.
Like P13877 to a GTX 980 with a preferred deadline of 1day.
It's basically not possible to return the WU in time for part time folders


Hmmm. i am seeing conflicting information;
> p13877. HFM Benchmark Data. These wu actually ran well on linux 970s (hosts F67/68) at TPF 4.5 minutes (0.3 days to complete). The 980 should be slightly faster than the 970 and able to easily meet the 1 day timeout.


Are you trying to run your 980 on a 1x riser? what TPF are you seeing? Please post applicable segments of your log.
bruce
 
Posts: 19830
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: GPU assignment of big WUs to slow GPUs

Postby Ichbin3 » Wed Jul 01, 2020 5:27 am

bruce wrote: foldinghomealone wrote:
Some of our team members face the problem that big WUs are assigned to slow GPUs.
Like P13877 to a GTX 980 with a preferred deadline of 1day.
It's basically not possible to return the WU in time for part time folders

I guess that's what he is pointing to ...
Asus Z87-A, i5-4570, RTX 2080Ti@180W, RTX 2080Ti@200W, Mint 19.3
Ichbin3
 
Posts: 74
Joined: Thu May 28, 2020 9:06 am
Location: Germany

Next

Return to Discussions of General-FAH topics

Who is online

Users browsing this forum: GregC, jbug55 and 2 guests

cron