GPU3 QRB?

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: slegrand, Site Moderators, PandeGroup

GPU3 QRB?

Postby sbohdan » Tue Aug 02, 2011 7:59 pm

I have read about a month ago an announcement about points changes to GPU3 client. It said that the quick return bonus (QRB) will be applied to the GPU3 client as well. My question is when?

The other change that we have in mind to do is to bring all classic and GPU WUs into the Quick Return Bonus (QRB) system. This would help further bring all FAH projects into balance. There may be some issues with GPUs and QRB, so we are looking to see what we can do to minimize problems with that before making a change in the points for GPU WUs.
Gigabyte GA P67-UD4 B3, 2500K@4.69Ghz@ 1.428V, Corsair A70, 4GB GSkill Ripjaws 2133Mhz 9-11-9-28 @ 1.64V,
EVGA GTX470@811/1622/1674@1.05V, Acer 24" 2ms, Crucial Real SSD C300 64GB, Seagate 2TB, Samsung 2TB, PC P&C 750W, Win7 Ultimate64
User avatar
sbohdan
 
Posts: 2
Joined: Tue Aug 02, 2011 5:13 pm

Re: GPU3 QRB?

Postby bruce » Tue Aug 02, 2011 8:29 pm

Welcome to foldingforum, org, sbohdan.

Stanford never gives predictions about when future changes may happen, so there will be no real answer to your question. Their resources are devoted mostly to scientific research, followed by a number of fairly important projects which are sorted by some kind of priority. Adding new features like that to a segment of FAH which is basically working as previously advertised probably falls at a lower priority than fixing things things that are actually broken. I do know that there are some changes that have to be integrated into the server code that must come first and updates to server code on active servers generally takes a long time, too.

The changes to the classic client are being phased in rather slowly. Existing projects continue under the old points methodology with new projects being brought on-line under the QRB system. A couple of hiccups have been discovered during that process and adjustments are being made. Adjustments to SMP have also been made, and it's likely that we'll more changes to both. Changes to the point system are always done gradually (and undoubtedly accompanied with a lot of behind-the-scenes testing and evaluation).

The announcement made it clear that such changes are in the planning stage and described some long-term goals, but I'd summarize by saying my guess is your answer is somewhere between "not soon" and "soon" (Sorry that I can't be more specific.)
bruce
 
Posts: 21697
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU3 QRB?

Postby sbohdan » Tue Aug 02, 2011 8:57 pm

Thanks for the quick answer. I just assumed, that since the bigadv bonus went down immediately, same would apply to the GPU3 QRB. I guess will see :)
User avatar
sbohdan
 
Posts: 2
Joined: Tue Aug 02, 2011 5:13 pm

Re: GPU3 QRB?

Postby bruce » Tue Aug 02, 2011 11:19 pm

Assuming I'm right about "... a lot of behind-the-scenes testing and evaluation." I think it's fair to say that bigadv had already been under close scrutiny for a while and that the steps to make the changes were relatively simple.

No changes needed to be made to servers. In fact, the formula for bonus did not change. Every project is assigned a baseline value shown in psummary.html. The bonus factor is computed based on that number. It was the baseline points that changed, not the bonus, and changing the baseline points (once you have a plan) is extremely simple for them to do. It's the planning of what to change and how much that takes all the thought.
bruce
 
Posts: 21697
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU3 QRB?

Postby Jesse_V » Wed Feb 15, 2012 7:45 am

To avoid opening up a new thread, I'll revive this one.
I would support a Quick Return Bonus on the GPU. I think the first logical step is to try to establish why the QRB exists in the first place. From the SMP FAQ, I found this:
High-performance clients often require more computing resources. SMP clients typically run on dedicated systems, 24 hours a day, and use more processing power, more disk space, more network resources, more system memory, etc. Also, a major part of the scientific benefit is dependent on rapid turnaround of work units; hence we assign short deadlines for SMP work units. To reward those contributors for donating resources beyond the typical CPU client, for completing these work units very quickly within the short deadlines, and for contributing to the development of our next-generation capabilities, we currently set a benchmark value (with included bonus*) proportional to these larger more demanding SMP work units. Without the SMP clients and your additional contributions, we would not be able to complete many important projects. *Please note the bonus value is subject to change.

The GPU clients perform different types of calculations than the SMP clients, but they do those calculations exceedingly quickly due to optimized hardware. As proven by one of Dr. Pande's blog posts, F@h was the first to use GPUs for distributed computing and the first to run major molecular dynamics simulations on them. It's also on folding.stanford.edu about how much speedups we gained from using GPUs, and they are a major component of our FLOP count (which may not accurately measure scientific production, but it makes F@h look good nonetheless).

Despite all these benefits, GPUs use a lot of resources. Based on the forum reports that I've seen, there's a much greater chance of GPU folding causing lag and slowing things down. I've experienced this myself on my 240m and even on my 560TI, and there's been a few rare driver crashes/recoveries and whatnot. Bottom line: although it generates less heat on my machines than CPU folding, it also consumes more of my computer. It's the PG's choice to make a GPU slot included by default on v7, that's their decision. I've also seen numerous posts from the mods/admins (such as Bruce) that GPUs aren't as good at priority processing, so unlike SMP/bigadv they aren't as good at backing down for other applications; this explains the lag and whatnot. Thus, it satisfies the first two sentences of the above quote. It also goes without saying that it satisfies the third as well, and in general GPU WUs do have shorter deadlines than uniprocessor WUs. My point is that it logically follows that it follows the "To reward those contributors for donating resources beyond the typical CPU client, for completing these work units very quickly within the short deadlines, and for contributing to the development of our next-generation capabilities, we currently set a benchmark value (with included bonus*) ..." line as well. I've also seen the results of GPU folding in several publications, so I know they are valuable.

Obviously uniprocessors don't need a QRB, I won't argue that point at all. They use one core (plenty processing remaining on any modern processor) and back off extremely well for other applications, generate the least heat out of all other clients, and have really long deadlines to cater to those who don't fold 24/7. This is not the case for SMP and GPUs. Like the SMP, the GPU should have the QRB as well. Luckily I see from the quote in the first post that there are plans to do this. I am in full support of this occurring rapidly. There have been numerous discussions about the QRB on SMP and bigadv, involving terms such as "inflation" and "arms race". This worries me. I picture SMP, GPUs, and PS3s working side-by-side, each performing their own special type of science that the others would have difficulty doing. I'm concerned that in terms of PPD and $/PPD, SMP is going to come out on top. We should get SMP and GPU on the same level. I have no idea what the current balance is; the High Performance Clients being a bit hidden skews the reporting on Stats by OS. Bigadv caters to server-class hardware, SMP, GPUs, and PS3s in the middle, and uniprocessors take up the rear. But as I've outlined above, the GPUs definitely fall within QRB criteria. So all I'm hoping for is revived interest. This is a reminder.
Pen tester at Cigital/Synopsys
User avatar
Jesse_V
 
Posts: 2773
Joined: Mon Jul 18, 2011 4:44 am
Location: USA

Re: GPU3 QRB?

Postby bruce » Wed Feb 15, 2012 9:24 am

Jesse_v: Please read the most recent entry in the news blog.

Two facts to consider.
1) If there's a conflict for PG resourcs (and they're always is) developing new methodology, such as the LTMD method is many, many times more productive (measured in scientific work completed) than updating the capabilities of the servers to manage QRb data.

We normally only talk about the version of the FAH Client, because that's the only part of a total system that we see. The code on the servers also have various versions and they go through a similar upgrading process. New server-based features appear on newer server code but older server code is still capable of supporting projects that started further in the past.

2) I consider it likely that both may happen at the same time. New servers with fresh server code are normally outfitted for new projects, especially new projects that also involve new technology. Projects running on older server code are currently working within the constraints of older server code.

We don't really see what version of code a server is running, but FahCore_11 was released before FahCore_15 which was before FahCore_16, so we can guess that projects that are running effectively with FahCore_11 (which will also be finished using FahCore_11) will probably never make use of Server code that supports QRB.

Witness the same sort of migration of uniprocessor changes as FahCore_78 migrates to FahCore_a4.
bruce
 
Posts: 21697
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU3 QRB?

Postby Jesse_V » Wed Feb 15, 2012 4:31 pm

That blog post was pretty impressive and I look forward to see what kind of things it can do. I'm all for its development as long as it doesn't stress everyone's machines too much but we'll see how it goes. If you look at psummary all the K values of non-QRB projects are 0. From FAQ-PointsNew, the formula is
final_points = base_points * max(1,sqrt(k*deadline_length/elapsed_time))
Thus if k is 0, there's no QRB, and changing it to a non-zero number gives the QRB. I would speculate that the servers would use that information to make a QRB calculation regardless but non-QRB WUs would always get base points. That way the server code is the most flexible and the code is also simpler and uniform instead of making QRB vs non-QRB choices on a case-by-case basis, so maybe it's actually coded that way. I'm not sure what roles cores play, because I thought that cores simply did the calculations on particular types of hardware and whatnot, and it was the client (v6 or 7's FAHClient) that did the uploading. From there again I would guess that the servers would get the deadline and elapsed time information and go from there. They have that information on hand anyway since they do reassignments after a timeout. Maybe the servers don't always make QRB calculations, but if they do shouldn't it be a simple matter of adding K-values to new GPU projects? If that were the case, there would be some discussion about what exactly the K value should be to make things in line with other projects, but when its launched those who get that project also get the QRB. Over time, as the older projects finish up and QRB-enabled GPU WUs keep getting introduced, eventually all GPU projects will have the QRB. Or is the process a lot more technical and complicated than that?
User avatar
Jesse_V
 
Posts: 2773
Joined: Mon Jul 18, 2011 4:44 am
Location: USA

Re: GPU3 QRB?

Postby bruce » Wed Feb 15, 2012 5:06 pm

Pretty much.

The old adage "If it ain't broke, don't fix it." Applies here. The safest way to keep old server code running is avoid changing it. The safest way to keep projects running that are successfully running on an old version of server code is to upgrade the server to new code AFTER the projects are finished. Murphy's Law applies to "simple" changes, too.
bruce
 
Posts: 21697
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU3 QRB?

Postby 7im » Wed Feb 15, 2012 5:45 pm

In real life, PG releases new projects from newer servers with the updated server code. The old and new run in parallel, until the old WUs are completed. Then the older server is either retired, repurposed, or upgraded. That's why people occasionally see clients or cores acting differently. Collection server working for some work units and not others, for example.

Another old adage that Bruce is fond of... "FAH is not unlike a large ship, and course corrections take time." (even the small simple ones...) You don't want to try turning an oceanliner on a dime, or the boat could roll over, taking down the guests and all. ;)


And I'm glad Jesse won't argue the point about QRB for uni-clients, because the decision was already made. 8-) QRB is coming to all client types, eventually, and has come to Uni clients already in the a4 core. The goal of the QRB is to add points as a function of time, because the faster a work unit is returned, the more scientific value that work unit has. And the primary function of the points system is to equate scientific value of the work to the number of points you received (despite all the varying opinions on the topic).
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
User avatar
7im
 
Posts: 14648
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Re: GPU3 QRB?

Postby Jesse_V » Wed Feb 15, 2012 6:33 pm

That's right, there were those few uniprocessor projects that got QRB. But doesn't a4 support both uniprocessors and SMP? It's not a uniprocessor-only core. I'm just saying it would be nice to see QRB on GPUs, and before any more uniprocessor clients. GPUs qualify more for QRB than uniprocessors do, IMO.
User avatar
Jesse_V
 
Posts: 2773
Joined: Mon Jul 18, 2011 4:44 am
Location: USA

Re: GPU3 QRB?

Postby 7im » Wed Feb 15, 2012 6:53 pm

Correct. fahcore_a4 supports both -smp x where x is more than 1, and also no smp switch, which is effectively -smp 1.
User avatar
7im
 
Posts: 14648
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Re: GPU3 QRB?

Postby Napoleon » Wed Feb 15, 2012 9:35 pm

Jesse_V wrote:Obviously uniprocessors don't need a QRB, I won't argue that point at all.

Wrong. Quick(er) return is quick(er) return, and PG has stated over and over that they really prefer quicker returns. I haven't read any posts from PG encouraging return or uniprocessor WUs as close to the preferred deadline as possible. :twisted:

However, if you're talking about letting core 78 WUs go gracefully and eventually replacing them with A4 WUs with a QRB, we are in agreement. The situation may be more or less self-correcting in that case, we just need to be patient. Keep in mind that SMP may not be an ideal choice for many non-dedicated folding setups, even if they are very powerful. SMP is very sensitive to load imbalance, making multiple uni slots a better choice in many non-dedicated environments. Servers and the like leap to mind; very powerful setups which often have lots of CPU cycles to spare, but their typical loads are a nightmare for SMP folding.


7im wrote:And the primary function of the points system is to equate scientific value of the work to the number of points you received (despite all the varying opinions on the topic).

I don't think the disagreement is about what should be the primary function, its whether the points system is doing its job or not. A difference... I happen to think it is seriously flawed in its current form, and trust me, I'm going to elaborate on that bold statement.

Let's take v7 client, hybrid A4 project and powerful 16 core server capable of BA16 for example. The admin of that server may choose to pass up on SMP and go for 16 uni slots simply because there's not much need to worry about load imbalance and strict deadlines. (S)he has bigger fish to fry. When otherwise idle, the server could still provide amazing throughput for that A4 project, even though latency is far from great when compared to the dedicated smp:16 folding setup.

Wringing the best possible scientific value out of the donor base is a compromize of the two. With the current lopsided points system, PG has managed to convey the message that raw throughput isn't the do-all-be-all solution. Downside is that many donors appear to be rushing towards the other extreme, sacrifing more or less significant amount of throughput to capitalize on low latency 4P systems.

If the amount of parallelized work for the project (unique Run & Clone pairs, aka trajectories) significantly exceeds the amount of CPU cores crunching the project, throughput is vital. Suppose there are 2^14 trajectories (2^7 Runs and 2^7 Clones for example) in a hypothetical A4 project. Suppose are 2^10 dedicated 16-core setups having a go at it. There would be 2^14 trajectories / 2^10 computers == 2^4 trajectories / computer == 16 trajectories / computer. As far as scientific value (read: wall clock time to complete the project) is concerned, there's no difference if the donors are running 16x uniproc or smp:16 (assuming perfect SMP scaling), but obviously the difference in awarded points would be huge.

Then again, suppose the project would have only 2^10 trajectories. In that case the latency would become vital. With the same hypothetical 2^10 donor base as in my previous example, 15 out 16 uniprocessor slots would be either starved for work or would have to be assigned to other projects. The example project would take 16x longer to complete. Sciencific progress would have suffered greatly if the project results turned out to have a pivotal role in research.

IMHO, the scientific value of quicker returns boils down to the ratio of trajectories and cores available to crunch them. Currently the FAH stats lists about 2^18 active CPUs (slots?) running Windows. How many trajectories does your average hybrid/SMP/BA project have? For example, could all Windows CPUs put to work on a single project if it seemed to have a pivotal role? I very much doubt that, so it does become important that donors aren't allowed to hog up on trajectories at will. Hence the push towards bigger SMP via QRB system.

To stick to the topic somewhat, I'd say introducing a QRB for (NVidia) GPUs wouldn't reflect scientific value at all. Stats page lists about 2^13 active NVidia GPUs. Any single P680x has more trajectories than that, all of the active Fermis could be put to work on a single project. They're all throughput gated, so nonlinear points curve would not reflect scientific value at all. :wink:

Then again, adding a hefty QRB would probably lure a lot of enthusiasts back to GPU folding and that would be good for science, and hopefully introducing a QRB for GPUs might justify itself in the long run. So, PG may just fiddle with the GPU points curve anyway. Fine by me, I'd keep on folding on my GT430 anyway, but please don't start lecturing about points reflecting scientific value if that happens, at least not until there's even one Fermi project gated by latency instead of throughput. :twisted:

Chicken-and-egg problem, really. If points were to reflect the scientific value, QRB curve should be adjusted periodically throughout the lifespan of the project, based on trajectory / donor base ratio along with some other performance metrics. Perhaps on a weekly or monthly basis, and that would cause awarded points to fluctuate wildly. Hey, points are fun, but I don't really care, so I say go for that, or some other way to keep points inflation at bay.

Gee, I'm itching to make enemies in all the FAH enthusiast camps with that statement. Hey, it's bash-the-troll-day at the FAH fair, grab your bat! :P
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
User avatar
Napoleon
 
Posts: 1032
Joined: Wed May 26, 2010 2:31 pm
Location: Finland

Re: GPU3 QRB?

Postby bruce » Wed Feb 15, 2012 11:03 pm

That's an excellent analysis. The PG is well aware of the fact that the number of CPUs/slots compared to the number of trajectories is an important metric that varies over time, and the strategy for optimum throughput varies. I have proposed similar changes either to the points strategy or to the Assignment Server strategy which include some dynamic components. Currently there are lots of discussions about the best points settings for BigAdv-16. Lots of turmoil, strong feelings, etc. Can you imagine the the donor responses if the Pande Group announced that (for example) the K-factor was no longer going to be predetermined but was going to be adjusted hourly based on supply and demand within some range between X and Y?

As far as I know, such a system is NOT planned, but if a system such as that could meet with donor acceptance, FAH would achieve more scientific work than they presently do.
bruce
 
Posts: 21697
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU3 QRB?

Postby 7im » Thu Feb 16, 2012 12:34 am

People expect consistent pay. Their paycheck should be the same dollar amount every week. The same is expected of FAH, despite donors' lack of understanding about the differences in hardware, smp vs. bigadv, etc.

Every WU below the average of X to Y would generate a complaint, and every WU above would pass quietly by without a thank you. Human nature. :lol: The more I know about people, the more I like my dog. -- Mark Twain

When PG can make a local client benchmark, then they can use a dynamic K factor, because the client will display the current K factor for the WU, and the resulting PPD estimate will be very accurate based on the very effective local benchmark. It will be much easier to communicate that complex concept, and therefore gain acceptance much more easily. ;)
User avatar
7im
 
Posts: 14648
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Re: GPU3 QRB?

Postby Biffa » Wed Feb 22, 2012 9:53 am

How about get rid of the QRB altogether?

Just base it on how many atoms the client can process in one day.

GPU Rig: GTX470 can process 5 x 1832 = 7,501 Atoms/day (P8032)

CPU Rig: Quad Opteron (48 cores) can process 1.17 x 2533797 = 2,576,871 Atoms/day (P6903)

CPU Rig = 343.5 x as many atoms/day than GPU Rig

GPU Rig = 15737 points/day

If 1 atom = 1 point then the CPU rig should get 15737 x 343.5 = 5,405,659.5 points/day

Under the current points system the CPU rig gets 495,749 points/day

Sounds like the CPU rig is losing out to me.
Image
User avatar
Biffa
 
Posts: 127
Joined: Sun Nov 16, 2008 11:40 pm
Location: UK

Next

Return to V6 GPU3 beta (including Fermi) OpenMM

Who is online

Users browsing this forum: No registered users and 1 guest

cron