Possible to 'rotate' WUs to different GPU?

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

ChristianVirtual
Posts: 1596
Joined: Tue May 28, 2013 12:14 pm
Location: Tokyo

Re: Possible to 'rotate' WUs to different GPU?

Post by ChristianVirtual »

I have not tried that method myself but isn't it high effort on babysitting ? I would need to quit my dayjob if I wanted to finecontrol this way.
A swap seems only meaningful a when both WU have similar start timing, but soon into the day those phasing will change making further swaps less effective, isn't it ?
Now that said, I would write a script as babysitter anyway. Still the phasing issue with significant different timestamps for assignments would remain ? Or not ?
I still wonder what is the real practical gain in the long run. I see usecases during testing etc, no doubt. But day2day it seems quite a PITA.
ImageImage
Please contribute your logs to http://ppd.fahmm.net
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Possible to 'rotate' WUs to different GPU?

Post by bruce »

foldinghomealone2 wrote:When you have a rig with same GPUs then there won't be any benefit. Same when all GPUs process the 'same' WU.
Normal users with one GPU won't profit at all.
In fact, the benefit will be negative. This loss will also reduce the benefit of switching when the GPUs are different.

Remember, whenever a WU is paused, it resumes from the previous checkpoint, not from the point at which it has progressed. The number of checkpoints per WU varies depending on the project. Let's suppose we have two projects that happen to have 16 checkpoints per WU. Each WU will restart but there will be a loss of between 0.01% and 6.24% that will need to be repeated (for an average of 3.12%. For WUs with more frequent checkpoints the loss will be less; with fewer checkpoints, the loss will be more.
foldinghomealone2
Posts: 148
Joined: Sun Jul 30, 2017 8:40 pm

Re: Possible to 'rotate' WUs to different GPU?

Post by foldinghomealone2 »

bruce wrote:In fact, the benefit will be negative. This loss will also reduce the benefit of switching when the GPUs are different.

Remember, whenever a WU is paused, it resumes from the previous checkpoint, not from the point at which it has progressed. The number of checkpoints per WU varies depending on the project. Let's suppose we have two projects that happen to have 16 checkpoints per WU. Each WU will restart but there will be a loss of between 0.01% and 6.24% that will need to be repeated (for an average of 3.12%. For WUs with more frequent checkpoints the loss will be less; with fewer checkpoints, the loss will be more.
In a scenerio described like yours it would be negative, right.

But:
The rotation function should make a check before WU rotation:
- Are there more than two different GPUs installed? If no, then rotation function disabled
- Are two different WUs assigned (to a two GPU rig)? If no, no rotation
- Could the current WU assignment of A and B to GPUs M and N be improved?
My experience by observation is that the bigger WU should be applied to the faster GPU, but like I said that needs to be evaluated

And:
Are you sure that this is always true?
bruce wrote:Remember, whenever a WU is paused, it resumes from the previous checkpoint, not from the point at which it has progressed. ...
Because during my tests with p9xxx and p11xxx WUs it always worked like this: 80,5% pause --> 80% resume, 71,6% pause --> resume 71%
I think it resumes at the last checkpoint only when you restart the client. But a restart wouldn't be necessary for WU rotation.

I think there is the chance for improvement. For a first evaluation only a few willing testers would be needed. I assume there are enough testers in this forum.

Maybe their results are positive or not.

But basically it should be a minor modification of FahControl.
It's basically a pause-command and a restart of FAHCoreWrappers with interchanged openCL and CUDA parameters.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Possible to 'rotate' WUs to different GPU?

Post by bruce »

When a WU is restarted after a pause, it ALWAYS starts from the last checkpoint, whether or not FAHClient is restarted.

I have no quarrel with your examples ... but we're missing a significant amount of information.
How often are the checkpoints for p9xxx and p11xxx? Also note that projects are assigned in blocks of 100s, so you need to fill in M and N in p9Mxx and p11Nxx before you draw any conclusions.

The fact that a WU resumed at 80% suggests that there may be a checkpoint every 10% so restarting after a pause at 80.5% is probably a very favorable example. You should try pausing at 80.9% (or 10.95% or ...) and demonstrate what may be an unfavorable example.

I normally look at the files for a particular WU and note the times at which a viewerFramex was written. It seems that those viewerFrams are written at the same interval that checkpoints are written ... though I'm not sure that's universally true.

Don't get your hopes up for this being incorporated into FAHClient/FAHControl. There is a long list of bugs that need to be fixed, followed by a long list of enhancement suggestions and yours would be at the end of those lists -- so it probably won't happen. Then, too, it's unlikely that the science will see any actual net gain, which is certainly questionable. As has already been stated, the science accomplished on small proteins is just as important to researchers as the science accomplished on big ones and you're intentionally slowing down the smaller proteins which won't meet with anybody's approval.

Also note that this technique is not risk-free. FAH has designed in a method to adjust these indices but you're doing so at your own risk. If you end up messing up a few WUs in the process, any complaints will be ignored.
foldinghomealone2
Posts: 148
Joined: Sun Jul 30, 2017 8:40 pm

Re: Possible to 'rotate' WUs to different GPU?

Post by foldinghomealone2 »

I didn't mean every 10%, I meant every 1%.

But I can understand that slowing down particular projects wouldn't be approved by anybody.

Thanks for your help and the great discussion
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Possible to 'rotate' WUs to different GPU?

Post by bruce »

I've never seen a case of a WU that had a checkpoint every 1% -- or even close to that.
foldinghomealone2
Posts: 148
Joined: Sun Jul 30, 2017 8:40 pm

Re: Possible to 'rotate' WUs to different GPU?

Post by foldinghomealone2 »

Oh, ok. I just tried again.

I was pausing a p9176 at 47,8%. Then it showed 47%. I resumed and it progressed to ~48% but a few seconds after that it started over from 44%.
I didn't realize the jump back to 44%.

Sorry for the confusion.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Possible to 'rotate' WUs to different GPU?

Post by bruce »

foldinghomealone2 wrote:I didn't realize the jump back to 44%.
That's what I was trying to explain. It will jump back to the previous checkpoint, whenever that occurred....and without knowing the construction of that family of WUs. you won't know when that checkpoint occurred. Please re-read what I've said earlier.
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Possible to 'rotate' WUs to different GPU?

Post by Joe_H »

Exactly. Both FAHControl and the Web Control will show the the previous percent completed until they receive an update from the folding core after a restart. You have to watch the log as it is updated to get close to "live" information. The control displays only get the completion percentage updated after the core has started, found and verified the checkpoint, and then started folding on the GPU. This can take more than just a few seconds.

Typically for GPU projects the checkpoint is done every 2-5% as set by the project manager on the WS. A few projects have used higher settings than 5%. This is unlike the CPU cores which write checkpoints after a time interval that can be set in the client. All folding cores resume at the last checkpoint on a restart if the checkpoint is verified. If it is not, sometimes the client will start over at the very beginning and other times it will fail the WU and dump it.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply