Proj 13420 same variability as 13418

Moderators: Site Moderators, FAHC Science Team

Proj 13420 same variability as 13418

Postby Ichbin3 » Tue Jul 28, 2020 12:20 pm

It looks like the project 13420 is showing the same variability as 13418 - means there are some fast folding and some slower folding.
@JohnChodera - would you mind to consider to increase the base credit as you did for the 13418?
13420 (3082, 36, 0) - this is a slow one, just folding, for example.
Asus Z87-A, i5-4570, RTX 2080Ti_Rev.A@200W, Mint 19.3
Ichbin3
 
Posts: 80
Joined: Thu May 28, 2020 9:06 am
Location: Germany

Re: Proj 13420 same variability as 13418

Postby HaloJones » Tue Jul 28, 2020 12:56 pm

TBH, I've not seen much variability in 13420. I've done 16 of them so far and they've all been around the same ppd.
1x Titan X, 5x 1070, 1x 970, 1 x Ryzen 3600

Image
HaloJones
 
Posts: 869
Joined: Thu Jul 24, 2008 11:16 am

Re: Proj 13420 same variability as 13418

Postby Ichbin3 » Tue Jul 28, 2020 3:54 pm

13420 (3082, 36, 0)
Normal time for a 13420 is indeed 1:02 TPF
This one had 1:21, without me using the computer.
Ichbin3
 
Posts: 80
Joined: Thu May 28, 2020 9:06 am
Location: Germany

Re: Proj 13420 same variability as 13418

Postby JohnChodera » Tue Jul 28, 2020 7:36 pm

Thanks for the heads-up. I suspect some GPUs see much more variability than others.

I've incremented the base credit for 13420-1 by 10% to help compensate for this variability.

We're still investigating how we can further minimize this in our setup or through changes to OpenMM.

Thanks for bearing with us!

~ John Chodera // MSKCC
User avatar
JohnChodera
Pande Group Member
 
Posts: 406
Joined: Fri Feb 22, 2013 10:59 pm

Re: Proj 13420 same variability as 13418

Postby Ichbin3 » Tue Jul 28, 2020 7:46 pm

Thanks for listening ;- )
Ichbin3
 
Posts: 80
Joined: Thu May 28, 2020 9:06 am
Location: Germany

Re: Proj 13420 same variability as 13418

Postby gunnarre » Tue Jul 28, 2020 7:54 pm

Thank you, I noticed that 13421 was projected to make just 52k PPD on a GPU which usually does between 70k-95k PPD.
Image
gunnarre
 
Posts: 181
Joined: Sun May 24, 2020 8:23 pm
Location: Norway

Re: Proj 13420 same variability as 13418

Postby Neil-B » Tue Jul 28, 2020 9:15 pm

hmmm ... I start to wonder ... bumping ppds up because of variability on some cards (and then only variability of some WUs) ... tbh begins to feel less and less point in even keeping track of points ... cpu projects delivering >20% less than was normal across the board ... gpu projects being bumped up by 10% on a single request ... perhaps the "cpu is irrelevant message" has some grounds ... probably close down the team tbh and just fold for anonymous :)
1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent, Quadro K420 1GB, FAH 7.6.13
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro, Quadro M1000M 2GB, FAH 7.6.13
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro, GTX 750Ti 2GB, FAH 7.6.13
Neil-B
 
Posts: 1409
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: Proj 13420 same variability as 13418

Postby gunnarre » Tue Jul 28, 2020 9:40 pm

My CPU normally makes twice as many points as that GPU, so CPUs don't feel irrelevant for folding for me. In fact, if the PPD drops much lower, it would be better to shut the GPU down and let the CPU run more threads. As long as the points are roughly equivalent to the science benefit of running the work units, they're doing their job of rewarding the most effective folding configurations.
gunnarre
 
Posts: 181
Joined: Sun May 24, 2020 8:23 pm
Location: Norway

Re: Proj 13420 same variability as 13418

Postby Neil-B » Tue Jul 28, 2020 10:14 pm

Sorry, but imho boosting base points to "make up for" a few variable WUs on some GPUs make a mockery of the equivalent science benefit argument - actually rewarding lack of performance !!

I use rolling ppd averages to monitor my kit (and to spot issues on beta testing) .. dropping 10-15 percent (275k per day to 250k per day) overnight helped me identify the performance impact of certain intel firmware patches ... since then (the last few months) a variety of projects have degraded "normal" for CPU points so that the 250k ppd is now under 200k most days ... so over the last few months obviously my server is delivering over 20% less scientific benefit - feels like the time will come sooner rather than later that it will not be considered to be delivering any scientific value at which point I'll retire it ... maybe the new ARM/Android folders can take up the slack ;)
Neil-B
 
Posts: 1409
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: Proj 13420 same variability as 13418

Postby JohnChodera » Tue Jul 28, 2020 10:44 pm

> hmmm ... I start to wonder ... bumping ppds up because of variability on some cards (and then only variability of some WUs) ... tbh begins to feel less and less point in even keeping track of points ... cpu projects delivering >20% less than was normal across the board ... gpu projects being bumped up by 10% on a single request ... perhaps the "cpu is irrelevant message" has some grounds ... probably close down the team tbh and just fold for anonymous :)


We had adjusted the previous projects in the series (which are almost identical) upwards after lots of reports, and the internal testers saw less variation on a small number of test projects before we had to go live. I'm comfortable bringing the base credit back up since we had many reports of this before and no good data that things had improved _except_ for no reports of variation during testing.

We have some ideas for how to reduce variability going forward, but we've been focusing on the science until we can get the infrastructure fully automated and can turn our attention back to these issues.

~ John Chodera // MSKCC
User avatar
JohnChodera
Pande Group Member
 
Posts: 406
Joined: Fri Feb 22, 2013 10:59 pm

Re: Proj 13420 same variability as 13418

Postby aetch » Wed Jul 29, 2020 2:06 am

I'm assuming, when a work unit is completed and the results are uploaded back to F@H, somewhere in there is a log of the actual hardware the work unit ran on. Hopefully you'll have a big enough sample to look at individual gpus and separate out the fast and slow units and figure out what makes them different.
1). Ryzen 9 3900x, RTX 2070 Super, 16GB, Win 10, F@H 7.6.13
2). i7-4770K, GTX 1080 Ti, 16GB, Win 7, F@H 7.6.13
aetch
 
Posts: 50
Joined: Thu Jun 25, 2020 4:04 pm

Re: Proj 13420 same variability as 13418

Postby gunnarre » Wed Jul 29, 2020 12:53 pm

These types of GPU work units - (low atom count?) - seem to benefit from being fed by a CPU with high single core clocks. Typical gaming/graphics oriented systems with a fast GPU and a stock cooled CPU might actually make more PPD by stopping CPU folding while these WUs are running on the GPU, so the CPU can clock up to max stock "boost" frequencies on the single core polling the GPU. PPD/watt would also be better.

Production oriented systems with a modest GPU and many CPU cores likely won't benefit from stopping CPU folding, especially if is well cooled and "boost" is switched off (it's running all cores at the same frequency). In those systems, the CPU can be faster than the GPU and in some cases adding more threads to the CPU gives more PPD than configuring the GPU for folding - at least until CUDA support hopefully reduces CPU usage while GPU folding.
gunnarre
 
Posts: 181
Joined: Sun May 24, 2020 8:23 pm
Location: Norway

Re: Proj 13420 same variability as 13418

Postby bruce » Wed Jul 29, 2020 6:05 pm

The variablity HAS been reduced. Taken as a group, projects 13420 and 13421 are less variable. The really short WUs are now being assigned to slower GPUs and the fater ones are retained in 13420. That allows the average points for each group to be consistent with the GPU performace of half of the spectrum of FAH GPUs. It does not remove all variability when you consider the overall variability of a spectrum of P134xx assignments.

In this case, the union of projects 13420 and 13421 represent a wide variety of projects just as Project MoonShot represents a wide variety of suggested protein fragments.
bruce
 
Posts: 20009
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: Proj 13420 same variability as 13418

Postby Ichbin3 » Thu Jul 30, 2020 2:04 pm

Got another slow one right now
13420 (6171, 23, 0)
For all the people who say there aren't any ;- )
Ichbin3
 
Posts: 80
Joined: Thu May 28, 2020 9:06 am
Location: Germany

Re: Proj 13420 same variability as 13418

Postby JohnChodera » Thu Jul 30, 2020 4:26 pm

Thanks, Ichbin3! We're still working on this.

~ John Chodera // MSKCC
User avatar
JohnChodera
Pande Group Member
 
Posts: 406
Joined: Fri Feb 22, 2013 10:59 pm

Next

Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 3 guests

cron