Page 1 of 1

14564 GPU Spikes / Low Utilization / 5 second Cycles

Posted: Sat Apr 25, 2020 9:19 am
by schapman1978
I'm folding the fourth, fifth and sixth chunks of 14564 on two machine and across 3 GPUS. I'm logging GPU behavior on both machines (one is a 2 GPU 2080 ti rig, the other a 2080 single card rig) where this WU runs at about 30-40% GPU utilization and every 5 seconds or so the activity drops to about 5-10% then spikes back up. This causes a power drop off for each card each time and then a spike in power as it begins working again. This is a 150-300w swing every 5 seconds or so on this system, (ax1500i power supply on line conditioning 1500VA UPS.) This is ongoing for the whole fold on both machines. Both are Win10Pro machines and it seems like it *could* be this workunit as I've not observed this behavior on other WU's. It cuts my estimated PPD output from 7MM+ to under 4MM on one machine for efficiency considerations.

It looks something like this - I've posted more screenshots of my 2 card rig in another thread here seeking advice prior to posting, but now I have 6 instances of this with this workunit across separate machines so I thought I'd put it here. (Other thread with more pics and troubleshooting it viewtopic.php?f=101&t=34791 )

So far the units exhibiting this behavior are
(1440, 0 ,1)
(1251, 0, 2)
(341, 0, 2)
(1318, 0, 1)
(745, 0, 3)
(225, 0, 4)

Image

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Posted: Sat Apr 25, 2020 10:51 am
by Trotador
Same here, these units are making my VII to fall asleep in average with some "processing" spikes, Ubuntu 18.04

So it seems more wu related.


Image

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Posted: Sat Apr 25, 2020 11:02 am
by foldy
That project 14564 has only 25k atoms count which is too low to fully utilize a RTX 2080(ti) or Radeon VII. So this project should be send preferred to slower GPUs with less shaders. Such big shader GPUs should only get 100k+ atom count projects. But I guess there is still server overload issues and so you get these or nothing. Another possibility is the project setup for steps run and checkpointing is wrong.

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Posted: Sat Apr 25, 2020 1:52 pm
by schapman1978
Gotcha - thanks for the heads up!

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Posted: Sat Apr 25, 2020 3:55 pm
by PantherX
foldy wrote:That project 14564 has only 25k atoms count which is too low to fully utilize a RTX 2080(ti) or Radeon VII. So this project should be send preferred to slower GPUs with less shaders. Such big shader GPUs should only get 100k+ atom count projects. But I guess there is still server overload issues and so you get these or nothing. Another possibility is the project setup for steps run and checkpointing is wrong.
Unfortunately, with the current system, there's only identification of the GPU architecture, not GPU model. Thus, there's no ability to differentiate a high-end Pascale from a low-end Pascale. Detecting the GPU architecture and model would require extensive changes on the servers and client side... with all the attention that F@H got, let's see what happens later this year :)

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Posted: Sat Apr 25, 2020 5:31 pm
by _r2w_ben
schapman1978 wrote:I'm folding the fourth, fifth and sixth chunks of 14564 on two machine and across 3 GPUS. I'm logging GPU behavior on both machines (one is a 2 GPU 2080 ti rig, the other a 2080 single card rig) where this WU runs at about 30-40% GPU utilization and every 5 seconds or so the activity drops to about 5-10% then spikes back up. This causes a power drop off for each card each time and then a spike in power as it begins working again. This is a 150-300w swing every 5 seconds or so on this system, (ax1500i power supply on line conditioning 1500VA UPS.) This is ongoing for the whole fold on both machines. Both are Win10Pro machines and it seems like it *could* be this workunit as I've not observed this behavior on other WU's. It cuts my estimated PPD output from 7MM+ to under 4MM on one machine for efficiency considerations.
Does the CPU usage for FahCore_22.exe spike opposite of the GPU i.e. high CPU when low GPU and vise versa? Try setting the priority of the FahCore_22.exe to "Below Normal" and see if that has any effect. It would help Windows give priority to the GPU unit instead of FahCore_a7.exe.

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Posted: Sat Apr 25, 2020 5:49 pm
by PantherX
If tweaking the priorities works for you, then do note that the priorities aren't sticky so you many need to use a third party application to "lock" the priority. I have previously used Process Lasso: https://bitsum.com/ and it does the job well.

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Posted: Sat Apr 25, 2020 6:06 pm
by foldy
@PantherX: Then it would be reasonable to put all nvidia RTX GPUs and all AMD Vega/RX5x00 GPUs to get high atom count work units preferred. And leave the low atom count work units preferred for nvidia GTX GPUs and AMD RX4x0/RX5x0

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Posted: Sat Apr 25, 2020 6:52 pm
by schapman1978
foldy wrote:@PantherX: Then it would be reasonable to put all nvidia RTX GPUs and all AMD Vega/RX5x00 GPUs to get high atom count work units preferred. And leave the low atom count work units preferred for nvidia GTX GPUs and AMD RX4x0/RX5x0
I completely agree.

I've been in the basement plumbing in new sump pumps before the next storm so I've been absent this morning. Looking back though, I was excited when I finally finished the dozen or so 14564 WU's and got anything else - which run like normal. Only to come back and see my PPD productivity is back to half or so with these units apparently being queued up.

I'll take a look at priority but it does it even if I pause the cpu folding or finish cpu folding and let it sit idle.

**EDIT** I just checked and _a7 is running about 90-92% of my CPU and _22 has 2 instances running about 3.3% each.

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Posted: Sat Apr 25, 2020 7:07 pm
by PantherX
While that seems to be a plausible idea, it is up to the researchers to make the final decision. Considering that it does impact Donors, it might take a while.

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Posted: Sat Apr 25, 2020 7:20 pm
by schapman1978
Yup. I just adjusted priorities and it exhibits the same behavior on both GPU's. Worth a shot though!

Maybe I'll give my machines a couple of days off and see if these get folded by better-situated machines. I'm burning about $60/month in extra electricity so I hate for only half the work to get done for the same resource consumption. Or I might just let it eat. I dunno. More worried about the rubber sealing grommets they didn't pack in my sump pit lid kit... closed til Monday lol... ah well.

I've just picked up 2 more of these 14564 units - I wonder if I drop the advanced flags if they'll stop landing in my lap and eating up my light bill for half the productivity?

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Posted: Sat Apr 25, 2020 7:29 pm
by _r2w_ben
schapman1978 wrote:I've just picked up 2 more of these 14564 units - I wonder if I drop the advanced flags if they'll stop landing in my lap and eating up my light bill for half the productivity?
These are in advanced at the moment so that would help until they're released to full FAH. When that occurs, you could hop back to advanced to avoid them.

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Posted: Sat Apr 25, 2020 7:58 pm
by schapman1978
Good to know. I got a juicy 500k+ WU in advanced the other day I’d hate to miss those tho lol

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Posted: Sun Apr 26, 2020 3:01 pm
by DarkFoss
I've had 3 so far. 1 completed without error the second bombed out at 94% with some nan error, the 3rd had a different error but managed to complete. All on a FuryX(FijiXT) , the 14561-3 all fold fine using the latest 20.4.2 driver. I can dig through the logs and post the errors if you'd like.