14564 GPU Spikes / Low Utilization / 5 second Cycles

Moderators: Site Moderators, FAHC Science Team

14564 GPU Spikes / Low Utilization / 5 second Cycles

Postby schapman1978 » Sat Apr 25, 2020 10:19 am

I'm folding the fourth, fifth and sixth chunks of 14564 on two machine and across 3 GPUS. I'm logging GPU behavior on both machines (one is a 2 GPU 2080 ti rig, the other a 2080 single card rig) where this WU runs at about 30-40% GPU utilization and every 5 seconds or so the activity drops to about 5-10% then spikes back up. This causes a power drop off for each card each time and then a spike in power as it begins working again. This is a 150-300w swing every 5 seconds or so on this system, (ax1500i power supply on line conditioning 1500VA UPS.) This is ongoing for the whole fold on both machines. Both are Win10Pro machines and it seems like it *could* be this workunit as I've not observed this behavior on other WU's. It cuts my estimated PPD output from 7MM+ to under 4MM on one machine for efficiency considerations.

It looks something like this - I've posted more screenshots of my 2 card rig in another thread here seeking advice prior to posting, but now I have 6 instances of this with this workunit across separate machines so I thought I'd put it here. (Other thread with more pics and troubleshooting it https://foldingforum.org/viewtopic.php?f=101&t=34791 )

So far the units exhibiting this behavior are
(1440, 0 ,1)
(1251, 0, 2)
(341, 0, 2)
(1318, 0, 1)
(745, 0, 3)
(225, 0, 4)

Image
schapman1978
 
Posts: 35
Joined: Tue Nov 20, 2012 12:12 am

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Postby Trotador » Sat Apr 25, 2020 11:51 am

Same here, these units are making my VII to fall asleep in average with some "processing" spikes, Ubuntu 18.04

So it seems more wu related.


Image
Trotador
 
Posts: 32
Joined: Sun Feb 17, 2008 7:41 pm

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Postby foldy » Sat Apr 25, 2020 12:02 pm

That project 14564 has only 25k atoms count which is too low to fully utilize a RTX 2080(ti) or Radeon VII. So this project should be send preferred to slower GPUs with less shaders. Such big shader GPUs should only get 100k+ atom count projects. But I guess there is still server overload issues and so you get these or nothing. Another possibility is the project setup for steps run and checkpointing is wrong.
foldy
 
Posts: 2041
Joined: Sat Dec 01, 2012 4:43 pm

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Postby schapman1978 » Sat Apr 25, 2020 2:52 pm

Gotcha - thanks for the heads up!
schapman1978
 
Posts: 35
Joined: Tue Nov 20, 2012 12:12 am

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Postby PantherX » Sat Apr 25, 2020 4:55 pm

foldy wrote:That project 14564 has only 25k atoms count which is too low to fully utilize a RTX 2080(ti) or Radeon VII. So this project should be send preferred to slower GPUs with less shaders. Such big shader GPUs should only get 100k+ atom count projects. But I guess there is still server overload issues and so you get these or nothing. Another possibility is the project setup for steps run and checkpointing is wrong.

Unfortunately, with the current system, there's only identification of the GPU architecture, not GPU model. Thus, there's no ability to differentiate a high-end Pascale from a low-end Pascale. Detecting the GPU architecture and model would require extensive changes on the servers and client side... with all the attention that F@H got, let's see what happens later this year :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6850
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Postby _r2w_ben » Sat Apr 25, 2020 6:31 pm

schapman1978 wrote:I'm folding the fourth, fifth and sixth chunks of 14564 on two machine and across 3 GPUS. I'm logging GPU behavior on both machines (one is a 2 GPU 2080 ti rig, the other a 2080 single card rig) where this WU runs at about 30-40% GPU utilization and every 5 seconds or so the activity drops to about 5-10% then spikes back up. This causes a power drop off for each card each time and then a spike in power as it begins working again. This is a 150-300w swing every 5 seconds or so on this system, (ax1500i power supply on line conditioning 1500VA UPS.) This is ongoing for the whole fold on both machines. Both are Win10Pro machines and it seems like it *could* be this workunit as I've not observed this behavior on other WU's. It cuts my estimated PPD output from 7MM+ to under 4MM on one machine for efficiency considerations.

Does the CPU usage for FahCore_22.exe spike opposite of the GPU i.e. high CPU when low GPU and vise versa? Try setting the priority of the FahCore_22.exe to "Below Normal" and see if that has any effect. It would help Windows give priority to the GPU unit instead of FahCore_a7.exe.
_r2w_ben
 
Posts: 281
Joined: Wed Apr 23, 2008 4:11 pm

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Postby PantherX » Sat Apr 25, 2020 6:49 pm

If tweaking the priorities works for you, then do note that the priorities aren't sticky so you many need to use a third party application to "lock" the priority. I have previously used Process Lasso: https://bitsum.com/ and it does the job well.
User avatar
PantherX
Site Moderator
 
Posts: 6850
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Postby foldy » Sat Apr 25, 2020 7:06 pm

@PantherX: Then it would be reasonable to put all nvidia RTX GPUs and all AMD Vega/RX5x00 GPUs to get high atom count work units preferred. And leave the low atom count work units preferred for nvidia GTX GPUs and AMD RX4x0/RX5x0
foldy
 
Posts: 2041
Joined: Sat Dec 01, 2012 4:43 pm

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Postby schapman1978 » Sat Apr 25, 2020 7:52 pm

foldy wrote:@PantherX: Then it would be reasonable to put all nvidia RTX GPUs and all AMD Vega/RX5x00 GPUs to get high atom count work units preferred. And leave the low atom count work units preferred for nvidia GTX GPUs and AMD RX4x0/RX5x0


I completely agree.

I've been in the basement plumbing in new sump pumps before the next storm so I've been absent this morning. Looking back though, I was excited when I finally finished the dozen or so 14564 WU's and got anything else - which run like normal. Only to come back and see my PPD productivity is back to half or so with these units apparently being queued up.

I'll take a look at priority but it does it even if I pause the cpu folding or finish cpu folding and let it sit idle.

**EDIT** I just checked and _a7 is running about 90-92% of my CPU and _22 has 2 instances running about 3.3% each.
schapman1978
 
Posts: 35
Joined: Tue Nov 20, 2012 12:12 am

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Postby PantherX » Sat Apr 25, 2020 8:07 pm

While that seems to be a plausible idea, it is up to the researchers to make the final decision. Considering that it does impact Donors, it might take a while.
User avatar
PantherX
Site Moderator
 
Posts: 6850
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Postby schapman1978 » Sat Apr 25, 2020 8:20 pm

Yup. I just adjusted priorities and it exhibits the same behavior on both GPU's. Worth a shot though!

Maybe I'll give my machines a couple of days off and see if these get folded by better-situated machines. I'm burning about $60/month in extra electricity so I hate for only half the work to get done for the same resource consumption. Or I might just let it eat. I dunno. More worried about the rubber sealing grommets they didn't pack in my sump pit lid kit... closed til Monday lol... ah well.

I've just picked up 2 more of these 14564 units - I wonder if I drop the advanced flags if they'll stop landing in my lap and eating up my light bill for half the productivity?
schapman1978
 
Posts: 35
Joined: Tue Nov 20, 2012 12:12 am

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Postby _r2w_ben » Sat Apr 25, 2020 8:29 pm

schapman1978 wrote:I've just picked up 2 more of these 14564 units - I wonder if I drop the advanced flags if they'll stop landing in my lap and eating up my light bill for half the productivity?

These are in advanced at the moment so that would help until they're released to full FAH. When that occurs, you could hop back to advanced to avoid them.
_r2w_ben
 
Posts: 281
Joined: Wed Apr 23, 2008 4:11 pm

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Postby schapman1978 » Sat Apr 25, 2020 8:58 pm

Good to know. I got a juicy 500k+ WU in advanced the other day I’d hate to miss those tho lol
schapman1978
 
Posts: 35
Joined: Tue Nov 20, 2012 12:12 am

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Postby DarkFoss » Sun Apr 26, 2020 4:01 pm

I've had 3 so far. 1 completed without error the second bombed out at 94% with some nan error, the 3rd had a different error but managed to complete. All on a FuryX(FijiXT) , the 14561-3 all fold fine using the latest 20.4.2 driver. I can dig through the logs and post the errors if you'd like.
Image
DarkFoss
 
Posts: 84
Joined: Sat Apr 17, 2010 12:43 am
Location: DG,IL


Return to Issues with a specific WU

Who is online

Users browsing this forum: Google [Bot] and 2 guests

cron