14564 GPU Spikes / Low Utilization / 5 second Cycles

Moderators: Site Moderators, FAHC Science Team

Post Reply
schapman1978
Posts: 35
Joined: Mon Nov 19, 2012 11:12 pm

14564 GPU Spikes / Low Utilization / 5 second Cycles

Post by schapman1978 »

I'm folding the fourth, fifth and sixth chunks of 14564 on two machine and across 3 GPUS. I'm logging GPU behavior on both machines (one is a 2 GPU 2080 ti rig, the other a 2080 single card rig) where this WU runs at about 30-40% GPU utilization and every 5 seconds or so the activity drops to about 5-10% then spikes back up. This causes a power drop off for each card each time and then a spike in power as it begins working again. This is a 150-300w swing every 5 seconds or so on this system, (ax1500i power supply on line conditioning 1500VA UPS.) This is ongoing for the whole fold on both machines. Both are Win10Pro machines and it seems like it *could* be this workunit as I've not observed this behavior on other WU's. It cuts my estimated PPD output from 7MM+ to under 4MM on one machine for efficiency considerations.

It looks something like this - I've posted more screenshots of my 2 card rig in another thread here seeking advice prior to posting, but now I have 6 instances of this with this workunit across separate machines so I thought I'd put it here. (Other thread with more pics and troubleshooting it viewtopic.php?f=101&t=34791 )

So far the units exhibiting this behavior are
(1440, 0 ,1)
(1251, 0, 2)
(341, 0, 2)
(1318, 0, 1)
(745, 0, 3)
(225, 0, 4)

Image
Trotador
Posts: 32
Joined: Sun Feb 17, 2008 6:41 pm

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Post by Trotador »

Same here, these units are making my VII to fall asleep in average with some "processing" spikes, Ubuntu 18.04

So it seems more wu related.


Image
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Post by foldy »

That project 14564 has only 25k atoms count which is too low to fully utilize a RTX 2080(ti) or Radeon VII. So this project should be send preferred to slower GPUs with less shaders. Such big shader GPUs should only get 100k+ atom count projects. But I guess there is still server overload issues and so you get these or nothing. Another possibility is the project setup for steps run and checkpointing is wrong.
schapman1978
Posts: 35
Joined: Mon Nov 19, 2012 11:12 pm

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Post by schapman1978 »

Gotcha - thanks for the heads up!
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Post by PantherX »

foldy wrote:That project 14564 has only 25k atoms count which is too low to fully utilize a RTX 2080(ti) or Radeon VII. So this project should be send preferred to slower GPUs with less shaders. Such big shader GPUs should only get 100k+ atom count projects. But I guess there is still server overload issues and so you get these or nothing. Another possibility is the project setup for steps run and checkpointing is wrong.
Unfortunately, with the current system, there's only identification of the GPU architecture, not GPU model. Thus, there's no ability to differentiate a high-end Pascale from a low-end Pascale. Detecting the GPU architecture and model would require extensive changes on the servers and client side... with all the attention that F@H got, let's see what happens later this year :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Post by _r2w_ben »

schapman1978 wrote:I'm folding the fourth, fifth and sixth chunks of 14564 on two machine and across 3 GPUS. I'm logging GPU behavior on both machines (one is a 2 GPU 2080 ti rig, the other a 2080 single card rig) where this WU runs at about 30-40% GPU utilization and every 5 seconds or so the activity drops to about 5-10% then spikes back up. This causes a power drop off for each card each time and then a spike in power as it begins working again. This is a 150-300w swing every 5 seconds or so on this system, (ax1500i power supply on line conditioning 1500VA UPS.) This is ongoing for the whole fold on both machines. Both are Win10Pro machines and it seems like it *could* be this workunit as I've not observed this behavior on other WU's. It cuts my estimated PPD output from 7MM+ to under 4MM on one machine for efficiency considerations.
Does the CPU usage for FahCore_22.exe spike opposite of the GPU i.e. high CPU when low GPU and vise versa? Try setting the priority of the FahCore_22.exe to "Below Normal" and see if that has any effect. It would help Windows give priority to the GPU unit instead of FahCore_a7.exe.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Post by PantherX »

If tweaking the priorities works for you, then do note that the priorities aren't sticky so you many need to use a third party application to "lock" the priority. I have previously used Process Lasso: https://bitsum.com/ and it does the job well.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Post by foldy »

@PantherX: Then it would be reasonable to put all nvidia RTX GPUs and all AMD Vega/RX5x00 GPUs to get high atom count work units preferred. And leave the low atom count work units preferred for nvidia GTX GPUs and AMD RX4x0/RX5x0
schapman1978
Posts: 35
Joined: Mon Nov 19, 2012 11:12 pm

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Post by schapman1978 »

foldy wrote:@PantherX: Then it would be reasonable to put all nvidia RTX GPUs and all AMD Vega/RX5x00 GPUs to get high atom count work units preferred. And leave the low atom count work units preferred for nvidia GTX GPUs and AMD RX4x0/RX5x0
I completely agree.

I've been in the basement plumbing in new sump pumps before the next storm so I've been absent this morning. Looking back though, I was excited when I finally finished the dozen or so 14564 WU's and got anything else - which run like normal. Only to come back and see my PPD productivity is back to half or so with these units apparently being queued up.

I'll take a look at priority but it does it even if I pause the cpu folding or finish cpu folding and let it sit idle.

**EDIT** I just checked and _a7 is running about 90-92% of my CPU and _22 has 2 instances running about 3.3% each.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Post by PantherX »

While that seems to be a plausible idea, it is up to the researchers to make the final decision. Considering that it does impact Donors, it might take a while.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
schapman1978
Posts: 35
Joined: Mon Nov 19, 2012 11:12 pm

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Post by schapman1978 »

Yup. I just adjusted priorities and it exhibits the same behavior on both GPU's. Worth a shot though!

Maybe I'll give my machines a couple of days off and see if these get folded by better-situated machines. I'm burning about $60/month in extra electricity so I hate for only half the work to get done for the same resource consumption. Or I might just let it eat. I dunno. More worried about the rubber sealing grommets they didn't pack in my sump pit lid kit... closed til Monday lol... ah well.

I've just picked up 2 more of these 14564 units - I wonder if I drop the advanced flags if they'll stop landing in my lap and eating up my light bill for half the productivity?
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Post by _r2w_ben »

schapman1978 wrote:I've just picked up 2 more of these 14564 units - I wonder if I drop the advanced flags if they'll stop landing in my lap and eating up my light bill for half the productivity?
These are in advanced at the moment so that would help until they're released to full FAH. When that occurs, you could hop back to advanced to avoid them.
schapman1978
Posts: 35
Joined: Mon Nov 19, 2012 11:12 pm

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Post by schapman1978 »

Good to know. I got a juicy 500k+ WU in advanced the other day I’d hate to miss those tho lol
DarkFoss
Posts: 103
Joined: Fri Apr 16, 2010 11:43 pm
Hardware configuration: AMD 5800X3D Asus ROG Strix X570-E Gaming WiFi II bios 5003 G-Skill TridentZ Neo 3600mhz Asrock Tachi RX 7900XTX Corsair rm850x psu Asus PG32UQXR EK Elite 360 D-rgb aio Win 11pro/Kubuntu 22.04.4 LTS UPS BX1500G
Location: Galifrey

Re: 14564 GPU Spikes / Low Utilization / 5 second Cycles

Post by DarkFoss »

I've had 3 so far. 1 completed without error the second bombed out at 94% with some nan error, the 3rd had a different error but managed to complete. All on a FuryX(FijiXT) , the 14561-3 all fold fine using the latest 20.4.2 driver. I can dig through the logs and post the errors if you'd like.
Image
Post Reply