2nd GPU Spiking Up and Down

Driver issues associated with the Windows 10 roll-out

Moderators: Site Moderators, FAHC Science Team

2nd GPU Spiking Up and Down

Postby schapman1978 » Sat Apr 25, 2020 4:02 am

I'm noticing some unusual second GPU slot activity tonight. It's working on WU project 14564 when I noticed this. Not sure if it's WU dependent or what. I have a pair of 2080 ti's folding and the first one's copy activity and CUDA usage is pretty flat and level as you can see in the images. The second one loads up for a few seconds, then drops off, then repeats over and over and over. I can hear the light coil whine, then it stops. Then restarts. It's not unusual to hear this but when I started looking at the card activity closer, I'm seeing these patterns. There is no NVI-link between them and they are not set up in SLI. They are both holding steady clocks and the second card is doing this whether its stock, under, or overclocked. It will grind on like this indefinitely but I'm trying to understand why it's loading up and dropping off. Then repeating. It causes a spike of about 125-150W each time. It's strange to me. Any thoughts? Reinstalled latest drivers, restarted client, OpenCL options at default.

Sorry for the zoomed in size - Imgur did something weird or I got the code wrong to link it right.

Image
Image
Last edited by schapman1978 on Sat Apr 25, 2020 9:28 am, edited 1 time in total.
schapman1978
 
Posts: 35
Joined: Tue Nov 20, 2012 12:12 am

Re: 2nd GPU Spiking Up and Down

Postby PantherX » Sat Apr 25, 2020 5:00 am

It seems that your GPU is being starved of CPU Cycles. I would suggest that you pause whatever is causing your CPU hit 100% usage and see if the issue goes away or not.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6765
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: 2nd GPU Spiking Up and Down

Postby MeeLee » Sat Apr 25, 2020 5:24 am

Interesting.
There's always some up and down motion, as the GPU writes to VRAM, and waits for PCIE data.
Your first GPU shows this.
Are you running PCIE Gen 2.0 on your Motherboard (running DDR3 RAM; GPU-Z can tell you).
If so, you'll need to configure your system to run at x8 speeds on both GPUs. Running below this, might starve the GPU on PCIE bandwidth.
If you're running PCIE 3.0, you'll preferably have x8, but x4 speed should work as well.
MeeLee
 
Posts: 1103
Joined: Tue Feb 19, 2019 11:16 pm

Re: 2nd GPU Spiking Up and Down

Postby Rel25917 » Sat Apr 25, 2020 7:10 am

Do the dips coincide with every 1% or so of the workunit? Could just be a dip while it does its checkpoints.
Rel25917
 
Posts: 302
Joined: Wed Aug 15, 2012 3:31 am

Re: 2nd GPU Spiking Up and Down

Postby schapman1978 » Sat Apr 25, 2020 8:32 am

I paused the CPU entirely in FAH and changed it from 28 threads to 26 and 16 etc and it still occurs. If the CPU is fully paused in FAH the valleys are much more shallow. It’s an AMD 3950x fwiw. If I paused the cpu from folding it runs at about 1-3% usually handling tasks.

I’m also running 32GB DDR4-3600 at stock CL16 timings and have dual PCIE 2080ti’s which run natively at 8x/8x on this X570 board. I have a pcie 4.0 m.2 drive in slot one but I’m wondering if having a second m.2 on the second slot might be shorting bandwidth to card 2 for some reason - it’s a gen 3 pcie m.2 and that m.2 is run by the x570 chipset so it shouldn’t since 4 lanes are dedicated to the chipset by the cpu - but I’m willing to pop it out and see if you think that makes sense. I keep wondering if maybe because of the second m.2 if maybe the second gpu slot is going 4x or something.

I’ll check in bios if it’s posting 8/8 when I get up.

And unfortunately the 1% dips aren’t coinciding with the constant dips. I wish I could crunch stuff that fast lol... this is pretty rhythmic and every 5-7 seconds or so I’d guess by memory.

Good ideas - we’re thinking similarly. Open to anything.
schapman1978
 
Posts: 35
Joined: Tue Nov 20, 2012 12:12 am

Re: 2nd GPU Spiking Up and Down

Postby bruce » Sat Apr 25, 2020 8:41 am

The frequency of the checkpoints is defined by the Project Owner.

FAH runs what is called a "sanity check" which gives the analysis a chance to be aborted if the WU is, in fact, unstable. The actual GPU process is suspended briefly when this is processed. It is probably synchronized with the checkpoints, but I'm not sure if that's always true.
bruce
 
Posts: 20019
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: 2nd GPU Spiking Up and Down

Postby PantherX » Sat Apr 25, 2020 8:45 am

Get GPU-Z and see what the PCIe utilization is and also the speed that it is operating at from within the OS.
User avatar
PantherX
Site Moderator
 
Posts: 6765
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: 2nd GPU Spiking Up and Down

Postby schapman1978 » Sat Apr 25, 2020 8:54 am

Well, I'm up anyway for other reasons - it's still doing it - but in a twist - it's now doing it on the physical first slot card - not the second one. I also screenshotted my GPUz screens showing them at 8x/8x - I guess that defeats my possible 2nd pcie m.2 bandwidth theft theory. Hmm...

Sorry for delay pics sizes are terrible I'm working through the coding
Image
Image
schapman1978
 
Posts: 35
Joined: Tue Nov 20, 2012 12:12 am

Re: 2nd GPU Spiking Up and Down

Postby uyaem » Sat Apr 25, 2020 9:01 am

Are the GPUs working on different projects now, could it be a project-specific "glitch" (intentional or not)?
Image
CPU: Ryzen 9 3900X (1x21 CPUs) ~ GPU: nVidia GeForce GTX 1660 Super (Asus)
uyaem
 
Posts: 222
Joined: Sat Mar 21, 2020 8:35 pm
Location: Esslingen, Germany

Re: 2nd GPU Spiking Up and Down

Postby schapman1978 » Sat Apr 25, 2020 9:11 am

It also appears that both GPUs can do it at the same time. I wonder if it's just the sanity check happening every 5 or so seconds and might be normal? I've always heard the intermittent noise breaks of the cards even in single or double configuration but never investigated assuming it was normal behavior. My checkpoints are set at 5m manually but this is like clockwork every 5 seconds. Looks like I finished a unit and picked up another - they're both now folding a piece of 14564 and both are dipping like that. I wonder if it's just the project ?

Image
schapman1978
 
Posts: 35
Joined: Tue Nov 20, 2012 12:12 am

Re: 2nd GPU Spiking Up and Down

Postby PantherX » Sat Apr 25, 2020 9:15 am

For GPUs, the value you configure for checkpoints is ignored. However, for the CPU, the checkpoint value applies. In the case of GPU checkpoints, the researcher sets the checkpoint interval which can vary from 2% to 5% IIRC.

The drops every 5 seconds is weird. Can you try pausing GPU01 and observing GPU02. Then pause GPU02 and unpause GPU01 and observe what happens. I would observe each attempt for 5 minutes.
User avatar
PantherX
Site Moderator
 
Posts: 6765
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: 2nd GPU Spiking Up and Down

Postby schapman1978 » Sat Apr 25, 2020 9:25 am

Good idea - I did a short 30 seconds of pausing 1 gpu and also pausing the cpu too with it so only 1 gpu was folding at a time (each scenario for about 30 seconds.) Then I repeated that test of the other GPU running with the CPU, then with the cpu paused. It exhibits the same spiking behavior. The only difference, regardless of which GPU is running this unit, is that if the CPU is totally paused (not reduced core usage but fully paused) either or both cards ramp up a few % points for usage and the dips become significantly less severe. But they always happen on time like a metronome - I think it might be programmed to run this way. Not sure. I'll try a longer sample test but I expect this behavior to persist.

I'm also going to reboot everything and see if it replicates. I'm only paying attention because I just jumped to Win10 Pro from Win10 Home tonight before I went to bed. OS swaps always makes me paranoid at first. Too bad I can't fire up a VM and run it in ubuntu or something to see if it's the same.
schapman1978
 
Posts: 35
Joined: Tue Nov 20, 2012 12:12 am

Re: 2nd GPU Spiking Up and Down

Postby PantherX » Sat Apr 25, 2020 9:35 am

I am running Windows 10 Pro 1909 64-bit and this is what it looks like:
Image
User avatar
PantherX
Site Moderator
 
Posts: 6765
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: 2nd GPU Spiking Up and Down

Postby schapman1978 » Sat Apr 25, 2020 9:41 am

Right on - I'm going to let these two chunks finish and wait until I get a new project and see if it's just this particular project. Both my GPU's are folding 14564 and exhibiting this behavior after a reboot still. I've had other projects where it was a nice level line (with minor variations like yours) on other projects. If a different project doesn't replicate this issue, maybe I should post something in that "problems with a particular WU" thread? I don't know if it a problem or not by standards. It appears its going to finish them, but maybe this optimization is causing it to take a lot longer than if it wasn't faceplanting both GPU's every 5 seconds. I'm not a programmer - I'm just thinking out loud.

Here's a GPUz sensors shot with it showing info for both and task manager on the same shot - it does this with the GPU's clocked up some or at default settings. They just fold slower and a little quieter at default clocks but spikes exist.
Image
schapman1978
 
Posts: 35
Joined: Tue Nov 20, 2012 12:12 am

Re: 2nd GPU Spiking Up and Down

Postby schapman1978 » Sat Apr 25, 2020 10:26 am

Yeah I realized it's doing the exact same thing on another workstation I'm folding on here that's a single 2080 setup. Identical behavior with or without CPU running like this machine. I went ahead and put a thread up in the WU section for the owner to take a peek at. I just got my 6th chunk of 14564 and its doing the same thing. So far, it's been on
(1440, 0, 1)
(1251, 0, 2)
(341, 0, 2)
(1318, 0, 1)
(745, 0, 3)
(225, 0, 4)

Link to that thread here https://foldingforum.org/viewtopic.php?f=19&t=34797
schapman1978
 
Posts: 35
Joined: Tue Nov 20, 2012 12:12 am

Next

Return to Windows 10 + NVidia

Who is online

Users browsing this forum: No registered users and 2 guests

cron