980ti Issue?

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Post Reply
abravo
Posts: 16
Joined: Tue Aug 04, 2015 1:55 pm

980ti Issue?

Post by abravo »

I am finally going to post regarding my issue, because I can't seem to solve it myself.

I am experiencing a "crash" (not sure what else to call it), with my EVGA 980ti Classified. After I start Folding @ Home, and the computer (a dedicated Folding rig) runs fine for a time period, at certain, various times after Windows 10 turns the display off automatically, the computer will no longer continue to fold on the CPU and the display will not come out of the turned off state. If I disable the "turn off display" setting in Windows 10, the computer does not "crash." That has led me to believe that it is something related to the GPU. The only way to get the display back is to hit the reset button and reboot the computer.

I have tried the following Nvidia drivers: 362.00, 378.49, and 372.90. I am currently running 372.90 because it looked like that driver had the highest PPD for my 980ti.

Just for background, I have seven other computers that are Folding, and none of them experience this type of "crash" after their displays turn off. I do not have another 980ti, but I have three 960s, three 980s, and one 1080.

Does anyone have a theory as to why the 980ti rig is crashing.
Image
PS3EdOlkkola
Posts: 184
Joined: Tue Aug 26, 2014 9:48 pm
Hardware configuration: 10 SMP folding slots on Intel Phi "Knights Landing" system, configured as 24 CPUs/slot
9 AMD GPU folding slots
31 Nvidia GPU folding slots
50 total folding slots
Average PPD/slot = 459,500
Location: Dallas, TX

Re: 980ti Issue?

Post by PS3EdOlkkola »

Sounds like your system is going into power saving mode after a period of time. You may want to set your power profile to "performance" which prevents the system from going into a lower power state and therefore stops folding. You should still be able to allow the system to turn your monitor off and continue to fold normally.
Image
Hardware config viewtopic.php?f=66&t=17997&p=277235#p277235
abravo
Posts: 16
Joined: Tue Aug 04, 2015 1:55 pm

Re: 980ti Issue?

Post by abravo »

Thanks for the fast reply.

Well, I guess what I said above is no longer current information. I just went down to check on my computer, which had been folding in Windows "High Performance" mode, and I had the display set to never turn off, and the computer had "crashed" just like before. The odd thing is that I have a Corsair H110i GT on my 4930K, and I have the RGB block color set to change based on CPU temperature, and the color was somewhere between where it is at idle state (blue), and where it is during folding (white). So it appears that the CPU is still doing something in the "crashed" state. The other strange thing to add is that the fans on the 980ti are still spinning, and there is a little bit of heat being generated from the card, but nowhere near what it would be at full folding. So it seems as though the GPU is still doing something in the "crashed" state and is not completely idle.

I am going to pause the 4930K from folding, just let the 980ti fold, and allow the display to turn off. I want to see what happens if I take the CPU out of the scenario.

I will report back.
Image
PS3EdOlkkola
Posts: 184
Joined: Tue Aug 26, 2014 9:48 pm
Hardware configuration: 10 SMP folding slots on Intel Phi "Knights Landing" system, configured as 24 CPUs/slot
9 AMD GPU folding slots
31 Nvidia GPU folding slots
50 total folding slots
Average PPD/slot = 459,500
Location: Dallas, TX

Re: 980ti Issue?

Post by PS3EdOlkkola »

Are you using a GPU utility to check the temperature of the 980ti, like Afterburner or GPUZ? It could be the case the 980ti is overheating and crashing the display driver. Sometimes it will recover on its own, but not always. Also, if the 980ti is overclocked, you might want to lower the GPU clock to stock speeds to see if that has an impact. Another consideration: If the GPU is not getting enough power, it can cause the GPU to drop from 3D mode to 2D mode, dropping the GPU clock rate to just above idle, which if I recall correctly is 405MHz, so it will still fold, just a lot more slowly. The scenario could be this: The GPU overheats and/or lacks sufficient power to fold at 3D performance levels, the display driver crashes, recovers but can't resync the monitor, and the GPU continues to fold in 2D mode. That would explain the moderate amount of heat generated by the GPU and why your CPU still shows its folding.
Image
Hardware config viewtopic.php?f=66&t=17997&p=277235#p277235
artoar_11
Posts: 657
Joined: Sun Nov 22, 2009 8:42 pm
Hardware configuration: AMD R7 3700X @ 4.0 GHz; ASUS ROG STRIX X470-F GAMING; DDR4 2x8GB @ 3.0 GHz; GByte RTX 3060 Ti @ 1890 MHz; Fortron-550W 80+ bronze; Win10 Pro/64
Location: Bulgaria/Team #224497/artoar11_ALL_....

Re: 980ti Issue?

Post by artoar_11 »

Try this:
NVIDIA Control Panel -> Manage 3D settings -> Power management mode -> Prefer maximum performance
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 980ti Issue?

Post by bruce »

Check the Windows Event Log. Is a GPU error noted at whatever time the monitor seems to hang?

Are all your systems using the same version of Windows and the same driver version (if applicable)?
abravo
Posts: 16
Joined: Tue Aug 04, 2015 1:55 pm

Re: 980ti Issue?

Post by abravo »

I am getting the following error every four seconds (for several minutes) until the crash: System

- Provider

[ Name] Display

- EventID 4101

[ Qualifiers] 0

Level 3

Task 0

Keywords 0x80000000000000

- TimeCreated

[ SystemTime] 2017-02-06T22:42:06.505990300Z

EventRecordID 63486

Channel System

Computer DESKTOP-**********

Security


- EventData

nvlddmkm
Image
abravo
Posts: 16
Joined: Tue Aug 04, 2015 1:55 pm

Re: 980ti Issue?

Post by abravo »

artoar_11 wrote:Try this:
NVIDIA Control Panel -> Manage 3D settings -> Power management mode -> Prefer maximum performance
I tried this, but it did not solve the problem.
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 980ti Issue?

Post by bruce »

Microsoft says this about EventID 4101:
Windows Vista and Windows Server 2008 can detect when the graphics hardware or device driver take longer than expected to complete an operation. When this happens, Windows attempts to preempt the operation, and restore the display system to a usable state by resetting the graphics adapter. Typically, the only noticeable effect from this is a flicker of the display due to the reset and subsequent screen redraw. For more information, see "Timeout Detection and Recovery of GPUs through WDDM" at http://go.microsoft.com/fwlink/?linkid=77531 on the Microsoft Web site.
In the past, there have been instances where increasing TDR in the Registry was recommended as a short-term solution. FAH took steps to avoid this timeout event in the cases reported.

In fact, this situation is dependent on the WU, the FAHCore, and the characteristics of the GPU that is driving the Windows monitor and other software that might be using the GPU. (i.e - It might be caused by non-FAH activities.) With an issue this complex, they might have missed some combinations that apply to your system.

Gathering all the necessary information and submitting it to Development is one route we can take. I'm going to guess that setting that FAH slot to run with "idle" = "true" may work. Permanently increasing TDR is a possible work-around, too.
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: 980ti Issue?

Post by foldy »

I have a GPU which crashed the driver during folding and during games and left the PC with a black screen. Even though the GPU did not run too hot, the solution was to renew the thermal paste. Since then it never crashed again.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 980ti Issue?

Post by bruce »

foldy wrote:Even though the GPU did not run too hot, the solution was to renew the thermal paste.
Aha. It was running too hot but the thermal sensors didn't detect it.

It's time to design GPUs with thermal sensors on the chip, itself.
Post Reply