random system hang

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
djgibbons
Posts: 38
Joined: Tue Sep 24, 2013 11:02 pm

random system hang

Post by djgibbons »

I have started getting random system hangs. or freezes. It happens after the monitors turn off due to inactivity and the GPUs start folding (on LIGHT setting). Everything is fine for 5-8 hours, and then the system is unresponsive. I can get the same problem if I run folding on MEDIUM setting for 1-6 hours. At this time I have not allowed the monitors to turn off (just screen saver), and it is very stable. But I don't think the GPUs do much folding with the screen saver on.

My computer is about 2 years old and otherwise robust. I have the latest update for the WIN10 OS. I have an Nvidia Quadro RTX4000 driving 2 monitors, and a Quadro K4200 I keep just for folding. I have contacted Nvidia about this behaviour to make sure the cards are not too hot and are still running within their design parameters.

I have updated all the drivers and BIOS in the past month. I also use Restoro to do some clean-up, but this only delays the hang for a little while.

Are there any known issues with 2 cards? Should I have each card driving one monitor?
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: random system hang

Post by JimboPalmer »

If you Igo to control panels and search for power, go to the power and screen settings. Reduce the time before the screen powers down and make sure the PC never sleeps.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
djgibbons
Posts: 38
Joined: Tue Sep 24, 2013 11:02 pm

Re: random system hang

Post by djgibbons »

Done. I will post the results when I have some.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: random system hang

Post by PantherX »

Just wondering if you have installed Nvidia drivers from the official site? Reason is that if you have done the feature update to 20H2, then you might be using Microsoft Drivers which are known to be troublesome when it comes to folding. Using the drivers from Nvidia would resolve that issue.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
djgibbons
Posts: 38
Joined: Tue Sep 24, 2013 11:02 pm

Re: random system hang

Post by djgibbons »

I did get a hang about 3 hours after changing the screen settings with reduced power down time. I already had the PC at not sleeping. When I checked the driver version it matched Nvidia's latest, but I reinstalled it as a clean installation to be sure. What I did notice is that the file path given as the first step of the process was historically correct, but I can't find the actual directory based on the driver version name (460.89). So, something has changed with this RTX change-over.

Could there be a problem if the 2 graphics cards don't know that the other one exists?
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: random system hang

Post by JimboPalmer »

https://www.techpowerup.com/gpu-specs/q ... 4000.c3336
is going to pull 165 watts and hopes for a 450 watt power supply.

https://www.techpowerup.com/gpu-specs/q ... 4200.c2602
pulls another 108 Watts, so a 600 Watt Power supply may be needed. (there is a real chance 550 watts is enough)

How many watts is your power supply rated for?

And in a similar vein, is your PC overheating?

Both Speccy and GPU-Z can measure Temps.

https://www.ccleaner.com/speccy/download/standard

https://www.techpowerup.com/gpuz/
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
djgibbons
Posts: 38
Joined: Tue Sep 24, 2013 11:02 pm

Re: random system hang

Post by djgibbons »

I have a 1000W power supply, so should be plenty of juice. I have GPUZ installed per Nvidia's request and have sent them data. I also use CPUID HWMonitor for getting an idea of where things are at. My case has 3 input fans and 3 output fans, with another on the CPU cooler (heat pipe). I have no signs of overheating, unless I have exceeded the temperature limit on one or both of the cards. I have a JPG image of steady state conditions while running FAH at MEDIUM. How do I attach it here?
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: random system hang

Post by PantherX »

djgibbons wrote:...When I checked the driver version it matched Nvidia's latest, but I reinstalled it as a clean installation to be sure. What I did notice is that the file path given as the first step of the process was historically correct, but I can't find the actual directory based on the driver version name (460.89). So, something has changed with this RTX change-over...
Apologies for not providing a bit more context... the driver that Microsoft would install would show the "right" value. The issue is that it would only do the Driver install and not the additional OpenCL stuff that is normally packaged in Nvidia's Driver. However, I don't know of a way to identify if the driver was installed by Microsoft or Nvidia... it's only when F@H encounters issues and a re-install magically fixes it.
djgibbons wrote:...Could there be a problem if the 2 graphics cards don't know that the other one exists?
For F@H, as long as it sees two supported GPUs, it will continue to work without issues.
For the drivers, as long as both GPUs are supported by the same driver package, it would work fine without issues... theoretically speaking.

Have you tried to fold only on 1 GPU at a time to test it out?
djgibbons wrote:...I have a JPG image of steady state conditions while running FAH at MEDIUM. How do I attach it here?
The forum doesn't provide image hosting feature. Instead, you can use third party sites to upload the image and then share it here. A commonly used one is: https://imgur.com/
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
djgibbons
Posts: 38
Joined: Tue Sep 24, 2013 11:02 pm

Re: random system hang

Post by djgibbons »

I had installed the latest driver straight from Nokia. I heard back from them, and all the data I sent them showed normal operation of both cards. My next step is to disable one of them to see if that helps.
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: random system hang

Post by JimboPalmer »

Is Nokia a typo? I would not use a driver from Nokia.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
djgibbons
Posts: 38
Joined: Tue Sep 24, 2013 11:02 pm

Re: random system hang

Post by djgibbons »

Sorry, Nvidia. One of my company's customers is Nokia, and they are very demanding. So, I disabled the K4200 as it is the oldest card and has been repaired once before under warranty. I then set FAH to Medium and it has run without a hang now for almost 24 hours. If I can't use the K4200 for FAH, I might as well remove it from my computer.
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: random system hang

Post by JimboPalmer »

It may be possible the K4200 will work in another PC. (this assumes you have more than one PC)
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
djgibbons
Posts: 38
Joined: Tue Sep 24, 2013 11:02 pm

Re: random system hang

Post by djgibbons »

I do have another desktop, as well as 2 older laptops and an all-in-one. These four machines are running FAH on High setting without issues, so I might let sleeping dogs lie. I am also running FAH on my work computer at Medium, so this gives me 12 clients on average. It would be nice to have the K4200 in the mix, but I am still talking to Nvidia to see if there is any way to test it for internal defects that can cause a random hang. What would help me the most, I think, is a tool that can log all system component behaviours every few seconds to see if we can capture the moment a hang hits.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: random system hang

Post by bruce »

With the right drivers, the K4200 is supported. Professional graphics boards are especially designed to be powerful on FP64 computations. FAH uses mainly FP32 calculations plus a small percentage of FP64. It should work, though.
djgibbons
Posts: 38
Joined: Tue Sep 24, 2013 11:02 pm

Re: random system hang

Post by djgibbons »

The drivers are the latest that Nvidia offers. And both cards use the same one, so there shouldn't be a conflict. I know that Nvidia offers an interface to connect 2 cards to make them work together, but have not tried it. Would FAH see them as 2 slots if tied together?

I am holding onto the idea that there is an interface problem where the traffic on the motherboard hits a jam because the 2 graphics cards are processing so much more data than the CPU, but the K4200 is lagging both the RTX4000 and the CPU.

I am also considering that having 2 monitors on the RTX4000 and none on the K4200 could present an imbalance of sorts. It might be worth a test to put my 1K monitor on the K4200 and keep the 4K monitor on the RTX4000, just for shiggles.
Post Reply