Page 1 of 1

GPU or WU issue?

Posted: Wed Jun 07, 2017 12:12 pm
by hiigaran
A few minutes ago, I noticed my folding rig started making more noise every few seconds. A quick investigation showed that the GPU0 fan, which is set to run at 100%, has apparently been running much faster for brief periods of time, going from 4200 RPM at what is supposed to be 100%, to 6000 RPM. A quick inspection of the temperatures revealed that the card was hitting 95 degrees. Another card with the exact same make, model, airflow setup and fan speed settings showed 70 degrees on load. Naturally, I have paused the GPU until the cause is found.

WU PRCG at the time was 9415 (192, 0, 303). Anyone else been having temperature issues with similar WUs?

Also, does this mean I'm not truly running my fans at 100%, or is it actually possible to push the fans harder? If the latter, anyone know how on Linux (naturally, it would reduce the lifespan, but hey, folding cards don't last long anyway)?

Re: GPU or WU issue?

Posted: Wed Jun 07, 2017 5:03 pm
by bruce
It's difficult to diagnose without access to the actual hardware.

In fact, you may be experiencing a potential fan failure. There have been a number of failures of fan bearing (lubrication) failures that plague some of the least-expensive fans on the market. Whether you actually have a fan with a long or short life depends on what components were used by your GPU's manufacturer BUT the first sign of this type of failure is an intermittent "strange" noise. Personally, I'd get an RMA for the GPU and let them figure it out ... or if that's not possible, get the fan replaced with a better quality/newer fan.

Re: GPU or WU issue?

Posted: Wed Jun 07, 2017 7:02 pm
by hiigaran
The noise isn't strange in the sense that it sounds like it's dying. It was the kind of noise you would hear from faster fans and increased airflow. No scraping, no clinking, nothing of that kind.

Oddly enough, the temperature issue went away after I paused the slot, waited for the card to cool down, then resumed the same WU.

No idea what happened, but I can say with confidence that the fan does not show any symptoms of dying or underperforming. The other 7 cards in the cluster sound exactly the same.

Re: GPU or WU issue?

Posted: Wed Jun 07, 2017 9:05 pm
by SteveWillis
I have a GPU (out of 11) that intermittently spikes temperatures into the mid 90's for a few minutes then drops back down to the low 80's. It may or may not be significant that it is GPU 0. I don't worry about it since most of the time it is within acceptable range and hope if it does fail it's within the warranty period. You might try taking the side panel off your case and directing a strong fan into the case. Even with good case fans ventilation is kind of iffy. My best temperatures are with open air cases with fans blowing across them.

Re: GPU or WU issue?

Posted: Wed Jun 14, 2017 9:54 am
by hiigaran
I, uhh...don't exactly have a side panel...

In any case, I haven't noticed any other issues since then from any of my cards. Still curious if setting fans past 100% is actually possible from the OS though.

Re: GPU or WU issue?

Posted: Wed Jun 14, 2017 11:39 am
by foldy
I guess it is not possible because the GPU driver provides the fan values 0-100% which matches in GPU Bios to a fan rpm. But you could edit the GPU bios and there set a higher fan rpm for 100% if the GPU fan supports it, but by default in bios the 100% value also matches the highest rpm the fan can do. The fan spike to 6000 rpm you where hearing may be directly initiated by the GPU bios because temp reached 95°C which may be the hard limit in bios for your GPU.

Re: GPU or WU issue?

Posted: Wed Jun 14, 2017 11:54 am
by hiigaran
Hmm, so that would mean that even at the default 100%, the fan speed is still significantly limited by PWM. Wonder if it would be easier to plug the fans directly into a molex adaptor/mod for pure 12v. I would assume these fans are tested for continuous 12v usage, which would mean that there wouldn't be as significant a drop in lifespan as I had originally thought.

Of course, that's a lot of assumptions...

Re: GPU or WU issue?

Posted: Wed Jun 14, 2017 6:37 pm
by jrweiss
Since this is 1 of several identical units, you may have a failure of the thermal paste seal between your GPU and its heat sink. If you're adept, you could disassemble it, clean it, and renew the thermal paste.

If all your BIOS settings (computer and GPU, as applicable) are set for continuous 100%, there should be no PWM limiting. However, to test it, you could plug the GPU fan into a 3-pin (non-PWM) fan connector.