Overclocked GPU Projects dropping with error. -- YES.

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

HaloJones
Posts: 920
Joined: Thu Jul 24, 2008 10:16 am

Re: Overclocked GPU Projects dropping with error. -- YES.

Post by HaloJones »

the latest 1xxx and 2xxx Nvidia cards boost automatically depending on their BIOS settings regarding power limit and cooling. The better the cooling the higher the card will boost. That means there is headroom in every Nvidia card so long as you're prepared to improve the cooling or accept louder fans. All my cards are water-cooled - most with custom loops - and they run typically 30C below Nvidia's max temp so all boost far higher than their manufacturer rated numbers.

They still need some TLC and tweaking sometimes but saying a gpu can't be overclocked safely is nonsense.
single 1070

Image
clapanse
Posts: 12
Joined: Thu Apr 02, 2020 11:08 pm

Re: Overclocked GPU Projects dropping with error. -- YES.

Post by clapanse »

Yeah, cooling is definitely a huge part of it. Part of the reason I suspect my 1080ti runs so well at 2050MHz is because it's water cooled, and stays under 62-63C even at 300W, and normally when folding, it stays at 47-53C.
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU Projects dropping with error

Post by bruce »

clapanse wrote:However, where Folding is more demanding is in tolerance of small errors - Furmark will still run with an occasional bad bit or small error. Folding will not tolerate any errors at all, so what seems to be stable in furmark may end up not being stable after all. In no way is folding as demanding of a workload for the GPU though.
Furmark is great if you plan to run games. You'll probably never notice if one light-blue pixel happens to be the wrong shade of light blue. When you're running science, that might be a critical bit, so yes, FAH is intolerant of errors -- whether you happen to mentally classify them as scientifically small or large.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: GPU Projects dropping with error

Post by PantherX »

clapanse wrote:...In addition, many people here have stated that folding is a harder workload than furmark. This is plainly and obviously false. Most folding WUs run between 65 and 100% TDP on my 1080ti, with some occasionally hitting 115% or so (I have the power limit set to 120%). If I remember right, the high load ones are 14415 and similar (the myosin projects) - I've seen those consistently hitting my GPU harder than most other projects. Even so, my GPU is able to maintain 1950MHz or above on basically any folding workload. Furmark on the other hand is able to push my GPU all the way to its power cap at 120% TDP with the card clocked all the way down at 1750MHz. No folding workload is this demanding. Not even close...
I don't consider the notion that higher TDP means more stressful...rather, generating higher TDP artificially simply tests the cooling system of the GPU/system. You can use 100 Watts and do something extremely inefficient or use 75 Watts and do something extremely efficient. F@H is optimized for maximum efficiency that does push the GPU to its limits. If the GPU is unstable, you get errors and if the GPU is stable, you don't.

Also, regarding Furmark, here's quote to sum it up nicely:
...use FurMark for identifying cooling issues or a graphics card’s potentially unstable power supply...but the more significant errors and crashes show up earlier in games like The Witcher 3. This means that FurMark is largely useless for realistic tests, especially since some drivers recognize it and automatically limit power...
https://www.tomshardware.com/reviews/ho ... 449-5.html
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
vtankovich
Posts: 8
Joined: Thu Apr 30, 2020 4:39 pm

Re: Overclocked GPU Projects dropping with error. -- YES.

Post by vtankovich »

I would still avoid overclocking, even if you don't encounter errors. Not every error can be detected, and a few occasional wrong bits will result in atom location being slightly off, but will reduce validity of simulation as it essentially adds more "temperature" into the protein.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Overclocked GPU Projects dropping with error. -- YES.

Post by bruce »

A single bit change in a FP number may result in a small shift in an atom position or a very large shift. It depends on which bit.
e.g: 1.23456E-64 and 1.23456E+64
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: Overclocked GPU Projects dropping with error. -- YES.

Post by MeeLee »

vtankovich wrote:I would still avoid overclocking, even if you don't encounter errors. Not every error can be detected, and a few occasional wrong bits will result in atom location being slightly off, but will reduce validity of simulation as it essentially adds more "temperature" into the protein.
Wouldn't this then not play in with real world scenarios?
Hardly ever do you have a a perfect temperature case scenario.
Joxster
Posts: 12
Joined: Thu Mar 19, 2020 9:57 am
Hardware configuration: Intel i7 8700K | NVIDIA GTX 1080 Ti

Re: Overclocked GPU Projects dropping with error. -- YES.

Post by Joxster »

So, I just wanted to drop an update over here about the issue I raised, which I see has turned into a wider discussion around overclocking in the meantime. :lol:

I took the advise of PantherX, and turned the memory clock down by a good 200Mhz as well, and bingo, the GPU projects stopped failing. But, I couldn't believe that the projects were failing because of my core or memory clock speeds, since my card OCs on it's own due to the lower temperatures, as already confirmed by another member in this thread, and I had not faced the issue even once in the 1.5 months of folding at full clock speeds on my GPU before reporting it on this thread.

A bit of monitoring in HWiNFO gave me the answer to my question of what was actually causing the issue - while my card was folding at it's normal clock speeds, I watched the temperature on the GPU VRM slowly increase, until it touched around 106C :eo, and the GPU project failed right at that moment. I tried another GPU project, and this time, the moment the VRM crossed 95C in temperature, the GPU project failed. By reducing the core and memory clocks the maximum I could, the VRM temperature was hovering around the 90C mark, but the GPU projects were not failing anymore.

Things took a turn for the worse when Red Dead Redemption 2 also started crashing due to the VRMs crossing 95C while playing the game. I have contacted the seller about this issue, and need to handover my GPU to them for the next couple of weeks, for them to either repair or replace. Hopefully they will give me a newer replacement card.

In summary - I agree with the other guys on this thread who say that overclocking should not be a problem with F@H if it's done correctly, and in my case, the card was doing it on its own, so there was definitely nothing wrong with that. I hope to get back to folding very soon.

Thanks to everyone for their responses and support.

Cheers
Intel i7 8700K | NVIDIA GTX 1080 Ti
HaloJones
Posts: 920
Joined: Thu Jul 24, 2008 10:16 am

Re: Overclocked GPU Projects dropping with error. -- YES.

Post by HaloJones »

I take it the card was bought refurbished or used?
single 1070

Image
Joxster
Posts: 12
Joined: Thu Mar 19, 2020 9:57 am
Hardware configuration: Intel i7 8700K | NVIDIA GTX 1080 Ti

Re: Overclocked GPU Projects dropping with error. -- YES.

Post by Joxster »

HaloJones wrote:I take it the card was bought refurbished or used?
Not at all. It was brand spanking new when I bought it in October 2017 - this is a top of the line ASUS ROG-STRIX-GTX1080TI-O11G-GAMING card btw. By seller, I meant the store where I bought the card from originally. Apparently ASUS doesn't allow customers to directly RMA graphics cards here in Europe, so the store is going to send the card to their distributor to take a look at :)

Cheers
Intel i7 8700K | NVIDIA GTX 1080 Ti
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: Overclocked GPU Projects dropping with error. -- YES.

Post by MeeLee »

Asus does allow to RMA, but they're @ssholes at it! They won't fix it in warranty, even if it would fall within warranty.
More than likely, they will clean up the heat sink, and renew the thermal paste.
If I were you, I'd place an extra case fan to feed the GPU some cool air...
Joxster
Posts: 12
Joined: Thu Mar 19, 2020 9:57 am
Hardware configuration: Intel i7 8700K | NVIDIA GTX 1080 Ti

Re: Overclocked GPU Projects dropping with error. -- YES.

Post by Joxster »

MeeLee wrote:Asus does allow to RMA, but they're @ssholes at it! They won't fix it in warranty, even if it would fall within warranty.
More than likely, they will clean up the heat sink, and renew the thermal paste.
If I were you, I'd place an extra case fan to feed the GPU some cool air...
The VRMs are cooled by the heatsink through thermal pads, and I believe that the way ASUS installed those thermal pads is sub-optimal, so I won't be surprised if they replace the thermal pads to better ones that transfer the heat more efficiently, and hand the card back over to me. If that resolves the issue, I won't complain about it either.

The airflow in my Corsair 570X Crystal case is fine. I am running 3x Corsair 120mm RGB PRO fans in the front + 1x Corsair 120mm RGB PRO fan in the back. Plenty of air for all the components.

Cheers
Intel i7 8700K | NVIDIA GTX 1080 Ti
toTOW
Site Moderator
Posts: 6309
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Overclocked GPU Projects dropping with error. -- YES.

Post by toTOW »

Joxster wrote:So, I just wanted to drop an update over here about the issue I raised, which I see has turned into a wider discussion around overclocking in the meantime. :lol:

I took the advise of PantherX, and turned the memory clock down by a good 200Mhz as well, and bingo, the GPU projects stopped failing. But, I couldn't believe that the projects were failing because of my core or memory clock speeds, since my card OCs on it's own due to the lower temperatures, as already confirmed by another member in this thread, and I had not faced the issue even once in the 1.5 months of folding at full clock speeds on my GPU before reporting it on this thread.

A bit of monitoring in HWiNFO gave me the answer to my question of what was actually causing the issue - while my card was folding at it's normal clock speeds, I watched the temperature on the GPU VRM slowly increase, until it touched around 106C :eo, and the GPU project failed right at that moment. I tried another GPU project, and this time, the moment the VRM crossed 95C in temperature, the GPU project failed. By reducing the core and memory clocks the maximum I could, the VRM temperature was hovering around the 90C mark, but the GPU projects were not failing anymore.

Things took a turn for the worse when Red Dead Redemption 2 also started crashing due to the VRMs crossing 95C while playing the game. I have contacted the seller about this issue, and need to handover my GPU to them for the next couple of weeks, for them to either repair or replace. Hopefully they will give me a newer replacement card.

In summary - I agree with the other guys on this thread who say that overclocking should not be a problem with F@H if it's done correctly, and in my case, the card was doing it on its own, so there was definitely nothing wrong with that. I hope to get back to folding very soon.

Thanks to everyone for their responses and support.

Cheers
It reminds me the fate of my 980 Ti ... RIP

It started with occasional Bad States detected on the GPU ... then, I started to get GPU (and driver) resets ... after a while, the system started to turn off by itself ... and one day, while booting Windows 10, I saw a nice flash and flame in the GPU VRMs ... and the PC would never turn on again until I swapped the card.

Luckily it was still under warranty, and I got it replaced for free ... The 1070 I got as replacement is still folding fine after 3 years ... :D
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply