WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Betroz
Posts: 7
Joined: Tue Oct 25, 2016 2:56 pm

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Post by Betroz »

I got a bad work unit with the 378.72 HF driver with my 980Ti card. Can't remember getting a bad wu with the 378.49 drivers, so maybe nVidia broke them again. Not sure, just a small input here.
SombraGuerrero
Posts: 118
Joined: Mon Mar 16, 2009 3:06 am

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Post by SombraGuerrero »

It would be more useful at this point if we could start posting logs again when bad WUs are reported. Other than Nvidia taking out the app profile, we should be past the point of their involvement. If we're still getting the enqueue error, I would find it more likely that version 18 of the core isn't fully covering the projects in the field.
JonasTheMovie
Posts: 88
Joined: Wed Jan 06, 2016 4:16 am
Location: Northern Sweden

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Post by JonasTheMovie »

So, where are we with this? Should we still stick with 376.49?
Image
SombraGuerrero
Posts: 118
Joined: Mon Mar 16, 2009 3:06 am

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Post by SombraGuerrero »

JonasTheMovie wrote:So, where are we with this? Should we still stick with 376.49?
This question is precisely the reason why I want to see logs from people still reporting bad WUs. I am going to assume, unless shown otherwise, that the failures on WUs still being reported are unrelated to the original bug. Where I believe we are with this is that version .18 of Core_0x21 has been released, which should fix the underlying bug. So assuming people's clients have successfully downloaded the new version of the core, any version of the driver past 376.48 should now work just fine. The thing I'm questioning is whether or not Nvidia has removed the hotfix from the latest release, because theoretically, now that the core has been updated, they can do so and everything should work. Supposedly if the hotifix is still in there, this may or may not have a negative impact on PPD. I have not noticed a negative impact of any significance on my PPD running the latest Game Ready drivers (378.66), but granted, I am not a user who cares about that aspect as much.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Post by bruce »

Why should we assume that nV intends to "remove" the hot-fix from future versions.?

While the FAH developers seem to have been in close communications with nV deveopment folks, I have not found a detailed description of the problem, itself, or the nature of the changes done by either of those groups. (If somebody else has, point me to that information.)

Why should we assume that all of the people who have encountered this issue are part of the FAH community? I heard an (unconfirmed) rumor that BOINC has encountered the same problem. Even if we assume that the changes that FAH has made to Core_21 will make it possible to run pre-376.48 crivers without error AND we assume that the Core_21 .18 provides better performance than Core_21 .17 (Both nice assumptions, but as yet, unproven) nVidia still has a responsibility to the rest of the world.
gregaber
Posts: 7
Joined: Sat Nov 22, 2014 2:46 pm

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Post by gregaber »

I've successfully gone back to 372.70 on 4 of my machines with Win10 Pro 1607 and Intel cpus. The 5th machine is still at Win10 Pro 1511 (which does not have ability to modify group policy to prevent driver updates) and won't update to 1607 (don't know why) so it's forced to update to the hotfix driver from Msoft.

However..

All 5 machines are currently running within 95% of their previous PPD averages prior to when this all began. I have occasional failed WU's but when I stop overclocking they behave. I don't have the time to dig into the logs of 5 machines to try and figure this out but at least I wanted to share my experience.

If you look at my historical chart, my issues began on 2/3 and did not recover until 2/21, when I was finally able to implement the configurations described in the first paragraph.

http://folding.extremeoverclocking.com/ ... =&u=673044

I don't know why the one AMD CPU machine is fine with the hotfix driver; is it because of the AMD CPU (I only do Gpu folding so it's just running the OS) or is it because it's on Win10Pro 1577?

I don't know why some of my Gpus (960s, 970s, 980s, and a 750ti) will have intermittent crashes when overclocked. It wasn't a problem before this mess. It's only 2 970s and the 980 Classified that have the issues. Never used to have issues while overclocked prior to 2/3.

All I know is I've returned to stable results that are nearly what they were before this all started. I'm not touching anything and will see how stable they'll remain for the longer term.

Oh yeah; I'm running the 64 bit version 7.4.15 FAH client on all machines.
Last edited by gregaber on Sun Feb 26, 2017 3:03 am, edited 1 time in total.
SombraGuerrero
Posts: 118
Joined: Mon Mar 16, 2009 3:06 am

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Post by SombraGuerrero »

This thread has been carrying on for so long, I admit, I forgot about some of the earlier discussions. It would be nice if we could ground some of what has actually been going on, but as that may very well be an unrealistic expectation, I think the only safe assumption then is that we have what we have at this point in time.
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Post by foldy »

bruce wrote:Why should we assume that nV intends to "remove" the hot-fix from future versions.?

While the FAH developers seem to have been in close communications with nV deveopment folks, I have not found a detailed description of the problem, itself, or the nature of the changes done by either of those groups. (If somebody else has, point me to that information.)

Why should we assume that all of the people who have encountered this issue are part of the FAH community? I heard an (unconfirmed) rumor that BOINC has encountered the same problem. Even if we assume that the changes that FAH has made to Core_21 will make it possible to run pre-376.48 crivers without error AND we assume that the Core_21 .18 provides better performance than Core_21 .17 (Both nice assumptions, but as yet, unproven) nVidia still has a responsibility to the rest of the world.
I disagree a little and my understanding is:
Except folding@home the OpenCL world had no problem with 375 and later nvidia drivers. (Boinc GPUGRID had a totally different bug with driver 378.49/66 on GTX 980 Ti only)

1) Core_21 v17 had a bug which only showed up when Nvidia made driver changes in 375.
2) Nvidia made a workaround in driver 376.48 and later which also decreases PPD.
3) Core_21 v18 fixed the bug so any driver works again but 376.48 and later still decrease PPD
4) Future: Nvidia should remove the workaround to increase PPD again

=> For best PPD currently stay on nvidia driver 372.xx but any driver will work
SombraGuerrero
Posts: 118
Joined: Mon Mar 16, 2009 3:06 am

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Post by SombraGuerrero »

Foldy has nicely summarized what I also understood to be the sequence of events and facts in this whole ordeal. I will also add that I recall an earlier point in this thread that gave me the impression that the bug we've been dealing with was a known bug in OpenMM that was actually fixed some time ago, hence Nvidia's argument that updating Core_21 is the correct long-term course of action.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Post by bruce »

From what I've heard, the "bug" if you want to call it that, is due to an interpretation issue in the OpenCL specification. That spec is vague enough that it allows for two different interpretations of the same issue ... NVidia and OpenMM interpreted that spec two different ways. :( Saying it was a bug in OpenMM is the correct way to describe it providing you're nVidia. Saying it's a bug in the drivers is the correct way to describe it is correct providing you're looking at it from the OpenMM perspective. Either the hot-fix or the recent change OpenMM inside of Core_21 fixes the issue.

There is an upcoming version of OpneMM that should be used in future FAHCores that is consistent with NVidia's interpretation. It should provide better performance than the hot-fix does but less that the OpenMM interpretation would have (if it had worked). I have not searched for a comment from khronos.org.
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Post by _r2w_ben »

bruce wrote:While the FAH developers seem to have been in close communications with nV deveopment folks, I have not found a detailed description of the problem, itself, or the nature of the changes done by either of those groups. (If somebody else has, point me to that information.)
I think this commit is as much detail as we'll get.
SombraGuerrero
Posts: 118
Joined: Mon Mar 16, 2009 3:06 am

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Post by SombraGuerrero »

Interesting, I hadn't heard that interpretation of events before. It would explain the confusing spectrum of opinions and conflicting information that has been floating around through this whole ordeal. It seems to me, based on everything I've now read and seen people say, that ultimately, however this problem ends up permanently going away, we're going to be left with a question of whether or not all users are getting optimal PPD. As I've said, I myself don't worry about this aspect as much as other users. I've just found tracking this problem to be a fascinating example of how much care and commitment this user base and the projects' stakeholders give to it.
squall_leonhart
Posts: 21
Joined: Wed Apr 20, 2011 5:31 pm

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Post by squall_leonhart »

bruce wrote:Why should we assume that nV intends to "remove" the hot-fix from future versions.?
Because they made clear their intent to remove the workaround when the openmm bug was fixed.1
While the FAH developers seem to have been in close communications with nV deveopment folks, I have not found a detailed description of the problem, itself, or the nature of the changes done by either of those groups. (If somebody else has, point me to that information.)
nVidia implemented an optimisation in their JIT compiler that exposed a flaw in the openmm library.
Why should we assume that all of the people who have encountered this issue are part of the FAH community? I heard an (unconfirmed) rumor that BOINC has encountered the same problem. Even if we assume that the changes that FAH has made to Core_21 will make it possible to run pre-376.48 crivers without error AND we assume that the Core_21 .18 provides better performance than Core_21 .17 (Both nice assumptions, but as yet, unproven) nVidia still has a responsibility to the rest of the world.
Boinc also uses the affected openmm library
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Post by Aurum »

Is it safe to upgrade from 369.30 to 378.92 yet :?: :?: :?:
In Science We Trust Image
SombraGuerrero
Posts: 118
Joined: Mon Mar 16, 2009 3:06 am

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

Post by SombraGuerrero »

Everything's been "safe" for a while now. The main question (that I defer to others to answer) is whether or not PPD output is going to be back to "normal " for everyone.
Post Reply