Page 20 of 21

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

PostPosted: Mon Feb 20, 2017 8:43 am
by Betroz
I got a bad work unit with the 378.72 HF driver with my 980Ti card. Can't remember getting a bad wu with the 378.49 drivers, so maybe nVidia broke them again. Not sure, just a small input here.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

PostPosted: Mon Feb 20, 2017 7:26 pm
by SombraGuerrero
It would be more useful at this point if we could start posting logs again when bad WUs are reported. Other than Nvidia taking out the app profile, we should be past the point of their involvement. If we're still getting the enqueue error, I would find it more likely that version 18 of the core isn't fully covering the projects in the field.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

PostPosted: Sat Feb 25, 2017 6:01 pm
by JonasTheMovie
So, where are we with this? Should we still stick with 376.49?

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

PostPosted: Sat Feb 25, 2017 6:14 pm
by SombraGuerrero
JonasTheMovie wrote:So, where are we with this? Should we still stick with 376.49?

This question is precisely the reason why I want to see logs from people still reporting bad WUs. I am going to assume, unless shown otherwise, that the failures on WUs still being reported are unrelated to the original bug. Where I believe we are with this is that version .18 of Core_0x21 has been released, which should fix the underlying bug. So assuming people's clients have successfully downloaded the new version of the core, any version of the driver past 376.48 should now work just fine. The thing I'm questioning is whether or not Nvidia has removed the hotfix from the latest release, because theoretically, now that the core has been updated, they can do so and everything should work. Supposedly if the hotifix is still in there, this may or may not have a negative impact on PPD. I have not noticed a negative impact of any significance on my PPD running the latest Game Ready drivers (378.66), but granted, I am not a user who cares about that aspect as much.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

PostPosted: Sun Feb 26, 2017 3:04 am
by bruce
Why should we assume that nV intends to "remove" the hot-fix from future versions.?

While the FAH developers seem to have been in close communications with nV deveopment folks, I have not found a detailed description of the problem, itself, or the nature of the changes done by either of those groups. (If somebody else has, point me to that information.)

Why should we assume that all of the people who have encountered this issue are part of the FAH community? I heard an (unconfirmed) rumor that BOINC has encountered the same problem. Even if we assume that the changes that FAH has made to Core_21 will make it possible to run pre-376.48 crivers without error AND we assume that the Core_21 .18 provides better performance than Core_21 .17 (Both nice assumptions, but as yet, unproven) nVidia still has a responsibility to the rest of the world.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

PostPosted: Sun Feb 26, 2017 3:53 am
by gregaber
I've successfully gone back to 372.70 on 4 of my machines with Win10 Pro 1607 and Intel cpus. The 5th machine is still at Win10 Pro 1511 (which does not have ability to modify group policy to prevent driver updates) and won't update to 1607 (don't know why) so it's forced to update to the hotfix driver from Msoft.

However..

All 5 machines are currently running within 95% of their previous PPD averages prior to when this all began. I have occasional failed WU's but when I stop overclocking they behave. I don't have the time to dig into the logs of 5 machines to try and figure this out but at least I wanted to share my experience.

If you look at my historical chart, my issues began on 2/3 and did not recover until 2/21, when I was finally able to implement the configurations described in the first paragraph.

http://folding.extremeoverclocking.com/user_summary.php?s=&u=673044

I don't know why the one AMD CPU machine is fine with the hotfix driver; is it because of the AMD CPU (I only do Gpu folding so it's just running the OS) or is it because it's on Win10Pro 1577?

I don't know why some of my Gpus (960s, 970s, 980s, and a 750ti) will have intermittent crashes when overclocked. It wasn't a problem before this mess. It's only 2 970s and the 980 Classified that have the issues. Never used to have issues while overclocked prior to 2/3.

All I know is I've returned to stable results that are nearly what they were before this all started. I'm not touching anything and will see how stable they'll remain for the longer term.

Oh yeah; I'm running the 64 bit version 7.4.15 FAH client on all machines.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

PostPosted: Sun Feb 26, 2017 4:02 am
by SombraGuerrero
This thread has been carrying on for so long, I admit, I forgot about some of the earlier discussions. It would be nice if we could ground some of what has actually been going on, but as that may very well be an unrealistic expectation, I think the only safe assumption then is that we have what we have at this point in time.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

PostPosted: Sun Feb 26, 2017 9:51 am
by foldy
bruce wrote:Why should we assume that nV intends to "remove" the hot-fix from future versions.?

While the FAH developers seem to have been in close communications with nV deveopment folks, I have not found a detailed description of the problem, itself, or the nature of the changes done by either of those groups. (If somebody else has, point me to that information.)

Why should we assume that all of the people who have encountered this issue are part of the FAH community? I heard an (unconfirmed) rumor that BOINC has encountered the same problem. Even if we assume that the changes that FAH has made to Core_21 will make it possible to run pre-376.48 crivers without error AND we assume that the Core_21 .18 provides better performance than Core_21 .17 (Both nice assumptions, but as yet, unproven) nVidia still has a responsibility to the rest of the world.

I disagree a little and my understanding is:
Except folding@home the OpenCL world had no problem with 375 and later nvidia drivers. (Boinc GPUGRID had a totally different bug with driver 378.49/66 on GTX 980 Ti only)

1) Core_21 v17 had a bug which only showed up when Nvidia made driver changes in 375.
2) Nvidia made a workaround in driver 376.48 and later which also decreases PPD.
3) Core_21 v18 fixed the bug so any driver works again but 376.48 and later still decrease PPD
4) Future: Nvidia should remove the workaround to increase PPD again

=> For best PPD currently stay on nvidia driver 372.xx but any driver will work

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

PostPosted: Sun Feb 26, 2017 10:20 am
by SombraGuerrero
Foldy has nicely summarized what I also understood to be the sequence of events and facts in this whole ordeal. I will also add that I recall an earlier point in this thread that gave me the impression that the bug we've been dealing with was a known bug in OpenMM that was actually fixed some time ago, hence Nvidia's argument that updating Core_21 is the correct long-term course of action.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

PostPosted: Tue Feb 28, 2017 12:40 am
by bruce
From what I've heard, the "bug" if you want to call it that, is due to an interpretation issue in the OpenCL specification. That spec is vague enough that it allows for two different interpretations of the same issue ... NVidia and OpenMM interpreted that spec two different ways. :( Saying it was a bug in OpenMM is the correct way to describe it providing you're nVidia. Saying it's a bug in the drivers is the correct way to describe it is correct providing you're looking at it from the OpenMM perspective. Either the hot-fix or the recent change OpenMM inside of Core_21 fixes the issue.

There is an upcoming version of OpneMM that should be used in future FAHCores that is consistent with NVidia's interpretation. It should provide better performance than the hot-fix does but less that the OpenMM interpretation would have (if it had worked). I have not searched for a comment from khronos.org.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

PostPosted: Tue Feb 28, 2017 2:22 am
by _r2w_ben
bruce wrote:While the FAH developers seem to have been in close communications with nV deveopment folks, I have not found a detailed description of the problem, itself, or the nature of the changes done by either of those groups. (If somebody else has, point me to that information.)

I think this commit is as much detail as we'll get.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

PostPosted: Tue Feb 28, 2017 2:45 am
by SombraGuerrero
Interesting, I hadn't heard that interpretation of events before. It would explain the confusing spectrum of opinions and conflicting information that has been floating around through this whole ordeal. It seems to me, based on everything I've now read and seen people say, that ultimately, however this problem ends up permanently going away, we're going to be left with a question of whether or not all users are getting optimal PPD. As I've said, I myself don't worry about this aspect as much as other users. I've just found tracking this problem to be a fascinating example of how much care and commitment this user base and the projects' stakeholders give to it.

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

PostPosted: Thu Mar 02, 2017 6:33 am
by squall_leonhart
bruce wrote:Why should we assume that nV intends to "remove" the hot-fix from future versions.?


Because they made clear their intent to remove the workaround when the openmm bug was fixed.1

While the FAH developers seem to have been in close communications with nV deveopment folks, I have not found a detailed description of the problem, itself, or the nature of the changes done by either of those groups. (If somebody else has, point me to that information.)


nVidia implemented an optimisation in their JIT compiler that exposed a flaw in the openmm library.

Why should we assume that all of the people who have encountered this issue are part of the FAH community? I heard an (unconfirmed) rumor that BOINC has encountered the same problem. Even if we assume that the changes that FAH has made to Core_21 will make it possible to run pre-376.48 crivers without error AND we assume that the Core_21 .18 provides better performance than Core_21 .17 (Both nice assumptions, but as yet, unproven) nVidia still has a responsibility to the rest of the world.


Boinc also uses the affected openmm library

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

PostPosted: Tue Mar 28, 2017 10:08 pm
by Aurum
Is it safe to upgrade from 369.30 to 378.92 yet :?: :?: :?:

Re: WARNING Do not upgrade to 375/376.xx drivers (for xx<48)

PostPosted: Tue Mar 28, 2017 10:47 pm
by SombraGuerrero
Everything's been "safe" for a while now. The main question (that I defer to others to answer) is whether or not PPD output is going to be back to "normal " for everyone.