Pausing mitigates TDR bug?

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, PandeGroup

Pausing mitigates TDR bug?

Postby csvanefalk » Wed Jul 16, 2014 12:44 pm

I fold on a GTX770, using the 319.76 Linux drivers on a Fedora 20 box.

While I was previously rebooting every 36 hours to avoid the TDR bug, I have noticed that pausing the folding seems to have the same effect. Letting the card rest for 5-10 minutes between each WU, I am now approaching 72 hours of folding without rebooting.

Can anyone confirm if this is expected behavior?
csvanefalk
 
Posts: 172
Joined: Mon May 21, 2012 10:28 am

Re: Pausing mitigates TDR bug?

Postby 7im » Wed Jul 16, 2014 1:14 pm

The TDR bug was simply time related. Didn't matter if you were folding or gaming or not. So pausing would have no affect.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
User avatar
7im
 
Posts: 14648
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Re: Pausing mitigates TDR bug?

Postby csvanefalk » Wed Jul 16, 2014 1:41 pm

Understood, could it have something to do with my OS then? I am far past the 36-hour cutoff, and there have been no broken WU:s, no crashes, or any other symptoms of the bug at all.
csvanefalk
 
Posts: 172
Joined: Mon May 21, 2012 10:28 am

Re: Pausing mitigates TDR bug?

Postby ChristianVirtual » Wed Jul 16, 2014 2:37 pm

I expirienced the TDR bug mainly on GTX 780 at that time; with GK110 chipset (also Titan and 780Ti). The 770 has GK104.

With newer driver the TDR got fixed; but GK104 based card got slower (like my 660TI). I split my GPU in different system and gave each a matching driver. TDR not seen for 9 month or so.
ImageImage
Please contribute your logs to http://ppd.fahmm.net
User avatar
ChristianVirtual
 
Posts: 1540
Joined: Tue May 28, 2013 12:14 pm
Location: 日本 東京

Re: Pausing mitigates TDR bug?

Postby 7im » Wed Jul 16, 2014 5:16 pm

csvanefalk wrote:Understood, could it have something to do with my OS then? I am far past the 36-hour cutoff, and there have been no broken WU:s, no crashes, or any other symptoms of the bug at all.


2 options. Pre-TDR bug driver version. Or the GPU did a reset. Check the FAH logs to see if there are any folding interruptions in the last 2 days other than your pausing the client.

Optionally, there a v55 fahcore that has no folding slow down, so you could upgrade past the TDR bug driver version, and just use the latest NV driver.
User avatar
7im
 
Posts: 14648
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Re: Pausing mitigates TDR bug?

Postby bollix47 » Wed Jul 16, 2014 5:26 pm

7im wrote:Optionally, there a v55 fahcore that has no folding slow down, so you could upgrade past the TDR bug driver version, and just use the latest NV driver.


AFAIK that version of the core is Windows only at this time.
Image
bollix47
 
Posts: 3493
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Pausing mitigates TDR bug?

Postby 7im » Wed Jul 16, 2014 7:13 pm

bollix47 wrote:
7im wrote:Optionally, there a v55 fahcore that has no folding slow down, so you could upgrade past the TDR bug driver version, and just use the latest NV driver.


AFAIK that version of the core is Windows only at this time.


Yep. Time to poke Prot again.
User avatar
7im
 
Posts: 14648
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Re: Pausing mitigates TDR bug?

Postby heikosch » Wed Jul 16, 2014 8:48 pm

7im wrote:
bollix47 wrote:
7im wrote:Optionally, there a v55 fahcore that has no folding slow down, so you could upgrade past the TDR bug driver version, and just use the latest NV driver.


AFAIK that version of the core is Windows only at this time.


Yep. Time to poke Prot again.

In my opinion v55 is still beta.

Heiko
Image Image
Image
heikosch
 
Posts: 139
Joined: Thu Apr 30, 2009 7:31 pm
Location: Essen, Germany

Re: Pausing mitigates TDR bug?

Postby 7im » Wed Jul 16, 2014 9:47 pm

Operationally, yes (simply because no one has moved it to public yet).

Is there some functional reason you think they should not release it as public?
User avatar
7im
 
Posts: 14648
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Re: Pausing mitigates TDR bug?

Postby heikosch » Thu Jul 17, 2014 7:54 pm

7im wrote:Operationally, yes (simply because no one has moved it to public yet).

Is there some functional reason you think they should not release it as public?

No but I´ve no idea who decides about the public release of a fahcore and why it takes so long to release an obviously working fahcore version.

Heiko
heikosch
 
Posts: 139
Joined: Thu Apr 30, 2009 7:31 pm
Location: Essen, Germany

Re: Pausing mitigates TDR bug?

Postby csvanefalk » Fri Jul 18, 2014 5:46 am

7im - I can't identify with either of the cases you mentioned. The driver version is 319.76, and I have had the TDR issue with it earlier:

Code: Select all
[christopher@chrisdesktop ~]$ nvidia-smi
Fri Jul 18 07:44:01 2014       
+------------------------------------------------------+                       
| NVIDIA-SMI 5.319.76   Driver Version: 319.76         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 770     Off  | 0000:03:00.0     N/A |                  N/A |
| 50%   66C  N/A     N/A /  N/A |      688MB /  2047MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
+-----------------------------------------------------------------------------+


I also cannot find any evidence in the log of the GPU resetting, apart from me pausing it (too large to post here):

http://hastebin.com/zomevafedu.coffee
csvanefalk
 
Posts: 172
Joined: Mon May 21, 2012 10:28 am

Re: Pausing mitigates TDR bug?

Postby 7im » Fri Jul 18, 2014 2:07 pm

The bug, as reported in the NV forum, was time based. You are welcome to look it up.

Also keep trying your pause trick. Does it work consistently, or just this once on a while? Let us know.
User avatar
7im
 
Posts: 14648
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Re: Pausing mitigates TDR bug?

Postby csvanefalk » Sun Jul 20, 2014 9:17 am

I have not used the pause trick for at least 48 hours, and the folding process continues without error. There appear to be no traces of the bug at all. I wish I could determine exactly how I got to this stage for the benefit of other Linux GPU folders, but the only major change I can recall doing was to recompile the driver after updating to kernel 3.15.

Complete log is here: http://hastebin.com/bahegomewu.coffee
csvanefalk
 
Posts: 172
Joined: Mon May 21, 2012 10:28 am


Return to Problems with NVidia drivers

Who is online

Users browsing this forum: No registered users and 1 guest

cron