7644 (510, 0, 5) NANs detected on GPU UNSTABLE_MACHINE

Moderators: Site Moderators, FAHC Science Team

Post Reply
metalmayhem
Posts: 4
Joined: Tue Oct 04, 2011 7:50 pm

7644 (510, 0, 5) NANs detected on GPU UNSTABLE_MACHINE

Post by metalmayhem »

First of all, No I'm not overclocked. Project: 7644 (Run 510, Clone 0, Gen 5)

My GPU have been trying to finish this WU for 4 days! I think it failed more than 10 times

The GTX 580 folded fine with driver 296.10 for a week at 888/2055/1100mV until this specific WU came up. After 2nd fail I dropped back to stock.

That didn't help. So I tried with 301.24 Beta w/ stock clocks + voltage. Still fail.

Then I tried driver 285.62 as it seems very reliable as posted in this forum. And yet I keep getting the same error. I am now thinking of dropping this WU.

The most frustrating thing is that it fails around 75-85% of WU completion which takes about 8-9 hours to get to.

Any suggestions?

I should also mention that I've been getting about 28K less PPD combined between my CPU & GPU since I started using V7 client. I'm gonna drop back to GPU3 v6.41. I checked there's only one instance of SMP client running.

Latest fail:

Code: Select all

12:47:53:WU00:FS00:0x15:Completed   1825000 out of 2500000 steps (73%).
12:55:27:WU00:FS00:0x15:Completed   1850000 out of 2500000 steps (74%).
13:03:03:WU00:FS00:0x15:Completed   1875000 out of 2500000 steps (75%).
13:10:35:WU00:FS00:0x15:Completed   1900000 out of 2500000 steps (76%).
13:18:10:WU00:FS00:0x15:Completed   1925000 out of 2500000 steps (77%).
13:34:57:WU00:FS00:0x15:Completed   1950000 out of 2500000 steps (78%).
13:34:57:WU00:FS00:0x15:mdrun_gpu returned 52
13:34:57:WU00:FS00:0x15:NANs detected on GPU
13:34:57:WU00:FS00:0x15:
13:34:57:WU00:FS00:0x15:Folding@home Core Shutdown: UNSTABLE_MACHINE
13:34:57:WU00:FS00:FahCore returned: UNSTABLE_MACHINE (122 = 0x7a)
13:34:57:WU00:FS00:Starting
sortofageek
Site Admin
Posts: 3111
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: 7644 (510, 0, 5) NANs detected on GPU UNSTABLE_MACHINE

Post by sortofageek »

Thanks for the report. As there is no info in the database thus far, I will mark Project: 7644 (Run 510, Clone 0, Gen 5) for follow-up.

It might help to know your folding name and team number when reviewing database results on this WU.
metalmayhem
Posts: 4
Joined: Tue Oct 04, 2011 7:50 pm

Re: 7644 (510, 0, 5) NANs detected on GPU UNSTABLE_MACHINE

Post by metalmayhem »

Thanks for replying. Here's the requested info:

Folding name: metalmayhem1
Team no.: 37726

As for this WU, what do I do? I sure am not making any progress for either Science or my points. Do I drop it?
sortofageek
Site Admin
Posts: 3111
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: 7644 (510, 0, 5) NANs detected on GPU UNSTABLE_MACHINE

Post by sortofageek »

Thanks for the info. I can't see enough of your log to tell where you are and where you've been with this WU. I'm not one to consider dumping a WU as long as I believe there might be a chance for success, though.
metalmayhem
Posts: 4
Joined: Tue Oct 04, 2011 7:50 pm

Re: 7644 (510, 0, 5) NANs detected on GPU UNSTABLE_MACHINE

Post by metalmayhem »

sortofageek wrote:I can't see enough of your log to tell where you are and where you've been with this WU.
I posted only a few steps before the unit failed. I don't think I can provide with log of previous fails since I restarted several times while troubleshooting.
sortofageek wrote:I'm not one to consider dumping a WU as long as I believe there might be a chance for success, though.
This I understand considering your position on the forum.

I'll give this WU maybe a day or a bit more with some overvolting and underclocking (i really don't know why) to finish this WU. Can't sit with this one forever given that the expiration date is 05/15, no explanation as to why this WU is failing repeatedly and unknown "chance for success" factor. Also I am in need of priming up my hardware as Chimp Challenge is coming up.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 7644 (510, 0, 5) NANs detected on GPU UNSTABLE_MACHINE

Post by bruce »

metalmayhem wrote:I posted only a few steps before the unit failed. I don't think I can provide with log of previous fails since I restarted several times while troubleshooting.
For future reference, "several times" won't overwrite your log. By default, you should find logs from the past 16 restarts all neatly arranged by data and time in the "logs" directory.
metalmayhem
Posts: 4
Joined: Tue Oct 04, 2011 7:50 pm

Re: 7644 (510, 0, 5) NANs detected on GPU UNSTABLE_MACHINE

Post by metalmayhem »

bruce wrote:For future reference, "several times" won't overwrite your log. By default, you should find logs from the past 16 restarts all neatly arranged by data and time in the "logs" directory.
Thanks for the tip. Yes I found the logs in the designated folder. Do you guys want me to post all of it?

Even with mild underclock & overvolts, the WU failed 3 more times in the last 17 hours.
sortofageek
Site Admin
Posts: 3111
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: 7644 (510, 0, 5) NANs detected on GPU UNSTABLE_MACHINE

Post by sortofageek »

Still nothing in the database. It must be a bad WU. :(
Post Reply