[Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Moderators: slegrand, Site Moderators, PandeGroup

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby bruce » Wed Jan 27, 2010 10:59 pm

I"d guess that your GPU is on the verge of instability. FahCore_11 typically uses more power and generates more heat than FahCore_14 and that tends to push things closer to being unstable. Also, if your power supply is marginal, the voltage may drop just enough to become a problem.

Which drivers are you using?
bruce
 
Posts: 22616
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby shdbcamping » Sun Feb 07, 2010 3:31 am

bruce wrote:I"d guess that your GPU is on the verge of instability. FahCore_11 typically uses more power and generates more heat than FahCore_14 and that tends to push things closer to being unstable. Also, if your power supply is marginal, the voltage may drop just enough to become a problem.

Which drivers are you using?

Bruce, have you read all the posts of the FAHcore11 of late? This is not a card problem.

Please Kick this up to the FAHcore 11 Engineers and take it away from moderator excuses.
shdbcamping
 
Posts: 519
Joined: Mon Nov 10, 2008 7:57 am

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby PaJaSoft » Tue Feb 09, 2010 8:10 am

I bet, that my 8600GT card will be serving for me long time further... If you guess, that my card is nearly in damage state, your problem and your problem is also if you don't get my resource I offer freely to this project...
PaJaSoft
 
Posts: 3
Joined: Thu Oct 08, 2009 1:30 pm

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby Nathan V » Wed Feb 17, 2010 12:26 pm

I was getting the NaNs solid across all types today (took some time away from Folding) after FaH grabbed the latest cores. Last I checked the nVidia drivers that are current were supposed to allow SLI and PhysX to stay enabled... seems that's not the case. I disabled both and both cards came up fine now. If FaH or the drivers have devolved I'm sure disappointed.

Unfortunately, my other system with the 8800GTS, which has never had NaN issues is failing 100% right now on the core 11 projects. Not overclocked, has plenty of power, isn't overheating, Vista x64 Enterprise RTM... :(
Nathan V
 
Posts: 11
Joined: Thu Jul 09, 2009 2:07 am

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby Darth_Fester » Sun Feb 28, 2010 2:55 pm

Windows 7 64bit, i7 CPU 920 @2.67GHz, 12GB RAM
GeForce GTX 260 4095MB Driver 196.21

Log:
Code: Select all
[05:07:04] - Preparing to get new work unit...
[05:07:04] + Attempting to get work packet
[05:07:04] - Connecting to assignment server
[05:07:05] - Successful: assigned to (171.67.108.11).
[05:07:05] + News From Folding@Home: Welcome to Folding@Home
[05:07:05] Loaded queue successfully.
[05:07:06] + Closed connections
[05:07:06]
[05:07:06] + Processing work unit
[05:07:06] Core required: FahCore_11.exe
[05:07:06] Core found.
[05:07:06] Working on queue slot 06 [February 28 05:07:06 UTC]
[05:07:06] + Working ...
[05:07:06]
[05:07:06] *------------------------------*
[05:07:06] Folding@Home GPU Core
[05:07:06] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[05:07:06]
[05:07:06] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[05:07:06] Build host: amoeba
[05:07:06] Board Type: Nvidia
[05:07:06] Core      :
[05:07:06] Preparing to commence simulation
[05:07:06] - Looking at optimizations...
[05:07:06] DeleteFrameFiles: successfully deleted file=work/wudata_06.ckp
[05:07:06] - Created dyn
[05:07:06] - Files status OK
[05:07:06] - Expanded 46673 -> 252912 (decompressed 541.8 percent)
[05:07:06] Called DecompressByteArray: compressed_data_size=46673 data_size=252912, decompressed_data_size=252912 diff=0
[05:07:06] - Digital signature verified
[05:07:06]
[05:07:06] Project: 5765 (Run 3, Clone 172, Gen 1989)
[05:07:06]
[05:07:06] Assembly optimizations on if available.
[05:07:06] Entering M.D.
[05:07:12] Tpr hash work/wudata_06.tpr:  3755783369 2118649657 2852838677 114027585 868645771
[05:07:12]
[05:07:12] Calling fah_main args: 14 usage=100
[05:07:12]
[05:07:12] Working on Protein
[05:07:12] Client config found, loading data.
[05:07:13] mdrun_gpu returned
[05:07:13] Going to send back what have done -- stepsTotalG=15000000
[05:07:13] Work fraction=0.0000 steps=15000000.
[05:07:13] Starting GUI Server
[05:07:17] logfile size=0 infoLength=0 edr=0 trr=25
[05:07:17] + Opened results file
[05:07:17] - Writing 644 bytes of core data to disk...
[05:07:17] Done: 132 -> 129 (compressed to 97.7 percent)
[05:07:17]   ... Done.
[05:07:17] DeleteFrameFiles: successfully deleted file=work/wudata_06.ckp
[05:07:17]
[05:07:17] Folding@home Core Shutdown: EARLY_UNIT_END
[05:07:42] CoreStatus = C0000005 (-1073741819)
[05:07:42] Client-core communications error: ERROR 0xc0000005
[05:07:42] This is a sign of more serious problems, shutting down.
[08:42:10] + Working...
Darth_Fester
 
Posts: 1
Joined: Sun Feb 28, 2010 2:49 pm

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby JimF » Sun Feb 28, 2010 4:38 pm

Darth_Fester wrote:Windows 7 64bit, i7 CPU 920 @2.67GHz, 12GB RAM
GeForce GTX 260 4095MB Driver 196.21

Log:
[05:07:17] Folding@home Core Shutdown: EARLY_UNIT_END
[05:07:42] CoreStatus = C0000005 (-1073741819)
[05:07:42] Client-core communications error: ERROR 0xc0000005
[05:07:42] This is a sign of more serious problems, shutting down.
[08:42:10] + Working...

How long have you been using this machine? Any success before now? I have seen problems with Win7 64-bit on at least one motherboard/Video card combination, though it works OK on others. But in general, WinXP seems to be more stable, and probably Vista is too. I think they changed the driver model going to Win7.
GTX 970 (i5-3550), GTX 980 (i7-3770); Win10 64-bit; FAH 7.4.4
JimF
 
Posts: 486
Joined: Thu Jan 21, 2010 2:03 pm

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby bruce » Fri Mar 19, 2010 11:03 pm

BuddhaChu wrote:Failed today:

Project: 5772 (Run 6, Clone 182, Gen 671)
OS: Windows 7 Pro
Card: 8800 GT
Drivers: 195.55


It really helps when you report WUs individually if you suspect it's a bad WU. See viewtopic.php?f=19&t=13867
bruce
 
Posts: 22616
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby noorman » Wed May 19, 2010 9:51 pm

shdbcamping wrote:
bruce wrote:I"d guess that your GPU is on the verge of instability. FahCore_11 typically uses more power and generates more heat than FahCore_14 and that tends to push things closer to being unstable. Also, if your power supply is marginal, the voltage may drop just enough to become a problem.

Which drivers are you using?

Bruce, have you read all the posts of the FAHcore11 of late? This is not a card problem.

Please Kick this up to the FAHcore 11 Engineers and take it away from moderator excuses.
.

I 've run lots and lots of FahCore_11 work on my 2 GPU rigs and they ran without problems !

In all those hundreds, I can only find 1 that did an EUE !

.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
User avatar
noorman
 
Posts: 548
Joined: Sun Dec 02, 2007 2:26 pm
Location: Belgium, near the International Sea-Port of Antwerp

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE er

Postby eastrider » Tue Nov 09, 2010 10:44 pm

Hello people.

I'm new to F@H GPU3, been a long time without folding, and it's the first time I do with this PC (W7x64, 8800GT SLI, Latest drivers, GPU3).

The thing is, one GPU folds perfectly (the one with n oscreen attached to it) but my main GPU (GPU1) can't fold.

[22:36:48] mdrun_gpu returned 52
[22:36:48] NANs detected on GPU
[22:36:48]
[22:36:48] Folding@home Core Shutdown: UNSTABLE_MACHINE
[22:36:52] CoreStatus = 7A (122)
[22:36:52] Sending work to server
[22:36:52] Project: 11179 (Run 11, Clone 98, Gen 8)
[22:36:52] - Read packet limit of 540015616... Set to 524286976.
[22:36:52] - Error: Could not get length of results file work/wuresults_01.dat
[22:36:52] - Error: Could not read unit 01 file. Removing from queue.
[22:36:52] + -oneunit flag given and have now finished a unit. Exiting.
Folding@Home Client Shutdown.



Noticed this on 11179 project. Been happening to me over 50 times in one day. I can't finish a WU. It stops after barely 5 minutes.


I have all the correct flags, tried with al SLI and PhysX configurations, and it just doesn't work. Even the card itself, alone in the system with no other GPU on it, can't fold.

Do I have a bad problem? How do I fix this?
eastrider
 
Posts: 7
Joined: Tue Nov 17, 2009 8:31 pm

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE er

Postby toTOW » Tue Nov 09, 2010 10:45 pm

Did you extend your desktop to a monitor (or a dummy plug) on the second card to enable it ?

edit :
Even the card itself, alone in the system with no other GPU on it, can't fold.


That's not a good sign ... try to run MemtestG80/MemtestCL on it and report the number of errors it found.
Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.

FAH-Addict : latest news, tests and reviews about Folding@Home project.

Image
User avatar
toTOW
Site Moderator
 
Posts: 8766
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE er

Postby noorman » Tue Nov 09, 2010 10:49 pm

.

are you sure that card is cooled well enough ?
have you got a monitor running which tells you the cards reported temps ?
Which driver (version) are you running with ?

.
User avatar
noorman
 
Posts: 548
Joined: Sun Dec 02, 2007 2:26 pm
Location: Belgium, near the International Sea-Port of Antwerp

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE er

Postby eastrider » Tue Nov 09, 2010 10:59 pm

noorman wrote:.

are you sure that card is cooled well enough ?
have you got a monitor running which tells you the cards reported temps ?
Which driver (version) are you running with ?

.


The last one. Both cards are internally watercooled. The highest temp this GPU has EVER reached is 46ºC

I'm running the latest version, 2xx.xx, don't know exactly the number but it's the latest avaiable.


Gonna try some Memtest right now...
eastrider
 
Posts: 7
Joined: Tue Nov 17, 2009 8:31 pm

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE er

Postby eastrider » Tue Nov 09, 2010 11:15 pm

Ran 50 itinerations of MemtestG80 at stock speeds.

I got a single error on the itineration nº 49.

...What can I do about that?
eastrider
 
Posts: 7
Joined: Tue Nov 17, 2009 8:31 pm

Postby eastrider » Wed Nov 10, 2010 12:15 am

409 errors on a 400MB / 100 itinerations run.

There is ANY way to fix these errors? Faulty VRAM?
eastrider
 
Posts: 7
Joined: Tue Nov 17, 2009 8:31 pm

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE er

Postby noorman » Wed Nov 10, 2010 12:30 am

.

If the RAM on the GPU card isn't 100%, only an RMA will solve that ...

That program is specific for GPU card RAM testing.

.
User avatar
noorman
 
Posts: 548
Joined: Sun Dec 02, 2007 2:26 pm
Location: Belgium, near the International Sea-Port of Antwerp

PreviousNext

Return to NVIDIA specific issues

Who is online

Users browsing this forum: No registered users and 1 guest

cron