[Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Moderators: slegrand, Site Moderators, PandeGroup

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby AJL » Sat Jan 31, 2009 12:00 am

Windows Vista Ult. 64b
2xEVGA 9600GT
Drivers - 180.48
Project: 5766 (Run 6, Clone 378, Gen 15)

Tried restarting my system, restarting gpu clients, etc.
This setup was working find till around the last week or so...
Sometimes one core will keep on folding while the other will keep on failing - sometimes both are DOA.

Still getting this till it eventually has 'too many EUEs, pausing for 24 hours' :(


Code: Select all
[23:49:14] *------------------------------*
[23:49:14] Folding@Home GPU Core - Beta
[23:49:14] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[23:49:14]
[23:49:14] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[23:49:14] Build host: amoeba
[23:49:14] Board Type: Nvidia
[23:49:14] Core      :
[23:49:14] Preparing to commence simulation
[23:49:14] - Looking at optimizations...
[23:49:14] - Created dyn
[23:49:14] - Files status OK
[23:49:14] - Expanded 46745 -> 252912 (decompressed 541.0 percent)
[23:49:14] Called DecompressByteArray: compressed_data_size=46745 data_size=252912, decompressed_data_size=252912 diff=0
[23:49:14] - Digital signature verified
[23:49:14]
[23:49:14] Project: 5766 (Run 6, Clone 378, Gen 15)
[23:49:14]
[23:49:14] Assembly optimizations on if available.
[23:49:14] Entering M.D.
[23:49:21] Working on Protein
[23:49:21] Client config found, loading data.
[23:49:22] mdrun_gpu returned
[23:49:22] NANs detected on GPU
[23:49:22]
[23:49:22] Folding@home Core Shutdown: UNSTABLE_MACHINE
[23:49:24] CoreStatus = 7A (122)
[23:49:24] Sending work to server
[23:49:24] Project: 5766 (Run 6, Clone 378, Gen 15)
[23:49:24] - Read packet limit of 540015616... Set to 524286976.
[23:49:24] - Error: Could not get length of results file work/wuresults_00.dat
[23:49:24] - Error: Could not read unit 00 file. Removing from queue.
[23:49:24] Trying to send all finished work units
[23:49:24] + No unsent completed units remaining.
[23:49:24] - Preparing to get new work unit...
[23:49:24] + Attempting to get work packet
[23:49:24] - Will indicate memory of 8189 MB
[23:49:24] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 15, Stepping: 11
[23:49:24] - Connecting to assignment server
[23:49:24] Connecting to http://assign-GPU.stanford.edu:8080/
[23:49:24] Posted data.
[23:49:24] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[23:49:24] + News From Folding@Home: GPU folding beta
[23:49:24] Loaded queue successfully.
[23:49:24] Connecting to http://171.67.108.11:8080/
[23:49:25] Posted data.
[23:49:25] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[23:49:25] - Attempt #1  to get work failed, and no other work to do.
Waiting before retry.
[23:49:36] + Attempting to get work packet
[23:49:36] - Will indicate memory of 8189 MB
[23:49:36] - Connecting to assignment server
[23:49:36] Connecting to http://assign-GPU.stanford.edu:8080/
[23:49:37] Posted data.
[23:49:37] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[23:49:37] + News From Folding@Home: GPU folding beta
[23:49:37] Loaded queue successfully.
[23:49:37] Connecting to http://171.67.108.11:8080/
[23:49:38] Posted data.
[23:49:38] Initial: 0000; - Receiving payload (expected size: 47241)
[23:49:38] Conversation time very short, giving reduced weight in bandwidth avg
[23:49:38] - Downloaded at ~92 kB/s
[23:49:38] - Averaged speed for that direction ~92 kB/s
[23:49:38] + Received work.
[23:49:38] Trying to send all finished work units
[23:49:38] + No unsent completed units remaining.
[23:49:38] + Closed connections
AJL
 
Posts: 68
Joined: Tue Jun 24, 2008 1:20 pm

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby AZBrandon » Sat Jan 31, 2009 6:37 pm

OK, so I've had a couple times I've posted in this thread before when I was using 180.60 drivers. I've gone maybe a week or so now on the 178.24 drivers and no more errors, at least not counting times when I accidentally logged off and switched users on the PC. Excepting that - I went from getting failures every 12-24 hours to no more failures. Either the WU's have cleaned up a lot or my own personal setup runs better on the 178.24 drivers.
AZBrandon
 
Posts: 178
Joined: Sat Jan 17, 2009 1:43 am

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby AJL » Sat Jan 31, 2009 6:56 pm

I just switched to the 181.22 drivers and now I'm folding fine - whatever :roll:
AJL
 
Posts: 68
Joined: Tue Jun 24, 2008 1:20 pm

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby heimie » Sat Jan 31, 2009 8:05 pm

AJL wrote:I just switched to the 181.22 drivers and now I'm folding fine - whatever :roll:


Have you completed any of the series of WUs that most are listing with the new drivers? For example, the latest one that just failed on me: Project: 5767 (Run 13, Clone 248, Gen 49)

Edit: I switched to these drivers and still can't fold the problematic WU's I have listed in my previous post http://foldingforum.org/viewtopic.php?f=52&t=7965&start=45#p80271 in this thread. I'm updating that post daily with the failed WU's as they EUE to Pause.
heimie
 
Posts: 79
Joined: Sat Jun 14, 2008 10:17 am
Location: Lockport, Louisiana

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby slugbug » Mon Feb 02, 2009 6:27 pm

I'm getting extremely frustrated. My EVGA 9800GTX+ SC simply refuses to work with these work units. Tried upgrading the drivers to ver.181.22 but they had no effect whatsoever.

Code: Select all
[22:02:38] Loaded queue successfully.
[22:02:38] Initialization complete
[22:02:38] - Preparing to get new work unit...
[22:02:38] + Attempting to get work packet
[22:02:38] - Connecting to assignment server
[22:02:38] - Successful: assigned to (171.67.108.11).
[22:02:38] + News From Folding@Home: GPU folding beta
[22:02:39] Loaded queue successfully.
[22:02:40] + Closed connections
[22:02:40]
[22:02:40] + Processing work unit
[22:02:40] Core required: FahCore_11.exe
[22:02:40] Core found.
[22:02:40] Working on queue slot 06 [January 30 22:02:40 UTC]
[22:02:40] + Working ...
[22:02:45]
[22:02:45] *------------------------------*
[22:02:45] Folding@Home GPU Core - Beta
[22:02:45] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[22:02:45]
[22:02:45] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[22:02:45] Build host: amoeba
[22:02:45] Board Type: Nvidia
[22:02:45] Core      :
[22:02:45] Preparing to commence simulation
[22:02:45] - Looking at optimizations...
[22:02:45] - Created dyn
[22:02:45] - Files status OK
[22:02:45] - Expanded 45438 -> 251112 (decompressed 552.6 percent)
[22:02:45] Called DecompressByteArray: compressed_data_size=45438 data_size=251112, decompressed_data_size=251112 diff=0
[22:02:45] - Digital signature verified
[22:02:45]
[22:02:45] Project: 5770 (Run 8, Clone 23, Gen 17)
[22:02:45]
[22:02:45] Assembly optimizations on if available.
[22:02:45] Entering M.D.
[22:02:52] Working on Protein
[22:02:52] Client config found, loading data.
[22:02:52] Starting GUI Server
[22:03:28] Completed 1%
[22:03:28] mdrun_gpu returned
[22:03:28] NANs detected on GPU
[22:03:28]
[22:03:28] Folding@home Core Shutdown: UNSTABLE_MACHINE
[22:03:31] CoreStatus = 7A (122)
[22:03:31] Sending work to server
[22:03:31] Project: 5770 (Run 8, Clone 23, Gen 17)
[22:03:31] - Read packet limit of 540015616... Set to 524286976.
[22:03:31] - Error: Could not get length of results file work/wuresults_06.dat
[22:03:31] - Error: Could not read unit 06 file. Removing from queue.
[22:03:31] - Preparing to get new work unit...
[22:03:31] + Attempting to get work packet
[22:03:31] - Connecting to assignment server
[22:03:32] - Successful: assigned to (171.67.108.11).
[22:03:32] + News From Folding@Home: GPU folding beta
[22:03:32] Loaded queue successfully.
[22:03:33] + Closed connections
[22:03:38]
[22:03:38] + Processing work unit
[22:03:38] Core required: FahCore_11.exe
[22:03:38] Core found.
[22:03:38] Working on queue slot 07 [January 30 22:03:38 UTC]
[22:03:38] + Working ...
[22:03:38]
[22:03:38] *------------------------------*
[22:03:38] Folding@Home GPU Core - Beta
[22:03:38] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[22:03:38]
[22:03:38] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[22:03:38] Build host: amoeba
[22:03:38] Board Type: Nvidia
[22:03:38] Core      :
[22:03:38] Preparing to commence simulation
[22:03:38] - Looking at optimizations...
[22:03:38] - Created dyn
[22:03:38] - Files status OK
[22:03:38] - Expanded 45438 -> 251112 (decompressed 552.6 percent)
[22:03:38] Called DecompressByteArray: compressed_data_size=45438 data_size=251112, decompressed_data_size=251112 diff=0
[22:03:38] - Digital signature verified
[22:03:38]
[22:03:38] Project: 5770 (Run 8, Clone 23, Gen 17)
[22:03:38]
[22:03:38] Assembly optimizations on if available.
[22:03:38] Entering M.D.
[22:03:44] Working on Protein
[22:03:45] Client config found, loading data.
[22:03:45] Starting GUI Server
[22:04:39] Completed 1%
[22:04:39] mdrun_gpu returned
[22:04:39] NANs detected on GPU
[22:04:39]
[22:04:39] Folding@home Core Shutdown: UNSTABLE_MACHINE
[22:04:42] CoreStatus = 7A (122)
[22:04:42] Sending work to server
[22:04:42] Project: 5770 (Run 8, Clone 23, Gen 17)
[22:04:42] - Read packet limit of 540015616... Set to 524286976.
[22:04:42] - Error: Could not get length of results file work/wuresults_07.dat
[22:04:42] - Error: Could not read unit 07 file. Removing from queue.
[22:04:42] - Preparing to get new work unit...
[22:04:42] + Attempting to get work packet
[22:04:42] - Connecting to assignment server
[22:04:43] - Successful: assigned to (171.67.108.11).
[22:04:43] + News From Folding@Home: GPU folding beta
[22:04:43] Loaded queue successfully.
[22:04:44] + Closed connections
[22:04:49]
[22:04:49] + Processing work unit
[22:04:49] Core required: FahCore_11.exe
[22:04:49] Core found.
[22:04:49] Working on queue slot 08 [January 30 22:04:49 UTC]
[22:04:49] + Working ...
[22:04:49]
[22:04:49] *------------------------------*
[22:04:49] Folding@Home GPU Core - Beta
[22:04:49] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[22:04:49]
[22:04:49] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[22:04:49] Build host: amoeba
[22:04:49] Board Type: Nvidia
[22:04:49] Core      :
[22:04:49] Preparing to commence simulation
[22:04:49] - Looking at optimizations...
[22:04:49] - Created dyn
[22:04:49] - Files status OK
[22:04:49] - Expanded 45438 -> 251112 (decompressed 552.6 percent)
[22:04:49] Called DecompressByteArray: compressed_data_size=45438 data_size=251112, decompressed_data_size=251112 diff=0
[22:04:49] - Digital signature verified
[22:04:49]
[22:04:49] Project: 5770 (Run 8, Clone 23, Gen 17)
[22:04:49]
[22:04:49] Assembly optimizations on if available.
[22:04:49] Entering M.D.
[22:04:56] Working on Protein
[22:04:56] Client config found, loading data.
[22:04:57] Starting GUI Server
[22:05:39] Completed 1%
[22:05:39] mdrun_gpu returned
[22:05:39] NANs detected on GPU
[22:05:39]
[22:05:39] Folding@home Core Shutdown: UNSTABLE_MACHINE
[22:05:41] CoreStatus = 7A (122)
[22:05:41] Sending work to server
[22:05:41] Project: 5770 (Run 8, Clone 23, Gen 17)
[22:05:41] - Read packet limit of 540015616... Set to 524286976.
[22:05:41] - Error: Could not get length of results file work/wuresults_08.dat
[22:05:41] - Error: Could not read unit 08 file. Removing from queue.
[22:05:41] - Preparing to get new work unit...
[22:05:41] + Attempting to get work packet
[22:05:41] - Connecting to assignment server
[22:05:42] - Successful: assigned to (171.67.108.11).
[22:05:42] + News From Folding@Home: GPU folding beta
[22:05:42] Loaded queue successfully.
[22:05:43] + Closed connections
[22:05:48]
[22:05:48] + Processing work unit
[22:05:48] Core required: FahCore_11.exe
[22:05:48] Core found.
[22:05:48] Working on queue slot 09 [January 30 22:05:48 UTC]
[22:05:48] + Working ...
[22:05:48]
[22:05:48] *------------------------------*
[22:05:48] Folding@Home GPU Core - Beta
[22:05:48] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[22:05:48]
[22:05:48] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[22:05:48] Build host: amoeba
[22:05:48] Board Type: Nvidia
[22:05:48] Core      :
[22:05:48] Preparing to commence simulation
[22:05:48] - Looking at optimizations...
[22:05:48] - Created dyn
[22:05:48] - Files status OK
[22:05:48] - Expanded 45438 -> 251112 (decompressed 552.6 percent)
[22:05:48] Called DecompressByteArray: compressed_data_size=45438 data_size=251112, decompressed_data_size=251112 diff=0
[22:05:48] - Digital signature verified
[22:05:48]
[22:05:48] Project: 5770 (Run 8, Clone 23, Gen 17)
[22:05:48]
[22:05:48] Assembly optimizations on if available.
[22:05:48] Entering M.D.
[22:05:55] Working on Protein
[22:05:55] Client config found, loading data.
[22:05:55] Starting GUI Server
[22:06:39] Completed 1%
[22:06:39] mdrun_gpu returned
[22:06:39] NANs detected on GPU
[22:06:39]
[22:06:39] Folding@home Core Shutdown: UNSTABLE_MACHINE
[22:06:42] CoreStatus = 7A (122)
[22:06:42] Sending work to server
[22:06:42] Project: 5770 (Run 8, Clone 23, Gen 17)
[22:06:42] - Read packet limit of 540015616... Set to 524286976.
[22:06:42] - Error: Could not get length of results file work/wuresults_09.dat
[22:06:42] - Error: Could not read unit 09 file. Removing from queue.
[22:06:42] - Preparing to get new work unit...
[22:06:42] + Attempting to get work packet
[22:06:42] - Connecting to assignment server
[22:06:43] - Successful: assigned to (171.67.108.11).
[22:06:43] + News From Folding@Home: GPU folding beta
[22:06:43] Loaded queue successfully.
[22:06:44] + Closed connections
[22:06:49]
[22:06:49] + Processing work unit
[22:06:49] Core required: FahCore_11.exe
[22:06:49] Core found.
[22:06:49] Working on queue slot 00 [January 30 22:06:49 UTC]
[22:06:49] + Working ...
[22:06:49]
[22:06:49] *------------------------------*
[22:06:49] Folding@Home GPU Core - Beta
[22:06:49] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[22:06:49]
[22:06:49] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[22:06:49] Build host: amoeba
[22:06:49] Board Type: Nvidia
[22:06:49] Core      :
[22:06:49] Preparing to commence simulation
[22:06:49] - Looking at optimizations...
[22:06:49] - Created dyn
[22:06:49] - Files status OK
[22:06:49] - Expanded 45438 -> 251112 (decompressed 552.6 percent)
[22:06:49] Called DecompressByteArray: compressed_data_size=45438 data_size=251112, decompressed_data_size=251112 diff=0
[22:06:49] - Digital signature verified
[22:06:49]
[22:06:49] Project: 5770 (Run 8, Clone 23, Gen 17)
[22:06:49]
[22:06:49] Assembly optimizations on if available.
[22:06:49] Entering M.D.
[22:06:55] Working on Protein
[22:06:56] Client config found, loading data.
[22:06:56] Starting GUI Server
[22:07:48] Completed 1%
[22:07:48] mdrun_gpu returned
[22:07:48] NANs detected on GPU
[22:07:48]
[22:07:48] Folding@home Core Shutdown: UNSTABLE_MACHINE
[22:07:52] CoreStatus = 7A (122)
[22:07:52] Sending work to server
[22:07:52] Project: 5770 (Run 8, Clone 23, Gen 17)
[22:07:52] - Read packet limit of 540015616... Set to 524286976.
[22:07:52] - Error: Could not get length of results file work/wuresults_00.dat
[22:07:52] - Error: Could not read unit 00 file. Removing from queue.
[22:07:52] EUE limit exceeded. Pausing 24 hours.

Folding@Home Client Shutdown.


--- Opening Log file [February 2 18:06:11 UTC]


# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.20

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Documents and Settings\Administrator\Application Data\Folding@home-gpu
Arguments: -gpu 0

[18:06:11] - Ask before connecting: No
[18:06:11] - User name: slugbug (Team 41608)
[18:06:11] - User ID: 78D5BCCE5BDA5DE9
[18:06:11] - Machine ID: 2
[18:06:11]
[18:06:11] Could not open work queue, generating new queue...
[18:06:12] Initialization complete
[18:06:12] - Preparing to get new work unit...
[18:06:12] + Attempting to get work packet
[18:06:12] - Connecting to assignment server
[18:06:12] - Successful: assigned to (171.67.108.11).
[18:06:12] + News From Folding@Home: GPU folding beta
[18:06:12] Loaded queue successfully.
[18:06:13] - Attempt #1  to get work failed, and no other work to do.
Waiting before retry.
[18:06:20] + Attempting to get work packet
[18:06:20] - Connecting to assignment server
[18:06:21] - Successful: assigned to (171.67.108.11).
[18:06:21] + News From Folding@Home: GPU folding beta
[18:06:21] Loaded queue successfully.
[18:06:22] + Closed connections
[18:06:22]
[18:06:22] + Processing work unit
[18:06:22] Core required: FahCore_11.exe
[18:06:22] Core found.
[18:06:22] Working on queue slot 01 [February 2 18:06:22 UTC]
[18:06:22] + Working ...
[18:06:22]
[18:06:22] *------------------------------*
[18:06:22] Folding@Home GPU Core - Beta
[18:06:22] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[18:06:22]
[18:06:22] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[18:06:22] Build host: amoeba
[18:06:22] Board Type: Nvidia
[18:06:22] Core      :
[18:06:22] Preparing to commence simulation
[18:06:22] - Looking at optimizations...
[18:06:22] - Created dyn
[18:06:22] - Files status OK
[18:06:22] - Expanded 45377 -> 251112 (decompressed 553.3 percent)
[18:06:22] Called DecompressByteArray: compressed_data_size=45377 data_size=251112, decompressed_data_size=251112 diff=0
[18:06:22] - Digital signature verified
[18:06:22]
[18:06:22] Project: 5771 (Run 2, Clone 27, Gen 101)
[18:06:22]
[18:06:22] Assembly optimizations on if available.
[18:06:22] Entering M.D.
[18:06:29] Working on Protein
[18:06:29] mdrun_gpu returned
[18:06:29] Self-test failure
[18:06:29]
[18:06:29] Folding@home Core Shutdown: UNSTABLE_MACHINE
[18:06:32] CoreStatus = 7A (122)
[18:06:32] Sending work to server
[18:06:32] Project: 5771 (Run 2, Clone 27, Gen 101)
[18:06:32] - Read packet limit of 540015616... Set to 524286976.
[18:06:32] - Error: Could not get length of results file work/wuresults_01.dat
[18:06:32] - Error: Could not read unit 01 file. Removing from queue.
[18:06:32] - Preparing to get new work unit...
[18:06:32] + Attempting to get work packet
[18:06:32] - Connecting to assignment server
[18:06:33] - Successful: assigned to (171.67.108.11).
[18:06:33] + News From Folding@Home: GPU folding beta
[18:06:33] Loaded queue successfully.
[18:06:34] + Closed connections
[18:06:39]
[18:06:39] + Processing work unit
[18:06:39] Core required: FahCore_11.exe
[18:06:39] Core found.
[18:06:39] Working on queue slot 02 [February 2 18:06:39 UTC]
[18:06:39] + Working ...
[18:06:39]
[18:06:39] *------------------------------*
[18:06:39] Folding@Home GPU Core - Beta
[18:06:39] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[18:06:39]
[18:06:39] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[18:06:39] Build host: amoeba
[18:06:39] Board Type: Nvidia
[18:06:39] Core      :
[18:06:39] Preparing to commence simulation
[18:06:39] - Looking at optimizations...
[18:06:39] - Created dyn
[18:06:39] - Files status OK
[18:06:39] - Expanded 45377 -> 251112 (decompressed 553.3 percent)
[18:06:39] Called DecompressByteArray: compressed_data_size=45377 data_size=251112, decompressed_data_size=251112 diff=0
[18:06:39] - Digital signature verified
[18:06:39]
[18:06:39] Project: 5771 (Run 2, Clone 27, Gen 101)
[18:06:39]
[18:06:39] Assembly optimizations on if available.
[18:06:39] Entering M.D.
[18:06:45] Working on Protein
[18:06:46] Client config found, loading data.
[18:06:46] Starting GUI Server
[18:07:32] Completed 1%
[18:07:32] mdrun_gpu returned
[18:07:32] NANs detected on GPU
[18:07:32]
[18:07:32] Folding@home Core Shutdown: UNSTABLE_MACHINE
[18:07:35] CoreStatus = 7A (122)
[18:07:35] Sending work to server
[18:07:35] Project: 5771 (Run 2, Clone 27, Gen 101)
[18:07:35] - Read packet limit of 540015616... Set to 524286976.
[18:07:35] - Error: Could not get length of results file work/wuresults_02.dat
[18:07:35] - Error: Could not read unit 02 file. Removing from queue.
[18:07:35] - Preparing to get new work unit...
[18:07:35] + Attempting to get work packet
[18:07:35] - Connecting to assignment server
[18:07:35] - Successful: assigned to (171.67.108.11).
[18:07:35] + News From Folding@Home: GPU folding beta
[18:07:36] Loaded queue successfully.
[18:07:37] + Closed connections
[18:07:42]
[18:07:42] + Processing work unit
[18:07:42] Core required: FahCore_11.exe
[18:07:42] Core found.
[18:07:42] Working on queue slot 03 [February 2 18:07:42 UTC]
[18:07:42] + Working ...
[18:07:42]
[18:07:42] *------------------------------*
[18:07:42] Folding@Home GPU Core - Beta
[18:07:42] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[18:07:42]
[18:07:42] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[18:07:42] Build host: amoeba
[18:07:42] Board Type: Nvidia
[18:07:42] Core      :
[18:07:42] Preparing to commence simulation
[18:07:42] - Looking at optimizations...
[18:07:42] - Created dyn
[18:07:42] - Files status OK
[18:07:42] - Expanded 45377 -> 251112 (decompressed 553.3 percent)
[18:07:42] Called DecompressByteArray: compressed_data_size=45377 data_size=251112, decompressed_data_size=251112 diff=0
[18:07:42] - Digital signature verified
[18:07:42]
[18:07:42] Project: 5771 (Run 2, Clone 27, Gen 101)
[18:07:42]
[18:07:42] Assembly optimizations on if available.
[18:07:42] Entering M.D.
[18:07:48] Working on Protein
[18:07:49] Client config found, loading data.
[18:07:49] Starting GUI Server
[18:08:43] Completed 1%
[18:08:43] mdrun_gpu returned
[18:08:43] NANs detected on GPU
[18:08:43]
[18:08:43] Folding@home Core Shutdown: UNSTABLE_MACHINE
[18:08:46] CoreStatus = 7A (122)
[18:08:46] Sending work to server
[18:08:46] Project: 5771 (Run 2, Clone 27, Gen 101)
[18:08:46] - Read packet limit of 540015616... Set to 524286976.
[18:08:46] - Error: Could not get length of results file work/wuresults_03.dat
[18:08:46] - Error: Could not read unit 03 file. Removing from queue.
[18:08:46] - Preparing to get new work unit...
[18:08:46] + Attempting to get work packet
[18:08:46] - Connecting to assignment server
[18:08:46] - Successful: assigned to (171.67.108.11).
[18:08:46] + News From Folding@Home: GPU folding beta
[18:08:46] Loaded queue successfully.
[18:08:47] + Closed connections
[18:08:52]
[18:08:52] + Processing work unit
[18:08:52] Core required: FahCore_11.exe
[18:08:52] Core found.
[18:08:52] Working on queue slot 04 [February 2 18:08:52 UTC]
[18:08:52] + Working ...
[18:08:52]
[18:08:52] *------------------------------*
[18:08:52] Folding@Home GPU Core - Beta
[18:08:52] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[18:08:52]
[18:08:52] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[18:08:52] Build host: amoeba
[18:08:52] Board Type: Nvidia
[18:08:52] Core      :
[18:08:52] Preparing to commence simulation
[18:08:52] - Looking at optimizations...
[18:08:52] - Created dyn
[18:08:52] - Files status OK
[18:08:52] - Expanded 45377 -> 251112 (decompressed 553.3 percent)
[18:08:52] Called DecompressByteArray: compressed_data_size=45377 data_size=251112, decompressed_data_size=251112 diff=0
[18:08:52] - Digital signature verified
[18:08:52]
[18:08:52] Project: 5771 (Run 2, Clone 27, Gen 101)
[18:08:52]
[18:08:52] Assembly optimizations on if available.
[18:08:52] Entering M.D.
[18:08:59] Working on Protein
[18:08:59] Client config found, loading data.
[18:08:59] Starting GUI Server
[18:09:55] Completed 1%
[18:09:55] mdrun_gpu returned
[18:09:55] NANs detected on GPU
[18:09:55]
[18:09:55] Folding@home Core Shutdown: UNSTABLE_MACHINE
[18:09:59] CoreStatus = 7A (122)
[18:09:59] Sending work to server
[18:09:59] Project: 5771 (Run 2, Clone 27, Gen 101)
[18:09:59] - Read packet limit of 540015616... Set to 524286976.
[18:09:59] - Error: Could not get length of results file work/wuresults_04.dat
[18:09:59] - Error: Could not read unit 04 file. Removing from queue.
[18:09:59] - Preparing to get new work unit...
[18:09:59] + Attempting to get work packet
[18:09:59] - Connecting to assignment server
[18:09:59] - Successful: assigned to (171.67.108.11).
[18:09:59] + News From Folding@Home: GPU folding beta
[18:09:59] Loaded queue successfully.
[18:10:00] + Closed connections
[18:10:05]
[18:10:05] + Processing work unit
[18:10:05] Core required: FahCore_11.exe
[18:10:05] Core found.
[18:10:05] Working on queue slot 05 [February 2 18:10:05 UTC]
[18:10:05] + Working ...
[18:10:05]
[18:10:05] *------------------------------*
[18:10:05] Folding@Home GPU Core - Beta
[18:10:05] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[18:10:05]
[18:10:05] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[18:10:05] Build host: amoeba
[18:10:05] Board Type: Nvidia
[18:10:05] Core      :
[18:10:05] Preparing to commence simulation
[18:10:05] - Looking at optimizations...
[18:10:05] - Created dyn
[18:10:05] - Files status OK
[18:10:05] - Expanded 45377 -> 251112 (decompressed 553.3 percent)
[18:10:05] Called DecompressByteArray: compressed_data_size=45377 data_size=251112, decompressed_data_size=251112 diff=0
[18:10:05] - Digital signature verified
[18:10:05]
[18:10:05] Project: 5771 (Run 2, Clone 27, Gen 101)
[18:10:05]
[18:10:05] Assembly optimizations on if available.
[18:10:05] Entering M.D.
[18:10:12] Working on Protein
[18:10:12] Client config found, loading data.
[18:10:12] Starting GUI Server
[18:11:07] Completed 1%
[18:11:43] Completed 2%
[18:11:43] mdrun_gpu returned
[18:11:43] NANs detected on GPU
[18:11:43]
[18:11:43] Folding@home Core Shutdown: UNSTABLE_MACHINE
[18:11:45] CoreStatus = 7A (122)
[18:11:45] Sending work to server
[18:11:45] Project: 5771 (Run 2, Clone 27, Gen 101)
[18:11:45] - Read packet limit of 540015616... Set to 524286976.
[18:11:45] - Error: Could not get length of results file work/wuresults_05.dat
[18:11:45] - Error: Could not read unit 05 file. Removing from queue.
[18:11:45] EUE limit exceeded. Pausing 24 hours.


Windows XP x64 with Nvidia driver 181.22
It was folding fine until these new work units appeared a month or so ago. I had the same problem when the FAH_11 core v1.15 appeared, then when v1.19 appeared it was back to business as usual. Now nadda :(
Image
slugbug
 
Posts: 133
Joined: Wed Apr 16, 2008 5:43 pm
Location: Canada

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby CPUacreage » Tue Feb 03, 2009 2:50 am

I upgraded to the latest 181.22 drivers on both my nVidia machines.
I've had one additional NAN on Machine 1; none on machine 2
Project: 5762 (Run 10, Clone 111, Gen 207)

Machine 1
Failing projects (please add a list of exact project numbers if you have them)
Machine 1
Failing projects (please add a list of exact project numbers if you have them)
Failing hardware (please add the exact GPU designation if you know it. ie 9800GTX+)
9800gtx+
Failing OS
Windows Vista 32 bits
Failing drivers (enter here the version number of the driver you use)
180.48
Comments (add below any detail you might find useful to the report)
otherwise very stable machine
Failing hardware (please add the exact GPU designation if you know it. ie 9800GTX+)
9800gtx+
Failing OS
Windows Vista 32 bits
Failing drivers (enter here the version number of the driver you use)
181.22
Comments (add below any detail you might find useful to the report)
otherwise very stable machine
CPUacreage
 
Posts: 64
Joined: Sun Dec 02, 2007 7:58 pm

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby twistedspark » Wed Feb 04, 2009 2:40 am

toTOW wrote:Before posting any report, and if you're only seeing issue on an individual WU (it can fail up to 6 times in a row before moving to another one), please check if it has already been reported as a bad WU in this forum : viewforum.php?f=19 ... if you're having issue with multiple WUs (different Project/Run/Clone/Gen numbers), please do what is described below.

I've recently seen many reports of errors on NV GPU, so I decided to create this thread. It is intended to centralize reports, and to split them from discussions around the issue. The goal is to help find a pattern that could trigger the issue. To make your report, please quote my post, and remove any line in each section that doesn't apply to your case.

  • Failing projects
    • Project: 5764 (Run 2, Clone 28, Gen 235)
  • Failing hardware
    • 9800GX2
  • Failing OS
    • Windows XP 32 bits
  • Failing drivers
      6.14.11.7828
  • Comments Completed 90% before failing on this one machine. Happened many other times but records lost after numerous reinstall attempts.

twistedspark
 
Posts: 9
Joined: Thu Jan 08, 2009 4:02 am

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby pokey » Wed Feb 04, 2009 4:02 am

Failing GPU projects:

353 points Wus (project 5765)
353 points Wus (project 5768)

Failing hardware: 9800GT

Failing OS: Windows Vista 32 bits

Failing drivers: Nvidia 9xxxx series v181.22

Comments: Changed PSU to 600W and still fails.
pokey
 
Posts: 11
Joined: Wed Feb 04, 2009 3:53 am

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby subwoofer_fuhr » Thu Feb 05, 2009 4:26 pm

Failing projects
(has failed 100% of the work it's been assigned so far @ anywhere from 4% to 96%)
--other projects not noted--
----
5755 R12 c207 Gen67
5758 R12 c35 gen216
5764 R7 C70 Gen266

card 8600GTS 512MB
Stock clocks - 40-45*C
driver 181.22
Windows Xp Pro. 32 bit
subwoofer_fuhr
 
Posts: 2
Joined: Thu Feb 05, 2009 4:15 pm

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby skinnykid63 » Thu Feb 05, 2009 9:19 pm

I have actually had the problem go away for me in the last few days. I haven't scrolled through my logs, but I haven't had a five EUE timeout and I been seeing WU's that had EUE before work fine. I can't say they are the exact same WU (ie gen or run #), but they have same point value 353 are within the 5765-5772 range. I'm not sure if this was happening before, but these WU's seem to net me about 9000ppd whereas the 511 seem to get 6-7000ppd.
skinnykid63
 
Posts: 23
Joined: Mon Jun 23, 2008 2:11 pm

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby KroontjesPen » Fri Feb 06, 2009 9:06 am

Failing GPU project:----5755 (Run 14, Clone 138, Gen 65)
Failing hardware:-------9600 GT
Failing OS:---------------Windows Vista 64 bits
Failing drivers:----------7.15.11.8120


Code: Select all
[02:25:13] Completed 83%
[02:26:33] Run: exception thrown during GuardedRun
[02:26:33] Run: exception thrown in GuardedRun -- Gromacs cannot continue further.
[02:26:33] Going to send back what have done -- stepsTotalG=10000000
[02:26:33] Work fraction=0.8337 steps=10000000.
[02:26:38] logfile size=10421 infoLength=10421 edr=0 trr=23
[02:26:38] - Writing 10957 bytes of core data to disk...
[02:26:38] Done: 10445 -> 3870 (compressed to 37.0 percent)
[02:26:38]   ... Done.
[02:26:38]
[02:26:38] Folding@home Core Shutdown: UNSTABLE_MACHINE
[02:26:40] CoreStatus = 7A (122)
[02:26:40] Sending work to server
[02:26:40] Project: 5755 (Run 14, Clone 138, Gen 65)
[02:26:40] - Read packet limit of 540015616... Set to 524286976.
User avatar
KroontjesPen
 
Posts: 51
Joined: Thu Oct 30, 2008 2:29 pm
Location: Spijkenisse, The Netherlands

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby SonicWRX » Sat Feb 07, 2009 4:20 am

# Failing projects (please add a list of exact project numbers if you have them)

5766 (Run2 Clone1 Gen115)

# Failing hardware (please add the exact GPU designation if you know it. ie 9800GTX+)

*8800GT stock clock of 680 Temp never exceeds 52C

# Failing OS

* Windows Vista 32 bits


# Failing drivers

181.20
...
# Comments

This is the only project tht fails for me I get one unit to run then it tryso to do a 5766
SonicWRX
 
Posts: 8
Joined: Fri Feb 22, 2008 9:02 pm

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby T_Flight » Sat Feb 07, 2009 8:56 am

Since my last post I have made some pretty drastic changes to my system. I have added watercooling which has not only dropped my GPU Core Temps, but the Unisink that is optional with this DTek GFX2 block also dropped my VRM temps. I now have watercooling and a heavily OC'd GTX280 SSC @720/1512/1200 (Core, Shader, Memory). I also have changed drivers to 181.22 WHQL's. I have not had an EUE since my last post at all.

My temps range from 36-40C GPU and my VRM's are running in the 54-64 range at present.

I do know this. These cards LOVE to run cool. I have achieved stable clocks that were not even possible on stock air cooling. I will continue to monitor my client and if I see anything out of the ordinary will report.
Image
T_Flight
 
Posts: 18
Joined: Tue Apr 29, 2008 4:34 am

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby khurios2000 » Wed Feb 11, 2009 1:57 am

no more EUE reports?
khurios2000
 
Posts: 5
Joined: Sun Jun 22, 2008 5:18 pm

Re: [Please read] NaNs detected on GPU - UNSTABLE_MACHINE error

Postby Shebnay » Wed Feb 11, 2009 3:10 am

If these have been reported, forgive me :P

Failing projects - 5766 (Run 14, Clone 238, Gen 164),
5767 (Run 10, Clone 131, Gen 94)

Failing hardware - 8800 GTS (stock speeds)

Failing OS - Vista 32 bit

Failing driver - 181.20

GPU client chugs along great on many other WU's. No overclocking

Log file for 5766...
Code: Select all
[00:42:16] + Processing work unit
[00:42:16] Core required: FahCore_11.exe
[00:42:16] Core found.
[00:42:16] Working on queue slot 09 [February 11 00:42:16 UTC]
[00:42:16] + Working ...
[00:42:16]
[00:42:16] *------------------------------*
[00:42:16] Folding@Home GPU Core - Beta
[00:42:16] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[00:42:16]
[00:42:16] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[00:42:16] Build host: amoeba
[00:42:16] Board Type: Nvidia
[00:42:16] Core      :
[00:42:16] Preparing to commence simulation
[00:42:16] - Looking at optimizations...
[00:42:16] - Created dyn
[00:42:16] - Files status OK
[00:42:16] - Expanded 46572 -> 252912 (decompressed 543.0 percent)
[00:42:16] Called DecompressByteArray: compressed_data_size=46572 data_size=252912, decompressed_data_size=252912 diff=0
[00:42:16] - Digital signature verified
[00:42:16]
[00:42:16] Project: 5766 (Run 14, Clone 238, Gen 164)
[00:42:16]
[00:42:16] Assembly optimizations on if available.
[00:42:16] Entering M.D.
[00:42:22] Working on Protein
[00:42:23] Client config found, loading data.
[00:42:23] mdrun_gpu returned
[00:42:23] NANs detected on GPU
[00:42:23]
[00:42:23] Folding@home Core Shutdown: UNSTABLE_MACHINE
[00:42:26] CoreStatus = 7A (122)
[00:42:26] Sending work to server
[00:42:26] Project: 5766 (Run 14, Clone 238, Gen 164)
[00:42:26] - Read packet limit of 540015616... Set to 524286976.
[00:42:26] - Error: Could not get length of results file work/wuresults_09.dat
[00:42:26] - Error: Could not read unit 09 file. Removing from queue.
[00:42:26] EUE limit exceeded. Pausing 24 hours.
Shebnay
 
Posts: 1
Joined: Wed Feb 11, 2009 2:23 am

PreviousNext

Return to NVIDIA specific issues

Who is online

Users browsing this forum: No registered users and 2 guests

cron