Project: 5765 (Run 14, Clone 287, Gen 5947) NANs detected

Moderators: Site Moderators, FAHC Science Team

Post Reply
art_l_j_PlanetAMD64
Posts: 472
Joined: Sun May 30, 2010 2:28 pm

Project: 5765 (Run 14, Clone 287, Gen 5947) NANs detected

Post by art_l_j_PlanetAMD64 »

Code: Select all

01:34:24:WU00:FS00:0x11:Project: 5765 (Run 14, Clone 287, Gen 5947)
01:34:24:WU00:FS00:0x11:
01:34:24:WU00:FS00:0x11:Assembly optimizations on if available.
01:34:24:WU00:FS00:0x11:Entering M.D.
01:34:24:WU02:FS00:Upload complete
01:34:24:WU02:FS00:Server responded WORK_ACK (400)
01:34:24:WU02:FS00:Cleaning up
01:34:30:WU00:FS00:0x11:Tpr hash 00/wudata_01.tpr:  835009443 2768827429 709192641 3143635559 3988931968
01:34:30:WU00:FS00:0x11:
01:34:30:WU00:FS00:0x11:Calling fah_main args: 14 usage=100
01:34:30:WU00:FS00:0x11:
01:34:30:WU00:FS00:0x11:Working on Protein
01:34:31:WU00:FS00:0x11:Client config unavailable.
01:34:31:WU00:FS00:0x11:Starting GUI Server
01:35:03:WU00:FS00:0x11:Completed 1%
01:35:35:WU00:FS00:0x11:Completed 2%
01:36:07:WU00:FS00:0x11:Completed 3%
01:36:38:WU00:FS00:0x11:Completed 4%
01:37:10:WU00:FS00:0x11:Completed 5%
01:37:42:WU00:FS00:0x11:Completed 6%
01:38:14:WU00:FS00:0x11:Completed 7%
01:38:46:WU00:FS00:0x11:Completed 8%
01:39:18:WU00:FS00:0x11:Completed 9%
01:39:50:WU00:FS00:0x11:Completed 10%
01:40:22:WU00:FS00:0x11:Completed 11%
01:40:54:WU00:FS00:0x11:Completed 12%
01:41:26:WU00:FS00:0x11:Completed 13%
01:41:57:WU00:FS00:0x11:Completed 14%
01:42:29:WU00:FS00:0x11:Completed 15%
01:43:01:WU00:FS00:0x11:Completed 16%
01:43:33:WU00:FS00:0x11:Completed 17%
01:44:05:WU00:FS00:0x11:Completed 18%
01:44:37:WU00:FS00:0x11:Completed 19%
01:45:09:WU00:FS00:0x11:Completed 20%
01:45:41:WU00:FS00:0x11:Completed 21%
01:46:13:WU00:FS00:0x11:Completed 22%
01:46:44:WU00:FS00:0x11:Completed 23%
01:47:16:WU00:FS00:0x11:Completed 24%
01:47:48:WU00:FS00:0x11:Completed 25%
01:48:20:WU00:FS00:0x11:Completed 26%
01:48:52:WU00:FS00:0x11:Completed 27%
01:49:24:WU00:FS00:0x11:Completed 28%
01:49:56:WU00:FS00:0x11:Completed 29%
01:50:28:WU00:FS00:0x11:Completed 30%
01:51:00:WU00:FS00:0x11:Completed 31%
01:51:32:WU00:FS00:0x11:Completed 32%
01:52:03:WU00:FS00:0x11:Completed 33%
01:52:35:WU00:FS00:0x11:Completed 34%
01:53:07:WU00:FS00:0x11:Completed 35%
01:53:39:WU00:FS00:0x11:Completed 36%
01:54:11:WU00:FS00:0x11:Completed 37%
01:54:43:WU00:FS00:0x11:Completed 38%
01:55:15:WU00:FS00:0x11:Completed 39%
01:55:47:WU00:FS00:0x11:Completed 40%
01:56:19:WU00:FS00:0x11:Completed 41%
01:56:47:WU00:FS00:0x11:Completed 42%
01:56:47:WU00:FS00:0x11:mdrun_gpu returned 
01:56:47:WU00:FS00:0x11:NANs detected on GPU
01:56:47:WU00:FS00:0x11:
01:56:47:WU00:FS00:0x11:Folding@home Core Shutdown: UNSTABLE_MACHINE
The client config:

Code: Select all

*********************** Log Started 2013-01-16T15:18:02Z ***********************
15:18:02:************************* Folding@home Client *************************
15:18:02:      Website: http://folding.stanford.edu/
15:18:02:    Copyright: (c) 2009-2012 Stanford University
15:18:02:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:18:02:         Args: --lifeline 3220 --command-port=36330
15:18:02:       Config: C:/Documents and Settings/Art Johnson/Application
15:18:02:               Data/FAHClient/config.xml
15:18:02:******************************** Build ********************************
15:18:02:      Version: 7.2.9
15:18:02:         Date: Oct 3 2012
15:18:02:         Time: 18:05:48
15:18:02:      SVN Rev: 3578
15:18:02:       Branch: fah/trunk/client
15:18:02:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
15:18:02:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
15:18:02:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
15:18:02:     Platform: win32 XP
15:18:02:         Bits: 32
15:18:02:         Mode: Release
15:18:02:******************************* System ********************************
15:18:02:          CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 4200+
15:18:02:       CPU ID: AuthenticAMD Family 15 Model 43 Stepping 1
15:18:02:         CPUs: 2
15:18:02:       Memory: 2.00GiB
15:18:02:  Free Memory: 1.49GiB
15:18:02:      Threads: WINDOWS_THREADS
15:18:02:   On Battery: false
15:18:02:   UTC offset: -8
15:18:02:          PID: 3812
15:18:02:          CWD: C:/Documents and Settings/Art Johnson/Application Data/FAHClient
15:18:02:           OS: Microsoft Windows XP Service Pack 3
15:18:02:      OS Arch: X86
15:18:02:         GPUs: 1
15:18:02:        GPU 0: NVIDIA:1 GT200b [GeForce GTX 275]
15:18:02:         CUDA: 1.3
15:18:02:  CUDA Driver: 5000
15:18:02:Win32 Service: false
15:18:02:***********************************************************************
15:18:02:<config>
15:18:02:  <!-- Folding Slot Configuration -->
15:18:02:  <gpu v='true'/>
15:18:02:
15:18:02:  <!-- Network -->
15:18:02:  <proxy v=':8080'/>
15:18:02:
15:18:02:  <!-- User Information -->
15:18:02:  <passkey v='********************************'/>
15:18:02:  <team v='45862'/>
15:18:02:  <user v='art_l_j_PlanetAMD64'/>
15:18:02:
15:18:02:  <!-- Work Unit Control -->
15:18:02:  <next-unit-percentage v='100'/>
15:18:02:
15:18:02:  <!-- Folding Slots -->
15:18:02:  <slot id='0' type='GPU'/>
15:18:02:  <slot id='1' type='SMP'/>
15:18:02:</config>
15:18:02:Trying to access database...
15:18:02:Successfully acquired database lock
15:18:02:Enabled folding slot 00: READY gpu:0:"GT200b [GeForce GTX 275]"
15:18:02:Enabled folding slot 01: READY smp:2
15:18:02:WU01:FS01:Starting
The GPU is a BFG GTX 275 OCX, factory OC'd to 709MHz (vs 633 MHz standard). EVGA Precision X shows the Core clock at 710MHz, GPU temp at 60C, Fan speed at 86%. This system is on a UPS, so no power bumps or noise.

Restarting, this GPU successfully completed these WUs:

Code: Select all

03:11:13:WU00:FS00:0x11:Project: 5765 (Run 14, Clone 287, Gen 5947)
03:11:13:WU00:FS00:0x11:
03:11:13:WU00:FS00:0x11:Assembly optimizations on if available.
03:11:14:WU00:FS00:0x11:Entering M.D.
03:11:54:WU00:FS00:0x11:Completed 1%
03:12:27:WU00:FS00:0x11:Completed 2%
03:13:00:WU00:FS00:0x11:Completed 3%
...
04:03:41:WU00:FS00:0x11:Completed 98%
04:04:13:WU00:FS00:0x11:Completed 99%
04:04:45:WU00:FS00:0x11:Completed 100%
04:04:45:WU00:FS00:0x11:Successful run
04:04:45:WU00:FS00:0x11:DynamicWrapper: Finished Work Unit: sleep=10000
04:04:59:WU00:FS00:0x11:Folding@home Core Shutdown: FINISHED_UNIT
04:04:59:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
04:05:00:WU00:FS00:Upload complete
04:05:00:WU00:FS00:Server responded WORK_ACK (400)
...
04:05:00:WU02:FS00:0x11:Project: 5769 (Run 2, Clone 263, Gen 2979)
04:05:00:WU02:FS00:0x11:
04:05:00:WU02:FS00:0x11:Assembly optimizations on if available.
04:05:00:WU02:FS00:0x11:Entering M.D.
04:05:07:WU02:FS00:0x11:Starting GUI Server
04:05:39:WU02:FS00:0x11:Completed 1%
04:06:11:WU02:FS00:0x11:Completed 2%
04:06:43:WU02:FS00:0x11:Completed 3%
...
04:56:45:WU02:FS00:0x11:Completed 97%
04:57:17:WU02:FS00:0x11:Completed 98%
04:57:49:WU02:FS00:0x11:Completed 99%
04:58:21:WU02:FS00:0x11:Completed 100%
04:58:21:WU02:FS00:0x11:Successful run
04:58:21:WU02:FS00:0x11:DynamicWrapper: Finished Work Unit: sleep=10000
04:58:35:WU02:FS00:0x11:Folding@home Core Shutdown: FINISHED_UNIT
04:58:35:WU02:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
04:58:36:WU02:FS00:Upload complete
04:58:36:WU02:FS00:Server responded WORK_ACK (400)
I will keep a close watch on this GPU, to see if any more events like this happen.
art_l_j_PlanetAMD64
Over 1.04 Billion Total Points
Over 185,000 Work Units
Over 3,800,000 PPD
Overall rank (if points are combined) 20 of 1721690
In memory of my Mother May 12th 1923 - February 10th 2012
art_l_j_PlanetAMD64
Posts: 472
Joined: Sun May 30, 2010 2:28 pm

Re: Project: 5765 (14, 287, 5947) NANs detected

Post by art_l_j_PlanetAMD64 »

Here is some more information, from that system ('office2'). I copied all of the log files (including the current one, 'log.txt' on the WinXP 'office2' system) to one of my Debian Linux systems, and ran these commands on them:

Code: Select all

art@art-Debian-02:~/Documents/logs-office2$ fc -l
117	 grep 'Shutdown:' *.txt | grep -v FINISHED_UNIT >office2_err.txt
118	 grep 'Shutdown:' *.txt >office2_log.txt
119	 ls -altr
120	 wc -l office2*.txt
art@art-Debian-02:~/Documents/logs-office2$ wc -l office2*.txt
   27 office2_err.txt
  908 office2_log.txt
  935 total
art@art-Debian-02:~/Documents/logs-office2$
The contents of the file 'office2_err.txt':

Code: Select all

log-20121214-185651.txt:04:06:09:WU01:FS01:0xa4:Folding@home Core Shutdown: CLIENT_DIED
log-20121214-185651.txt:04:06:10:WU00:FS00:0x11:Folding@home Core Shutdown: CLIENT_DIED
log-20121214-185651.txt:18:43:45:WU01:FS00:0x11:Folding@home Core Shutdown: CLIENT_DIED
log-20121214-185651.txt:18:43:46:WU00:FS01:0xa4:Folding@home Core Shutdown: CLIENT_DIED
log-20121215-183853.txt:18:35:35:WU00:FS00:0x11:Folding@home Core Shutdown: CLIENT_DIED
log-20121215-183853.txt:18:36:30:WU01:FS01:0xa4:Folding@home Core Shutdown: CLIENT_DIED
log-20121218-033913.txt:03:26:35:WU00:FS01:0xa4:Folding@home Core Shutdown: CLIENT_DIED
log-20121218-033913.txt:03:26:38:WU02:FS00:0x11:Folding@home Core Shutdown: CLIENT_DIED
log-20121219-031139.txt:03:05:30:WU00:FS01:0xa4:Folding@home Core Shutdown: CLIENT_DIED
log-20121221-025433.txt:02:50:23:WU00:FS01:0xa4:Folding@home Core Shutdown: CLIENT_DIED
log-20121222-034533.txt:11:26:50:WU00:FS01:0xa4:Folding@home Core Shutdown: CLIENT_DIED
log-20121222-034533.txt:03:44:35:WU02:FS01:0xa4:Folding@home Core Shutdown: CLIENT_DIED
log-20130108-234638.txt:21:38:40:WU01:FS00:0x11:Folding@home Core Shutdown: UNSTABLE_MACHINE
log-20130108-234638.txt:23:27:40:WU02:FS00:0x11:Folding@home Core Shutdown: CLIENT_DIED
log-20130108-234638.txt:23:27:48:WU00:FS01:0xa3:Folding@home Core Shutdown: CLIENT_DIED
log-20130114-234500.txt:23:37:07:WU00:FS01:0xa4:Folding@home Core Shutdown: CLIENT_DIED
log-20130114-235016.txt:23:47:36:WU00:FS01:0xa4:Folding@home Core Shutdown: CLIENT_DIED
log-20130115-025356.txt:02:50:45:WU00:FS01:0xa4:Folding@home Core Shutdown: CLIENT_DIED
log-20130115-025356.txt:02:50:52:WU02:FS00:0x11:Folding@home Core Shutdown: CLIENT_DIED
log-20130115-043555.txt:04:01:33:WU00:FS01:0xa4:Folding@home Core Shutdown: CLIENT_DIED
log-20130116-140019.txt:13:23:18:WU01:FS01:0xa4:Folding@home Core Shutdown: CLIENT_DIED
log-20130116-143035.txt:14:19:57:WU00:FS00:0x11:Folding@home Core Shutdown: CLIENT_DIED
log-20130116-151802.txt:15:04:59:WU00:FS00:0x11:Folding@home Core Shutdown: CLIENT_DIED
log-20130116-153744.txt:15:23:46:WU00:FS00:0x11:Folding@home Core Shutdown: CLIENT_DIED
log-20130117-031111.txt:01:56:47:WU00:FS00:0x11:Folding@home Core Shutdown: UNSTABLE_MACHINE
log-20130117-230000.txt:05:08:55:WU01:FS01:0xa4:Folding@home Core Shutdown: CLIENT_DIED
log-20130117-230000.txt:05:08:56:WU00:FS00:0x11:Folding@home Core Shutdown: CLIENT_DIED
So out of all of the 'Shutdown:' lines, there are ( 908 - 27 = 881 ) with 'FINISHED_UNIT', and 2 with 'UNSTABLE_MACHINE'. Most of the completed WUs are from the GPU slot, as its WUs take about 1 hour to complete. The smp:2 WUs take several days to complete.
art_l_j_PlanetAMD64
Over 1.04 Billion Total Points
Over 185,000 Work Units
Over 3,800,000 PPD
Overall rank (if points are combined) 20 of 1721690
In memory of my Mother May 12th 1923 - February 10th 2012
Post Reply