Bad work for Maxwell cards? (solved)

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Post Reply
Gary480six
Posts: 91
Joined: Mon Jan 21, 2008 6:42 pm

Bad work for Maxwell cards? (solved)

Post by Gary480six »

I have four different systems using the GTX 750 family of video card for GPU Folding. All are using the version 7 client and Windows 7 systems.
ALL have failed in the last 24 hours. In my mind there is a 0% chance that four different video cards could exhibit hardware failures at the same time. (while other GTX 560 and GTX 650 cards in different systems are not having issues)

After perhaps a dozen BAD WORK units, the version 7 client just marks that slot as FAILED and shuts off Folding

I suspect that Stanford has once again tweeked the assignment or work server - with BAD results.

Code: Select all

*********************** Log Started 2015-02-14T21:10:39Z ***********************
21:10:39:************************* Folding@home Client *************************
21:10:39:      Website: http://folding.stanford.edu/
21:10:39:    Copyright: (c) 2009-2014 Stanford University
21:10:39:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:10:39:         Args: 
21:10:39:       Config: C:/Users/Dells Bells/AppData/Roaming/FAHClient/config.xml
21:10:39:******************************** Build ********************************
21:10:39:      Version: 7.4.4
21:10:39:         Date: Mar 4 2014
21:10:39:         Time: 20:26:54
21:10:39:      SVN Rev: 4130
21:10:39:       Branch: fah/trunk/client
21:10:39:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
21:10:39:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
21:10:39:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
21:10:39:     Platform: win32 XP
21:10:39:         Bits: 32
21:10:39:         Mode: Release
21:10:39:******************************* System ********************************
21:10:39:          CPU: Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz
21:10:39:       CPU ID: GenuineIntel Family 6 Model 30 Stepping 5
21:10:39:         CPUs: 8
21:10:39:       Memory: 3.99GiB
21:10:39:  Free Memory: 2.82GiB
21:10:39:      Threads: WINDOWS_THREADS
21:10:39:   OS Version: 6.1
21:10:39:  Has Battery: false
21:10:39:   On Battery: false
21:10:39:   UTC Offset: -5
21:10:39:          PID: 2752
21:10:39:          CWD: C:/Users/Dells Bells/AppData/Roaming/FAHClient
21:10:39:           OS: Windows 7 Professional
21:10:39:      OS Arch: AMD64
21:10:39:         GPUs: 1
21:10:39:        GPU 0: NVIDIA:4 GM107 [GeForce GTX 750 Ti]
21:10:39:         CUDA: 5.0
21:10:39:  CUDA Driver: 6050
21:10:39:Win32 Service: false
21:10:39:***********************************************************************
21:10:39:<config>
21:10:39:  <!-- Network -->
21:10:39:  <proxy v=':8080'/>
21:10:39:
21:10:39:  <!-- Slot Control -->
21:10:39:  <power v='full'/>
21:10:39:
21:10:39:  <!-- User Information -->
21:10:39:  <passkey v='********************************'/>
21:10:39:  <team v='11108'/>
21:10:39:  <user v='Gary480six'/>
21:10:39:
21:10:39:  <!-- Folding Slots -->
21:10:39:  <slot id='1' type='GPU'>
21:10:39:    <paused v='true'/>
21:10:39:  </slot>
21:10:39:</config>
21:10:39:Trying to access database...

Above is the machine data - Below is log file before the version 7 client marked the GPU slot as failed


12:21:17:WU00:FS01:Assigned to work server 140.163.4.231
12:21:17:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 140.163.4.231
12:21:17:WU00:FS01:Connecting to 140.163.4.231:8080
12:21:17:WU00:FS01:Downloading 4.84MiB
12:21:20:WU00:FS01:Download complete
12:21:20:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13000 run:466 clone:0 gen:152 core:0x17 unit:0x00000104538b3db753101f5e2f35e90c
12:21:20:WU00:FS01:Starting
12:21:20:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Users/Dells Bells/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 00 -suffix 01 -version 704 -lifeline 2752 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
12:21:20:WU00:FS01:Started FahCore on PID 912
12:21:20:WU00:FS01:Core PID:3064
12:21:20:WU00:FS01:FahCore 0x17 started
12:21:20:WU00:FS01:0x17:*********************** Log Started 2015-02-26T12:21:20Z ***********************
12:21:20:WU00:FS01:0x17:Project: 13000 (Run 466, Clone 0, Gen 152)
12:21:20:WU00:FS01:0x17:Unit: 0x00000104538b3db753101f5e2f35e90c
12:21:20:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
12:21:20:WU00:FS01:0x17:Machine: 1
12:21:20:WU00:FS01:0x17:Reading tar file state.xml
12:21:21:WU00:FS01:0x17:Reading tar file system.xml
12:21:22:WU00:FS01:0x17:Reading tar file integrator.xml
12:21:22:WU00:FS01:0x17:Reading tar file core.xml
12:21:22:WU00:FS01:0x17:Digital signatures verified
12:21:22:WU00:FS01:0x17:Folding@home GPU core17
12:21:22:WU00:FS01:0x17:Version 0.0.52
12:24:46:WU00:FS01:0x17:ERROR:exception: Force RMSE error of 449.706 with threshold of 5
12:24:46:WU00:FS01:0x17:Saving result file logfile_01.txt
12:24:46:WU00:FS01:0x17:Saving result file log.txt
12:24:46:WU00:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
12:24:46:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:24:46:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13000 run:466 clone:0 gen:152 core:0x17 unit:0x00000104538b3db753101f5e2f35e90c
12:24:46:WU00:FS01:Uploading 2.26KiB to 140.163.4.231
12:24:46:WU00:FS01:Connecting to 140.163.4.231:8080
12:24:47:WU01:FS01:Connecting to 171.67.108.200:80
12:24:47:WU00:FS01:Upload complete
12:24:47:WU00:FS01:Server responded WORK_ACK (400)
12:24:47:WU00:FS01:Cleaning up
12:24:48:WU01:FS01:Assigned to work server 140.163.4.231
12:24:48:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 140.163.4.231
12:24:48:WU01:FS01:Connecting to 140.163.4.231:8080
12:24:48:WU01:FS01:Downloading 4.83MiB
12:24:50:WU01:FS01:Download complete
12:24:50:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13000 run:463 clone:2 gen:45 core:0x17 unit:0x0000004f538b3db753101e953ce58f70
12:24:51:WU01:FS01:Starting
12:24:51:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Users/Dells Bells/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 01 -suffix 01 -version 704 -lifeline 2752 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
12:24:51:WU01:FS01:Started FahCore on PID 2128
12:24:51:WU01:FS01:Core PID:3056
12:24:51:WU01:FS01:FahCore 0x17 started
12:24:51:WU01:FS01:0x17:*********************** Log Started 2015-02-26T12:24:51Z ***********************
12:24:51:WU01:FS01:0x17:Project: 13000 (Run 463, Clone 2, Gen 45)
12:24:51:WU01:FS01:0x17:Unit: 0x0000004f538b3db753101e953ce58f70
12:24:51:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
12:24:51:WU01:FS01:0x17:Machine: 1
12:24:51:WU01:FS01:0x17:Reading tar file state.xml
12:24:52:WU01:FS01:0x17:Reading tar file system.xml
12:24:53:WU01:FS01:0x17:Reading tar file integrator.xml
12:24:53:WU01:FS01:0x17:Reading tar file core.xml
12:24:53:WU01:FS01:0x17:Digital signatures verified
12:24:53:WU01:FS01:0x17:Folding@home GPU core17
12:24:53:WU01:FS01:0x17:Version 0.0.52
12:28:16:WU01:FS01:0x17:ERROR:exception: Force RMSE error of 454.726 with threshold of 5
12:28:16:WU01:FS01:0x17:Saving result file logfile_01.txt
12:28:16:WU01:FS01:0x17:Saving result file log.txt
12:28:16:WU01:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
12:28:17:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:28:17:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13000 run:463 clone:2 gen:45 core:0x17 unit:0x0000004f538b3db753101e953ce58f70
12:28:17:WU01:FS01:Uploading 2.24KiB to 140.163.4.231
12:28:17:WU01:FS01:Connecting to 140.163.4.231:8080
12:28:17:WU01:FS01:Upload complete
12:28:17:WU01:FS01:Server responded WORK_ACK (400)
12:28:17:WU01:FS01:Cleaning up
12:28:17:WU00:FS01:Connecting to 171.67.108.200:80
12:28:18:WU00:FS01:Assigned to work server 140.163.4.231
12:28:18:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 140.163.4.231
12:28:18:WU00:FS01:Connecting to 140.163.4.231:8080
12:28:18:WU00:FS01:Downloading 4.84MiB
12:28:20:WU00:FS01:Download complete
12:28:20:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13001 run:289 clone:0 gen:93 core:0x17 unit:0x000000aa538b3db75328a1989a078b20
12:28:20:WU00:FS01:Starting
12:28:20:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Users/Dells Bells/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 00 -suffix 01 -version 704 -lifeline 2752 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
12:28:20:WU00:FS01:Started FahCore on PID 2836
12:28:20:WU00:FS01:Core PID:1844
12:28:20:WU00:FS01:FahCore 0x17 started
12:28:21:WU00:FS01:0x17:*********************** Log Started 2015-02-26T12:28:21Z ***********************
12:28:21:WU00:FS01:0x17:Project: 13001 (Run 289, Clone 0, Gen 93)
12:28:21:WU00:FS01:0x17:Unit: 0x000000aa538b3db75328a1989a078b20
12:28:21:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
12:28:21:WU00:FS01:0x17:Machine: 1
12:28:21:WU00:FS01:0x17:Reading tar file state.xml
12:28:22:WU00:FS01:0x17:Reading tar file system.xml
12:28:23:WU00:FS01:0x17:Reading tar file integrator.xml
12:28:23:WU00:FS01:0x17:Reading tar file core.xml
12:28:23:WU00:FS01:0x17:Digital signatures verified
12:28:23:WU00:FS01:0x17:Folding@home GPU core17
12:28:23:WU00:FS01:0x17:Version 0.0.52
12:31:46:WU00:FS01:0x17:ERROR:exception: Force RMSE error of 450.572 with threshold of 5
12:31:46:WU00:FS01:0x17:Saving result file logfile_01.txt
12:31:46:WU00:FS01:0x17:Saving result file log.txt
12:31:46:WU00:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
12:31:47:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:31:47:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13001 run:289 clone:0 gen:93 core:0x17 unit:0x000000aa538b3db75328a1989a078b20
12:31:47:WU00:FS01:Uploading 2.25KiB to 140.163.4.231
12:31:47:WU00:FS01:Connecting to 140.163.4.231:8080
12:31:47:WU01:FS01:Connecting to 171.67.108.200:80
12:31:48:WU01:FS01:Assigned to work server 140.163.4.231
12:31:48:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 140.163.4.231
12:31:48:WU01:FS01:Connecting to 140.163.4.231:8080
12:31:49:WU00:FS01:Upload complete
12:31:49:WU00:FS01:Server responded WORK_ACK (400)
12:31:49:WU00:FS01:Cleaning up
12:31:50:WU01:FS01:Downloading 4.83MiB
12:31:52:WU01:FS01:Download complete
12:31:52:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13001 run:77 clone:1 gen:123 core:0x17 unit:0x000000e0538b3db75328658362add919
12:31:52:WU01:FS01:Starting
12:31:52:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Users/Dells Bells/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 01 -suffix 01 -version 704 -lifeline 2752 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
12:31:52:WU01:FS01:Started FahCore on PID 2700
12:31:52:WU01:FS01:Core PID:1076
12:31:52:WU01:FS01:FahCore 0x17 started
12:31:53:WU01:FS01:0x17:*********************** Log Started 2015-02-26T12:31:52Z ***********************
12:31:53:WU01:FS01:0x17:Project: 13001 (Run 77, Clone 1, Gen 123)
12:31:53:WU01:FS01:0x17:Unit: 0x000000e0538b3db75328658362add919
12:31:53:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
12:31:53:WU01:FS01:0x17:Machine: 1
12:31:53:WU01:FS01:0x17:Reading tar file state.xml
12:31:54:WU01:FS01:0x17:Reading tar file system.xml
12:31:55:WU01:FS01:0x17:Reading tar file integrator.xml
12:31:55:WU01:FS01:0x17:Reading tar file core.xml
12:31:55:WU01:FS01:0x17:Digital signatures verified
12:31:55:WU01:FS01:0x17:Folding@home GPU core17
12:31:55:WU01:FS01:0x17:Version 0.0.52
12:35:18:WU01:FS01:0x17:ERROR:exception: Force RMSE error of 451.976 with threshold of 5
12:35:18:WU01:FS01:0x17:Saving result file logfile_01.txt
12:35:18:WU01:FS01:0x17:Saving result file log.txt
12:35:18:WU01:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
12:35:18:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:35:18:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13001 run:77 clone:1 gen:123 core:0x17 unit:0x000000e0538b3db75328658362add919
12:35:18:WU01:FS01:Uploading 2.25KiB to 140.163.4.231
12:35:18:WU01:FS01:Connecting to 140.163.4.231:8080
12:35:18:WU00:FS01:Connecting to 171.67.108.200:80
12:35:19:WU01:FS01:Upload complete
12:35:19:WU01:FS01:Server responded WORK_ACK (400)
12:35:19:WU01:FS01:Cleaning up
12:35:19:WU00:FS01:Assigned to work server 140.163.4.231
12:35:19:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 140.163.4.231
12:35:19:WU00:FS01:Connecting to 140.163.4.231:8080
12:35:20:WU00:FS01:Downloading 4.84MiB
12:35:22:WU00:FS01:Download complete
12:35:22:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13001 run:102 clone:4 gen:129 core:0x17 unit:0x000000d6538b3db753286ca35fa9c662
12:35:22:WU00:FS01:Starting
12:35:22:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Users/Dells Bells/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 00 -suffix 01 -version 704 -lifeline 2752 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
12:35:22:WU00:FS01:Started FahCore on PID 2548
12:35:22:WU00:FS01:Core PID:2204
12:35:22:WU00:FS01:FahCore 0x17 started
12:35:23:WU00:FS01:0x17:*********************** Log Started 2015-02-26T12:35:22Z ***********************
12:35:23:WU00:FS01:0x17:Project: 13001 (Run 102, Clone 4, Gen 129)
12:35:23:WU00:FS01:0x17:Unit: 0x000000d6538b3db753286ca35fa9c662
12:35:23:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
12:35:23:WU00:FS01:0x17:Machine: 1
12:35:23:WU00:FS01:0x17:Reading tar file state.xml
12:35:24:WU00:FS01:0x17:Reading tar file system.xml
12:35:25:WU00:FS01:0x17:Reading tar file integrator.xml
12:35:25:WU00:FS01:0x17:Reading tar file core.xml
12:35:25:WU00:FS01:0x17:Digital signatures verified
12:35:25:WU00:FS01:0x17:Folding@home GPU core17
12:35:25:WU00:FS01:0x17:Version 0.0.52
12:38:48:WU00:FS01:0x17:ERROR:exception: Force RMSE error of 449.688 with threshold of 5
12:38:48:WU00:FS01:0x17:Saving result file logfile_01.txt
12:38:48:WU00:FS01:0x17:Saving result file log.txt
12:38:48:WU00:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
12:38:48:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:38:48:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13001 run:102 clone:4 gen:129 core:0x17 unit:0x000000d6538b3db753286ca35fa9c662
12:38:48:WU00:FS01:Uploading 2.26KiB to 140.163.4.231
12:38:48:WU00:FS01:Connecting to 140.163.4.231:8080
12:38:49:WU01:FS01:Connecting to 171.67.108.200:80
12:38:49:WU01:FS01:Assigned to work server 140.163.4.231
12:38:49:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 140.163.4.231
12:38:49:WU01:FS01:Connecting to 140.163.4.231:8080
12:38:50:WU00:FS01:Upload complete
12:38:50:WU00:FS01:Server responded WORK_ACK (400)
12:38:50:WU00:FS01:Cleaning up
12:38:51:WU01:FS01:Downloading 4.84MiB
12:38:53:WU01:FS01:Download complete
12:38:53:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13001 run:330 clone:2 gen:44 core:0x17 unit:0x00000055538b3db75328ad44db025d48
12:38:53:WU01:FS01:Starting
12:38:53:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Users/Dells Bells/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 01 -suffix 01 -version 704 -lifeline 2752 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
12:38:53:WU01:FS01:Started FahCore on PID 708
12:38:53:WU01:FS01:Core PID:1872
12:38:53:WU01:FS01:FahCore 0x17 started
12:38:53:WU01:FS01:0x17:*********************** Log Started 2015-02-26T12:38:53Z ***********************
12:38:53:WU01:FS01:0x17:Project: 13001 (Run 330, Clone 2, Gen 44)
12:38:53:WU01:FS01:0x17:Unit: 0x00000055538b3db75328ad44db025d48
12:38:53:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
12:38:53:WU01:FS01:0x17:Machine: 1
12:38:53:WU01:FS01:0x17:Reading tar file state.xml
12:38:54:WU01:FS01:0x17:Reading tar file system.xml
12:38:55:WU01:FS01:0x17:Reading tar file integrator.xml
12:38:55:WU01:FS01:0x17:Reading tar file core.xml
12:38:55:WU01:FS01:0x17:Digital signatures verified
12:38:55:WU01:FS01:0x17:Folding@home GPU core17
12:38:55:WU01:FS01:0x17:Version 0.0.52
12:42:19:WU01:FS01:0x17:ERROR:exception: Force RMSE error of 456.48 with threshold of 5
12:42:19:WU01:FS01:0x17:Saving result file logfile_01.txt
12:42:19:WU01:FS01:0x17:Saving result file log.txt
12:42:19:WU01:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
12:42:19:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:42:19:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13001 run:330 clone:2 gen:44 core:0x17 unit:0x00000055538b3db75328ad44db025d48
12:42:19:WU01:FS01:Uploading 2.24KiB to 140.163.4.231
12:42:19:WU01:FS01:Connecting to 140.163.4.231:8080
12:42:20:WU01:FS01:Upload complete
12:42:20:WU01:FS01:Server responded WORK_ACK (400)
12:42:20:WU01:FS01:Cleaning up
12:42:20:WU00:FS01:Connecting to 171.67.108.200:80
12:42:20:WU00:FS01:Assigned to work server 140.163.4.231
12:42:20:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 140.163.4.231
12:42:20:WU00:FS01:Connecting to 140.163.4.231:8080
12:42:20:WU00:FS01:Downloading 4.83MiB
12:42:23:WU00:FS01:Download complete
12:42:23:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13000 run:233 clone:0 gen:126 core:0x17 unit:0x000000d1538b3db7530fdda459e147a3
12:42:23:WU00:FS01:Starting
12:42:23:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Users/Dells Bells/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 00 -suffix 01 -version 704 -lifeline 2752 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
12:42:23:WU00:FS01:Started FahCore on PID 956
12:42:23:WU00:FS01:Core PID:1116
12:42:23:WU00:FS01:FahCore 0x17 started
12:42:23:WU00:FS01:0x17:*********************** Log Started 2015-02-26T12:42:23Z ***********************
12:42:23:WU00:FS01:0x17:Project: 13000 (Run 233, Clone 0, Gen 126)
12:42:23:WU00:FS01:0x17:Unit: 0x000000d1538b3db7530fdda459e147a3
12:42:23:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
12:42:23:WU00:FS01:0x17:Machine: 1
12:42:23:WU00:FS01:0x17:Reading tar file state.xml
12:42:24:WU00:FS01:0x17:Reading tar file system.xml
12:42:25:WU00:FS01:0x17:Reading tar file integrator.xml
12:42:25:WU00:FS01:0x17:Reading tar file core.xml
12:42:25:WU00:FS01:0x17:Digital signatures verified
12:42:25:WU00:FS01:0x17:Folding@home GPU core17
12:42:25:WU00:FS01:0x17:Version 0.0.52
12:45:49:WU00:FS01:0x17:ERROR:exception: Force RMSE error of 451.446 with threshold of 5
12:45:49:WU00:FS01:0x17:Saving result file logfile_01.txt
12:45:49:WU00:FS01:0x17:Saving result file log.txt
12:45:49:WU00:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
12:45:49:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:45:49:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13000 run:233 clone:0 gen:126 core:0x17 unit:0x000000d1538b3db7530fdda459e147a3
12:45:49:WU00:FS01:Uploading 2.25KiB to 140.163.4.231
12:45:49:WU00:FS01:Connecting to 140.163.4.231:8080
12:45:49:WU00:FS01:Upload complete
12:45:49:WU00:FS01:Server responded WORK_ACK (400)
12:45:49:WU00:FS01:Cleaning up
12:45:50:WU01:FS01:Connecting to 171.67.108.200:80
12:45:50:WU01:FS01:Assigned to work server 140.163.4.231
12:45:50:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 140.163.4.231
12:45:50:WU01:FS01:Connecting to 140.163.4.231:8080
12:45:50:WU01:FS01:Downloading 4.83MiB
12:45:53:WU01:FS01:Download complete
12:45:53:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13001 run:480 clone:1 gen:130 core:0x17 unit:0x000000d3538b3db7532c8426bfe0e044
12:45:53:WU01:FS01:Starting
12:45:53:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Users/Dells Bells/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 01 -suffix 01 -version 704 -lifeline 2752 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
12:45:53:WU01:FS01:Started FahCore on PID 2420
12:45:53:WU01:FS01:Core PID:636
12:45:53:WU01:FS01:FahCore 0x17 started
12:45:53:WU01:FS01:0x17:*********************** Log Started 2015-02-26T12:45:53Z ***********************
12:45:53:WU01:FS01:0x17:Project: 13001 (Run 480, Clone 1, Gen 130)
12:45:53:WU01:FS01:0x17:Unit: 0x000000d3538b3db7532c8426bfe0e044
12:45:53:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
12:45:53:WU01:FS01:0x17:Machine: 1
12:45:53:WU01:FS01:0x17:Reading tar file state.xml
12:45:54:WU01:FS01:0x17:Reading tar file system.xml
12:45:55:WU01:FS01:0x17:Reading tar file integrator.xml
12:45:55:WU01:FS01:0x17:Reading tar file core.xml
12:45:55:WU01:FS01:0x17:Digital signatures verified
12:45:55:WU01:FS01:0x17:Folding@home GPU core17
12:45:55:WU01:FS01:0x17:Version 0.0.52
12:49:19:WU01:FS01:0x17:ERROR:exception: Force RMSE error of 450.859 with threshold of 5
12:49:19:WU01:FS01:0x17:Saving result file logfile_01.txt
12:49:19:WU01:FS01:0x17:Saving result file log.txt
12:49:19:WU01:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
12:49:19:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:49:19:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13001 run:480 clone:1 gen:130 core:0x17 unit:0x000000d3538b3db7532c8426bfe0e044
12:49:19:WU01:FS01:Uploading 2.24KiB to 140.163.4.231
12:49:19:WU01:FS01:Connecting to 140.163.4.231:8080
12:49:19:WU01:FS01:Upload complete
12:49:19:WU01:FS01:Server responded WORK_ACK (400)
12:49:19:WU01:FS01:Cleaning up
Assuming this is not just happening to me, how is it that the assignment or work servers do not notice the massive numbers of failed work units and alert someone? With all the Maxwell cards Folding, they should have seen about 1000 failed work units in the last 24 hours.


I plan to just shut off these systems until I see some recognition from Stanford that they have FIXED this issue.
Last edited by Gary480six on Fri Feb 27, 2015 9:27 pm, edited 1 time in total.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Bad work for Maxwell cards?

Post by bruce »

The last couple of WUs that were assigned to my GTX750ti were project 9411 and they've been working fine. I notice that the projects in the log are all project 13000. Is that also true for your other machines? [At least that would give me a clue to explore which server(s) might have been reconfigured.]

I also notice you're not running the latest drivers. It would be interesting to know if updating them would make a difference.
Mstenholm
Posts: 84
Joined: Fri Oct 22, 2010 10:17 pm
Hardware configuration: 4 x GTX 970. Win 7.

Re: Bad work for Maxwell cards?

Post by Mstenholm »

Do you normally get 13000? I don't have a 750 but some 970s and I newer saw a 1300x on these cards. That said I think that they have some serious server problems atm but as you know you will never get a straight answer to any problem that can not be traced to your end.
JimF
Posts: 652
Joined: Thu Jan 21, 2010 2:03 pm

Re: Bad work for Maxwell cards?

Post by JimF »

I don't know if it is relevant, but my GTX 750 Ti's show CUDA Driver: 7000 in the logs. I am using the latest released driver, 347.52 (Win7 64-bit).

I did have a problem early this morning, but it was not marked "BAD_WORK_UNIT". It was a failure to contact the folding servers due to a network problem, probably due to a BOINC CPU project I was running at the time which apparently crashed. At the moment, my Folding cards are running two Core_17 work units (P13001, P10468) OK, but if they crash I will post back.
Gary480six
Posts: 91
Joined: Mon Jan 21, 2008 6:42 pm

Re: Bad work for Maxwell cards?

Post by Gary480six »

Bruce, M and Jim,

Thanks for your reply.

I took a look at the log files for three of the four GTX 750 based Folding systems - going back more than a month.

No problems with Any Core 15 or Core 18 work units. I folded dozens and dozens without issue.

I never had Any success with the P10468, P10469, P13000, P13001 Core 17 work units.

The Only Core 17 work units that did succeed were the P9201 (I did at least 10)

I can see that this is not a new problem. But what was happening in each case was that a Core 18 work unit would finish, the system would pick up a Core 17 work unit, it would crash at or before step zero, and download a different new Core 18 work unit.

I never spotted the problems because, as you can see from my earlier log file, a single work unit crashing only takes about 4 minutes.
And up until yesterday, my systems were getting single releases of Core 17 work, crashing and picking up new Core 18 work.

Only starting yesterday was I getting core 17 work from the assignment server over and over again - triggering the Failed message.


But correct me if I'm wrong, I thought that some Months ago, Stanford said that the P13000 and P13001 work would be blocked from the Maxwell based card because of this problem.
Why have I been getting them for over a month?


Bruce - as far as the newer drivers..... The drivers I'm using, were the ones reccomended by Folding users when I installed the cards. From time to time I check the Nvidia topic here at the forums to see if newer drivers are available.
But as each new driver comes out, the forum comments say "no improvement for the Folding" so I have not upgrade my drivers.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Bad work for Maxwell cards?

Post by bruce »

Gary480six wrote: Bruce - as far as the newer drivers..... The drivers I'm using, were the ones reccomended by Folding users when I installed the cards. From time to time I check the Nvidia topic here at the forums to see if newer drivers are available.
But as each new driver comes out, the forum comments say "no improvement for the Folding" so I have not upgrade my drivers.
The forum comments to which you're referring say "no change in PPD" but don't recall them saying anything about changes to the success or failure of particular projects. As I said, I'm using later drivers and I"m not seeing any problems like the ones you're describing.

If you're not willing to at least try my suggestion, there's nothing more I can offer.
rwh202
Posts: 425
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: Bad work for Maxwell cards?

Post by rwh202 »

I reported something similar a few months back - the assignment servers were re-relaxed to allow a wider range of projects to maxwell.
At the time I was running almost year old drivers and saw the same failures. As Bruce suggests, updating to later drivers meant that my 750 tis could fold everything assigned to them.
davidcoton
Posts: 1102
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: Bad work for Maxwell cards?

Post by davidcoton »

From comments elsewhere on the forum, it's possible that the newest nVidia drivers have fixed something foe Maxwell, which may have triggered a change in assignment strategy. So this time it is worth trying a driver update.
Image
Gary480six
Posts: 91
Joined: Mon Jan 21, 2008 6:42 pm

Re: Bad work for Maxwell cards?

Post by Gary480six »

Bruce, rwh and David,

Thanks for the suggestion - updating my Maxwell based GPU Folding boxes to the latest Nvidia drivers worked Perfectly!

All of those systems are now happily crunching away on P13000 and P13001 work units.

And yeah - I can be an ass sometimes... sorry
Post Reply