WU_STALLED then a series of BAD_WORK_UNIT errors

Moderators: Site Moderators, PandeGroup

WU_STALLED then a series of BAD_WORK_UNIT errors

Postby SteveWillis » Wed May 17, 2017 10:29 pm

This is what I woke up to this morning


Code: Select all
FS00 is a gtx 1080 ti  that is NOT overclocked  Nvidia driver 381.09
Notice it's always WU02

egrep  "Date|over|STALLED|BAD|wuresults|boost|ENUM|error"  log.txt |grep -v NO_ERROR

******************************* Date: 2017-05-16 *******************************
******************************* Date: 2017-05-16 *******************************
******************************* Date: 2017-05-16 *******************************
******************************* Date: 2017-05-17 *******************************
01:35:48:WARNING:WU02:FS00:FahCore returned: WU_STALLED (127 = 0x7f)
01:45:21:WU02:FS00:0x21:ERROR:exception: boost::filesystem::current_path: No such file or directory
01:45:22:WU02:FS00:0x21:ERROR:Exception: 117: Failed to open '../wuresults_01.dat': No such file or directory
01:45:23:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
01:45:23:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:11431 run:3 clone:13 gen:22 core:0x21 unit:0x0000001c8ca304e858e137b895baf162
03:00:10:WU02:FS01:0x21:ERROR:Exception: 117: Failed to open '../wuresults_01.dat': No such file or directory
03:00:12:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
03:00:12:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:13204 run:23 clone:7 gen:130 core:0x21 unit:0x00000051ab436c6657894f0c932af21f
03:30:23:WU02:FS00:0x21:ERROR:exception: boost::filesystem::current_path: No such file or directory
03:30:23:WU02:FS00:0x21:ERROR:Exception: 117: Failed to open '../wuresults_01.dat': No such file or directory
03:30:24:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
03:30:24:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:9175 run:25 clone:0 gen:354 core:0x21 unit:0x00000211ab436c6957b24c2805bf34bf
04:30:15:WU02:FS00:0x21:ERROR:exception: boost::filesystem::current_path: No such file or directory
04:30:16:WU02:FS00:0x21:ERROR:Exception: 117: Failed to open '../wuresults_01.dat': No such file or directory
04:30:17:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
04:30:17:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:13204 run:32 clone:12 gen:58 core:0x21 unit:0x00000022ab436c6657894f0ca38c7feb
04:45:10:WU02:FS01:0x21:ERROR:exception: boost::filesystem::current_path: No such file or directory
04:45:11:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
04:45:11:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:9431 run:649 clone:0 gen:300 core:0x21 unit:0x00000161ab436c9d586fdd394791436c
05:15:51:WU02:FS02:0x21:ERROR:exception: boost::filesystem::current_path: No such file or directory
05:15:52:WU02:FS02:0x21:ERROR:Exception: 117: Failed to open '../wuresults_01.dat': No such file or directory
05:15:53:WARNING:WU02:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
05:15:53:WU02:FS02:Sending unit results: id:02 state:SEND error:FAULTY project:13204 run:26 clone:6 gen:124 core:0x21 unit:0x0000004aab436c6657894f0ce75f7302
05:30:19:WU02:FS03:0x21:ERROR:exception: boost::filesystem::current_path: No such file or directory
05:30:19:WU02:FS03:0x21:ERROR:Exception: 117: Failed to open '../wuresults_01.dat': No such file or directory
05:30:21:WARNING:WU02:FS03:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
05:30:21:WU02:FS03:Sending unit results: id:02 state:SEND error:FAULTY project:9414 run:658 clone:0 gen:166 core:0x21 unit:0x000000c1ab436c9d585e0694a6110625
06:15:09:WU02:FS01:0x21:ERROR:exception: boost::filesystem::current_path: No such file or directory
06:15:10:WU02:FS01:0x21:ERROR:Exception: 117: Failed to open '../wuresults_01.dat': No such file or directory
06:15:11:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
06:15:11:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:9178 run:11 clone:10 gen:104 core:0x21 unit:0x00000084ab436c6957b24c2ad178d59e
******************************* Date: 2017-05-17 *******************************
07:30:14:WU02:FS00:0x21:ERROR:exception: boost::filesystem::current_path: No such file or directory
07:30:15:WU02:FS00:0x21:ERROR:Exception: 117: Failed to open '../wuresults_01.dat': No such file or directory
07:30:16:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
07:30:16:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:13204 run:23 clone:10 gen:178 core:0x21 unit:0x0000006dab436c6657894f0c76a540dc
07:45:07:WU02:FS01:0x21:ERROR:exception: boost::filesystem::current_path: No such file or directory
07:45:07:WU02:FS01:0x21:ERROR:Exception: 117: Failed to open '../wuresults_01.dat': No such file or directory
07:45:08:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
07:45:08:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:9180 run:10 clone:1 gen:279 core:0x21 unit:0x000001a5ab436c9f57bdce054127eea6
08:00:29:WU02:FS03:0x21:ERROR:exception: boost::filesystem::current_path: No such file or directory
08:00:30:WARNING:WU02:FS03:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
08:00:30:WU02:FS03:Sending unit results: id:02 state:SEND error:FAULTY project:9189 run:2 clone:80 gen:193 core:0x21 unit:0x000000eaab40415457cb2bfd7cc9b05d
08:48:12:WU02:FS02:0x21:ERROR:exception: boost::filesystem::current_path: No such file or directory
08:48:13:WU02:FS02:0x21:ERROR:Exception: 117: Failed to open '../wuresults_01.dat': No such file or directory
08:48:14:WARNING:WU02:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
08:48:14:WU02:FS02:Sending unit results: id:02 state:SEND error:FAULTY project:10494 run:0 clone:22 gen:421 core:0x21 unit:0x0000024a8ca304f555de927fff54bc08
09:30:11:WU02:FS03:0x21:ERROR:exception: boost::filesystem::current_path: No such file or directory
09:30:12:WU02:FS03:0x21:ERROR:Exception: 117: Failed to open '../wuresults_01.dat': No such file or directory
09:30:13:WARNING:WU02:FS03:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
09:30:13:WU02:FS03:Sending unit results: id:02 state:SEND error:FAULTY project:11431 run:4 clone:48 gen:40 core:0x21 unit:0x0000002f8ca304e858e137b8569e35a3
09:45:32:WU02:FS00:0x21:ERROR:exception: boost::filesystem::current_path: No such file or directory
09:45:33:WU02:FS00:0x21:ERROR:Exception: 117: Failed to open '../wuresults_01.dat': No such file or directory
09:45:34:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
09:45:34:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:9191 run:2 clone:47 gen:233 core:0x21 unit:0x0000013cab40415457cb2cead383db8a
09:51:08:WU03:FS02:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
Wed May 17 05:11:43 CDT 2017




WU directory for 02 doesn't exist!
Code: Select all
steve@fah01 /var/lib/fahclient/work $ ls -ltr
total 84
drwxrwxrwx 3 fahclient root  4096 May 17 05:04 01
drwxrwxrwx 3 fahclient root  4096 May 17 05:06 03
drwxrwxrwx 3 fahclient root  4096 May 17 05:15 00
-rw-rw-r-- 1 fahclient root 40960 May 17 05:18 client.db
-rw-rw-r-- 1 fahclient root 25136 May 17 05:18 client.db-journal
drwxrwxrwx 3 fahclient root  4096 May 17 05:18 04
steve@fah01 /var/lib/fahclient/work $




first 200 lines of log.txt
Code: Select all
steve@fah01 /var/lib/fahclient $ cat log.txt |grep -v topology |head -200
*********************** Log Started 2017-05-11T12:48:36Z ***********************
12:48:36:************************* Folding@home Client *************************
12:48:36:        Website: http://folding.stanford.edu/
12:48:36:      Copyright: (c) 2009-2016 Stanford University
12:48:36:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
12:48:36:           Args: --child --lifeline 1837 /etc/fahclient/config.xml --run-as
12:48:36:                 fahclient --pid-file=/var/run/fahclient.pid --daemon
12:48:36:         Config: /etc/fahclient/config.xml
12:48:36:******************************** Build ********************************
12:48:36:        Version: 7.4.16
12:48:36:           Date: Jan 6 2017
12:48:36:           Time: 08:08:33
12:48:36:     Repository: Git
12:48:36:       Revision: e12187cbb0bd6937c067b9749af011374563b7b9
12:48:36:         Branch: master
12:48:36:       Compiler: GNU 4.9.2
12:48:36:        Options: -std=gnu++98 -O3 -funroll-loops -ffast-math -mfpmath=sse
12:48:36:                 -fno-unsafe-math-optimizations -msse2
12:48:36:       Platform: linux2 4.8.0-2-amd64
12:48:36:           Bits: 64
12:48:36:           Mode: Release
12:48:36:******************************* System ********************************
12:48:36:            CPU: AMD FX(tm)-6300 Six-Core Processor
12:48:36:         CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
12:48:36:           CPUs: 6
12:48:36:         Memory: 7.70GiB
12:48:36:    Free Memory: 7.19GiB
12:48:36:        Threads: POSIX_THREADS
12:48:36:     OS Version: 4.4
12:48:36:    Has Battery: false
12:48:36:     On Battery: false
12:48:36:     UTC Offset: -5
12:48:36:            PID: 1839
12:48:36:            CWD: /var/lib/fahclient
12:48:36:             OS: Linux 4.4.0-21-generic x86_64
12:48:36:        OS Arch: AMD64
12:48:36:           GPUs: 4
12:48:36:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:5 GP104 [GeForce GTX 1080]
12:48:36:          GPU 1: Bus:8 Slot:0 Func:0 NVIDIA:5 GP102 [GeForce GTX 1080 Ti]
12:48:36:          GPU 2: Bus:9 Slot:0 Func:0 NVIDIA:5 GP104 [GeForce GTX 1080]
12:48:36:          GPU 3: Bus:10 Slot:0 Func:0 NVIDIA:5 GP104 [GeForce GTX 1080]
12:48:36:  CUDA Device 0: Platform:0 Device:0 Bus:8 Slot:0 Compute:6.1 Driver:8.0
12:48:36:  CUDA Device 1: Platform:0 Device:1 Bus:1 Slot:0 Compute:6.1 Driver:8.0
12:48:36:  CUDA Device 2: Platform:0 Device:2 Bus:9 Slot:0 Compute:6.1 Driver:8.0
12:48:36:  CUDA Device 3: Platform:0 Device:3 Bus:10 Slot:0 Compute:6.1 Driver:8.0
12:48:36:OpenCL Device 0: Platform:0 Device:0 Bus:8 Slot:0 Compute:1.2 Driver:381.9
12:48:36:OpenCL Device 1: Platform:0 Device:1 Bus:1 Slot:0 Compute:1.2 Driver:381.9
12:48:36:OpenCL Device 2: Platform:0 Device:2 Bus:9 Slot:0 Compute:1.2 Driver:381.9
12:48:36:OpenCL Device 3: Platform:0 Device:3 Bus:10 Slot:0 Compute:1.2 Driver:381.9
12:48:36:***********************************************************************
12:48:36:<config>
12:48:36:  <!-- Client Control -->
12:48:36:  <fold-anon v='true'/>
12:48:36:
12:48:36:  <!-- Folding Core -->
12:48:36:  <core-priority v='low'/>
12:48:36:
12:48:36:  <!-- Folding Slot Configuration -->
12:48:36:  <gpu v='false'/>
12:48:36:
12:48:36:  <!-- Network -->
12:48:36:  <proxy v=':8080'/>
12:48:36:
12:48:36:  <!-- Slot Control -->
12:48:36:  <power v='full'/>
12:48:36:
12:48:36:  <!-- User Information -->
12:48:36:  <passkey v='********************************'/>
12:48:36:  <team v='224497'/>
12:48:36:  <user v='DarthMouse_ALL_1GD5nCZbh7gNo1SESPLT24xEd2Jsu4rTP9'/>
12:48:36:
12:48:36:  <!-- Work Unit Control -->
12:48:36:  <next-unit-percentage v='100'/>
12:48:36:
12:48:36:  <!-- Folding Slots -->
12:48:36:  <slot id='1' type='GPU'>
12:48:36:    <gpu-index v='0'/>
12:48:36:  </slot>
12:48:36:  <slot id='2' type='GPU'>
12:48:36:    <gpu-index v='2'/>
12:48:36:  </slot>
12:48:36:  <slot id='3' type='GPU'>
12:48:36:    <gpu-index v='3'/>
12:48:36:  </slot>
12:48:36:  <slot id='0' type='GPU'>
12:48:36:    <gpu-index v='1'/>
12:48:36:  </slot>
12:48:36:</config>
12:48:36:Switching to user fahclient
12:48:36:Trying to access database...
12:48:36:Successfully acquired database lock
12:48:36:Enabled folding slot 01: READY gpu:0:GP104 [GeForce GTX 1080]
12:48:36:Enabled folding slot 02: READY gpu:2:GP104 [GeForce GTX 1080]
12:48:36:Enabled folding slot 03: READY gpu:3:GP104 [GeForce GTX 1080]
12:48:36:Enabled folding slot 00: READY gpu:1:GP102 [GeForce GTX 1080 Ti]
12:48:36:WU00:FS00:Starting
12:48:36:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1839 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
12:48:36:WU00:FS00:Started FahCore on PID 1850
12:48:36:WU00:FS00:Core PID:1854
12:48:36:WU00:FS00:FahCore 0x21 started
12:48:37:WU02:FS02:Starting
12:48:37:WU02:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 02 -suffix 01 -version 704 -lifeline 1839 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 2 -cuda-device 2 -gpu 2
12:48:37:WU02:FS02:Started FahCore on PID 1862
12:48:37:WU02:FS02:Core PID:1866
12:48:37:WU02:FS02:FahCore 0x21 started
12:48:28:WU03:FS03:Starting
12:48:28:WU03:FS03:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 03 -suffix 01 -version 704 -lifeline 1839 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 3 -cuda-device 3 -gpu 3
12:48:28:WU03:FS03:Started FahCore on PID 1893
12:48:28:WU03:FS03:Core PID:1897
12:48:28:WU03:FS03:FahCore 0x21 started
12:48:28:WU01:FS01:Starting
12:48:28:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 704 -lifeline 1839 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu 1
12:48:28:WU01:FS01:Started FahCore on PID 1903
12:48:28:WU01:FS01:Core PID:1907
12:48:28:WU01:FS01:FahCore 0x21 started
12:48:29:WU01:FS01:0x21:*********************** Log Started 2017-05-11T12:48:28Z ***********************
12:48:29:WU01:FS01:0x21:Project: 13204 (Run 36, Clone 12, Gen 6)
12:48:29:WU01:FS01:0x21:Unit: 0x00000004ab436c6657894f0cc036ca96
12:48:29:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
12:48:29:WU01:FS01:0x21:Machine: 1
12:48:29:WU01:FS01:0x21:Reading tar file core.xml
12:48:29:WU01:FS01:0x21:Reading tar file integrator.xml
12:48:29:WU01:FS01:0x21:Reading tar file state.xml
12:48:29:WU00:FS00:0x21:*********************** Log Started 2017-05-11T12:48:29Z ***********************
12:48:29:WU00:FS00:0x21:Project: 10494 (Run 10, Clone 19, Gen 342)
12:48:29:WU00:FS00:0x21:Unit: 0x000001f08ca304f555de946bf85b2d05
12:48:29:WU00:FS00:0x21:CPU: 0x00000000000000000000000000000000
12:48:29:WU00:FS00:0x21:Machine: 0
12:48:29:WU00:FS00:0x21:Digital signatures verified
12:48:29:WU00:FS00:0x21:Folding@home GPU Core21 Folding@home Core
12:48:29:WU00:FS00:0x21:Version 0.0.18
12:48:29:WU00:FS00:0x21:  Found a checkpoint file
12:48:29:WU02:FS02:0x21:*********************** Log Started 2017-05-11T12:48:29Z ***********************
12:48:29:WU02:FS02:0x21:Project: 10496 (Run 160, Clone 12, Gen 111)
12:48:29:WU02:FS02:0x21:Unit: 0x0000008b8ca304f556bbb16e03550839
12:48:29:WU02:FS02:0x21:CPU: 0x00000000000000000000000000000000
12:48:29:WU02:FS02:0x21:Machine: 2
12:48:29:WU02:FS02:0x21:Digital signatures verified
12:48:29:WU02:FS02:0x21:Folding@home GPU Core21 Folding@home Core
12:48:29:WU02:FS02:0x21:Version 0.0.18
12:48:29:WU02:FS02:0x21:  Found a checkpoint file
12:48:29:WU03:FS03:0x21:*********************** Log Started 2017-05-11T12:48:29Z ***********************
12:48:29:WU03:FS03:0x21:Project: 10496 (Run 12, Clone 50, Gen 41)
12:48:29:WU03:FS03:0x21:Unit: 0x000000348ca304f5588959cb91dd4207
12:48:29:WU03:FS03:0x21:CPU: 0x00000000000000000000000000000000
12:48:29:WU03:FS03:0x21:Machine: 3
12:48:29:WU03:FS03:0x21:Digital signatures verified
12:48:29:WU03:FS03:0x21:Folding@home GPU Core21 Folding@home Core
12:48:29:WU03:FS03:0x21:Version 0.0.18
12:48:33:WU01:FS01:0x21:Reading tar file system.xml
12:48:35:WU01:FS01:0x21:Digital signatures verified
12:48:35:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
12:48:35:WU01:FS01:0x21:Version 0.0.18
12:49:09:WU02:FS02:0x21:Completed 1625000 out of 2000000 steps (81%)
12:49:09:WU02:FS02:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
12:49:13:WU03:FS03:0x21:Completed 0 out of 2000000 steps (0%)
12:49:13:WU03:FS03:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
12:49:28:WU00:FS00:0x21:Completed 900000 out of 1000000 steps (90%)
12:49:28:WU00:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
12:50:06:WU01:FS01:0x21:Completed 0 out of 2000000 steps (0%)
12:50:06:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
12:50:39:WU02:FS02:0x21:Completed 1640000 out of 2000000 steps (82%)
12:51:16:WU03:FS03:0x21:Completed 20000 out of 2000000 steps (1%)
12:51:36:WU01:FS01:0x21:Completed 20000 out of 2000000 steps (1%)
12:51:56:WU00:FS00:0x21:Completed 910000 out of 1000000 steps (91%)
12:52:39:WU02:FS02:0x21:Completed 1660000 out of 2000000 steps (83%)
12:53:04:WU01:FS01:0x21:Completed 40000 out of 2000000 steps (2%)
12:53:17:WU03:FS03:0x21:Completed 40000 out of 2000000 steps (2%)
12:54:22:WU00:FS00:0x21:Completed 920000 out of 1000000 steps (92%)
12:54:33:WU01:FS01:0x21:Completed 60000 out of 2000000 steps (3%)
12:54:40:WU02:FS02:0x21:Completed 1680000 out of 2000000 steps (84%)
12:55:19:WU03:FS03:0x21:Completed 60000 out of 2000000 steps (3%)
12:56:03:WU01:FS01:0x21:Completed 80000 out of 2000000 steps (4%)
12:56:41:WU02:FS02:0x21:Completed 1700000 out of 2000000 steps (85%)
12:56:47:WU00:FS00:0x21:Completed 930000 out of 1000000 steps (93%)
12:57:20:WU03:FS03:0x21:Completed 80000 out of 2000000 steps (4%)
12:57:32:WU01:FS01:0x21:Completed 100000 out of 2000000 steps (5%)
12:58:42:WU02:FS02:0x21:Completed 1720000 out of 2000000 steps (86%)
12:59:02:WU01:FS01:0x21:Completed 120000 out of 2000000 steps (6%)
12:59:13:WU00:FS00:0x21:Completed 940000 out of 1000000 steps (94%)
12:59:21:WU03:FS03:0x21:Completed 100000 out of 2000000 steps (5%)
13:00:32:WU01:FS01:0x21:Completed 140000 out of 2000000 steps (7%)
13:00:43:WU02:FS02:0x21:Completed 1740000 out of 2000000 steps (87%)
13:01:22:WU03:FS03:0x21:Completed 120000 out of 2000000 steps (6%)
13:01:38:WU00:FS00:0x21:Completed 950000 out of 1000000 steps (95%)
13:02:02:WU01:FS01:0x21:Completed 160000 out of 2000000 steps (8%)
13:02:52:WU02:FS02:0x21:Completed 1760000 out of 2000000 steps (88%)
13:03:31:WU01:FS01:0x21:Completed 180000 out of 2000000 steps (9%)
13:03:32:WU03:FS03:0x21:Completed 140000 out of 2000000 steps (7%)
13:04:20:WU00:FS00:0x21:Completed 960000 out of 1000000 steps (96%)
13:04:53:WU02:FS02:0x21:Completed 1780000 out of 2000000 steps (89%)
13:05:01:WU01:FS01:0x21:Completed 200000 out of 2000000 steps (10%)
13:05:33:WU03:FS03:0x21:Completed 160000 out of 2000000 steps (8%)
13:06:37:WU01:FS01:0x21:Completed 220000 out of 2000000 steps (11%)
13:06:46:WU00:FS00:0x21:Completed 970000 out of 1000000 steps (97%)
13:06:54:WU02:FS02:0x21:Completed 1800000 out of 2000000 steps (90%)
13:07:34:WU03:FS03:0x21:Completed 180000 out of 2000000 steps (9%)
13:08:07:WU01:FS01:0x21:Completed 240000 out of 2000000 steps (12%)
13:08:55:WU02:FS02:0x21:Completed 1820000 out of 2000000 steps (91%)
13:09:12:WU00:FS00:0x21:Completed 980000 out of 1000000 steps (98%)
steve@fah01 /var/lib/fahclient $


I reinstalled the client and am up and folding now and no problems since this morning. Maybe a restart would have done the same. I'm just curious as i wasn't able to find this exact situation using the forum search.
Image
My thanks to my very indulgent wife
http://folding.extremeoverclocking.com/user_summary.php?s=&u=712804

3 AMD Linux rigs 3, 4, and 5 GPUs 7 X GTX 1080, 5 X GTX 1080 TI
SteveWillis
 
Posts: 194
Joined: Fri Apr 15, 2016 12:42 am

Re: WU_STALLED then a series of BAD_WORK_UNIT errors

Postby bruce » Wed May 17, 2017 11:53 pm

I'm afraid that the directory ../wuresults_01.dat has to start in a different directory. Each WU is download into into a new directory inside of var/lib/fahclient/work which might be 00 or 01 or 02 or larger numbers.

The file called wuresults_01.dat is a temporary file that's created soon after the WU reached 100% and is deleted when that WU uploads or expires. Without more information from your log between those two times for the particular WU that you're searching for there's just too much we don't know about what happened. What doesn't make sense is how you happen to have so many different WUs all identified as id:02, especially since there is no subdirectory 02 and it can contain only one WU.

Do you always start FAHClient with the script provided? The permissions don't look right.
sudo /etc/init.d/FAHClient start
(Depending on how you installed, that may happen automatically on reboot.)
bruce
 
Posts: 21182
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: WU_STALLED then a series of BAD_WORK_UNIT errors

Postby SteveWillis » Thu May 18, 2017 2:21 am

Hi Bruce
On both of my computers the client starts at boot. I've been folding for quite a while and never seen this problem before but the 1080 TI GPU that initially stalled has only been installed for about a week.
The entire log is 10X too big to post but if you are up to it here is a link to it.

https://drive.google.com/open?id=0B5tlKJoTKnBqTkFkbGFnNVZjZTA
SteveWillis
 
Posts: 194
Joined: Fri Apr 15, 2016 12:42 am

Re: WU_STALLED then a series of BAD_WORK_UNIT errors

Postby bruce » Thu May 18, 2017 4:54 am

The following WU were reported as missing when you restarted FAHClient. I didn't find any of them being processed in the log you posted (although I only searched for representatives.)

That means that something happened to them in the PREVIOUS log. Go to the \logs directory where previous logs are filed by date-time. Can you find the log where they were processed?

01:45:23:WU02:FS00: project:11431 run:3 clone:13 gen:22
03:00:12:WU02:FS01: project:13204 run:23 clone:7 gen:130
03:30:24:WU02:FS00: project:9175 run:25 clone:0 gen:354
04:30:17:WU02:FS00: project:13204 run:32 clone:12 gen:58
04:45:11:WU02:FS01: project:9431 run:649 clone:0 gen:300
05:15:53:WU02:FS02: project:13204 run:26 clone:6 gen:124
05:30:21:WU02:FS03: project:9414 run:658 clone:0 gen:166
06:15:11:WU02:FS01: project:9178 run:11 clone:10 gen:104
07:30:16:WU02:FS00: project:13204 run:23 clone:10 gen:178
07:45:08:WU02:FS01: project:9180 run:10 clone:1 gen:279
08:00:30:WU02:FS03: project:9189 run:2 clone:80 gen:193
08:48:14:WU02:FS02: project:run:0 clone:22 gen:421


You should find the strings in the following format when the WU starts processing and the previous format when the WU attempts to upload.
Code: Select all
Project: 11431  (Run 3, Clone 13, Gen 22)
Project: 13204  (Run 23, Clone 7, Gen 130)
bruce
 
Posts: 21182
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.


Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 1 guest

cron