Faulty Work Units

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Post Reply
The REAL Specter
Posts: 5
Joined: Wed Apr 01, 2020 2:25 pm

Faulty Work Units

Post by The REAL Specter »

Good Morning everyone.
For over a day now, I have been trying to fold now on my Nvidia 1070Ti. Everything started out fine, but recently, late on april 30th lets say. The jobs would run for a few minutes then fault out saying faulty job, and terminate. Sometimes at a few percent complete, sometimes almost half complete. This is getting really frustrating.

I am not over clocked, I checked the drivers they are current, the thing is, it WAS working fine then decided not to work fine.
Is this something on my end, or any advice.
The name if this machine is SpecterUpRig

Thank you for your assistance.
Aaron
davidcoton
Posts: 1102
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: Faulty Work Units

Post by davidcoton »

Please post your log, otherwise we're working blind.
Go to Advanced Control, log tab, Refresh and Copy. Then post it here in Code tags.
Image
The REAL Specter
Posts: 5
Joined: Wed Apr 01, 2020 2:25 pm

Re: Faulty Work Units

Post by The REAL Specter »

david thank you for the reply. i should have thought of that, i did some reinstalling and will need to let the log catch something as i cleared it after getting into the advanced.
It seems the problem is it does a check point, and when it goes to resume, or continue from there, it crashes then, saying bad state, faulty project. some particle location error or something.
Ill have to provide a c/p of the log for you. I am writing now just to let you know I seen your post and am working on it.
Thank you for your assistance.
Aaron
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Faulty Work Units

Post by bruce »

FAHControl typically displays the "tail" of the active log. If it's been runiing a while and you want to see the top, davidcoton's "Refresh" is an essential step ... and perhaps turn off "Follow". Backup copies are also available in FAH's working files (see below, depending on which OS you're running)
The REAL Specter
Posts: 5
Joined: Wed Apr 01, 2020 2:25 pm

Re: Faulty Work Units

Post by The REAL Specter »

Code: Select all

*********************** Log Started 2020-04-01T20:35:11Z ***********************
20:36:13:WU01:FS01:Connecting to 65.254.110.245:8080
20:36:14:WU01:FS01:Connecting to 65.254.110.245:8080
20:36:14:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
20:36:14:WU01:FS01:Connecting to 18.218.241.186:80
20:36:14:WU01:FS01:Assigned to work server 40.114.52.201
20:36:14:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GP104 [GeForce GTX 1070 Ti] 8186 from 40.114.52.201
20:36:14:WU01:FS01:Connecting to 40.114.52.201:8080
20:36:35:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
20:36:35:WU01:FS01:Connecting to 40.114.52.201:80
20:37:08:ERROR:WU01:FS01:Exception: 10002: Received short response, expected 512 bytes, got 0
20:37:08:WU01:FS01:Connecting to 65.254.110.245:8080
20:37:09:WU01:FS01:Assigned to work server 128.252.203.10
20:37:09:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GP104 [GeForce GTX 1070 Ti] 8186 from 128.252.203.10
20:37:09:WU01:FS01:Connecting to 128.252.203.10:8080
20:37:48:ERROR:WU01:FS01:Exception: 10002: Received short response, expected 512 bytes, got 0
20:38:08:WU01:FS01:Connecting to 65.254.110.245:8080
20:38:09:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
20:38:09:WU01:FS01:Connecting to 18.218.241.186:80
20:38:09:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
20:38:09:ERROR:WU01:FS01:Exception: Could not get an assignment
20:39:46:WU01:FS01:Connecting to 65.254.110.245:8080
20:39:46:WU01:FS01:Assigned to work server 13.90.152.57
20:39:46:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GP104 [GeForce GTX 1070 Ti] 8186 from 13.90.152.57
20:39:46:WU01:FS01:Connecting to 13.90.152.57:8080
20:40:08:WU01:FS01:Downloading 86.23MiB
20:40:14:WU01:FS01:Download 17.25%
20:40:20:WU01:FS01:Download 45.73%
20:40:26:WU01:FS01:Download 70.45%
20:40:32:WU01:FS01:Download 94.51%
20:40:32:WU01:FS01:Download complete
20:40:33:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:11781 run:0 clone:8812 gen:23 core:0x22 unit:0x000000280d5a98395e7588dd5e35dea2
20:40:33:WU01:FS01:Downloading core from http://cores.foldingathome.org/v7/win/64bit/Core_22.fah
20:40:33:WU01:FS01:Connecting to cores.foldingathome.org:80
20:40:33:WU01:FS01:FahCore 22: Downloading 4.04MiB
20:40:34:WU01:FS01:FahCore 22: Download complete
20:40:34:WU01:FS01:Valid core signature
20:40:34:WU01:FS01:Unpacked 13.49MiB to cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe
20:40:34:WU01:FS01:Starting
20:40:34:WU01:FS01:Running FahCore: C:\Folding\FAHClient/FAHCoreWrapper.exe "C:\Folding\New folder\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe" -dir 01 -suffix 01 -version 705 -lifeline 5776 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
20:40:34:WU01:FS01:Started FahCore on PID 9536
20:40:35:WU01:FS01:Core PID:6508
20:40:35:WU01:FS01:FahCore 0x22 started
20:40:36:WU01:FS01:0x22:*********************** Log Started 2020-04-01T20:40:35Z ***********************
20:40:36:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
20:40:36:WU01:FS01:0x22:       Type: 0x22
20:40:36:WU01:FS01:0x22:       Core: Core22
20:40:36:WU01:FS01:0x22:    Website: https://foldingathome.org/
20:40:36:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
20:40:36:WU01:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
20:40:36:WU01:FS01:0x22:             <rafal.wiewiora@choderalab.org>
20:40:36:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 705 -lifeline 9536 -checkpoint 15
20:40:36:WU01:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
20:40:36:WU01:FS01:0x22:             0 -gpu 0
20:40:36:WU01:FS01:0x22:     Config: <none>
20:40:36:WU01:FS01:0x22:************************************ Build *************************************
20:40:36:WU01:FS01:0x22:    Version: 0.0.2
20:40:36:WU01:FS01:0x22:       Date: Dec 6 2019
20:40:36:WU01:FS01:0x22:       Time: 21:30:31
20:40:36:WU01:FS01:0x22: Repository: Git
20:40:36:WU01:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
20:40:36:WU01:FS01:0x22:     Branch: HEAD
20:40:36:WU01:FS01:0x22:   Compiler: Visual C++ 2008
20:40:36:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
20:40:36:WU01:FS01:0x22:   Platform: win32 10
20:40:36:WU01:FS01:0x22:       Bits: 64
20:40:36:WU01:FS01:0x22:       Mode: Release
20:40:36:WU01:FS01:0x22:************************************ System ************************************
20:40:36:WU01:FS01:0x22:        CPU: AMD Phenom(tm) II X6 1075T Processor
20:40:36:WU01:FS01:0x22:     CPU ID: AuthenticAMD Family 16 Model 10 Stepping 0
20:40:36:WU01:FS01:0x22:       CPUs: 6
20:40:36:WU01:FS01:0x22:     Memory: 8.00GiB
20:40:36:WU01:FS01:0x22:Free Memory: 5.26GiB
20:40:36:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
20:40:36:WU01:FS01:0x22: OS Version: 6.2
20:40:36:WU01:FS01:0x22:Has Battery: false
20:40:36:WU01:FS01:0x22: On Battery: false
20:40:36:WU01:FS01:0x22: UTC Offset: -4
20:40:36:WU01:FS01:0x22:        PID: 6508
20:40:36:WU01:FS01:0x22:        CWD: C:\Folding\New folder\work
20:40:36:WU01:FS01:0x22:         OS: Windows 10 Pro
20:40:36:WU01:FS01:0x22:    OS Arch: AMD64
20:40:36:WU01:FS01:0x22:********************************************************************************
20:40:36:WU01:FS01:0x22:Project: 11781 (Run 0, Clone 8812, Gen 23)
20:40:36:WU01:FS01:0x22:Unit: 0x000000280d5a98395e7588dd5e35dea2
20:40:36:WU01:FS01:0x22:Reading tar file core.xml
20:40:36:WU01:FS01:0x22:Reading tar file integrator.xml
20:40:36:WU01:FS01:0x22:Reading tar file state.xml
20:40:36:WU01:FS01:0x22:Reading tar file system.xml
20:40:36:WU01:FS01:0x22:Digital signatures verified
20:40:36:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
20:40:36:WU01:FS01:0x22:Version 0.0.2
20:41:59:WU01:FS01:0x22:Completed 0 out of 1000000 steps (0%)
20:41:59:WU01:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
20:44:16:WU01:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
20:46:05:WU01:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
20:47:18:WU01:FS01:0x22:ERROR:exception: clWaitForEvents
20:47:18:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
20:47:18:WU01:FS01:0x22:Saving result file checkpt.crc
20:47:18:WU01:FS01:0x22:Saving result file science.log
20:47:18:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
20:47:18:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
20:47:18:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11781 run:0 clone:8812 gen:23 core:0x22 unit:0x000000280d5a98395e7588dd5e35dea2
20:47:18:WU01:FS01:Uploading 10.00KiB to 13.90.152.57
20:47:18:WU01:FS01:Connecting to 13.90.152.57:8080
20:47:18:WU01:FS01:Upload complete
20:47:18:WU01:FS01:Server responded WORK_ACK (400)
20:47:19:WU01:FS01:Cleaning up
20:47:19:WU02:FS01:Connecting to 65.254.110.245:8080
20:47:19:WU02:FS01:Assigned to work server 128.252.203.10
20:47:19:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:GP104 [GeForce GTX 1070 Ti] 8186 from 128.252.203.10
20:47:20:WU02:FS01:Connecting to 128.252.203.10:8080
20:47:41:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
20:47:41:WU02:FS01:Connecting to 128.252.203.10:80
20:48:28:WU02:FS01:Downloading 86.24MiB
20:48:34:WU02:FS01:Download 29.86%
20:48:40:WU02:FS01:Download 63.78%
20:48:46:WU02:FS01:Download 97.55%
20:48:46:WU02:FS01:Download complete
20:48:46:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:11764 run:0 clone:6785 gen:18 core:0x22 unit:0x0000002280fccb0a5e7112fd760a63f4
20:48:46:WU02:FS01:Starting
20:48:46:WU02:FS01:Running FahCore: C:\Folding\FAHClient/FAHCoreWrapper.exe "C:\Folding\New folder\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe" -dir 02 -suffix 01 -version 705 -lifeline 5776 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
20:48:46:WU02:FS01:Started FahCore on PID 9276
20:48:46:WU02:FS01:Core PID:3932
20:48:46:WU02:FS01:FahCore 0x22 started
20:48:47:WU02:FS01:0x22:*********************** Log Started 2020-04-01T20:48:46Z ***********************
20:48:47:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
20:48:47:WU02:FS01:0x22:       Type: 0x22
20:48:47:WU02:FS01:0x22:       Core: Core22
20:48:47:WU02:FS01:0x22:    Website: https://foldingathome.org/
20:48:47:WU02:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
20:48:47:WU02:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
20:48:47:WU02:FS01:0x22:             <rafal.wiewiora@choderalab.org>
20:48:47:WU02:FS01:0x22:       Args: -dir 02 -suffix 01 -version 705 -lifeline 9276 -checkpoint 15
20:48:47:WU02:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
20:48:47:WU02:FS01:0x22:             0 -gpu 0
20:48:47:WU02:FS01:0x22:     Config: <none>
20:48:47:WU02:FS01:0x22:************************************ Build *************************************
20:48:47:WU02:FS01:0x22:    Version: 0.0.2
20:48:47:WU02:FS01:0x22:       Date: Dec 6 2019
20:48:47:WU02:FS01:0x22:       Time: 21:30:31
20:48:47:WU02:FS01:0x22: Repository: Git
20:48:47:WU02:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
20:48:47:WU02:FS01:0x22:     Branch: HEAD
20:48:47:WU02:FS01:0x22:   Compiler: Visual C++ 2008
20:48:47:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
20:48:47:WU02:FS01:0x22:   Platform: win32 10
20:48:47:WU02:FS01:0x22:       Bits: 64
20:48:47:WU02:FS01:0x22:       Mode: Release
20:48:47:WU02:FS01:0x22:************************************ System ************************************
20:48:47:WU02:FS01:0x22:        CPU: AMD Phenom(tm) II X6 1075T Processor
20:48:47:WU02:FS01:0x22:     CPU ID: AuthenticAMD Family 16 Model 10 Stepping 0
20:48:47:WU02:FS01:0x22:       CPUs: 6
20:48:47:WU02:FS01:0x22:     Memory: 8.00GiB
20:48:47:WU02:FS01:0x22:Free Memory: 5.38GiB
20:48:47:WU02:FS01:0x22:    Threads: WINDOWS_THREADS
20:48:47:WU02:FS01:0x22: OS Version: 6.2
20:48:47:WU02:FS01:0x22:Has Battery: false
20:48:47:WU02:FS01:0x22: On Battery: false
20:48:47:WU02:FS01:0x22: UTC Offset: -4
20:48:47:WU02:FS01:0x22:        PID: 3932
20:48:47:WU02:FS01:0x22:        CWD: C:\Folding\New folder\work
20:48:47:WU02:FS01:0x22:         OS: Windows 10 Pro
20:48:47:WU02:FS01:0x22:    OS Arch: AMD64
20:48:47:WU02:FS01:0x22:********************************************************************************
20:48:47:WU02:FS01:0x22:Project: 11764 (Run 0, Clone 6785, Gen 18)
20:48:47:WU02:FS01:0x22:Unit: 0x0000002280fccb0a5e7112fd760a63f4
20:48:47:WU02:FS01:0x22:Reading tar file core.xml
20:48:47:WU02:FS01:0x22:Reading tar file integrator.xml
20:48:47:WU02:FS01:0x22:Reading tar file state.xml
20:48:47:WU02:FS01:0x22:Reading tar file system.xml
20:48:48:WU02:FS01:0x22:Digital signatures verified
20:48:48:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
20:48:48:WU02:FS01:0x22:Version 0.0.2
20:50:07:WU02:FS01:0x22:Completed 0 out of 1000000 steps (0%)
20:50:08:WU02:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
20:52:24:WU02:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
20:54:13:WU02:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
20:56:02:WU02:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
20:56:40:WU02:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
20:56:40:WU02:FS01:0x22:Following exception occured: Particle coordinate is nan
20:57:03:WU02:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
20:57:03:WU02:FS01:0x22:Following exception occured: Particle coordinate is nan
20:57:25:WU02:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
20:57:25:WU02:FS01:0x22:Following exception occured: Particle coordinate is nan
20:57:25:WU02:FS01:0x22:ERROR:114: Max Retries Reached
20:57:25:WU02:FS01:0x22:Saving result file ..\logfile_01.txt
20:57:25:WU02:FS01:0x22:Saving result file badstate-0.xml
20:57:25:WU02:FS01:0x22:Saving result file badstate-1.xml
20:57:25:WU02:FS01:0x22:Saving result file badstate-2.xml
20:57:25:WU02:FS01:0x22:Saving result file checkpt.crc
20:57:25:WU02:FS01:0x22:Saving result file science.log
20:57:25:WU02:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
20:57:26:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
20:57:26:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:11764 run:0 clone:6785 gen:18 core:0x22 unit:0x0000002280fccb0a5e7112fd760a63f4
20:57:26:WU02:FS01:Uploading 59.12MiB to 128.252.203.10
20:57:26:WU02:FS01:Connecting to 128.252.203.10:8080
20:57:27:WU01:FS01:Connecting to 65.254.110.245:8080
20:57:27:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
20:57:27:WU01:FS01:Connecting to 18.218.241.186:80
20:57:27:WU01:FS01:Assigned to work server 128.252.203.10
20:57:27:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GP104 [GeForce GTX 1070 Ti] 8186 from 128.252.203.10
20:57:27:WU01:FS01:Connecting to 128.252.203.10:8080
Last edited by Joe_H on Thu Apr 02, 2020 3:28 am, edited 1 time in total.
Reason: added Code tags to log
The REAL Specter
Posts: 5
Joined: Wed Apr 01, 2020 2:25 pm

Re: Faulty Work Units

Post by The REAL Specter »

FWIW NO My system is not over clocked, and I do not know if this affects it or not but I have the work slider set to medium as well. As I said previously, it was running fine, as you can tell by my score, and I changed nothing, then for the past two days this happens.
Frustrating.
Thank you for your help.
Aaron

Edit: let me also add, my GPU temp is barely hitting 60 Degrees, so it is not overheating either.
The REAL Specter
Posts: 5
Joined: Wed Apr 01, 2020 2:25 pm

Re: Faulty Work Units

Post by The REAL Specter »

and here is another fault

21:12:33:WU01:FS01:0x22:Completed 60000 out of 1000000 steps (6%)
21:14:21:WU01:FS01:0x22:Completed 70000 out of 1000000 steps (7%)
21:16:08:WU01:FS01:0x22:Completed 80000 out of 1000000 steps (8%)
21:17:55:WU01:FS01:0x22:Completed 90000 out of 1000000 steps (9%)
21:19:01:WU01:FS01:0x22:ERROR:exception: clWaitForEvents
21:19:01:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
21:19:01:WU01:FS01:0x22:Saving result file checkpointState.xml
21:19:02:WU01:FS01:0x22:Saving result file checkpt.crc
21:19:02:WU01:FS01:0x22:Saving result file positions.xtc
21:19:02:WU01:FS01:0x22:Saving result file science.log
21:19:02:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
21:19:03:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
21:19:03:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11781 run:0 clone:2453 gen:20 core:0x22 unit:0x000000230d5a98395e73c518cd794122
21:19:03:WU01:FS01:Uploading 42.88MiB to 13.90.152.57
21:19:03:WU01:FS01:Connecting to 13.90.152.57:8080
21:19:03:WU02:FS01:Connecting to 65.254.110.245:8080


21:42:56:WU02:FS01:0x22:Completed 110000 out of 1000000 steps (11%)
21:44:43:WU02:FS01:0x22:Completed 120000 out of 1000000 steps (12%)
Joe_H
Site Admin
Posts: 7868
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Faulty Work Units

Post by Joe_H »

I am seeing errors related to OpenCL not working right. As you are on Windows, could you check to see if Windows Update applied a video driver update recently? The drivers supplied through MS Windows Update leave out OpenCL support, and can leave you with either noOpenCL support or an incompatible version for the driver just updated.

Also could you post the first 200 lines of the log as mentioned above. That gives hardware, software and client configuration information that is useful for figuring out what is going on.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
toTOW
Site Moderator
Posts: 6309
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Faulty Work Units

Post by toTOW »

Project: 11781 (Run 0, Clone 8812, Gen 23)
Failed 5 times on 5 different donors. It's a bad WU and it is automatically pulled out of the system after 5 errors. There's no Gen 24 for it.

Project: 11764 (Run 0, Clone 6785, Gen 18)
This trajectory seems unstable, but it has been completed successfully by someone else. Gen 19 has been completed too ...

Project: 11781 (Run 0, Clone 2453, Gen 20)
Failed twice before being completed. Gen 21 not yet completed ...

There's one common point with all these WUs : they are all from the biggest project (182 699 atoms) of the COVID-19 studies ... is it pushing some card to their limits, hence the increased failure rate ?
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply