Core_22 Problem 14300 Stuck in Core Oudated

Moderators: Site Moderators, FAHC Science Team

Nuitari
Posts: 80
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Project 13400 3 bad WUs

Post by Nuitari »

Got 3 bad WU for Project 13400, spread across 2 rigs

project:13400 run:137 clone:46 gen:0
project:13400 run:38 clone:61 gen:0
project:13400 run:15 clone:20 gen:0

Relevant log snippets:
First rig

Code: Select all

02:07:25:WU00:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
02:07:25:WU00:FS01:0x22:Saving result file ../logfile_01.txt
02:07:25:WU00:FS01:0x22:Saving result file science.log
02:07:30:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
02:07:30:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
02:07:30:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13400 run:137 clone:46 gen:0 core:0x22 unit:0x0000000312bc7d9a5ea38da61f4d1e7f
02:07:30:WU00:FS01:Uploading 2.33KiB to 18.188.125.154
02:07:30:WU00:FS01:Connecting to 18.188.125.154:8080

02:08:08:WU03:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
02:08:08:WU03:FS01:0x22:Saving result file ../logfile_01.txt
02:08:08:WU03:FS01:0x22:Saving result file science.log
02:08:08:WU03:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
02:08:09:WARNING:WU03:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
02:08:09:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:13400 run:38 clone:61 gen:0 core:0x22 unit:0x0000000012bc7d9a5ea38da706fce4a5
2nd rig

Code: Select all

02:09:24:WU03:FS04:0x22:Completed 0 out of 2000000 steps (0%)
02:09:24:WU03:FS04:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
02:13:21:WU03:FS04:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
02:13:21:WU03:FS04:0x22:Following exception occured: Particle coordinate is nan
02:14:32:WU03:FS04:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
02:14:32:WU03:FS04:0x22:Following exception occured: Particle coordinate is nan
02:15:32:WU03:FS04:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
02:15:32:WU03:FS04:0x22:Following exception occured: Particle coordinate is nan
02:15:32:WU03:FS04:0x22:ERROR:114: Max Retries Reached
02:15:32:WU03:FS04:0x22:Saving result file ../logfile_01.txt
02:15:32:WU03:FS04:0x22:Saving result file badstate-0.xml
02:15:33:WU03:FS04:0x22:Saving result file badstate-1.xml
02:15:33:WU03:FS04:0x22:Saving result file badstate-2.xml
02:15:34:WU03:FS04:0x22:Saving result file checkpt.crc
02:15:34:WU03:FS04:0x22:Saving result file globals.csv
02:15:34:WU03:FS04:0x22:Saving result file science.log
02:15:34:WU03:FS04:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
02:15:35:WARNING:WU03:FS04:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
02:15:35:WU03:FS04:Sending unit results: id:03 state:SEND error:FAULTY project:13400 run:15 clone:20 gen:0 core:0x22 unit:0x0000000112bc7d9a5ea36f7519413d2d
02:15:36:WU03:FS04:Uploading 2.94MiB to 18.188.125.154

Rig1:

Code: Select all

*********************** Log Started 2020-04-25T01:17:40Z ***********************
01:17:40:****************************** FAHClient ******************************
01:17:40:        Version: 7.6.11
01:17:40:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
01:17:40:      Copyright: 2020 foldingathome.org
01:17:40:       Homepage: https://foldingathome.org/
01:17:40:           Date: Apr 23 2020
01:17:40:           Time: 19:19:59
01:17:40:       Revision: e87ca3fff7450346da217202d837a0c66491ef19
01:17:40:         Branch: master
01:17:40:       Compiler: GNU 8.3.0
01:17:40:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
01:17:40:                 -funroll-loops -fno-pie
01:17:40:       Platform: linux2 4.19.0-5-amd64
01:17:40:           Bits: 64
01:17:40:           Mode: Release
01:17:40:         Config: /root/fahclient_aurora/config.xml
01:17:40:******************************** CBang ********************************
01:17:40:           Date: Apr 17 2020
01:17:40:           Time: 18:10:13
01:17:40:       Revision: 2fb0be7809c5e45287a122ca5fbc15b5ae859a3b
01:17:40:         Branch: master
01:17:40:       Compiler: GNU 8.3.0
01:17:40:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
01:17:40:                 -funroll-loops -fno-pie -fPIC
01:17:40:       Platform: linux2 4.19.0-5-amd64
01:17:40:           Bits: 64
01:17:40:           Mode: Release
01:17:40:******************************* System ********************************
01:17:40:            CPU: AMD Phenom(tm) II X4 925 Processor
01:17:40:         CPU ID: AuthenticAMD Family 16 Model 4 Stepping 2
01:17:40:           CPUs: 4
01:17:40:         Memory: 23.50GiB
01:17:40:    Free Memory: 22.67GiB
01:17:40:        Threads: POSIX_THREADS
01:17:40:     OS Version: 4.15
01:17:40:    Has Battery: false
01:17:40:     On Battery: false
01:17:40:     UTC Offset: -4
01:17:40:            PID: 1486
01:17:40:            CWD: /root/fahclient_aurora
01:17:40:             OS: Linux 4.15.0-43-generic x86_64
01:17:40:        OS Arch: AMD64
01:17:40:           GPUs: 5
01:17:40:          GPU 0: Bus:1 Slot:0 Func:0 AMD:5 Ellesmere XT [Radeon RX
01:17:40:                 470/480/570/580/590]
01:17:40:          GPU 1: Bus:6 Slot:0 Func:0 AMD:5 Baffin [Polaris11]
01:17:40:          GPU 2: Bus:7 Slot:0 Func:0 AMD:5 Baffin XT [Radeon RX 460]
01:17:40:          GPU 3: Bus:8 Slot:0 Func:0 AMD:5 Ellesmere XT [Radeon RX
01:17:40:                 470/480/570/580/590]
01:17:40:          GPU 4: Bus:9 Slot:0 Func:0 AMD:5 Ellesmere XT [Radeon RX
01:17:40:                 470/480/570/580/590]
01:17:40:           CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
01:17:40:                 libcuda.so: cannot open shared object file: No such file or
01:17:40:                 directory
01:17:40:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:2527.3
01:17:40:OpenCL Device 1: Platform:0 Device:1 Bus:6 Slot:0 Compute:1.2 Driver:2527.3
01:17:40:OpenCL Device 2: Platform:0 Device:2 Bus:7 Slot:0 Compute:1.2 Driver:2527.3
01:17:40:OpenCL Device 3: Platform:0 Device:3 Bus:8 Slot:0 Compute:1.2 Driver:2527.3
01:17:40:OpenCL Device 4: Platform:0 Device:4 Bus:9 Slot:0 Compute:1.2 Driver:2527.3
01:17:40:******************************* libFAH ********************************
01:17:40:           Date: Apr 15 2020
01:17:40:           Time: 21:43:24
01:17:40:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
01:17:40:         Branch: master
01:17:40:       Compiler: GNU 8.3.0
01:17:40:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
01:17:40:                 -funroll-loops -fno-pie
01:17:40:       Platform: linux2 4.19.0-5-amd64
01:17:40:           Bits: 64
01:17:40:           Mode: Release
01:17:40:***********************************************************************
---SNIP---
02:07:02:WU00:FS01:Started FahCore on PID 3996
02:07:02:WU00:FS01:Core PID:4000
02:07:02:WU00:FS01:FahCore 0x22 started
02:07:03:WU00:FS01:0x22:*********************** Log Started 2020-04-25T02:07:02Z ***********************
02:07:03:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
02:07:03:WU00:FS01:0x22:       Type: 0x22
02:07:03:WU00:FS01:0x22:       Core: Core22
02:07:03:WU00:FS01:0x22:    Website: https://foldingathome.org/
02:07:03:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
02:07:03:WU00:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
02:07:03:WU00:FS01:0x22:             <rafal.wiewiora@choderalab.org>
02:07:03:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 3996 -checkpoint 15
02:07:03:WU00:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
02:07:03:WU00:FS01:0x22:     Config: <none>
02:07:03:WU00:FS01:0x22:************************************ Build *************************************
02:07:03:WU00:FS01:0x22:    Version: 0.0.5
02:07:03:WU00:FS01:0x22:       Date: Apr 22 2020
02:07:03:WU00:FS01:0x22:       Time: 03:57:11
02:07:03:WU00:FS01:0x22: Repository: Git
02:07:03:WU00:FS01:0x22:   Revision: 2d69202c898bd9bb3e093f51cd32bf411c2a0388
02:07:03:WU00:FS01:0x22:     Branch: HEAD
02:07:03:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
02:07:03:WU00:FS01:0x22:    Options: -std=c++11 -O3 -funroll-loops
02:07:03:WU00:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
02:07:03:WU00:FS01:0x22:       Bits: 64
02:07:03:WU00:FS01:0x22:       Mode: Release
02:07:03:WU00:FS01:0x22:************************************ System ************************************
02:07:03:WU00:FS01:0x22:        CPU: AMD Phenom(tm) II X4 925 Processor
02:07:03:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 16 Model 4 Stepping 2
02:07:03:WU00:FS01:0x22:       CPUs: 4
02:07:03:WU00:FS01:0x22:     Memory: 23.50GiB
02:07:03:WU00:FS01:0x22:Free Memory: 18.64GiB
02:07:03:WU00:FS01:0x22:    Threads: POSIX_THREADS
02:07:03:WU00:FS01:0x22: OS Version: 4.15
02:07:03:WU00:FS01:0x22:Has Battery: false
02:07:03:WU00:FS01:0x22: On Battery: false
02:07:03:WU00:FS01:0x22: UTC Offset: -4
02:07:03:WU00:FS01:0x22:        PID: 4000
02:07:03:WU00:FS01:0x22:        CWD: /root/fahclient_aurora/work
02:07:03:WU00:FS01:0x22:         OS: Linux 4.15.0-43-generic x86_64
02:07:03:WU00:FS01:0x22:    OS Arch: AMD64
02:07:03:WU00:FS01:0x22:********************************************************************************
02:07:03:WU00:FS01:0x22:Project: 13400 (Run 137, Clone 46, Gen 0)
02:07:03:WU00:FS01:0x22:Unit: 0x0000000312bc7d9a5ea38da61f4d1e7f
02:07:03:WU00:FS01:0x22:Reading tar file core.xml
02:07:03:WU00:FS01:0x22:Reading tar file integrator.xml
02:07:03:WU00:FS01:0x22:Reading tar file state.xml
02:07:03:WU00:FS01:0x22:Reading tar file system.xml
02:07:03:WU00:FS01:0x22:Digital signatures verified
02:07:03:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
02:07:03:WU00:FS01:0x22:Version 0.0.5
02:07:25:WU00:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
02:07:25:WU00:FS01:0x22:Saving result file ../logfile_01.txt
02:07:25:WU00:FS01:0x22:Saving result file science.log
02:07:30:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
02:07:30:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
02:07:30:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13400 run:137 clone:46 gen:0 core:0x22 unit:0x0000000312bc7d9a5ea38da61f4d1e7f
02:07:30:WU00:FS01:Uploading 2.33KiB to 18.188.125.154
02:07:30:WU00:FS01:Connecting to 18.188.125.154:8080
02:07:30:WU03:FS01:Connecting to 65.254.110.245:80
02:07:30:WU00:FS01:Upload complete
02:07:33:WU00:FS01:Server responded WORK_ACK (400)
02:07:33:WU00:FS01:Cleaning up
02:07:34:WU03:FS01:Assigned to work server 18.188.125.154
02:07:34:WU03:FS01:Requesting new work unit for slot 01: READY gpu:0:Ellesmere XT [Radeon RX 470/480/570/580/590] from 18.188.125.154
02:07:34:WU03:FS01:Connecting to 18.188.125.154:8080
02:07:37:WU03:FS01:Downloading 9.84MiB
02:07:39:WU03:FS01:Download complete
02:07:39:WU03:FS01:Received Unit: id:03 state:DOWNLOAD error:NO_ERROR project:13400 run:38 clone:61 gen:0 core:0x22 unit:0x0000000012bc7d9a5ea38da706fce4a5
02:07:40:WU03:FS01:Starting
02:07:40:WU03:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /root/fahclient_aurora/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 03 -suffix 01 -version 706 -lifeline 1486 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
02:07:40:WU03:FS01:Started FahCore on PID 4061
02:07:40:WU03:FS01:Core PID:4065
02:07:40:WU03:FS01:FahCore 0x22 started
02:07:40:WU03:FS01:0x22:*********************** Log Started 2020-04-25T02:07:40Z ***********************
02:07:40:WU03:FS01:0x22:*************************** Core22 Folding@home Core ***************************
02:07:40:WU03:FS01:0x22:       Type: 0x22
02:07:40:WU03:FS01:0x22:       Core: Core22
02:07:40:WU03:FS01:0x22:    Website: https://foldingathome.org/
02:07:40:WU03:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
02:07:40:WU03:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
02:07:40:WU03:FS01:0x22:             <rafal.wiewiora@choderalab.org>
02:07:40:WU03:FS01:0x22:       Args: -dir 03 -suffix 01 -version 706 -lifeline 4061 -checkpoint 15
02:07:40:WU03:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
02:07:40:WU03:FS01:0x22:     Config: <none>
02:07:40:WU03:FS01:0x22:************************************ Build *************************************
02:07:40:WU03:FS01:0x22:    Version: 0.0.5
02:07:40:WU03:FS01:0x22:       Date: Apr 22 2020
02:07:40:WU03:FS01:0x22:       Time: 03:57:11
02:07:40:WU03:FS01:0x22: Repository: Git
02:07:40:WU03:FS01:0x22:   Revision: 2d69202c898bd9bb3e093f51cd32bf411c2a0388
02:07:40:WU03:FS01:0x22:     Branch: HEAD
02:07:40:WU03:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
02:07:40:WU03:FS01:0x22:    Options: -std=c++11 -O3 -funroll-loops
02:07:40:WU03:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
02:07:40:WU03:FS01:0x22:       Bits: 64
02:07:40:WU03:FS01:0x22:       Mode: Release
02:07:40:WU03:FS01:0x22:************************************ System ************************************
02:07:40:WU03:FS01:0x22:        CPU: AMD Phenom(tm) II X4 925 Processor
02:07:40:WU03:FS01:0x22:     CPU ID: AuthenticAMD Family 16 Model 4 Stepping 2
02:07:40:WU03:FS01:0x22:       CPUs: 4
02:07:40:WU03:FS01:0x22:     Memory: 23.50GiB
02:07:40:WU03:FS01:0x22:Free Memory: 20.70GiB
02:07:40:WU03:FS01:0x22:    Threads: POSIX_THREADS
02:07:40:WU03:FS01:0x22: OS Version: 4.15
02:07:40:WU03:FS01:0x22:Has Battery: false
02:07:40:WU03:FS01:0x22: On Battery: false
02:07:40:WU03:FS01:0x22: UTC Offset: -4
02:07:40:WU03:FS01:0x22:        PID: 4065
02:07:40:WU03:FS01:0x22:        CWD: /root/fahclient_aurora/work
02:07:40:WU03:FS01:0x22:         OS: Linux 4.15.0-43-generic x86_64
02:07:40:WU03:FS01:0x22:    OS Arch: AMD64
02:07:40:WU03:FS01:0x22:********************************************************************************
02:07:40:WU03:FS01:0x22:Project: 13400 (Run 38, Clone 61, Gen 0)
02:07:40:WU03:FS01:0x22:Unit: 0x0000000012bc7d9a5ea38da706fce4a5
02:07:40:WU03:FS01:0x22:Reading tar file core.xml
02:07:40:WU03:FS01:0x22:Reading tar file integrator.xml
02:07:40:WU03:FS01:0x22:Reading tar file state.xml
02:07:40:WU03:FS01:0x22:Reading tar file system.xml
02:07:41:WU03:FS01:0x22:Digital signatures verified
02:07:41:WU03:FS01:0x22:Folding@home GPU Core22 Folding@home Core
02:07:41:WU03:FS01:0x22:Version 0.0.5
02:08:08:WU03:FS01:0x22:ERROR:Discrepancy: Forces are blowing up! 0 0
02:08:08:WU03:FS01:0x22:Saving result file ../logfile_01.txt
02:08:08:WU03:FS01:0x22:Saving result file science.log
02:08:08:WU03:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
02:08:09:WARNING:WU03:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
02:08:09:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:13400 run:38 clone:61 gen:0 core:0x22 unit:0x0000000012bc7d9a5ea38da706fce4a5
02:08:09:WU03:FS01:Uploading 2.33KiB to 18.188.125.154
02:08:09:WU03:FS01:Connecting to 18.188.125.154:8080
02:08:10:WU03:FS01:Upload complete
02:08:10:WU00:FS01:Connecting to 65.254.110.245:80
02:08:10:WU03:FS01:Server responded WORK_ACK (400)
Rig 2:

Code: Select all

*********************** Log Started 2020-04-24T17:57:43Z ***********************
17:57:43:****************************** FAHClient ******************************
17:57:43:        Version: 7.6.11
17:57:43:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:57:43:      Copyright: 2020 foldingathome.org
17:57:43:       Homepage: https://foldingathome.org/
17:57:43:           Date: Apr 23 2020
17:57:43:           Time: 19:19:59
17:57:43:       Revision: e87ca3fff7450346da217202d837a0c66491ef19
17:57:43:         Branch: master
17:57:43:       Compiler: GNU 8.3.0
17:57:43:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
17:57:43:                 -funroll-loops -fno-pie
17:57:43:       Platform: linux2 4.19.0-5-amd64
17:57:43:           Bits: 64
17:57:43:           Mode: Release
17:57:43:         Config: /root/fahclient_saruman/config.xml
17:57:43:******************************** CBang ********************************
17:57:43:           Date: Apr 17 2020
17:57:43:           Time: 18:10:13
17:57:43:       Revision: 2fb0be7809c5e45287a122ca5fbc15b5ae859a3b
17:57:43:         Branch: master
17:57:43:       Compiler: GNU 8.3.0
17:57:43:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
17:57:43:                 -funroll-loops -fno-pie -fPIC
17:57:43:       Platform: linux2 4.19.0-5-amd64
17:57:43:           Bits: 64
17:57:43:           Mode: Release
17:57:43:******************************* System ********************************
17:57:43:            CPU: AMD A8-9600 RADEON R7, 10 COMPUTE CORES 4C+6G
17:57:43:         CPU ID: AuthenticAMD Family 21 Model 101 Stepping 1
17:57:43:           CPUs: 4
17:57:43:         Memory: 6.74GiB
17:57:43:    Free Memory: 5.24GiB
17:57:43:        Threads: POSIX_THREADS
17:57:43:     OS Version: 4.13
17:57:43:    Has Battery: false
17:57:43:     On Battery: false
17:57:43:     UTC Offset: -4
17:57:43:            PID: 13565
17:57:43:            CWD: /root/fahclient_saruman
17:57:43:             OS: Linux 4.13.0-26-generic x86_64
17:57:43:        OS Arch: AMD64
17:57:43:           GPUs: 7
17:57:43:          GPU 0: Bus:0 Slot:1 Func:0 AMD:5 Carrizo [Radeon R7/R6/R5 Series]
17:57:43:          GPU 1: Bus:7 Slot:0 Func:0 AMD:5 Ellesmere XT [Radeon RX
17:57:43:                 470/480/570/580/590]
17:57:43:          GPU 2: Bus:8 Slot:0 Func:0 AMD:5 Ellesmere XT [Radeon RX
17:57:43:                 470/480/570/580/590]
17:57:43:          GPU 3: Bus:9 Slot:0 Func:0 AMD:5 Ellesmere XT [Radeon RX
17:57:43:                 470/480/570/580/590]
17:57:43:          GPU 4: Bus:10 Slot:0 Func:0 AMD:5 Ellesmere XT [Radeon RX
17:57:43:                 470/480/570/580/590]
17:57:43:          GPU 5: Bus:13 Slot:0 Func:0 AMD:5 Ellesmere XT [Radeon RX
17:57:43:                 470/480/570/580/590]
17:57:43:          GPU 6: Bus:14 Slot:0 Func:0 AMD:5 Baffin XT [Radeon RX 460]
17:57:43:           CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
17:57:43:                 libcuda.so: cannot open shared object file: No such file or
17:57:43:                 directory
17:57:43:OpenCL Device 0: Platform:0 Device:0 Bus:0 Slot:1 Compute:1.2 Driver:2527.3
17:57:43:OpenCL Device 1: Platform:0 Device:1 Bus:7 Slot:0 Compute:1.2 Driver:2527.3
17:57:43:OpenCL Device 2: Platform:0 Device:2 Bus:8 Slot:0 Compute:1.2 Driver:2527.3
17:57:43:OpenCL Device 3: Platform:0 Device:3 Bus:9 Slot:0 Compute:1.2 Driver:2527.3
17:57:43:OpenCL Device 4: Platform:0 Device:4 Bus:10 Slot:0 Compute:1.2 Driver:2527.3
17:57:43:OpenCL Device 5: Platform:0 Device:5 Bus:13 Slot:0 Compute:1.2 Driver:2527.3
17:57:43:OpenCL Device 6: Platform:0 Device:6 Bus:14 Slot:0 Compute:1.2 Driver:2527.3
17:57:43:******************************* libFAH ********************************
17:57:43:           Date: Apr 15 2020
17:57:43:           Time: 21:43:24
17:57:43:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
17:57:43:         Branch: master
17:57:43:       Compiler: GNU 8.3.0
17:57:43:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
17:57:43:                 -funroll-loops -fno-pie
17:57:43:       Platform: linux2 4.19.0-5-amd64
17:57:43:           Bits: 64
17:57:43:           Mode: Release
17:57:43:***********************************************************************
---SNIP---
02:09:01:WU03:FS04:Unpacked 9.32MiB to cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22
02:09:01:WU03:FS04:Starting
02:09:01:WU03:FS04:Running FahCore: /usr/bin/FAHCoreWrapper /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 03 -suffix 01 -version 706 -lifeline 13565 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 2 -gpu 2
02:09:01:WU03:FS04:Started FahCore on PID 19067
02:09:01:WU03:FS04:Core PID:19071
02:09:01:WU03:FS04:FahCore 0x22 started
02:09:01:WU03:FS04:0x22:*********************** Log Started 2020-04-25T02:09:01Z ***********************
02:09:01:WU03:FS04:0x22:*************************** Core22 Folding@home Core ***************************
02:09:01:WU03:FS04:0x22:       Type: 0x22
02:09:01:WU03:FS04:0x22:       Core: Core22
02:09:01:WU03:FS04:0x22:    Website: https://foldingathome.org/
02:09:01:WU03:FS04:0x22:  Copyright: (c) 2009-2018 foldingathome.org
02:09:01:WU03:FS04:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
02:09:01:WU03:FS04:0x22:             <rafal.wiewiora@choderalab.org>
02:09:01:WU03:FS04:0x22:       Args: -dir 03 -suffix 01 -version 706 -lifeline 19067 -checkpoint 15
02:09:01:WU03:FS04:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 2 -gpu 2
02:09:01:WU03:FS04:0x22:     Config: <none>
02:09:01:WU03:FS04:0x22:************************************ Build *************************************
02:09:01:WU03:FS04:0x22:    Version: 0.0.5
02:09:01:WU03:FS04:0x22:       Date: Apr 22 2020
02:09:01:WU03:FS04:0x22:       Time: 03:57:11
02:09:01:WU03:FS04:0x22: Repository: Git
02:09:01:WU03:FS04:0x22:   Revision: 2d69202c898bd9bb3e093f51cd32bf411c2a0388
02:09:01:WU03:FS04:0x22:     Branch: HEAD
02:09:01:WU03:FS04:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
02:09:01:WU03:FS04:0x22:    Options: -std=c++11 -O3 -funroll-loops
02:09:01:WU03:FS04:0x22:   Platform: linux2 4.19.76-linuxkit
02:09:01:WU03:FS04:0x22:       Bits: 64
02:09:01:WU03:FS04:0x22:       Mode: Release
02:09:01:WU03:FS04:0x22:************************************ System ************************************
02:09:01:WU03:FS04:0x22:        CPU: AMD A8-9600 RADEON R7, 10 COMPUTE CORES 4C+6G
02:09:01:WU03:FS04:0x22:     CPU ID: AuthenticAMD Family 21 Model 101 Stepping 1
02:09:01:WU03:FS04:0x22:       CPUs: 4
02:09:01:WU03:FS04:0x22:     Memory: 6.74GiB
02:09:01:WU03:FS04:0x22:Free Memory: 5.27GiB
02:09:01:WU03:FS04:0x22:    Threads: POSIX_THREADS
02:09:01:WU03:FS04:0x22: OS Version: 4.13
02:09:01:WU03:FS04:0x22:Has Battery: false
02:09:01:WU03:FS04:0x22: On Battery: false
02:09:01:WU03:FS04:0x22: UTC Offset: -4
02:09:01:WU03:FS04:0x22:        PID: 19071
02:09:01:WU03:FS04:0x22:        CWD: /root/fahclient_saruman/work
02:09:01:WU03:FS04:0x22:         OS: Linux 4.13.0-26-generic x86_64
02:09:01:WU03:FS04:0x22:    OS Arch: AMD64
02:09:01:WU03:FS04:0x22:********************************************************************************
02:09:01:WU03:FS04:0x22:Project: 13400 (Run 15, Clone 20, Gen 0)
02:09:01:WU03:FS04:0x22:Unit: 0x0000000112bc7d9a5ea36f7519413d2d
02:09:01:WU03:FS04:0x22:Reading tar file core.xml
02:09:01:WU03:FS04:0x22:Reading tar file integrator.xml
02:09:01:WU03:FS04:0x22:Reading tar file state.xml
02:09:01:WU03:FS04:0x22:Reading tar file system.xml
02:09:02:WU03:FS04:0x22:Digital signatures verified
02:09:02:WU03:FS04:0x22:Folding@home GPU Core22 Folding@home Core
02:09:02:WU03:FS04:0x22:Version 0.0.5
02:09:24:WU03:FS04:0x22:Completed 0 out of 2000000 steps (0%)
02:09:24:WU03:FS04:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
02:13:21:WU03:FS04:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
02:13:21:WU03:FS04:0x22:Following exception occured: Particle coordinate is nan
02:14:32:WU03:FS04:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
02:14:32:WU03:FS04:0x22:Following exception occured: Particle coordinate is nan
02:15:32:WU03:FS04:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
02:15:32:WU03:FS04:0x22:Following exception occured: Particle coordinate is nan
02:15:32:WU03:FS04:0x22:ERROR:114: Max Retries Reached
02:15:32:WU03:FS04:0x22:Saving result file ../logfile_01.txt
02:15:32:WU03:FS04:0x22:Saving result file badstate-0.xml
02:15:33:WU03:FS04:0x22:Saving result file badstate-1.xml
02:15:33:WU03:FS04:0x22:Saving result file badstate-2.xml
02:15:34:WU03:FS04:0x22:Saving result file checkpt.crc
02:15:34:WU03:FS04:0x22:Saving result file globals.csv
02:15:34:WU03:FS04:0x22:Saving result file science.log
02:15:34:WU03:FS04:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
02:15:35:WARNING:WU03:FS04:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
02:15:35:WU03:FS04:Sending unit results: id:03 state:SEND error:FAULTY project:13400 run:15 clone:20 gen:0 core:0x22 unit:0x0000000112bc7d9a5ea36f7519413d2d
No overclock anywhere.
Yes these were cards previously used in mining. The one in Rig 1 was just converted to FAH, and project 13400 was its first WUs, so there could be an issue there. However, one of them has already been confirmed as faulty by other users.

However the one in Rig 2 has been running FAH for a while and has returned a few WUs succesfully. Its also been reported as Faulty by another user.

In rig1 the card is a RX570 by Asus, in rig2 the card is a RX570 by MSI ( I can try to get a more precise SKU if needed )
Image
Nuitari
Posts: 80
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: Core_22 Problem 14300 Stuck in Core Oudated

Post by Nuitari »

I paused my other slots to get the core update and I'm now seeing the messages:

Code: Select all

02:16:06:WARNING:FS03:Size of positions 1980 does not match topology 1958
02:18:58:WU02:FS02:0x22:Completed 365000 out of 500000 steps (73%)
02:21:53:WU05:FS03:0x22:Completed 4400000 out of 8000000 steps (55%)
02:22:00:WARNING:FS03:Size of positions 1980 does not match topology 1958
02:22:50:WARNING:FS01:Size of positions 1980 does not match topology 1958
02:25:06:WU01:FS04:0x22:Completed 80000 out of 8000000 steps (1%)
02:25:28:WU00:FS00:0xa7:Completed 127500 out of 250000 steps (51%)
02:26:00:WU02:FS02:0x22:Completed 370000 out of 500000 steps (74%)
02:27:46:WU04:FS01:0x22:Completed 1920000 out of 8000000 steps (24%)
02:27:58:WARNING:FS03:Size of positions 1980 does not match topology 1958
These are folding slots that always were very stable and always were dedicated to FAH.

PCRG for the slots:
14549 (0, 111, 75)
14549 (0, 202, 70)
Image
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Core_22 Problem 14300 Stuck in Core Oudated

Post by PantherX »

Welcome to the F@H Forum Faxon,

Unfortunately, the Project is limited to Linux only. There was a brief moment were it was assigned to Windows but was switched back to Linux. Thus, pause the client and delete the folder 01 here:
C:\Users\Folding Rig 2\AppData\Roaming\FAHClient\work

This will get you a WU that can fold on Windows and doesn't require the updated version of FahCore_22.

EDIT: Project is limited to OS, not FahCore_22 0.0.5
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
JohnChodera
Pande Group Member
Posts: 470
Joined: Fri Feb 22, 2013 9:59 pm

Re: Core_22 Problem 14300 Stuck in Core Oudated

Post by JohnChodera »

We're aware of the "Size of positions 1980 does not match topology 1958" warning and will attempt to fix it in 0.0.6! It just affects the viewer.

core22 has been rolled out for windows and linux, but 13400/13401 are linux-only because the CustomIntegrator workload we use for those projects seems to have performance issues on win.

Apologies for the oddities here, and thanks for sticking with it while we iron out the issues!
JohnChodera
Pande Group Member
Posts: 470
Joined: Fri Feb 22, 2013 9:59 pm

Re: Project 13400 3 bad WUs

Post by JohnChodera »

Thanks for the thorough report, especially the (run,clone,frame) tuples! We're trying to figure out what is going on here, and whether it was those RUNs that were problematic or if we have some sort of interesting new bug this project has only just uncovered.
We're seeing ~7% failure rate on 13400, which may be due to our preparation procedure for this new project, or may be something else.
We'll test locally.
Nuitari
Posts: 80
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: Core_22 Problem 14300 Stuck in Core Oudated

Post by Nuitari »

No worries about the oddities, I've been working in IT for way too long.

Got a new warning that popped up
04:07:35:WARNING:FS02:Size of positions 50726 does not match topology 50500
Image
Nuitari
Posts: 80
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: Project 13400 3 bad WUs

Post by Nuitari »

Do you need more information ?
Is there a better place to track ?

My Nvidia GTX 1660 Super started working on 13400 (109, 65, 0) and so far hasn't crashed out. The status for it shows 3 people having reported it as faulty previously.
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 13400 3 bad WUs

Post by bruce »

This problem has been resolved. The WUs were not, in fact, bad. They were moved from Beta testing to ADVANCED testing a few minutes before the required updated FAHCore was installed on the server. It should have happened in the opposite order, of course. If you still have not resolved this issue, simply pause the WU and then allow it to continue. The updated FAHCore will download and everything will be fine.
Nuitari
Posts: 80
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: Project 13400 3 bad WUs

Post by Nuitari »

bruce wrote:This problem has been resolved. The WUs were not, in fact, bad. They were moved from Beta testing to ADVANCED testing a few minutes before the required updated FAHCore was installed on the server. It should have happened in the opposite order, of course. If you still have not resolved this issue, simply pause the WU and then allow it to continue. The updated FAHCore will download and everything will be fine.
No, these 3 WUs were folded with the 0.0.5 core and failed with the 0.0.5 core.
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Core_22 Problem 14300 Stuck in Core Oudated

Post by bruce »

Really? What were the Run/Clone/Gen numbers?
Nuitari
Posts: 80
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: Core_22 Problem 14300 Stuck in Core Oudated

Post by Nuitari »

bruce wrote:Really? What were the Run/Clone/Gen numbers?
project:13400 run:137 clone:46 gen:0
project:13400 run:38 clone:61 gen:0
project:13400 run:15 clone:20 gen:0
Image
JohnChodera
Pande Group Member
Posts: 470
Joined: Fri Feb 22, 2013 9:59 pm

Re: Core_22 Problem 14300 Stuck in Core Oudated

Post by JohnChodera »

I'm looking into this! We aren't quite sure if it's the initial conditions, the new nonequilibrium integrator (which uses capabilities of OpenMM we haven't used on FAH before), or perhaps something in the core.
Failure rates are low enough to get useful data for prioritizing compound synthesis, so I hope folks can bear with us for a few more days given the circumstances. We'll smooth out the sharp edges soon.
Thanks!
Nuitari
Posts: 80
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: Core_22 Problem 14300 Stuck in Core Oudated

Post by Nuitari »

JohnChodera wrote:I'm looking into this! We aren't quite sure if it's the initial conditions, the new nonequilibrium integrator (which uses capabilities of OpenMM we haven't used on FAH before), or perhaps something in the core.
Failure rates are low enough to get useful data for prioritizing compound synthesis, so I hope folks can bear with us for a few more days given the circumstances. We'll smooth out the sharp edges soon.
Thanks!
Not sure if there is any debugging interest, but project:13400 run:109 clone:65 gen:0 ran successfully on a NVIDIA 1660 Super.
Image
Post Reply