Watchdog shutdowns on 16926 WUs

Moderators: Site Moderators, FAHC Science Team

Post Reply
ShootThePicture
Posts: 10
Joined: Thu Sep 24, 2020 4:39 am

Watchdog shutdowns on 16926 WUs

Post by ShootThePicture »

Since last night (11/29/20), I've been getting WUs that are part of project 16926 that wind up triggering a "Watchdog soft shutdown", "hard shutdown", "WU_STALLED", and after several more attempts, it just returns it incomplete. I noticed other people are having some issues with this project, but I didn't see this specific problem anywhere. All my computers are Macs. The computer that the logs are from is running 10.14.6.

Code: Select all

*********************** Log Started 2020-11-25T22:01:35Z ***********************
22:01:35:Trying to access database...
22:01:35:Successfully acquired database lock
22:01:35:Read GPUs.txt
22:01:35:Enabled folding slot 00: READY cpu:7
22:01:35:****************************** FAHClient ******************************
22:01:35:    Version: 7.6.13
22:01:35:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
22:01:35:  Copyright: 2020 foldingathome.org
22:01:35:   Homepage: https://foldingathome.org/
22:01:35:       Date: Apr 27 2020
22:01:35:       Time: 21:20:45
22:01:35:   Revision: 5a652817f46116b6e135503af97f18e094414e3b
22:01:35:     Branch: master
22:01:35:   Compiler: GNU 4.2.1 Compatible Apple LLVM 11.0.0 (clang-1100.0.33.8)
22:01:35:    Options: -std=c++11 -O3 -funroll-loops -mmacosx-version-min=10.7
22:01:35:             -Wno-unused-local-typedefs -stdlib=libc++
22:01:35:   Platform: darwin 19.2.0
22:01:35:       Bits: 64
22:01:35:       Mode: Release
22:01:35:     Config: /Library/Application Support/FAHClient/config.xml
22:01:35:******************************** CBang ********************************
22:01:35:       Date: Apr 24 2020
22:01:35:       Time: 17:07:50
22:01:35:   Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
22:01:35:     Branch: master
22:01:35:   Compiler: GNU 4.2.1 Compatible Apple LLVM 11.0.0 (clang-1100.0.33.8)
22:01:35:    Options: -std=c++11 -O3 -funroll-loops -mmacosx-version-min=10.7
22:01:35:             -Wno-unused-local-typedefs -stdlib=libc++ -fPIC
22:01:35:   Platform: darwin 19.2.0
22:01:35:       Bits: 64
22:01:35:       Mode: Release
22:01:35:******************************* System ********************************
22:01:35:        CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
22:01:35:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
22:01:35:       CPUs: 8
22:01:35:     Memory: 32.00GiB
22:01:35:Free Memory: 9.36GiB
22:01:35:    Threads: POSIX_THREADS
22:01:35: OS Version: 10.14
22:01:35:Has Battery: false
22:01:35: On Battery: false
22:01:35: UTC Offset: -8
22:01:35:        PID: 5624
22:01:35:        CWD: /Library/Application Support/FAHClient
22:01:35:         OS: Darwin 18.7.0 x86_64
22:01:35:    OS Arch: AMD64
22:01:35:       GPUs: 1
22:01:35:      GPU 0: Bus:1 Slot:0 Func:0 AMD:5 Amethyst XT [Radeon R9 M295X]
22:01:35:       CUDA: Not detected: Failed to open dynamic library 'libcuda.dylib':
22:01:35:             dlopen(libcuda.dylib, 1): image not found
22:01:35:     OpenCL: Not detected: Failed to open dynamic library 'libOpenCL.dylib':
22:01:35:             dlopen(libOpenCL.dylib, 1): image not found
22:01:35:******************************* libFAH ********************************
22:01:35:       Date: Apr 15 2020
22:01:35:       Time: 14:43:28
22:01:35:   Revision: 216968bc7025029c841ed6e36e81a03a316890d3
22:01:35:     Branch: master
22:01:35:   Compiler: GNU 4.2.1 Compatible Apple LLVM 11.0.0 (clang-1100.0.33.8)
22:01:35:    Options: -std=c++11 -O3 -funroll-loops -mmacosx-version-min=10.7
22:01:35:             -Wno-unused-local-typedefs -stdlib=libc++
22:01:35:   Platform: darwin 19.2.0
22:01:35:       Bits: 64
22:01:35:       Mode: Release
22:01:35:***********************************************************************
22:01:35:<config>
22:01:35:  <!-- Network -->
22:01:35:  <proxy v=':8080'/>
22:01:35:
22:01:35:  <!-- User Information -->
22:01:35:  <user v='JumpRaven'/>
22:01:35:
22:01:35:  <!-- Folding Slots -->
22:01:35:  <slot id='0' type='CPU'/>
22:01:35:</config>

Code: Select all

17:15:04:WU01:FS00:Connecting to assign1.foldingathome.org:80
17:15:04:WU01:FS00:Assigned to work server 129.32.209.204
17:15:04:WU01:FS00:Requesting new work unit for slot 00: RUNNING cpu:7 from 129.32.209.204
17:15:04:WU01:FS00:Connecting to 129.32.209.204:8080
17:15:05:WU01:FS00:Downloading 49.00KiB
17:15:05:WU01:FS00:Download complete
17:15:05:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:16926 run:97 clone:534 gen:1 core:0xa8 unit:0x000000048120d1cc5fbd37bf634e817c
17:17:52:WU00:FS00:0xa8:Completed 500000 out of 500000 steps (100%)
17:17:54:WU00:FS00:0xa8:Saving result file ../logfile_01.txt
17:17:54:WU00:FS00:0xa8:Saving result file frame61.gro
17:17:54:WU00:FS00:0xa8:Saving result file frame61.xtc
17:17:54:WU00:FS00:0xa8:Saving result file md.log
17:17:54:WU00:FS00:0xa8:Saving result file science.log
17:17:54:WU00:FS00:0xa8:Saving result file state.cpt
17:17:54:WU00:FS00:0xa8:Folding@home Core Shutdown: FINISHED_UNIT
17:17:54:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
17:17:54:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:16812 run:4 clone:134 gen:61 core:0xa8 unit:0x00000046b2aec48a5f74f1b146e98ca6
17:17:54:WU00:FS00:Uploading 8.34MiB to 178.174.196.138
17:17:54:WU01:FS00:Starting
17:17:54:WU00:FS00:Connecting to 178.174.196.138:8080
17:17:54:WU01:FS00:Running FahCore: /usr/local/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/cores.foldingathome.org/osx/64bit-avx2-256/a8-0.0.9/Core_a8.fah/FahCore_a8" -dir 01 -suffix 01 -version 706 -lifeline 5624 -checkpoint 15 -np 7
17:17:54:WU01:FS00:Started FahCore on PID 22437
17:17:54:WU01:FS00:Core PID:22438
17:17:54:WU01:FS00:FahCore 0xa8 started
17:17:55:WU01:FS00:0xa8:*********************** Log Started 2020-11-30T17:17:54Z ***********************
17:17:55:WU01:FS00:0xa8:************************** Gromacs Folding@home Core ***************************
17:17:55:WU01:FS00:0xa8:       Core: Gromacs
17:17:55:WU01:FS00:0xa8:       Type: 0xa8
17:17:55:WU01:FS00:0xa8:    Version: 0.0.9
17:17:55:WU01:FS00:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:17:55:WU01:FS00:0xa8:  Copyright: 2020 foldingathome.org
17:17:55:WU01:FS00:0xa8:   Homepage: https://foldingathome.org/
17:17:55:WU01:FS00:0xa8:       Date: Oct 29 2020
17:17:55:WU01:FS00:0xa8:       Time: 13:33:44
17:17:55:WU01:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
17:17:55:WU01:FS00:0xa8:     Branch: master
17:17:55:WU01:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
17:17:55:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
17:17:55:WU01:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
17:17:55:WU01:FS00:0xa8:   Platform: darwin 19.6.0
17:17:55:WU01:FS00:0xa8:       Bits: 64
17:17:55:WU01:FS00:0xa8:       Mode: Release
17:17:55:WU01:FS00:0xa8:       SIMD: avx2_256
17:17:55:WU01:FS00:0xa8:     OpenMP: ON
17:17:55:WU01:FS00:0xa8:       CUDA: OFF
17:17:55:WU01:FS00:0xa8:       Args: -dir 01 -suffix 01 -version 706 -lifeline 22437 -checkpoint 15 -np
17:17:55:WU01:FS00:0xa8:             7
17:17:55:WU01:FS00:0xa8:************************************ libFAH ************************************
17:17:55:WU01:FS00:0xa8:       Date: Oct 29 2020
17:17:55:WU01:FS00:0xa8:       Time: 13:29:34
17:17:55:WU01:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
17:17:55:WU01:FS00:0xa8:     Branch: master
17:17:55:WU01:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
17:17:55:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
17:17:55:WU01:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
17:17:55:WU01:FS00:0xa8:   Platform: darwin 19.6.0
17:17:55:WU01:FS00:0xa8:       Bits: 64
17:17:55:WU01:FS00:0xa8:       Mode: Release
17:17:55:WU01:FS00:0xa8:************************************ CBang *************************************
17:17:55:WU01:FS00:0xa8:       Date: Oct 29 2020
17:17:55:WU01:FS00:0xa8:       Time: 13:28:52
17:17:55:WU01:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
17:17:55:WU01:FS00:0xa8:     Branch: master
17:17:55:WU01:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
17:17:55:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
17:17:55:WU01:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7 -fPIC
17:17:55:WU01:FS00:0xa8:   Platform: darwin 19.6.0
17:17:55:WU01:FS00:0xa8:       Bits: 64
17:17:55:WU01:FS00:0xa8:       Mode: Release
17:17:55:WU01:FS00:0xa8:************************************ System ************************************
17:17:55:WU01:FS00:0xa8:        CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
17:17:55:WU01:FS00:0xa8:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
17:17:55:WU01:FS00:0xa8:       CPUs: 8
17:17:55:WU01:FS00:0xa8:     Memory: 32.00GiB
17:17:55:WU01:FS00:0xa8:Free Memory: 6.38GiB
17:17:55:WU01:FS00:0xa8:    Threads: POSIX_THREADS
17:17:55:WU01:FS00:0xa8: OS Version: 10.14
17:17:55:WU01:FS00:0xa8:Has Battery: false
17:17:55:WU01:FS00:0xa8: On Battery: false
17:17:55:WU01:FS00:0xa8: UTC Offset: -8
17:17:55:WU01:FS00:0xa8:        PID: 22438
17:17:55:WU01:FS00:0xa8:        CWD: /Library/Application Support/FAHClient/work
17:17:55:WU01:FS00:0xa8:********************************************************************************
17:17:55:WU01:FS00:0xa8:Project: 16926 (Run 97, Clone 534, Gen 1)
17:17:55:WU01:FS00:0xa8:Unit: 0x000000048120d1cc5fbd37bf634e817c
17:17:55:WU01:FS00:0xa8:Reading tar file core.xml
17:17:55:WU01:FS00:0xa8:Reading tar file frame1.tpr
17:17:55:WU01:FS00:0xa8:Digital signatures verified
17:17:55:WU01:FS00:0xa8:Calling: mdrun -c frame1.gro -s frame1.tpr -x frame1.xtc -cpt 15 -nt 7 -ntmpi 1
17:17:55:WU01:FS00:0xa8:Steps: first=0 total=0
17:18:00:WU00:FS00:Upload 17.99%
17:18:06:WU00:FS00:Upload 41.99%
17:18:12:WU00:FS00:Upload 65.98%
17:18:18:WU00:FS00:Upload 89.97%
17:18:21:WU00:FS00:Upload complete
17:18:21:WU00:FS00:Server responded WORK_ACK (400)
17:18:21:WU00:FS00:Final credit estimate, 10744.00 points
17:18:21:WU00:FS00:Cleaning up
17:27:56:WU01:FS00:0xa8:Watchdog triggered, requesting soft shutdown down
17:37:55:WU01:FS00:0xa8:Watchdog shutdown failed, hard shutdown triggered
17:37:56:WARNING:WU01:FS00:FahCore returned: WU_STALLED (127 = 0x7f)
17:37:56:WU01:FS00:Starting
17:37:56:WU01:FS00:Running FahCore: /usr/local/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/cores.foldingathome.org/osx/64bit-avx2-256/a8-0.0.9/Core_a8.fah/FahCore_a8" -dir 01 -suffix 01 -version 706 -lifeline 5624 -checkpoint 15 -np 7
17:37:56:WU01:FS00:Started FahCore on PID 22537
17:37:56:WU01:FS00:Core PID:22538
17:37:56:WU01:FS00:FahCore 0xa8 started
17:37:56:WU01:FS00:0xa8:*********************** Log Started 2020-11-30T17:37:56Z ***********************
17:37:56:WU01:FS00:0xa8:************************** Gromacs Folding@home Core ***************************
17:37:56:WU01:FS00:0xa8:       Core: Gromacs
17:37:56:WU01:FS00:0xa8:       Type: 0xa8
17:37:56:WU01:FS00:0xa8:    Version: 0.0.9
17:37:56:WU01:FS00:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:37:56:WU01:FS00:0xa8:  Copyright: 2020 foldingathome.org
17:37:56:WU01:FS00:0xa8:   Homepage: https://foldingathome.org/
17:37:56:WU01:FS00:0xa8:       Date: Oct 29 2020
17:37:56:WU01:FS00:0xa8:       Time: 13:33:44
17:37:56:WU01:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
17:37:56:WU01:FS00:0xa8:     Branch: master
17:37:56:WU01:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
17:37:56:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
17:37:56:WU01:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
17:37:56:WU01:FS00:0xa8:   Platform: darwin 19.6.0
17:37:56:WU01:FS00:0xa8:       Bits: 64
17:37:56:WU01:FS00:0xa8:       Mode: Release
17:37:56:WU01:FS00:0xa8:       SIMD: avx2_256
17:37:56:WU01:FS00:0xa8:     OpenMP: ON
17:37:56:WU01:FS00:0xa8:       CUDA: OFF
17:37:56:WU01:FS00:0xa8:       Args: -dir 01 -suffix 01 -version 706 -lifeline 22537 -checkpoint 15 -np
17:37:56:WU01:FS00:0xa8:             7
17:37:56:WU01:FS00:0xa8:************************************ libFAH ************************************
17:37:56:WU01:FS00:0xa8:       Date: Oct 29 2020
17:37:56:WU01:FS00:0xa8:       Time: 13:29:34
17:37:56:WU01:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
17:37:56:WU01:FS00:0xa8:     Branch: master
17:37:56:WU01:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
17:37:56:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
17:37:56:WU01:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
17:37:56:WU01:FS00:0xa8:   Platform: darwin 19.6.0
17:37:56:WU01:FS00:0xa8:       Bits: 64
17:37:56:WU01:FS00:0xa8:       Mode: Release
17:37:56:WU01:FS00:0xa8:************************************ CBang *************************************
17:37:56:WU01:FS00:0xa8:       Date: Oct 29 2020
17:37:56:WU01:FS00:0xa8:       Time: 13:28:52
17:37:56:WU01:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
17:37:56:WU01:FS00:0xa8:     Branch: master
17:37:56:WU01:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
17:37:56:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
17:37:56:WU01:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7 -fPIC
17:37:56:WU01:FS00:0xa8:   Platform: darwin 19.6.0
17:37:56:WU01:FS00:0xa8:       Bits: 64
17:37:56:WU01:FS00:0xa8:       Mode: Release
17:37:56:WU01:FS00:0xa8:************************************ System ************************************
17:37:56:WU01:FS00:0xa8:        CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
17:37:56:WU01:FS00:0xa8:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
17:37:56:WU01:FS00:0xa8:       CPUs: 8
17:37:56:WU01:FS00:0xa8:     Memory: 32.00GiB
17:37:56:WU01:FS00:0xa8:Free Memory: 6.19GiB
17:37:56:WU01:FS00:0xa8:    Threads: POSIX_THREADS
17:37:56:WU01:FS00:0xa8: OS Version: 10.14
17:37:56:WU01:FS00:0xa8:Has Battery: false
17:37:56:WU01:FS00:0xa8: On Battery: false
17:37:56:WU01:FS00:0xa8: UTC Offset: -8
17:37:56:WU01:FS00:0xa8:        PID: 22538
17:37:56:WU01:FS00:0xa8:        CWD: /Library/Application Support/FAHClient/work
17:37:56:WU01:FS00:0xa8:********************************************************************************
17:37:56:WU01:FS00:0xa8:Project: 16926 (Run 97, Clone 534, Gen 1)
17:37:56:WU01:FS00:0xa8:Unit: 0x000000048120d1cc5fbd37bf634e817c
17:37:56:WU01:FS00:0xa8:Reading tar file core.xml
17:37:56:WU01:FS00:0xa8:Reading tar file frame1.tpr
17:37:56:WU01:FS00:0xa8:Digital signatures verified
17:37:56:WU01:FS00:0xa8:Calling: mdrun -c frame1.gro -s frame1.tpr -x frame1.xtc -cpt 15 -nt 7 -ntmpi 1
17:37:56:WU01:FS00:0xa8:Steps: first=0 total=0
17:47:57:WU01:FS00:0xa8:Watchdog triggered, requesting soft shutdown down
17:57:57:WU01:FS00:0xa8:Watchdog shutdown failed, hard shutdown triggered
17:57:57:WARNING:WU01:FS00:FahCore returned: WU_STALLED (127 = 0x7f)
17:57:58:WU01:FS00:Starting
17:57:58:WU01:FS00:Running FahCore: /usr/local/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/cores.foldingathome.org/osx/64bit-avx2-256/a8-0.0.9/Core_a8.fah/FahCore_a8" -dir 01 -suffix 01 -version 706 -lifeline 5624 -checkpoint 15 -np 7
17:57:58:WU01:FS00:Started FahCore on PID 22601
17:57:58:WU01:FS00:Core PID:22603
17:57:58:WU01:FS00:FahCore 0xa8 started
17:57:58:WU01:FS00:0xa8:*********************** Log Started 2020-11-30T17:57:58Z ***********************
17:57:58:WU01:FS00:0xa8:************************** Gromacs Folding@home Core ***************************
17:57:58:WU01:FS00:0xa8:       Core: Gromacs
17:57:58:WU01:FS00:0xa8:       Type: 0xa8
17:57:58:WU01:FS00:0xa8:    Version: 0.0.9
17:57:58:WU01:FS00:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:57:58:WU01:FS00:0xa8:  Copyright: 2020 foldingathome.org
17:57:58:WU01:FS00:0xa8:   Homepage: https://foldingathome.org/
17:57:58:WU01:FS00:0xa8:       Date: Oct 29 2020
17:57:58:WU01:FS00:0xa8:       Time: 13:33:44
17:57:58:WU01:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
17:57:58:WU01:FS00:0xa8:     Branch: master
17:57:58:WU01:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
17:57:58:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
17:57:58:WU01:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
17:57:58:WU01:FS00:0xa8:   Platform: darwin 19.6.0
17:57:58:WU01:FS00:0xa8:       Bits: 64
17:57:58:WU01:FS00:0xa8:       Mode: Release
17:57:58:WU01:FS00:0xa8:       SIMD: avx2_256
17:57:58:WU01:FS00:0xa8:     OpenMP: ON
17:57:58:WU01:FS00:0xa8:       CUDA: OFF
17:57:58:WU01:FS00:0xa8:       Args: -dir 01 -suffix 01 -version 706 -lifeline 22601 -checkpoint 15 -np
17:57:58:WU01:FS00:0xa8:             7
17:57:58:WU01:FS00:0xa8:************************************ libFAH ************************************
17:57:58:WU01:FS00:0xa8:       Date: Oct 29 2020
17:57:58:WU01:FS00:0xa8:       Time: 13:29:34
17:57:58:WU01:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
17:57:58:WU01:FS00:0xa8:     Branch: master
17:57:58:WU01:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
17:57:58:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
17:57:58:WU01:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
17:57:58:WU01:FS00:0xa8:   Platform: darwin 19.6.0
17:57:58:WU01:FS00:0xa8:       Bits: 64
17:57:58:WU01:FS00:0xa8:       Mode: Release
17:57:58:WU01:FS00:0xa8:************************************ CBang *************************************
17:57:58:WU01:FS00:0xa8:       Date: Oct 29 2020
17:57:58:WU01:FS00:0xa8:       Time: 13:28:52
17:57:58:WU01:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
17:57:58:WU01:FS00:0xa8:     Branch: master
17:57:58:WU01:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
17:57:58:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
17:57:58:WU01:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7 -fPIC
17:57:58:WU01:FS00:0xa8:   Platform: darwin 19.6.0
17:57:58:WU01:FS00:0xa8:       Bits: 64
17:57:58:WU01:FS00:0xa8:       Mode: Release
17:57:58:WU01:FS00:0xa8:************************************ System ************************************
17:57:58:WU01:FS00:0xa8:        CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
17:57:58:WU01:FS00:0xa8:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
17:57:58:WU01:FS00:0xa8:       CPUs: 8
17:57:58:WU01:FS00:0xa8:     Memory: 32.00GiB
17:57:58:WU01:FS00:0xa8:Free Memory: 4.54GiB
17:57:58:WU01:FS00:0xa8:    Threads: POSIX_THREADS
17:57:58:WU01:FS00:0xa8: OS Version: 10.14
17:57:58:WU01:FS00:0xa8:Has Battery: false
17:57:58:WU01:FS00:0xa8: On Battery: false
17:57:58:WU01:FS00:0xa8: UTC Offset: -8
17:57:58:WU01:FS00:0xa8:        PID: 22603
17:57:58:WU01:FS00:0xa8:        CWD: /Library/Application Support/FAHClient/work
17:57:58:WU01:FS00:0xa8:********************************************************************************
17:57:58:WU01:FS00:0xa8:Project: 16926 (Run 97, Clone 534, Gen 1)
17:57:58:WU01:FS00:0xa8:Unit: 0x000000048120d1cc5fbd37bf634e817c
17:57:58:WU01:FS00:0xa8:Reading tar file core.xml
17:57:58:WU01:FS00:0xa8:Reading tar file frame1.tpr
17:57:58:WU01:FS00:0xa8:Digital signatures verified
17:57:58:WU01:FS00:0xa8:Calling: mdrun -c frame1.gro -s frame1.tpr -x frame1.xtc -cpt 15 -nt 7 -ntmpi 1
17:57:58:WU01:FS00:0xa8:Steps: first=0 total=0
18:07:59:WU01:FS00:0xa8:Watchdog triggered, requesting soft shutdown down
18:17:59:WU01:FS00:0xa8:Watchdog shutdown failed, hard shutdown triggered
18:18:00:WARNING:WU01:FS00:FahCore returned: WU_STALLED (127 = 0x7f)
18:18:00:WU01:FS00:Starting
18:18:00:WU01:FS00:Running FahCore: /usr/local/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/cores.foldingathome.org/osx/64bit-avx2-256/a8-0.0.9/Core_a8.fah/FahCore_a8" -dir 01 -suffix 01 -version 706 -lifeline 5624 -checkpoint 15 -np 7
18:18:00:WU01:FS00:Started FahCore on PID 22643
18:18:00:WU01:FS00:Core PID:22644
18:18:00:WU01:FS00:FahCore 0xa8 started
18:18:00:WU01:FS00:0xa8:*********************** Log Started 2020-11-30T18:18:00Z ***********************
18:18:00:WU01:FS00:0xa8:************************** Gromacs Folding@home Core ***************************
18:18:00:WU01:FS00:0xa8:       Core: Gromacs
18:18:00:WU01:FS00:0xa8:       Type: 0xa8
18:18:00:WU01:FS00:0xa8:    Version: 0.0.9
18:18:00:WU01:FS00:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:18:00:WU01:FS00:0xa8:  Copyright: 2020 foldingathome.org
18:18:00:WU01:FS00:0xa8:   Homepage: https://foldingathome.org/
18:18:00:WU01:FS00:0xa8:       Date: Oct 29 2020
18:18:00:WU01:FS00:0xa8:       Time: 13:33:44
18:18:00:WU01:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
18:18:00:WU01:FS00:0xa8:     Branch: master
18:18:00:WU01:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
18:18:00:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
18:18:00:WU01:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
18:18:00:WU01:FS00:0xa8:   Platform: darwin 19.6.0
18:18:00:WU01:FS00:0xa8:       Bits: 64
18:18:00:WU01:FS00:0xa8:       Mode: Release
18:18:00:WU01:FS00:0xa8:       SIMD: avx2_256
18:18:00:WU01:FS00:0xa8:     OpenMP: ON
18:18:00:WU01:FS00:0xa8:       CUDA: OFF
18:18:00:WU01:FS00:0xa8:       Args: -dir 01 -suffix 01 -version 706 -lifeline 22643 -checkpoint 15 -np
18:18:00:WU01:FS00:0xa8:             7
18:18:00:WU01:FS00:0xa8:************************************ libFAH ************************************
18:18:00:WU01:FS00:0xa8:       Date: Oct 29 2020
18:18:00:WU01:FS00:0xa8:       Time: 13:29:34
18:18:00:WU01:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
18:18:00:WU01:FS00:0xa8:     Branch: master
18:18:00:WU01:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
18:18:00:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
18:18:00:WU01:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
18:18:00:WU01:FS00:0xa8:   Platform: darwin 19.6.0
18:18:00:WU01:FS00:0xa8:       Bits: 64
18:18:00:WU01:FS00:0xa8:       Mode: Release
18:18:00:WU01:FS00:0xa8:************************************ CBang *************************************
18:18:00:WU01:FS00:0xa8:       Date: Oct 29 2020
18:18:00:WU01:FS00:0xa8:       Time: 13:28:52
18:18:00:WU01:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
18:18:00:WU01:FS00:0xa8:     Branch: master
18:18:00:WU01:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
18:18:00:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
18:18:00:WU01:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7 -fPIC
18:18:00:WU01:FS00:0xa8:   Platform: darwin 19.6.0
18:18:00:WU01:FS00:0xa8:       Bits: 64
18:18:00:WU01:FS00:0xa8:       Mode: Release
18:18:00:WU01:FS00:0xa8:************************************ System ************************************
18:18:00:WU01:FS00:0xa8:        CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
18:18:00:WU01:FS00:0xa8:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
18:18:00:WU01:FS00:0xa8:       CPUs: 8
18:18:00:WU01:FS00:0xa8:     Memory: 32.00GiB
18:18:00:WU01:FS00:0xa8:Free Memory: 4.93GiB
18:18:00:WU01:FS00:0xa8:    Threads: POSIX_THREADS
18:18:00:WU01:FS00:0xa8: OS Version: 10.14
18:18:00:WU01:FS00:0xa8:Has Battery: false
18:18:00:WU01:FS00:0xa8: On Battery: false
18:18:00:WU01:FS00:0xa8: UTC Offset: -8
18:18:00:WU01:FS00:0xa8:        PID: 22644
18:18:00:WU01:FS00:0xa8:        CWD: /Library/Application Support/FAHClient/work
18:18:00:WU01:FS00:0xa8:********************************************************************************
18:18:00:WU01:FS00:0xa8:Project: 16926 (Run 97, Clone 534, Gen 1)
18:18:00:WU01:FS00:0xa8:Unit: 0x000000048120d1cc5fbd37bf634e817c
18:18:00:WU01:FS00:0xa8:Reading tar file core.xml
18:18:00:WU01:FS00:0xa8:Reading tar file frame1.tpr
18:18:00:WU01:FS00:0xa8:Digital signatures verified
18:18:00:WU01:FS00:0xa8:Calling: mdrun -c frame1.gro -s frame1.tpr -x frame1.xtc -cpt 15 -nt 7 -ntmpi 1
18:18:00:WU01:FS00:0xa8:Steps: first=0 total=0
18:28:02:WU01:FS00:0xa8:Watchdog triggered, requesting soft shutdown down
18:38:01:WU01:FS00:0xa8:Watchdog shutdown failed, hard shutdown triggered
18:38:02:WARNING:WU01:FS00:FahCore returned: WU_STALLED (127 = 0x7f)
18:38:02:WU01:FS00:Starting
18:38:02:WU01:FS00:Running FahCore: /usr/local/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/cores.foldingathome.org/osx/64bit-avx2-256/a8-0.0.9/Core_a8.fah/FahCore_a8" -dir 01 -suffix 01 -version 706 -lifeline 5624 -checkpoint 15 -np 7
18:38:02:WU01:FS00:Started FahCore on PID 22719
18:38:02:WU01:FS00:Core PID:22721
18:38:02:WU01:FS00:FahCore 0xa8 started
18:38:03:WU01:FS00:0xa8:*********************** Log Started 2020-11-30T18:38:02Z ***********************
18:38:03:WU01:FS00:0xa8:************************** Gromacs Folding@home Core ***************************
18:38:03:WU01:FS00:0xa8:       Core: Gromacs
18:38:03:WU01:FS00:0xa8:       Type: 0xa8
18:38:03:WU01:FS00:0xa8:    Version: 0.0.9
18:38:03:WU01:FS00:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:38:03:WU01:FS00:0xa8:  Copyright: 2020 foldingathome.org
18:38:03:WU01:FS00:0xa8:   Homepage: https://foldingathome.org/
18:38:03:WU01:FS00:0xa8:       Date: Oct 29 2020
18:38:03:WU01:FS00:0xa8:       Time: 13:33:44
18:38:03:WU01:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
18:38:03:WU01:FS00:0xa8:     Branch: master
18:38:03:WU01:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
18:38:03:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
18:38:03:WU01:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
18:38:03:WU01:FS00:0xa8:   Platform: darwin 19.6.0
18:38:03:WU01:FS00:0xa8:       Bits: 64
18:38:03:WU01:FS00:0xa8:       Mode: Release
18:38:03:WU01:FS00:0xa8:       SIMD: avx2_256
18:38:03:WU01:FS00:0xa8:     OpenMP: ON
18:38:03:WU01:FS00:0xa8:       CUDA: OFF
18:38:03:WU01:FS00:0xa8:       Args: -dir 01 -suffix 01 -version 706 -lifeline 22719 -checkpoint 15 -np
18:38:03:WU01:FS00:0xa8:             7
18:38:03:WU01:FS00:0xa8:************************************ libFAH ************************************
18:38:03:WU01:FS00:0xa8:       Date: Oct 29 2020
18:38:03:WU01:FS00:0xa8:       Time: 13:29:34
18:38:03:WU01:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
18:38:03:WU01:FS00:0xa8:     Branch: master
18:38:03:WU01:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
18:38:03:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
18:38:03:WU01:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
18:38:03:WU01:FS00:0xa8:   Platform: darwin 19.6.0
18:38:03:WU01:FS00:0xa8:       Bits: 64
18:38:03:WU01:FS00:0xa8:       Mode: Release
18:38:03:WU01:FS00:0xa8:************************************ CBang *************************************
18:38:03:WU01:FS00:0xa8:       Date: Oct 29 2020
18:38:03:WU01:FS00:0xa8:       Time: 13:28:52
18:38:03:WU01:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
18:38:03:WU01:FS00:0xa8:     Branch: master
18:38:03:WU01:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
18:38:03:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
18:38:03:WU01:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7 -fPIC
18:38:03:WU01:FS00:0xa8:   Platform: darwin 19.6.0
18:38:03:WU01:FS00:0xa8:       Bits: 64
18:38:03:WU01:FS00:0xa8:       Mode: Release
18:38:03:WU01:FS00:0xa8:************************************ System ************************************
18:38:03:WU01:FS00:0xa8:        CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
18:38:03:WU01:FS00:0xa8:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
18:38:03:WU01:FS00:0xa8:       CPUs: 8
18:38:03:WU01:FS00:0xa8:     Memory: 32.00GiB
18:38:03:WU01:FS00:0xa8:Free Memory: 4.69GiB
18:38:03:WU01:FS00:0xa8:    Threads: POSIX_THREADS
18:38:03:WU01:FS00:0xa8: OS Version: 10.14
18:38:03:WU01:FS00:0xa8:Has Battery: false
18:38:03:WU01:FS00:0xa8: On Battery: false
18:38:03:WU01:FS00:0xa8: UTC Offset: -8
18:38:03:WU01:FS00:0xa8:        PID: 22721
18:38:03:WU01:FS00:0xa8:        CWD: /Library/Application Support/FAHClient/work
18:38:03:WU01:FS00:0xa8:********************************************************************************
18:38:03:WU01:FS00:0xa8:Project: 16926 (Run 97, Clone 534, Gen 1)
18:38:03:WU01:FS00:0xa8:Unit: 0x000000048120d1cc5fbd37bf634e817c
18:38:03:WU01:FS00:0xa8:Reading tar file core.xml
18:38:03:WU01:FS00:0xa8:Reading tar file frame1.tpr
18:38:03:WU01:FS00:0xa8:Digital signatures verified
18:38:03:WU01:FS00:0xa8:Calling: mdrun -c frame1.gro -s frame1.tpr -x frame1.xtc -cpt 15 -nt 7 -ntmpi 1
18:38:03:WU01:FS00:0xa8:Steps: first=0 total=0
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Watchdog shutdowns on 16926 WUs

Post by PantherX »

Thanks for posting about this. I will ask around and see what I can find :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
ShootThePicture
Posts: 10
Joined: Thu Sep 24, 2020 4:39 am

Re: Watchdog shutdowns on 16926 WUs

Post by ShootThePicture »

This is an old post, but I just woke up for the day and my logs are showing this is happening agin, so I'm glad you're looking into it. Thanks!
Hopfgeist
Posts: 71
Joined: Thu Jul 09, 2020 12:07 pm
Hardware configuration: Dell T420, 2x Xeon E5-2470 v2, NetBSD 10, SunFire X2270 M2, 2x Xeon X5675, NetBSD 9; various other Linux/NetBSD PCs, Macs and virtual servers.
Location: Germany

Re: Watchdog shutdowns on 16926 WUs

Post by Hopfgeist »

I'm having a very similar problem with a 16926 work unit, I even deleted the core and had the client re-download, but no luck.

Specific work unit:
Project: 16926 (Run 57, Clone 678, Gen 7)

Here's how it goes:

Code: Select all

*********************** Log Started 2020-12-16T16:31:55Z ***********************
16:31:55:******************************* libFAH ********************************
16:31:55:       Date: Oct 20 2020
16:31:55:       Time: 20:36:39
16:31:55:   Revision: 5ca109d295a6245e2a2f590b3d0085ad5e567aeb
16:31:55:     Branch: master
16:31:55:   Compiler: GNU 8.3.0
16:31:55:    Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
16:31:55:             -fdata-sections -O3 -funroll-loops -fno-pie
16:31:55:   Platform: linux2 5.8.0-1-amd64
16:31:55:       Bits: 64
16:31:55:       Mode: Release
16:31:55:****************************** FAHClient ******************************
16:31:55:    Version: 7.6.21
16:31:55:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:31:55:  Copyright: 2020 foldingathome.org
16:31:55:   Homepage: https://foldingathome.org/
16:31:55:       Date: Oct 20 2020
16:31:55:       Time: 20:39:00
16:31:55:   Revision: 6efbf0e138e22d3963e6a291f78dcb9c6422a278
16:31:55:     Branch: master
16:31:55:   Compiler: GNU 8.3.0
16:31:55:    Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
16:31:55:             -fdata-sections -O3 -funroll-loops -fno-pie
16:31:55:   Platform: linux2 5.8.0-1-amd64
16:31:55:       Bits: 64
16:31:55:       Mode: Release
16:31:55:       Args: --user=aeroclub-nrw_Hopfgeist --team=264169
16:31:55:             --passkey=******************************** --password=********
16:31:55:             --chdir /home/bernd/FAH --gpu=false --smp=true --cpus=4
16:31:55:             --log-color=false --next-unit-percentage 100 --pid=true --pid-file
16:31:55:             /var/run/FAHClient.pid
16:31:55:     Config: /usr/home/bernd/FAH/config.xml
16:31:55:******************************** CBang ********************************
16:31:55:       Date: Oct 20 2020
16:31:55:       Time: 18:37:59
16:31:55:   Revision: 7e4ce85225d7eaeb775e87c31740181ca603de60
16:31:55:     Branch: master
16:31:55:   Compiler: GNU 8.3.0
16:31:55:    Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
16:31:55:             -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
16:31:55:   Platform: linux2 5.8.0-1-amd64
16:31:55:       Bits: 64
16:31:55:       Mode: Release
16:31:55:******************************* System ********************************
16:31:55:        CPU: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz
16:31:55:     CPU ID: GenuineIntel Family 6 Model 30 Stepping 5
16:31:55:       CPUs: 4
16:31:55:     Memory: 11.99GiB
16:31:55:Free Memory: 5.83GiB
16:31:55:    Threads: POSIX_THREADS
16:31:55: OS Version: 3.11
16:31:55:Has Battery: false
16:31:55: On Battery: false
16:31:55: UTC Offset: 1
16:31:55:        PID: 22453
16:31:55:        CWD: /
16:31:55:         OS: Linux 3.11.6 x86_64
16:31:55:    OS Arch: AMD64
16:31:55:       GPUs: 0
16:31:55:       CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
16:31:55:             libcuda.so: cannot open shared object file: No such file or
16:31:55:             directory
16:31:55:     OpenCL: Not detected: Failed to open dynamic library 'libOpenCL.so':
16:31:55:             libOpenCL.so: cannot open shared object file: No such file or
16:31:55:             directory
16:31:55:***********************************************************************
16:31:55:<config>
16:31:55:  <!-- Network -->
16:31:55:  <proxy v=':8080'/>
16:31:55:
16:31:55:  <!-- Folding Slots -->
16:31:55:  <slot id='0' type='CPU'/>
16:31:55:</config>
16:31:55:Trying to access database...
16:31:55:Successfully acquired database lock
16:31:55:FS00:Initialized folding slot 00: cpu:4
16:31:55:WU01:FS00:Starting
16:31:55:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /usr/home/user/FAH/cores/cores.foldingathome.org/lin/64bit-sse2/a8-0.0.9/Core_a8.fah/FahCore_a8 -dir 01 -suffix 01 -version 706 -lifeline 22453 -checkpoint 15 -np 4
16:31:55:WU01:FS00:Started FahCore on PID 28654
16:31:55:WU01:FS00:Core PID:29102
16:31:55:WU01:FS00:FahCore 0xa8 started
16:31:56:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
16:31:56:WU01:FS00:Starting
16:31:56:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /usr/home/user/FAH/cores/cores.foldingathome.org/lin/64bit-sse2/a8-0.0.9/Core_a8.fah/FahCore_a8 -dir 01 -suffix 01 -version 706 -lifeline 22453 -checkpoint 15 -np 4
16:31:56:WU01:FS00:Started FahCore on PID 26747
16:31:56:WU01:FS00:Core PID:1434
16:31:56:WU01:FS00:FahCore 0xa8 started
16:31:56:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
And so on at infinitum ...

So far I've seen the problem only with the Linux client, none of my other clients currently works on project 16926.

Cheers,
HG.
Image
Dell PowerEdge T420: 2x Xeon E5-2470 v2
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Watchdog shutdowns on 16926 WUs

Post by Joe_H »

Assignments of Project 16926 WUs was suspended over a week ago, just delete the WU.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Hopfgeist
Posts: 71
Joined: Thu Jul 09, 2020 12:07 pm
Hardware configuration: Dell T420, 2x Xeon E5-2470 v2, NetBSD 10, SunFire X2270 M2, 2x Xeon X5675, NetBSD 9; various other Linux/NetBSD PCs, Macs and virtual servers.
Location: Germany

Re: Watchdog shutdowns on 16926 WUs

Post by Hopfgeist »

Joe_H wrote:Assignments of Project 16926 WUs was suspended over a week ago, just delete the WU.
Well, it seems that not all servers got the memo. This was today:

Code: Select all

[...]
******************************* Date: 2020-12-16 *******************************
[...]
15:58:45:WU01:FS00:Connecting to assign1.foldingathome.org:80
15:58:46:WU01:FS00:Assigned to work server 129.32.209.204
15:58:46:WU01:FS00:Requesting new work unit for slot 00: cpu:4 from 129.32.209.204
15:58:46:WU01:FS00:Connecting to 129.32.209.204:8080
15:58:47:WU01:FS00:Downloading 49.00KiB
15:58:47:WU01:FS00:Download complete
15:58:48:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:16926 run:57 clone:678 gen:7 core:0xa8 unit:0x0000
000c8120d1cc00000000003902a6
15:58:48:WU01:FS00:Starting
No big problem, I hope, I deleted it and will just carry on, just letting you know that apparently Project 16926 work units are in fact being assigned right now (or at least a few hours ago).


Cheers,
HG.
Image
Dell PowerEdge T420: 2x Xeon E5-2470 v2
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Watchdog shutdowns on 16926 WUs

Post by Joe_H »

I will pass that on to the researcher, they shut down the project when they ran into a problem that resulted in WUs like this failing. They had enough data already, and if more is needed may release a new project in the future once they figure out what the problem was.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Hopfgeist
Posts: 71
Joined: Thu Jul 09, 2020 12:07 pm
Hardware configuration: Dell T420, 2x Xeon E5-2470 v2, NetBSD 10, SunFire X2270 M2, 2x Xeon X5675, NetBSD 9; various other Linux/NetBSD PCs, Macs and virtual servers.
Location: Germany

Re: Watchdog shutdowns on 16926 WUs

Post by Hopfgeist »

Joe_H wrote:I will pass that on to the researcher, they shut down the project when they ran into a problem that resulted in WUs like this failing. They had enough data already, and if more is needed may release a new project in the future once they figure out what the problem was.
Sure, thanks.

HG.
Image
Dell PowerEdge T420: 2x Xeon E5-2470 v2
ShootThePicture
Posts: 10
Joined: Thu Sep 24, 2020 4:39 am

Re: Watchdog shutdowns on 16926 WUs

Post by ShootThePicture »

Like Joe_H said, they are still releasing these WUs. I was receiving them this morning.

Code: Select all

14:18:34:WU00:FS00:Connecting to assign1.foldingathome.org:80
14:18:34:WU00:FS00:Assigned to work server 129.32.209.204
14:18:34:WU00:FS00:Requesting new work unit for slot 00: READY cpu:7 from 129.32.209.204
14:18:34:WU00:FS00:Connecting to 129.32.209.204:8080
14:18:35:WU00:FS00:Downloading 49.00KiB
14:18:35:WU00:FS00:Download complete
14:18:35:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:16926 run:49 clone:737 gen:6 core:0xa8 unit:0x000000078120d1cc00000000003102e1
14:18:35:WU00:FS00:Starting
14:18:35:WU00:FS00:Running FahCore: /usr/local/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/cores.foldingathome.org/osx/64bit-avx2-256/a8-0.0.9/Core_a8.fah/FahCore_a8" -dir 00 -suffix 01 -version 706 -lifeline 48 -checkpoint 15 -np 7
14:18:35:WU00:FS00:Started FahCore on PID 10054
14:18:35:WU00:FS00:Core PID:10056
14:18:35:WU00:FS00:FahCore 0xa8 started
14:18:36:WU00:FS00:0xa8:*********************** Log Started 2020-12-16T14:18:36Z ***********************
14:18:36:WU00:FS00:0xa8:************************** Gromacs Folding@home Core ***************************
14:18:36:WU00:FS00:0xa8:       Core: Gromacs
14:18:36:WU00:FS00:0xa8:       Type: 0xa8
14:18:36:WU00:FS00:0xa8:    Version: 0.0.9
14:18:36:WU00:FS00:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
14:18:36:WU00:FS00:0xa8:  Copyright: 2020 foldingathome.org
14:18:36:WU00:FS00:0xa8:   Homepage: https://foldingathome.org/
14:18:36:WU00:FS00:0xa8:       Date: Oct 29 2020
14:18:36:WU00:FS00:0xa8:       Time: 13:33:44
14:18:36:WU00:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
14:18:36:WU00:FS00:0xa8:     Branch: master
14:18:36:WU00:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
14:18:36:WU00:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
14:18:36:WU00:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
14:18:36:WU00:FS00:0xa8:   Platform: darwin 19.6.0
14:18:36:WU00:FS00:0xa8:       Bits: 64
14:18:36:WU00:FS00:0xa8:       Mode: Release
14:18:36:WU00:FS00:0xa8:       SIMD: avx2_256
14:18:36:WU00:FS00:0xa8:     OpenMP: ON
14:18:36:WU00:FS00:0xa8:       CUDA: OFF
14:18:36:WU00:FS00:0xa8:       Args: -dir 00 -suffix 01 -version 706 -lifeline 10054 -checkpoint 15 -np
14:18:36:WU00:FS00:0xa8:             7
14:18:36:WU00:FS00:0xa8:************************************ libFAH ************************************
14:18:36:WU00:FS00:0xa8:       Date: Oct 29 2020
14:18:36:WU00:FS00:0xa8:       Time: 13:29:34
14:18:36:WU00:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
14:18:36:WU00:FS00:0xa8:     Branch: master
14:18:36:WU00:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
14:18:36:WU00:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
14:18:36:WU00:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
14:18:36:WU00:FS00:0xa8:   Platform: darwin 19.6.0
14:18:36:WU00:FS00:0xa8:       Bits: 64
14:18:36:WU00:FS00:0xa8:       Mode: Release
14:18:36:WU00:FS00:0xa8:************************************ CBang *************************************
14:18:36:WU00:FS00:0xa8:       Date: Oct 29 2020
14:18:36:WU00:FS00:0xa8:       Time: 13:28:52
14:18:36:WU00:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
14:18:36:WU00:FS00:0xa8:     Branch: master
14:18:36:WU00:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
14:18:36:WU00:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
14:18:36:WU00:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7 -fPIC
14:18:36:WU00:FS00:0xa8:   Platform: darwin 19.6.0
14:18:36:WU00:FS00:0xa8:       Bits: 64
14:18:36:WU00:FS00:0xa8:       Mode: Release
14:18:36:WU00:FS00:0xa8:************************************ System ************************************
14:18:36:WU00:FS00:0xa8:        CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
14:18:36:WU00:FS00:0xa8:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
14:18:36:WU00:FS00:0xa8:       CPUs: 8
14:18:36:WU00:FS00:0xa8:     Memory: 32.00GiB
14:18:36:WU00:FS00:0xa8:Free Memory: 8.98GiB
14:18:36:WU00:FS00:0xa8:    Threads: POSIX_THREADS
14:18:36:WU00:FS00:0xa8: OS Version: 10.14
14:18:36:WU00:FS00:0xa8:Has Battery: false
14:18:36:WU00:FS00:0xa8: On Battery: false
14:18:36:WU00:FS00:0xa8: UTC Offset: -8
14:18:36:WU00:FS00:0xa8:        PID: 10056
14:18:36:WU00:FS00:0xa8:        CWD: /Library/Application Support/FAHClient/work
14:18:36:WU00:FS00:0xa8:********************************************************************************
14:18:36:WU00:FS00:0xa8:Project: 16926 (Run 49, Clone 737, Gen 6)
14:18:36:WU00:FS00:0xa8:Unit: 0x000000078120d1cc00000000003102e1
14:18:36:WU00:FS00:0xa8:Reading tar file core.xml
14:18:36:WU00:FS00:0xa8:Reading tar file frame6.tpr
14:18:36:WU00:FS00:0xa8:Digital signatures verified
14:18:36:WU00:FS00:0xa8:Calling: mdrun -c frame6.gro -s frame6.tpr -x frame6.xtc -cpt 15 -nt 7 -ntmpi 1
14:18:36:WU00:FS00:0xa8:Steps: first=0 total=0
14:28:37:WU00:FS00:0xa8:Watchdog triggered, requesting soft shutdown down
14:38:37:WU00:FS00:0xa8:Watchdog shutdown failed, hard shutdown triggered
14:38:37:WARNING:WU00:FS00:FahCore returned: WU_STALLED (127 = 0x7f)
14:38:38:WU00:FS00:Starting
14:38:38:WU00:FS00:Running FahCore: /usr/local/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/cores.foldingathome.org/osx/64bit-avx2-256/a8-0.0.9/Core_a8.fah/FahCore_a8" -dir 00 -suffix 01 -version 706 -lifeline 48 -checkpoint 15 -np 7
14:38:38:WU00:FS00:Started FahCore on PID 10105
14:38:38:WU00:FS00:Core PID:10106
14:38:38:WU00:FS00:FahCore 0xa8 started
14:38:38:WU00:FS00:0xa8:*********************** Log Started 2020-12-16T14:38:38Z ***********************
14:38:38:WU00:FS00:0xa8:************************** Gromacs Folding@home Core ***************************
14:38:38:WU00:FS00:0xa8:       Core: Gromacs
14:38:38:WU00:FS00:0xa8:       Type: 0xa8
14:38:38:WU00:FS00:0xa8:    Version: 0.0.9
14:38:38:WU00:FS00:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
14:38:38:WU00:FS00:0xa8:  Copyright: 2020 foldingathome.org
14:38:38:WU00:FS00:0xa8:   Homepage: https://foldingathome.org/
14:38:38:WU00:FS00:0xa8:       Date: Oct 29 2020
14:38:38:WU00:FS00:0xa8:       Time: 13:33:44
14:38:38:WU00:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
14:38:38:WU00:FS00:0xa8:     Branch: master
14:38:38:WU00:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
14:38:38:WU00:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
14:38:38:WU00:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
14:38:38:WU00:FS00:0xa8:   Platform: darwin 19.6.0
14:38:38:WU00:FS00:0xa8:       Bits: 64
14:38:38:WU00:FS00:0xa8:       Mode: Release
14:38:38:WU00:FS00:0xa8:       SIMD: avx2_256
14:38:38:WU00:FS00:0xa8:     OpenMP: ON
14:38:38:WU00:FS00:0xa8:       CUDA: OFF
14:38:38:WU00:FS00:0xa8:       Args: -dir 00 -suffix 01 -version 706 -lifeline 10105 -checkpoint 15 -np
14:38:38:WU00:FS00:0xa8:             7
14:38:38:WU00:FS00:0xa8:************************************ libFAH ************************************
14:38:38:WU00:FS00:0xa8:       Date: Oct 29 2020
14:38:38:WU00:FS00:0xa8:       Time: 13:29:34
14:38:38:WU00:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
14:38:38:WU00:FS00:0xa8:     Branch: master
14:38:38:WU00:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
14:38:38:WU00:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
14:38:38:WU00:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
14:38:38:WU00:FS00:0xa8:   Platform: darwin 19.6.0
14:38:38:WU00:FS00:0xa8:       Bits: 64
14:38:38:WU00:FS00:0xa8:       Mode: Release
14:38:38:WU00:FS00:0xa8:************************************ CBang *************************************
14:38:38:WU00:FS00:0xa8:       Date: Oct 29 2020
14:38:38:WU00:FS00:0xa8:       Time: 13:28:52
14:38:38:WU00:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
14:38:38:WU00:FS00:0xa8:     Branch: master
14:38:38:WU00:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
14:38:38:WU00:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
14:38:38:WU00:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7 -fPIC
14:38:38:WU00:FS00:0xa8:   Platform: darwin 19.6.0
14:38:38:WU00:FS00:0xa8:       Bits: 64
14:38:38:WU00:FS00:0xa8:       Mode: Release
14:38:38:WU00:FS00:0xa8:************************************ System ************************************
14:38:38:WU00:FS00:0xa8:        CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
14:38:38:WU00:FS00:0xa8:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
14:38:38:WU00:FS00:0xa8:       CPUs: 8
14:38:38:WU00:FS00:0xa8:     Memory: 32.00GiB
14:38:38:WU00:FS00:0xa8:Free Memory: 8.92GiB
14:38:38:WU00:FS00:0xa8:    Threads: POSIX_THREADS
14:38:38:WU00:FS00:0xa8: OS Version: 10.14
14:38:38:WU00:FS00:0xa8:Has Battery: false
14:38:38:WU00:FS00:0xa8: On Battery: false
14:38:38:WU00:FS00:0xa8: UTC Offset: -8
14:38:38:WU00:FS00:0xa8:        PID: 10106
14:38:38:WU00:FS00:0xa8:        CWD: /Library/Application Support/FAHClient/work
14:38:38:WU00:FS00:0xa8:********************************************************************************
14:38:38:WU00:FS00:0xa8:Project: 16926 (Run 49, Clone 737, Gen 6)
14:38:38:WU00:FS00:0xa8:Unit: 0x000000078120d1cc00000000003102e1
14:38:38:WU00:FS00:0xa8:Reading tar file core.xml
14:38:38:WU00:FS00:0xa8:Reading tar file frame6.tpr
14:38:38:WU00:FS00:0xa8:Digital signatures verified
14:38:38:WU00:FS00:0xa8:Calling: mdrun -c frame6.gro -s frame6.tpr -x frame6.xtc -cpt 15 -nt 7 -ntmpi 1
14:38:38:WU00:FS00:0xa8:Steps: first=0 total=0
14:48:40:WU00:FS00:0xa8:Watchdog triggered, requesting soft shutdown down
14:58:39:WU00:FS00:0xa8:Watchdog shutdown failed, hard shutdown triggered
14:58:40:WARNING:WU00:FS00:FahCore returned: WU_STALLED (127 = 0x7f)
14:58:40:WU00:FS00:Starting
14:58:40:WU00:FS00:Running FahCore: /usr/local/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/cores.foldingathome.org/osx/64bit-avx2-256/a8-0.0.9/Core_a8.fah/FahCore_a8" -dir 00 -suffix 01 -version 706 -lifeline 48 -checkpoint 15 -np 7
14:58:40:WU00:FS00:Started FahCore on PID 10167
14:58:40:WU00:FS00:Core PID:10168
14:58:40:WU00:FS00:FahCore 0xa8 started
14:58:40:WU00:FS00:0xa8:*********************** Log Started 2020-12-16T14:58:40Z ***********************
14:58:40:WU00:FS00:0xa8:************************** Gromacs Folding@home Core ***************************
14:58:40:WU00:FS00:0xa8:       Core: Gromacs
14:58:40:WU00:FS00:0xa8:       Type: 0xa8
14:58:40:WU00:FS00:0xa8:    Version: 0.0.9
14:58:40:WU00:FS00:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
14:58:40:WU00:FS00:0xa8:  Copyright: 2020 foldingathome.org
14:58:40:WU00:FS00:0xa8:   Homepage: https://foldingathome.org/
14:58:40:WU00:FS00:0xa8:       Date: Oct 29 2020
14:58:40:WU00:FS00:0xa8:       Time: 13:33:44
14:58:40:WU00:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
14:58:40:WU00:FS00:0xa8:     Branch: master
14:58:40:WU00:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
14:58:40:WU00:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
14:58:40:WU00:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
14:58:40:WU00:FS00:0xa8:   Platform: darwin 19.6.0
14:58:40:WU00:FS00:0xa8:       Bits: 64
14:58:40:WU00:FS00:0xa8:       Mode: Release
14:58:40:WU00:FS00:0xa8:       SIMD: avx2_256
14:58:40:WU00:FS00:0xa8:     OpenMP: ON
14:58:40:WU00:FS00:0xa8:       CUDA: OFF
14:58:40:WU00:FS00:0xa8:       Args: -dir 00 -suffix 01 -version 706 -lifeline 10167 -checkpoint 15 -np
14:58:40:WU00:FS00:0xa8:             7
14:58:40:WU00:FS00:0xa8:************************************ libFAH ************************************
14:58:40:WU00:FS00:0xa8:       Date: Oct 29 2020
14:58:40:WU00:FS00:0xa8:       Time: 13:29:34
14:58:40:WU00:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
14:58:40:WU00:FS00:0xa8:     Branch: master
14:58:40:WU00:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
14:58:40:WU00:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
14:58:40:WU00:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
14:58:40:WU00:FS00:0xa8:   Platform: darwin 19.6.0
14:58:40:WU00:FS00:0xa8:       Bits: 64
14:58:40:WU00:FS00:0xa8:       Mode: Release
14:58:40:WU00:FS00:0xa8:************************************ CBang *************************************
14:58:40:WU00:FS00:0xa8:       Date: Oct 29 2020
14:58:40:WU00:FS00:0xa8:       Time: 13:28:52
14:58:40:WU00:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
14:58:40:WU00:FS00:0xa8:     Branch: master
14:58:40:WU00:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
14:58:40:WU00:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
14:58:40:WU00:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7 -fPIC
14:58:40:WU00:FS00:0xa8:   Platform: darwin 19.6.0
14:58:40:WU00:FS00:0xa8:       Bits: 64
14:58:40:WU00:FS00:0xa8:       Mode: Release
14:58:40:WU00:FS00:0xa8:************************************ System ************************************
14:58:40:WU00:FS00:0xa8:        CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
14:58:40:WU00:FS00:0xa8:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
14:58:40:WU00:FS00:0xa8:       CPUs: 8
14:58:40:WU00:FS00:0xa8:     Memory: 32.00GiB
14:58:40:WU00:FS00:0xa8:Free Memory: 7.68GiB
14:58:40:WU00:FS00:0xa8:    Threads: POSIX_THREADS
14:58:40:WU00:FS00:0xa8: OS Version: 10.14
14:58:40:WU00:FS00:0xa8:Has Battery: false
14:58:40:WU00:FS00:0xa8: On Battery: false
14:58:40:WU00:FS00:0xa8: UTC Offset: -8
14:58:40:WU00:FS00:0xa8:        PID: 10168
14:58:40:WU00:FS00:0xa8:        CWD: /Library/Application Support/FAHClient/work
14:58:40:WU00:FS00:0xa8:********************************************************************************
14:58:40:WU00:FS00:0xa8:Project: 16926 (Run 49, Clone 737, Gen 6)
14:58:40:WU00:FS00:0xa8:Unit: 0x000000078120d1cc00000000003102e1
14:58:40:WU00:FS00:0xa8:Reading tar file core.xml
14:58:40:WU00:FS00:0xa8:Reading tar file frame6.tpr
14:58:40:WU00:FS00:0xa8:Digital signatures verified
14:58:40:WU00:FS00:0xa8:Calling: mdrun -c frame6.gro -s frame6.tpr -x frame6.xtc -cpt 15 -nt 7 -ntmpi 1
14:58:40:WU00:FS00:0xa8:Steps: first=0 total=0
15:08:41:WU00:FS00:0xa8:Watchdog triggered, requesting soft shutdown down
15:18:41:WU00:FS00:0xa8:Watchdog shutdown failed, hard shutdown triggered
15:18:42:WARNING:WU00:FS00:FahCore returned: WU_STALLED (127 = 0x7f)
15:18:42:WU00:FS00:Starting
15:18:42:WU00:FS00:Running FahCore: /usr/local/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/cores.foldingathome.org/osx/64bit-avx2-256/a8-0.0.9/Core_a8.fah/FahCore_a8" -dir 00 -suffix 01 -version 706 -lifeline 48 -checkpoint 15 -np 7
15:18:42:WU00:FS00:Started FahCore on PID 10204
15:18:42:WU00:FS00:Core PID:10205
15:18:42:WU00:FS00:FahCore 0xa8 started
15:18:42:WU00:FS00:0xa8:*********************** Log Started 2020-12-16T15:18:42Z ***********************
15:18:42:WU00:FS00:0xa8:************************** Gromacs Folding@home Core ***************************
15:18:42:WU00:FS00:0xa8:       Core: Gromacs
15:18:42:WU00:FS00:0xa8:       Type: 0xa8
15:18:42:WU00:FS00:0xa8:    Version: 0.0.9
15:18:42:WU00:FS00:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:18:42:WU00:FS00:0xa8:  Copyright: 2020 foldingathome.org
15:18:42:WU00:FS00:0xa8:   Homepage: https://foldingathome.org/
15:18:42:WU00:FS00:0xa8:       Date: Oct 29 2020
15:18:42:WU00:FS00:0xa8:       Time: 13:33:44
15:18:42:WU00:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
15:18:42:WU00:FS00:0xa8:     Branch: master
15:18:42:WU00:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
15:18:42:WU00:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
15:18:42:WU00:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
15:18:42:WU00:FS00:0xa8:   Platform: darwin 19.6.0
15:18:42:WU00:FS00:0xa8:       Bits: 64
15:18:42:WU00:FS00:0xa8:       Mode: Release
15:18:42:WU00:FS00:0xa8:       SIMD: avx2_256
15:18:42:WU00:FS00:0xa8:     OpenMP: ON
15:18:42:WU00:FS00:0xa8:       CUDA: OFF
15:18:42:WU00:FS00:0xa8:       Args: -dir 00 -suffix 01 -version 706 -lifeline 10204 -checkpoint 15 -np
15:18:42:WU00:FS00:0xa8:             7
15:18:42:WU00:FS00:0xa8:************************************ libFAH ************************************
15:18:42:WU00:FS00:0xa8:       Date: Oct 29 2020
15:18:42:WU00:FS00:0xa8:       Time: 13:29:34
15:18:42:WU00:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
15:18:42:WU00:FS00:0xa8:     Branch: master
15:18:42:WU00:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
15:18:42:WU00:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
15:18:42:WU00:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
15:18:42:WU00:FS00:0xa8:   Platform: darwin 19.6.0
15:18:42:WU00:FS00:0xa8:       Bits: 64
15:18:42:WU00:FS00:0xa8:       Mode: Release
15:18:42:WU00:FS00:0xa8:************************************ CBang *************************************
15:18:42:WU00:FS00:0xa8:       Date: Oct 29 2020
15:18:42:WU00:FS00:0xa8:       Time: 13:28:52
15:18:42:WU00:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
15:18:42:WU00:FS00:0xa8:     Branch: master
15:18:42:WU00:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
15:18:42:WU00:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
15:18:42:WU00:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7 -fPIC
15:18:42:WU00:FS00:0xa8:   Platform: darwin 19.6.0
15:18:42:WU00:FS00:0xa8:       Bits: 64
15:18:42:WU00:FS00:0xa8:       Mode: Release
15:18:42:WU00:FS00:0xa8:************************************ System ************************************
15:18:42:WU00:FS00:0xa8:        CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
15:18:42:WU00:FS00:0xa8:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
15:18:42:WU00:FS00:0xa8:       CPUs: 8
15:18:42:WU00:FS00:0xa8:     Memory: 32.00GiB
15:18:42:WU00:FS00:0xa8:Free Memory: 7.68GiB
15:18:42:WU00:FS00:0xa8:    Threads: POSIX_THREADS
15:18:42:WU00:FS00:0xa8: OS Version: 10.14
15:18:42:WU00:FS00:0xa8:Has Battery: false
15:18:42:WU00:FS00:0xa8: On Battery: false
15:18:42:WU00:FS00:0xa8: UTC Offset: -8
15:18:42:WU00:FS00:0xa8:        PID: 10205
15:18:42:WU00:FS00:0xa8:        CWD: /Library/Application Support/FAHClient/work
15:18:42:WU00:FS00:0xa8:********************************************************************************
15:18:42:WU00:FS00:0xa8:Project: 16926 (Run 49, Clone 737, Gen 6)
15:18:42:WU00:FS00:0xa8:Unit: 0x000000078120d1cc00000000003102e1
15:18:42:WU00:FS00:0xa8:Reading tar file core.xml
15:18:42:WU00:FS00:0xa8:Reading tar file frame6.tpr
15:18:42:WU00:FS00:0xa8:Digital signatures verified
15:18:42:WU00:FS00:0xa8:Calling: mdrun -c frame6.gro -s frame6.tpr -x frame6.xtc -cpt 15 -nt 7 -ntmpi 1
15:18:42:WU00:FS00:0xa8:Steps: first=0 total=0
15:28:43:WU00:FS00:0xa8:Watchdog triggered, requesting soft shutdown down
15:38:43:WU00:FS00:0xa8:Watchdog shutdown failed, hard shutdown triggered
15:38:44:WARNING:WU00:FS00:FahCore returned: WU_STALLED (127 = 0x7f)
15:38:44:WU00:FS00:Starting
15:38:44:WU00:FS00:Running FahCore: /usr/local/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/cores.foldingathome.org/osx/64bit-avx2-256/a8-0.0.9/Core_a8.fah/FahCore_a8" -dir 00 -suffix 01 -version 706 -lifeline 48 -checkpoint 15 -np 7
15:38:44:WU00:FS00:Started FahCore on PID 10240
15:38:44:WU00:FS00:Core PID:10242
15:38:44:WU00:FS00:FahCore 0xa8 started
15:38:44:WU00:FS00:0xa8:*********************** Log Started 2020-12-16T15:38:44Z ***********************
15:38:44:WU00:FS00:0xa8:************************** Gromacs Folding@home Core ***************************
15:38:44:WU00:FS00:0xa8:       Core: Gromacs
15:38:44:WU00:FS00:0xa8:       Type: 0xa8
15:38:44:WU00:FS00:0xa8:    Version: 0.0.9
15:38:44:WU00:FS00:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:38:44:WU00:FS00:0xa8:  Copyright: 2020 foldingathome.org
15:38:44:WU00:FS00:0xa8:   Homepage: https://foldingathome.org/
15:38:44:WU00:FS00:0xa8:       Date: Oct 29 2020
15:38:44:WU00:FS00:0xa8:       Time: 13:33:44
15:38:44:WU00:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
15:38:44:WU00:FS00:0xa8:     Branch: master
15:38:44:WU00:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
15:38:44:WU00:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
15:38:44:WU00:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
15:38:44:WU00:FS00:0xa8:   Platform: darwin 19.6.0
15:38:44:WU00:FS00:0xa8:       Bits: 64
15:38:44:WU00:FS00:0xa8:       Mode: Release
15:38:44:WU00:FS00:0xa8:       SIMD: avx2_256
15:38:44:WU00:FS00:0xa8:     OpenMP: ON
15:38:44:WU00:FS00:0xa8:       CUDA: OFF
15:38:44:WU00:FS00:0xa8:       Args: -dir 00 -suffix 01 -version 706 -lifeline 10240 -checkpoint 15 -np
15:38:44:WU00:FS00:0xa8:             7
15:38:44:WU00:FS00:0xa8:************************************ libFAH ************************************
15:38:44:WU00:FS00:0xa8:       Date: Oct 29 2020
15:38:44:WU00:FS00:0xa8:       Time: 13:29:34
15:38:44:WU00:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
15:38:44:WU00:FS00:0xa8:     Branch: master
15:38:44:WU00:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
15:38:44:WU00:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
15:38:44:WU00:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7
15:38:44:WU00:FS00:0xa8:   Platform: darwin 19.6.0
15:38:44:WU00:FS00:0xa8:       Bits: 64
15:38:44:WU00:FS00:0xa8:       Mode: Release
15:38:44:WU00:FS00:0xa8:************************************ CBang *************************************
15:38:44:WU00:FS00:0xa8:       Date: Oct 29 2020
15:38:44:WU00:FS00:0xa8:       Time: 13:28:52
15:38:44:WU00:FS00:0xa8:   Revision: a2332c71664fb4eb279de78f77799f38b2fc0696
15:38:44:WU00:FS00:0xa8:     Branch: master
15:38:44:WU00:FS00:0xa8:   Compiler: GNU Apple LLVM 12.0.0 (clang-1200.0.32.2)
15:38:44:WU00:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -stdlib=libc++ -O3
15:38:44:WU00:FS00:0xa8:             -funroll-loops -fno-pie -mmacosx-version-min=10.7 -fPIC
15:38:44:WU00:FS00:0xa8:   Platform: darwin 19.6.0
15:38:44:WU00:FS00:0xa8:       Bits: 64
15:38:44:WU00:FS00:0xa8:       Mode: Release
15:38:44:WU00:FS00:0xa8:************************************ System ************************************
15:38:44:WU00:FS00:0xa8:        CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
15:38:44:WU00:FS00:0xa8:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
15:38:44:WU00:FS00:0xa8:       CPUs: 8
15:38:44:WU00:FS00:0xa8:     Memory: 32.00GiB
15:38:44:WU00:FS00:0xa8:Free Memory: 7.68GiB
15:38:44:WU00:FS00:0xa8:    Threads: POSIX_THREADS
15:38:44:WU00:FS00:0xa8: OS Version: 10.14
15:38:44:WU00:FS00:0xa8:Has Battery: false
15:38:44:WU00:FS00:0xa8: On Battery: false
15:38:44:WU00:FS00:0xa8: UTC Offset: -8
15:38:44:WU00:FS00:0xa8:        PID: 10242
15:38:44:WU00:FS00:0xa8:        CWD: /Library/Application Support/FAHClient/work
15:38:44:WU00:FS00:0xa8:********************************************************************************
15:38:44:WU00:FS00:0xa8:Project: 16926 (Run 49, Clone 737, Gen 6)
15:38:44:WU00:FS00:0xa8:Unit: 0x000000078120d1cc00000000003102e1
15:38:44:WU00:FS00:0xa8:Reading tar file core.xml
15:38:44:WU00:FS00:0xa8:Reading tar file frame6.tpr
15:38:44:WU00:FS00:0xa8:Digital signatures verified
15:38:44:WU00:FS00:0xa8:Calling: mdrun -c frame6.gro -s frame6.tpr -x frame6.xtc -cpt 15 -nt 7 -ntmpi 1
15:38:44:WU00:FS00:0xa8:Steps: first=0 total=0
15:48:46:WU00:FS00:0xa8:Watchdog triggered, requesting soft shutdown down
15:58:45:WU00:FS00:0xa8:Watchdog shutdown failed, hard shutdown triggered
15:58:45:WARNING:WU00:FS00:FahCore returned: WU_STALLED (127 = 0x7f)
15:58:45:WARNING:WU00:FS00:Too many errors, failing
15:58:45:WU00:FS00:Sending unit results: id:00 state:SEND error:FAILED project:16926 run:49 clone:737 gen:6 core:0xa8 unit:0x000000078120d1cc00000000003102e1
rkv_2401
Posts: 6
Joined: Thu Nov 18, 2021 5:20 am

Re: Watchdog shutdowns on 16926 WUs

Post by rkv_2401 »

I have a 'watchdog triggered, soft shutdown' error on 18201. It's still "running" to 100% in accordance with the time it should take, but it's using no GPU and barely any CPU.
Post Reply