13434/13435 Stalled Work Units

Moderators: Site Moderators, FAHC Science Team

Post Reply
Dotious
Posts: 4
Joined: Sat Dec 12, 2020 1:00 am

13434/13435 Stalled Work Units

Post by Dotious »

I have received 3 WUs for 13434 and 13435 over the last couple of days. All WUs have eventually completed, but each WU threw a "WU_stalled" flag 2 - 3 times per simulation. I haven't seen this on any other work units over the last few months. These WUs have all been assigned to my RTX 2070 Super.

13434 (248, 1, 28)
13434 (104, 3, 45)
13435 (348,1,21)

Code: Select all

18:48:28:WU00:FS01:0x22:Completed 3650000 out of 5000000 steps (73%)
19:00:47:WU00:FS01:0x22:Watchdog triggered, requesting soft shutdown down
19:10:47:WU00:FS01:0x22:Watchdog shutdown failed, hard shutdown triggered
19:10:47:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
19:10:47:WARNING:WU00:FS01:FahCore returned: WU_STALLED (127 = 0x7f)
19:10:47:WU00:FS01:Starting
19:10:47:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\user\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 3568 -checkpoint 30 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
19:10:47:WU00:FS01:Started FahCore on PID 11696
19:10:47:WU00:FS01:Core PID:3988
19:10:47:WU00:FS01:FahCore 0x22 started
19:10:48:WU00:FS01:0x22:*********************** Log Started 2021-01-11T19:10:47Z ***********************
19:10:48:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
19:10:48:WU00:FS01:0x22:       Core: Core22
19:10:48:WU00:FS01:0x22:       Type: 0x22
19:10:48:WU00:FS01:0x22:    Version: 0.0.13
19:10:48:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:10:48:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
19:10:48:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
19:10:48:WU00:FS01:0x22:       Date: Sep 19 2020
19:10:48:WU00:FS01:0x22:       Time: 02:35:58
19:10:48:WU00:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
19:10:48:WU00:FS01:0x22:     Branch: core22-0.0.13
19:10:48:WU00:FS01:0x22:   Compiler: Visual C++ 2015
19:10:48:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
19:10:48:WU00:FS01:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
19:10:48:WU00:FS01:0x22:   Platform: win32 10
19:10:48:WU00:FS01:0x22:       Bits: 64
19:10:48:WU00:FS01:0x22:       Mode: Release
19:10:48:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
19:10:48:WU00:FS01:0x22:             <peastman@stanford.edu>
19:10:48:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 11696 -checkpoint 30
19:10:48:WU00:FS01:0x22:             -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
19:10:48:WU00:FS01:0x22:             nvidia -gpu 0 -gpu-usage 100
19:10:48:WU00:FS01:0x22:************************************ libFAH ************************************
19:10:48:WU00:FS01:0x22:       Date: Sep 7 2020
19:10:48:WU00:FS01:0x22:       Time: 19:09:56
19:10:48:WU00:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
19:10:48:WU00:FS01:0x22:     Branch: HEAD
19:10:48:WU00:FS01:0x22:   Compiler: Visual C++ 2015
19:10:48:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
19:10:48:WU00:FS01:0x22:   Platform: win32 10
19:10:48:WU00:FS01:0x22:       Bits: 64
19:10:48:WU00:FS01:0x22:       Mode: Release
19:10:48:WU00:FS01:0x22:************************************ CBang *************************************
19:10:48:WU00:FS01:0x22:       Date: Sep 7 2020
19:10:48:WU00:FS01:0x22:       Time: 19:08:30
19:10:48:WU00:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
19:10:48:WU00:FS01:0x22:     Branch: HEAD
19:10:48:WU00:FS01:0x22:   Compiler: Visual C++ 2015
19:10:48:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
19:10:48:WU00:FS01:0x22:   Platform: win32 10
19:10:48:WU00:FS01:0x22:       Bits: 64
19:10:48:WU00:FS01:0x22:       Mode: Release
19:10:48:WU00:FS01:0x22:************************************ System ************************************
19:10:48:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
19:10:48:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
19:10:48:WU00:FS01:0x22:       CPUs: 12
19:10:48:WU00:FS01:0x22:     Memory: 15.94GiB
19:10:48:WU00:FS01:0x22:Free Memory: 9.41GiB
19:10:48:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
19:10:48:WU00:FS01:0x22: OS Version: 6.2
19:10:48:WU00:FS01:0x22:Has Battery: false
19:10:48:WU00:FS01:0x22: On Battery: false
19:10:48:WU00:FS01:0x22: UTC Offset: -8
19:10:48:WU00:FS01:0x22:        PID: 3988
19:10:48:WU00:FS01:0x22:        CWD: C:\Users\user\AppData\Roaming\FAHClient\work
19:10:48:WU00:FS01:0x22:************************************ OpenMM ************************************
19:10:48:WU00:FS01:0x22:   Revision: 189320d0
19:10:48:WU00:FS01:0x22:********************************************************************************
19:10:48:WU00:FS01:0x22:Project: 13434 (Run 104, Clone 3, Gen 45)
19:10:48:WU00:FS01:0x22:Unit: 0x00000000000000000000000000000000
19:10:48:WU00:FS01:0x22:Digital signatures verified
19:10:48:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
19:10:48:WU00:FS01:0x22:Version 0.0.13
19:10:48:WU00:FS01:0x22:  Checkpoint write interval: 250000 steps (5%) [20 total]
19:10:48:WU00:FS01:0x22:  JSON viewer frame write interval: 50000 steps (1%) [100 total]
19:10:48:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (5%) [20 total]
19:10:48:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
19:10:48:WU00:FS01:0x22:There are 4 platforms available.
19:10:48:WU00:FS01:0x22:Platform 0: Reference
19:10:48:WU00:FS01:0x22:Platform 1: CPU
19:10:48:WU00:FS01:0x22:Platform 2: OpenCL
19:10:48:WU00:FS01:0x22:  opencl-device 0 specified
19:10:48:WU00:FS01:0x22:Platform 3: CUDA
19:10:48:WU00:FS01:0x22:  cuda-device 0 specified
19:10:54:WU00:FS01:0x22:Attempting to create CUDA context:
19:10:54:WU00:FS01:0x22:  Configuring platform CUDA
19:10:56:WU00:FS01:0x22:  Using CUDA and gpu 0
19:10:56:WU00:FS01:0x22:Completed 3500000 out of 5000000 steps (70%)
19:13:35:WU00:FS01:0x22:Completed 3550000 out of 5000000 steps (71%)
19:16:15:WU00:FS01:0x22:Completed 3600000 out of 5000000 steps (72%)
19:18:54:WU00:FS01:0x22:Completed 3650000 out of 5000000 steps (73%)
19:21:34:WU00:FS01:0x22:Completed 3700000 out of 5000000 steps (74%)
19:24:13:WU00:FS01:0x22:Completed 3750000 out of 5000000 steps (75%)
19:24:14:WU00:FS01:0x22:Checkpoint completed at step 3750000

Code: Select all

18:38:46:******************************* System ********************************
18:38:46:            CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
18:38:46:         CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
18:38:46:           CPUs: 12
18:38:46:         Memory: 15.94GiB
18:38:46:    Free Memory: 9.28GiB
18:38:46:        Threads: WINDOWS_THREADS
18:38:46:     OS Version: 6.2
18:38:46:    Has Battery: false
18:38:46:     On Battery: false
18:38:46:     UTC Offset: -8
18:38:46:            PID: 3568
18:38:46:            CWD: C:\Users\user\AppData\Roaming\FAHClient
18:38:46:  Win32 Service: false
18:38:46:             OS: Windows 10 Enterprise
18:38:46:        OS Arch: AMD64
18:38:46:           GPUs: 2
18:38:46:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 TU104 [GeForce RTX 2070 SUPER]
18:38:46:                 8218
18:38:46:          GPU 1: Bus:2 Slot:0 Func:0 NVIDIA:5 GM204 [GeForce GTX 970] 3494
18:38:46:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:7.5 Driver:11.2
18:38:46:  CUDA Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:5.2 Driver:11.2
18:38:46:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:461.9
18:38:46:OpenCL Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:1.2 Driver:461.9
18:38:46:***********************************************************************
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 13434/13435 Stalled Work Units

Post by PantherX »

Welcome to the F@H Forum Dotious,

Do you know what you (or your system) were doing just before you encountered the "WU_STALLED (127 = 0x7f)" error? Is this something that you can reproduce on demand? If yes, what exact steps do you take?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Dotious
Posts: 4
Joined: Sat Dec 12, 2020 1:00 am

Re: 13434/13435 Stalled Work Units

Post by Dotious »

Thanks for the welcome, PantherX,

These have all occurred while I was away from the computer, so I'm not sure if something occurred in the background. I haven't tried to reproduce on demand. I run folding@home at a throttled power setting (70%) using the EVGA X1 app. This keeps the clock near stock speeds. I've folded for a few months with this power setting and haven't encountered these stalls until the last few days. These are larger WUs, do you think that throttling might affect it?

I haven't received anymore of these projects since I posted. All WUs I have received in the last day have completed without issues. I'll keep an eye on the logs and see if it pops up again.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 13434/13435 Stalled Work Units

Post by PantherX »

If you're reducing the power target for the GPU, then it shouldn't have any impact on WUs as long as the target is within specifications. If it is too low, I think the GPU would become unstable so as long as it is within normal range, it would be fine.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Dotious
Posts: 4
Joined: Sat Dec 12, 2020 1:00 am

Re: 13434/13435 Stalled Work Units

Post by Dotious »

Looks like I may have been a bit hasty singling out these projects. I had two other work units from other projects give the "WU_stalled" flag last night. I forgot to note which projects. Now I'm thinking it may have been a driver issue with the latest from Nvidia (461.09). I reverted back to 460.89 last night and I haven't had any stall since then. I'll keep an eye on my logs the next few days.

Good news is all WUs ended up finishing and returning results, even if they took an extra 30-60 minutes each.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 13434/13435 Stalled Work Units

Post by PantherX »

Humm, I really want to update the Nvidia drivers to get the Security fix but they haven't updated the Studio Driver (460.89) so I don't know if that's a driver issue or not :(

I haven't seen any issues being reported with the Game Ready Driver (461.09 WHQL) so at this point, I would say that it's unlikely. However, you may consider doing a fresh installation where you select the option to do a clean install. Alternatively, you can use NVCleanstall (https://www.techpowerup.com/download/te ... leanstall/) which I haven't used but heard good things about it.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Post Reply