WU stalled...after it was done?

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

Post Reply
thecomputerdude
Posts: 6
Joined: Sun Nov 22, 2009 7:27 am

WU stalled...after it was done?

Post by thecomputerdude »

Code: Select all

06:04:31:WU01:FS00:0xa7:Completed 249557 out of 250000 steps (99%)
06:25:07:WU01:FS00:0xa7:Completed 250000 out of 250000 steps (100%)
06:35:07:WARNING:WU01:FS00:FahCore returned an unknown error code which probably indicates that it crashed
06:35:07:WARNING:WU01:FS00:FahCore returned: WU_STALLED (127 = 0x7f)
But a simple pause/restart cleared the error (it was left stuck in queue for a few hours):

Code: Select all

09:59:31:FS00:Unpaused
09:59:31:WU01:FS00:Starting
09:59:31:WU01:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Administrator\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 01 -suffix 01 -version 706 -lifeline 5868 -checkpoint 15 -np 24
09:59:31:WU01:FS00:Started FahCore on PID 1612
09:59:31:WU01:FS00:Core PID:7020
09:59:31:WU01:FS00:FahCore 0xa7 started
09:59:32:WU01:FS00:0xa7:*********************** Log Started 2020-04-21T09:59:31Z ***********************
09:59:32:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
09:59:32:WU01:FS00:0xa7:       Type: 0xa7
09:59:32:WU01:FS00:0xa7:       Core: Gromacs
09:59:32:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 1612 -checkpoint 15 -np
09:59:32:WU01:FS00:0xa7:             24
09:59:32:WU01:FS00:0xa7:************************************ CBang *************************************
09:59:32:WU01:FS00:0xa7:       Date: Oct 26 2019
09:59:32:WU01:FS00:0xa7:       Time: 01:38:25
09:59:32:WU01:FS00:0xa7:   Revision: c46a1a011a24143739ac7218c5a435f66777f62f
09:59:32:WU01:FS00:0xa7:     Branch: master
09:59:32:WU01:FS00:0xa7:   Compiler: Visual C++ 2008
09:59:32:WU01:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
09:59:32:WU01:FS00:0xa7:   Platform: win32 10
09:59:32:WU01:FS00:0xa7:       Bits: 64
09:59:32:WU01:FS00:0xa7:       Mode: Release
09:59:32:WU01:FS00:0xa7:************************************ System ************************************
09:59:32:WU01:FS00:0xa7:        CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
09:59:32:WU01:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 63 Stepping 2
09:59:32:WU01:FS00:0xa7:       CPUs: 24
09:59:32:WU01:FS00:0xa7:     Memory: 127.31GiB
09:59:32:WU01:FS00:0xa7:Free Memory: 123.93GiB
09:59:32:WU01:FS00:0xa7:    Threads: WINDOWS_THREADS
09:59:32:WU01:FS00:0xa7: OS Version: 6.2
09:59:32:WU01:FS00:0xa7:Has Battery: false
09:59:32:WU01:FS00:0xa7: On Battery: false
09:59:32:WU01:FS00:0xa7: UTC Offset: -7
09:59:32:WU01:FS00:0xa7:        PID: 7020
09:59:32:WU01:FS00:0xa7:        CWD: C:\Users\Administrator\AppData\Roaming\FAHClient\work
09:59:32:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
09:59:32:WU01:FS00:0xa7:    Version: 0.0.18
09:59:32:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
09:59:32:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
09:59:32:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
09:59:32:WU01:FS00:0xa7:       Date: Oct 26 2019
09:59:32:WU01:FS00:0xa7:       Time: 01:52:30
09:59:32:WU01:FS00:0xa7:   Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
09:59:32:WU01:FS00:0xa7:     Branch: master
09:59:32:WU01:FS00:0xa7:   Compiler: Visual C++ 2008
09:59:32:WU01:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
09:59:32:WU01:FS00:0xa7:   Platform: win32 10
09:59:32:WU01:FS00:0xa7:       Bits: 64
09:59:32:WU01:FS00:0xa7:       Mode: Release
09:59:32:WU01:FS00:0xa7:************************************ Build *************************************
09:59:32:WU01:FS00:0xa7:       SIMD: avx_256
09:59:32:WU01:FS00:0xa7:********************************************************************************
09:59:32:WU01:FS00:0xa7:Project: 16802 (Run 5, Clone 290, Gen 3)
09:59:32:WU01:FS00:0xa7:Unit: 0x0000000581d59d695e95d85d111a5faa
09:59:32:WU01:FS00:0xa7:Digital signatures verified
09:59:32:WU01:FS00:0xa7:Calling: mdrun -s frame3.tpr -o frame3.trr -cpi state.cpt -cpt 15 -nt 24
09:59:37:WU01:FS00:0xa7:Steps: first=750000 total=250000
09:59:42:WU01:FS00:0xa7:Completed 250001 out of 250000 steps (100%)
09:59:42:WU01:FS00:0xa7:Saving result file ..\logfile_01.txt
09:59:42:WU01:FS00:0xa7:Saving result file ener.edr
09:59:42:WU01:FS00:0xa7:Saving result file frame3.trr
09:59:42:WU01:FS00:0xa7:Saving result file md.log
09:59:42:WU01:FS00:0xa7:Saving result file science.log
09:59:42:WU01:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
09:59:43:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
09:59:43:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:16802 run:5 clone:290 gen:3 core:0xa7 unit:0x0000000581d59d695e95d85d111a5faa
Is this just buggy behavior or something that I can correct? It's my first time seeing it happen (ever), on a new client/machine.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: WU stalled...after it was done?

Post by PantherX »

Welcome back thecomputerdude,

Do you have any Anti-virus/Anti-malware/anti-spyware software that could potentially "stall" the WU? Add exceptions to those security software. I am glad that the WU wasn't lost and it managed to finish up. Also, it might worth checking the HDD/SSD for any issues.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
thecomputerdude
Posts: 6
Joined: Sun Nov 22, 2009 7:27 am

Re: WU stalled...after it was done?

Post by thecomputerdude »

PantherX wrote:Welcome back thecomputerdude,

Do you have any Anti-virus/Anti-malware/anti-spyware software that could potentially "stall" the WU? Add exceptions to those security software. I am glad that the WU wasn't lost and it managed to finish up. Also, it might worth checking the HDD/SSD for any issues.
No AV software installed, running Windows Server 2016 Datacenter (with nothing extra configured over a normal desktop instance). Lenovo IMM is reporting all good on I/O with no controller or drive (Micron S630DC) errors.

Image

And thanks for the welcome, it's been so long that I didn't realize I had an account until I went to make an account again :lol:
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: WU stalled...after it was done?

Post by PantherX »

If this was the first time it happened, I won't worry about it too much. However, if it happens again, then it might be something to investigate :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: WU stalled...after it was done?

Post by bruce »

PantherX wrote:I am glad that the WU wasn't lost and it managed to finish up.
I am too. This used to be a bug and I complained about it. Apparently it has been fixed.

If the last checkpoint was at, say, 98% and the WU progressed to 100%, it immediately went into the code that compresses, zips, signs and writes the results for uploading. If an error occured during those steps, the 98% checkpoint had been deleted and the WU was LOST. It looks like maybe somebody rewrote that part of the process so a recovery was possible.
Post Reply