Page 1 of 1

11403 stuck at 99%

Posted: Sat Nov 19, 2016 3:56 pm
by dschief
Have 11403 2,9,327 stuck in work queue 01 status =ready ETA 12 min 9 sec , points at 45960 & dropping
its been that way for several hours. Looks like it got that far & just moved onto the next wu.
Box has 2 GTX760's & they're busy crunching other wu's. This is a new one for me??

Re: 11403 stuck at 99%

Posted: Sat Nov 19, 2016 10:22 pm
by bruce
Please post the applicable portions of you log. We also need to know more about the hardware that FAH detected and what configuration settings are in use.

Code: Select all

******************************* System ********************************
CPU
CPU ID
CPUs: 
Memory: 
Free Memory: 
Threads:
OS Version: 
Has Battery:
On Battery:
UTC Offset: 
PID: 
CWD:
OS: Windows 7 Ultimate
OS Arch:
GPUs: 2
GPU 0: Bus:? Slot:? NVIDIA: [GTX 760]
GPU 1: Bus:? Slot:? NVIDIA: [GTX 760]
CUDA:
***********************************************************************
<config> 
...
...
</config>
Enabled folding slot 00: READY ? ? ?
Enabled folding slot 01: READY ? ? ?
Enabled folding slot 02: READY ? ? ?
***********************************************************************

Which WUs are you talking about.

FYI: Projects (almost) never get stuck at 99% ...they do have some kind of error at an earlier time and the progress display continues to progress until it reaches 99%.

Re: 11403 stuck at 99%

Posted: Sun Nov 20, 2016 2:04 am
by dschief
one wu. specs in first post. was at 99% did succesful download of next wu, then threw an error.

Code: Select all

11:08:27:WU01:FS01:0x21:Completed 4900000 out of 5000000 steps (98%)
11:10:03:WU00:FS02:0x21:Completed 3550000 out of 5000000 steps (71%)
11:19:54:WU01:FS01:0x21:Completed 4950000 out of 5000000 steps (99%)
11:19:55:WU02:FS01:Connecting to 171.67.108.45:80
11:19:55:WU02:FS01:Assigned to work server 140.163.4.231
11:19:55:WU02:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GK104 [GeForce GTX 760] from 140.163.4.231
11:19:55:WU02:FS01:Connecting to 140.163.4.231:8080
11:19:56:WU02:FS01:Downloading 16.95MiB
11:20:02:WU02:FS01:Download 10.33%
11:20:08:WU02:FS01:Download 21.02%
11:20:14:WU02:FS01:Download 32.09%
11:20:20:WU02:FS01:Download 42.78%
11:20:26:WU02:FS01:Download 53.11%
11:20:32:WU02:FS01:Download 64.18%
11:20:38:WU02:FS01:Download 74.87%
11:20:44:WU02:FS01:Download 85.94%
11:20:50:WU02:FS01:Download 96.27%
11:20:51:WU02:FS01:Download complete
11:20:52:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:11710 run:3 clone:133 gen:18 core:0x21 unit:0x0000001c8ca304e75814df4ca4990d45
11:23:27:WU00:FS02:0x21:Completed 3600000 out of 5000000 steps (72%)
11:31:16:WARNING:WU01:FS01:FahCore returned an unknown error code which probably indicates that it crashed
11:31:16:WARNING:WU01:FS01:FahCore returned: UNKNOWN_ENUM (127 = 0x7f)
system info:

Code: Select all

17:28:23:******************************** Build ********************************
17:28:23:      Version: 7.4.4
17:28:23:         Date: Mar 4 2014
17:28:23:         Time: 20:26:54
17:28:23:      SVN Rev: 4130
17:28:23:       Branch: fah/trunk/client
17:28:23:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
17:28:23:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
17:28:23:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
17:28:23:     Platform: win32 XP
17:28:23:         Bits: 32
17:28:23:         Mode: Release
17:28:23:******************************* System ********************************
17:28:23:          CPU: Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz
17:28:23:       CPU ID: GenuineIntel Family 6 Model 15 Stepping 2
17:28:23:         CPUs: 2
17:28:23:       Memory: 4.00GiB
17:28:23:  Free Memory: 2.90GiB
17:28:23:      Threads: WINDOWS_THREADS
17:28:23:   OS Version: 6.1
17:28:23:  Has Battery: false
17:28:23:   On Battery: false
17:28:23:   UTC Offset: -8
17:28:23:          PID: 3672
17:28:23:          CWD: C:/Users/jim/AppData/Roaming/FAHClient
17:28:23:           OS: Windows 7 Professional
17:28:23:      OS Arch: AMD64
17:28:23:         GPUs: 2
17:28:23:        GPU 0: NVIDIA:3 GK104 [GeForce GTX 760]
17:28:23:        GPU 1: NVIDIA:3 GK104 [GeForce GTX 760]
17:28:23:         CUDA: 3.0
17:28:23:  CUDA Driver: 8000
17:28:23:Win32 Service: false
17:28:23:***********************************************************************
17:28:23:<config>
17:28:23:  <!-- Folding Slots -->
17:28:23:</config>
17:28:23:Connecting to assign-GPU.stanford.edu:80
17:28:23:Updated GPUs.txt
17:28:23:Read GPUs.txt
17:28:23:Trying to access database...
17:28:23:Successfully acquired database lock
17:28:23:Enabled folding slot 00: PAUSED cpu:1 (not configured)
17:28:23:Enabled folding slot 01: PAUSED gpu:0:GK104 [GeForce GTX 760] (not configured)
17:28:23:Enabled folding slot 02: PAUSED gpu:1:GK104 [GeForce GTX 760] (not configured)
17:29:24:Saving configuration to config.xml
17:29:24:<config>

Re: 11403 stuck at 99%

Posted: Mon Nov 21, 2016 2:34 am
by bruce
I wouldn't call that "stuck" or "stalled" It does look like an error at the precise time when it started to write the final data.

11:08:27:WU01:FS01:0x21:Completed 4900000 out of 5000000 steps (98%)
11:19:54:WU01:FS01:0x21:Completed 4950000 out of 5000000 steps (99%)
11:31:16:WARNING:WU01:FS01:FahCore returned an unknown error code which probably indicates that it crashed
11:31:16:WARNING:WU01:FS01:FahCore returned: UNKNOWN_ENUM (127 = 0x7f)

11:19:54 - 11:08:27 = 11m 27s
11:31:16 - 11:19:54 = 11m 22s

Personally, I've got some application (probably not FAH) on one machine that gradually fills up all of RAM, and the memory leak crashes my system in any number of different ways. One possible explanation your RAM was (almost) full but there was enough room to process FAH but not enough room when it tried to load the code that compresses the results for uploading. (Something like that would probably cause a crash called am unknown error.)

Was there anything in the Windows event log?

Did FAH continue to run after that error?