Mixed AMD/nVidia system - FINISHED_UNIT == UNKNOWN_ENUM??

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Mixed AMD/nVidia system - FINISHED_UNIT == UNKNOWN_ENUM??

Postby FalconFour » Fri Mar 27, 2020 8:03 pm

I've got a laptop with an Intel integrated GPU, nVidia discrete GPU, and an AMD GPU over Thunderbolt 3. I spent a good chunk of time last night trying to get the two to co-exist: saga and logs etc here: https://www.reddit.com/r/foldingathome/ ... _now_with/ - old thread revived due to Reddit missing its moderators.

After getting it working, I set off to bed. Woke up this morning to some VERY strange log messages. This is from the AMD slot:
Code: Select all
13:20:06:WU00:FS01:0x22:Completed 1760000 out of 2000000 steps (88%)
13:22:20:WU00:FS01:0x22:Completed 1780000 out of 2000000 steps (89%)
13:24:35:WU00:FS01:0x22:Completed 1800000 out of 2000000 steps (90%)
13:26:57:WU00:FS01:0x22:Completed 1820000 out of 2000000 steps (91%)
13:29:13:WU00:FS01:0x22:Completed 1840000 out of 2000000 steps (92%)
13:31:43:WU00:FS01:0x22:Completed 1860000 out of 2000000 steps (93%)
13:33:58:WU00:FS01:0x22:Completed 1880000 out of 2000000 steps (94%)
13:36:13:WU00:FS01:0x22:Completed 1900000 out of 2000000 steps (95%)
13:38:37:WU00:FS01:0x22:Completed 1920000 out of 2000000 steps (96%)
13:40:53:WU00:FS01:0x22:Completed 1940000 out of 2000000 steps (97%)
13:43:14:WU00:FS01:0x22:Completed 1960000 out of 2000000 steps (98%)
13:45:32:WU00:FS01:0x22:Completed 1980000 out of 2000000 steps (99%)
13:45:32:WU01:FS01:Connecting to 65.254.110.245:8080
13:45:32:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
13:45:32:WU01:FS01:Connecting to 18.218.241.186:80
13:45:33:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
13:45:33:ERROR:WU01:FS01:Exception: Could not get an assignment
13:45:33:WU01:FS01:Connecting to 65.254.110.245:8080
13:45:33:WU01:FS01:Assigned to work server 40.114.52.201
13:45:33:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:1:Fiji XT [Radeon R9 Fury X] from 40.114.52.201
13:45:33:WU01:FS01:Connecting to 40.114.52.201:8080
13:45:55:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
13:45:55:WU01:FS01:Connecting to 40.114.52.201:80
13:47:18:ERROR:WU01:FS01:Exception: 10002: Received short response, expected 512 bytes, got 0
13:47:18:WU01:FS01:Connecting to 65.254.110.245:8080
13:47:19:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
13:47:19:WU01:FS01:Connecting to 18.218.241.186:80
13:47:19:WU01:FS01:Assigned to work server 128.252.203.10
13:47:19:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:1:Fiji XT [Radeon R9 Fury X] from 128.252.203.10
13:47:19:WU01:FS01:Connecting to 128.252.203.10:8080
13:47:41:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
13:47:41:WU01:FS01:Connecting to 128.252.203.10:80
13:47:59:WU00:FS01:0x22:Completed 2000000 out of 2000000 steps (100%)
13:48:01:ERROR:WU01:FS01:Exception: 10002: Received short response, expected 512 bytes, got 0
13:48:04:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
13:48:04:WU00:FS01:0x22:Saving result file checkpointState.xml
13:48:07:WU00:FS01:0x22:Saving result file checkpt.crc
13:48:07:WU00:FS01:0x22:Saving result file positions.xtc
13:48:10:WU00:FS01:0x22:Saving result file science.log
13:48:10:WU00:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
13:48:13:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
13:48:13:WARNING:WU00:FS01:FahCore returned: UNKNOWN_ENUM (-1073740791 = 0xc0000409)
13:48:14:WU00:FS01:Starting
13:48:14:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" [...]\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 705 -lifeline 12988 -checkpoint 15 -gpu-vendor amd -opencl-platform 1 -opencl-device 0 -gpu 0
13:48:14:WU00:FS01:Started FahCore on PID 1260
13:48:14:WU00:FS01:Core PID:9952
13:48:14:WU00:FS01:FahCore 0x22 started
13:48:14:WU00:FS01:0x22:*********************** Log Started 2020-03-27T13:48:14Z ***********************
13:48:14:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
13:48:14:WU00:FS01:0x22:       Type: 0x22
13:48:14:WU00:FS01:0x22:       Core: Core22
13:48:14:WU00:FS01:0x22:    Website: https://foldingathome.org/
13:48:14:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
13:48:14:WU00:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
13:48:14:WU00:FS01:0x22:             <rafal.wiewiora@choderalab.org>
13:48:14:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 705 -lifeline 1260 -checkpoint 15
13:48:14:WU00:FS01:0x22:             -gpu-vendor amd -opencl-platform 1 -opencl-device 0 -gpu 0
13:48:14:WU00:FS01:0x22:     Config: <none>
13:48:14:WU00:FS01:0x22:************************************ Build *************************************
13:48:14:WU00:FS01:0x22:    Version: 0.0.2
13:48:14:WU00:FS01:0x22:       Date: Dec 6 2019
13:48:14:WU00:FS01:0x22:       Time: 21:30:31
13:48:14:WU00:FS01:0x22: Repository: Git
13:48:14:WU00:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
13:48:14:WU00:FS01:0x22:     Branch: HEAD
13:48:14:WU00:FS01:0x22:   Compiler: Visual C++ 2008
13:48:14:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
13:48:14:WU00:FS01:0x22:   Platform: win32 10
13:48:14:WU00:FS01:0x22:       Bits: 64
13:48:14:WU00:FS01:0x22:       Mode: Release
13:48:14:WU00:FS01:0x22:************************************ System ************************************
13:48:14:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
13:48:14:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
13:48:14:WU00:FS01:0x22:       CPUs: 8
13:48:14:WU00:FS01:0x22:     Memory: 15.88GiB
13:48:14:WU00:FS01:0x22:Free Memory: 10.41GiB
13:48:14:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
13:48:14:WU00:FS01:0x22: OS Version: 6.2
13:48:14:WU00:FS01:0x22:Has Battery: true
13:48:14:WU00:FS01:0x22: On Battery: false
13:48:14:WU00:FS01:0x22: UTC Offset: -7
13:48:14:WU00:FS01:0x22:        PID: 9952
13:48:14:WU00:FS01:0x22:        CWD: [...]\FAHClient\work
13:48:14:WU00:FS01:0x22:         OS: Windows 10 Pro
13:48:14:WU00:FS01:0x22:    OS Arch: AMD64
13:48:14:WU00:FS01:0x22:********************************************************************************
13:48:14:WU00:FS01:0x22:Project: 11748 (Run 0, Clone 6851, Gen 3)
13:48:14:WU00:FS01:0x22:Unit: 0x000000058ca304e75e6bb11c3a3521df
13:48:14:WU00:FS01:0x22:Reading tar file core.xml
13:48:14:WU00:FS01:0x22:Reading tar file integrator.xml
13:48:14:WU00:FS01:0x22:Reading tar file state.xml
13:48:16:WU00:FS01:0x22:Reading tar file system.xml
13:48:16:WU00:FS01:0x22:Digital signatures verified
13:48:16:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
13:48:16:WU00:FS01:0x22:Version 0.0.2
13:48:56:WU01:FS01:Connecting to 65.254.110.245:8080
13:48:56:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
13:48:56:WU01:FS01:Connecting to 18.218.241.186:80
13:48:56:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
13:48:56:ERROR:WU01:FS01:Exception: Could not get an assignment
13:51:33:WU01:FS01:Connecting to 65.254.110.245:8080
13:51:33:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
13:51:33:WU01:FS01:Connecting to 18.218.241.186:80
13:51:33:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
13:51:33:ERROR:WU01:FS01:Exception: Could not get an assignment
13:55:47:WU01:FS01:Connecting to 65.254.110.245:8080
13:55:47:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
13:55:47:WU01:FS01:Connecting to 18.218.241.186:80


This happened twice in the same way. This is the earlier one:
Code: Select all
13:31:43:WU00:FS01:0x22:Completed 1860000 out of 2000000 steps (93%)
13:33:58:WU00:FS01:0x22:Completed 1880000 out of 2000000 steps (94%)
13:36:13:WU00:FS01:0x22:Completed 1900000 out of 2000000 steps (95%)
13:38:37:WU00:FS01:0x22:Completed 1920000 out of 2000000 steps (96%)
13:40:53:WU00:FS01:0x22:Completed 1940000 out of 2000000 steps (97%)
13:43:14:WU00:FS01:0x22:Completed 1960000 out of 2000000 steps (98%)
13:45:32:WU00:FS01:0x22:Completed 1980000 out of 2000000 steps (99%)
13:45:32:WU01:FS01:Connecting to 65.254.110.245:8080
13:45:32:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
13:45:32:WU01:FS01:Connecting to 18.218.241.186:80
13:45:33:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
13:45:33:ERROR:WU01:FS01:Exception: Could not get an assignment
13:45:33:WU01:FS01:Connecting to 65.254.110.245:8080
13:45:33:WU01:FS01:Assigned to work server 40.114.52.201
13:45:33:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:1:Fiji XT [Radeon R9 Fury X] from 40.114.52.201
13:45:33:WU01:FS01:Connecting to 40.114.52.201:8080
13:45:55:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
13:45:55:WU01:FS01:Connecting to 40.114.52.201:80
13:47:18:ERROR:WU01:FS01:Exception: 10002: Received short response, expected 512 bytes, got 0
13:47:18:WU01:FS01:Connecting to 65.254.110.245:8080
13:47:19:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
13:47:19:WU01:FS01:Connecting to 18.218.241.186:80
13:47:19:WU01:FS01:Assigned to work server 128.252.203.10
13:47:19:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:1:Fiji XT [Radeon R9 Fury X] from 128.252.203.10
13:47:19:WU01:FS01:Connecting to 128.252.203.10:8080
13:47:41:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
13:47:41:WU01:FS01:Connecting to 128.252.203.10:80
13:47:59:WU00:FS01:0x22:Completed 2000000 out of 2000000 steps (100%)
13:48:01:ERROR:WU01:FS01:Exception: 10002: Received short response, expected 512 bytes, got 0
13:48:04:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
13:48:04:WU00:FS01:0x22:Saving result file checkpointState.xml
13:48:07:WU00:FS01:0x22:Saving result file checkpt.crc
13:48:07:WU00:FS01:0x22:Saving result file positions.xtc
13:48:10:WU00:FS01:0x22:Saving result file science.log
13:48:10:WU00:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
13:48:13:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
13:48:13:WARNING:WU00:FS01:FahCore returned: UNKNOWN_ENUM (-1073740791 = 0xc0000409)
13:48:14:WU00:FS01:Starting
13:48:14:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" [...]\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 705 -lifeline 12988 -checkpoint 15 -gpu-vendor amd -opencl-platform 1 -opencl-device 0 -gpu 0
13:48:14:WU00:FS01:Started FahCore on PID 1260
13:48:14:WU00:FS01:Core PID:9952
13:48:14:WU00:FS01:FahCore 0x22 started
09:06:31:WU00:FS01:Connecting to 65.254.110.245:8080
09:06:32:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
09:06:32:WU00:FS01:Connecting to 18.218.241.186:80
09:06:32:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
09:06:32:ERROR:WU00:FS01:Exception: Could not get an assignment
09:09:09:WU00:FS01:Connecting to 65.254.110.245:8080
09:09:09:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
09:09:09:WU00:FS01:Connecting to 18.218.241.186:80
09:09:09:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
09:09:09:ERROR:WU00:FS01:Exception: Could not get an assignment
09:13:23:WU00:FS01:Connecting to 65.254.110.245:8080
09:13:23:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
09:13:23:WU00:FS01:Connecting to 18.218.241.186:80


In particular, note this section:
Code: Select all
13:48:07:WU00:FS01:0x22:Saving result file positions.xtc
13:48:10:WU00:FS01:0x22:Saving result file science.log
13:48:10:WU00:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
13:48:13:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
13:48:13:WARNING:WU00:FS01:FahCore returned: UNKNOWN_ENUM (-1073740791 = 0xc0000409)
13:48:14:WU00:FS01:Starting
13:48:14:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" [...]\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 705 -lifeline 12988 -checkpoint 15 -gpu-vendor amd -opencl-platform 1 -opencl-device 0 -gpu 0


For some reason it seems to be taking the FINISHED_UNIT return value and somehow, by inexplicable miracle of nonsense, interpreting that as UNKNOWN_ENUM and asking the core to start work again... which of course fails... but it doesn't do anything with the result. Nothing at all. It seems to just throw it all away, not even sending a result of failure or success to the server.

?Que?

Obligatory system info:
Code: Select all
*********************** Log Started 2020-03-27T09:03:21Z ***********************
09:03:21:************************* Folding@home Client *************************
09:03:21:        Website: https://foldingathome.org/
09:03:21:      Copyright: (c) 2009-2018 foldingathome.org
09:03:21:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
09:03:21:           Args:
09:03:21:         Config: [...]\FAHClient\config.xml
09:03:21:******************************** Build ********************************
09:03:21:        Version: 7.5.1
09:03:21:           Date: May 11 2018
09:03:21:           Time: 13:06:32
09:03:21:     Repository: Git
09:03:21:       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
09:03:21:         Branch: master
09:03:21:       Compiler: Visual C++ 2008
09:03:21:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
09:03:21:       Platform: win32 10
09:03:21:           Bits: 32
09:03:21:           Mode: Release
09:03:21:******************************* System ********************************
09:03:21:            CPU: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
09:03:21:         CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
09:03:21:           CPUs: 8
09:03:21:         Memory: 15.88GiB
09:03:21:    Free Memory: 8.13GiB
09:03:21:        Threads: WINDOWS_THREADS
09:03:21:     OS Version: 6.2
09:03:21:    Has Battery: true
09:03:21:     On Battery: false
09:03:21:     UTC Offset: -7
09:03:21:            PID: 12988
09:03:21:            CWD: [...]\FAHClient
09:03:21:             OS: Windows 10 Enterprise
09:03:21:        OS Arch: AMD64
09:03:21:           GPUs: 2
09:03:21:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:4 GM108 [GeForce 940MX]
09:03:21:          GPU 1: Bus:9 Slot:0 Func:0 AMD:5 Fiji XT [Radeon R9 Fury X]
09:03:21:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:5.0 Driver:11.0
09:03:21:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:445.75
09:03:21:OpenCL Device 1: Platform:1 Device:0 Bus:9 Slot:0 Compute:1.2 Driver:2906.10
09:03:21:OpenCL Device 2: Platform:2 Device:0 Bus:NA Slot:NA Compute:2.1 Driver:24.20
09:03:21:  Win32 Service: false
09:03:21:***********************************************************************


FWIW, it had previously been working (using AMD external GPU only), before fighting with getting the nVidia and AMD to both work under the same umbrella. Reinstalled the nVidia driver to get it up and running, then reinstalled the AMD driver second once nVidia was working. This is because AMD and nVidia seem to share an "opencl.dll" file in \windows\system32, which causes a conflict of who "owns" the OpenCL platform. But now that both seem to be working (both have high activity and crunch WUs), the only problem seems to be in how F@H handles it.
FalconFour
 
Posts: 5
Joined: Fri Sep 05, 2008 12:57 pm

Re: Mixed AMD/nVidia system - FINISHED_UNIT == UNKNOWN_ENUM?

Postby foldy » Fri Mar 27, 2020 8:44 pm

I guess that is the problem:
13:48:01:ERROR:WU01:FS01:Exception: 10002: Received short response, expected 512 bytes, got 0

It means the connection to upload server was truncated. Maybe a firewall is in place somehow interrupting the upload?

(On the other hand this seems also a bug in the FAHclient, as it does not retry the failed upload later again...)
foldy
 
Posts: 1865
Joined: Sat Dec 01, 2012 4:43 pm

Re: Mixed AMD/nVidia system - FINISHED_UNIT == UNKNOWN_ENUM?

Postby FalconFour » Fri Mar 27, 2020 9:00 pm

foldy wrote:I guess that is the problem:
13:48:01:ERROR:WU01:FS01:Exception: 10002: Received short response, expected 512 bytes, got 0

It means the connection to upload server was truncated. Maybe a firewall is in place somehow interrupting the upload?

(On the other hand this seems also a bug in the FAHclient, as it does not retry the failed upload later again...)

Pretty sure that's a red herring - it happens all throughout the log and is most likely a log message from the WU downloader (which is running as the WU nears completion), not the uploader. There's definitely no firewall at play here.

CPU slot finished up just fine seconds ago:
Code: Select all
19:48:47:WU04:FS00:0xa7:Completed 455000 out of 500000 steps (91%)
19:49:53:WU04:FS00:0xa7:Completed 460000 out of 500000 steps (92%)
19:50:59:WU04:FS00:0xa7:Completed 465000 out of 500000 steps (93%)
19:52:04:WU04:FS00:0xa7:Completed 470000 out of 500000 steps (94%)
19:53:09:WU04:FS00:0xa7:Completed 475000 out of 500000 steps (95%)
19:54:17:WU04:FS00:0xa7:Completed 480000 out of 500000 steps (96%)
19:55:24:WU04:FS00:0xa7:Completed 485000 out of 500000 steps (97%)
19:56:36:WU04:FS00:0xa7:Completed 490000 out of 500000 steps (98%)
19:57:47:WU04:FS00:0xa7:Completed 495000 out of 500000 steps (99%)
19:57:48:WU02:FS00:Connecting to 65.254.110.245:8080
19:57:48:WARNING:WU02:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
19:57:48:WU02:FS00:Connecting to 18.218.241.186:80
19:57:49:WARNING:WU02:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
19:57:49:ERROR:WU02:FS00:Exception: Could not get an assignment
19:57:49:WU02:FS00:Connecting to 65.254.110.245:8080
19:57:49:WU02:FS00:Assigned to work server 155.247.164.213
19:57:49:WU02:FS00:Requesting new work unit for slot 00: RUNNING cpu:6 from 155.247.164.213
19:57:49:WU02:FS00:Connecting to 155.247.164.213:8080
19:57:49:WU02:FS00:Downloading 1.24MiB
19:57:50:WU02:FS00:Download complete
19:57:50:WU02:FS00:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:14308 run:2 clone:125 gen:23 core:0xa7 unit:0x0000001d9bf7a4d55e66c409dd06404b
19:58:56:WU04:FS00:0xa7:Completed 500000 out of 500000 steps (100%)
19:58:57:WU04:FS00:0xa7:Saving result file ..\logfile_01.txt
19:58:57:WU04:FS00:0xa7:Saving result file frame23.trr
19:58:57:WU04:FS00:0xa7:Saving result file md.log
19:58:57:WU04:FS00:0xa7:Saving result file pullf.xvg
19:58:57:WU04:FS00:0xa7:Saving result file pullx.xvg
19:58:57:WU04:FS00:0xa7:Saving result file science.log
19:58:57:WU04:FS00:0xa7:Saving result file traj_comp.xtc
19:58:57:WU04:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
19:58:58:WU04:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
19:58:58:WU04:FS00:Sending unit results: id:04 state:SEND error:NO_ERROR project:14198 run:4 clone:241 gen:23 core:0xa7 unit:0x0000001b9bf7a4d55e66cfb46140bc6f
19:58:58:WU04:FS00:Uploading 2.64MiB to 155.247.164.213
19:58:58:WU02:FS00:Starting
19:58:58:WU04:FS00:Connecting to 155.247.164.213:8080
19:58:58:WU02:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" [...]\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 02 -suffix 01 -version 705 -lifeline 12988 -checkpoint 15 -np 6
19:58:58:WU02:FS00:Started FahCore on PID 14008
19:58:58:WU02:FS00:Core PID:6128
19:58:58:WU02:FS00:FahCore 0xa7 started
FalconFour
 
Posts: 5
Joined: Fri Sep 05, 2008 12:57 pm

Re: Mixed AMD/nVidia system - FINISHED_UNIT == UNKNOWN_ENUM?

Postby Joe_H » Sat Mar 28, 2020 6:53 am

FalconFour wrote:
Code: Select all
13:48:07:WU00:FS01:0x22:Saving result file positions.xtc
13:48:10:WU00:FS01:0x22:Saving result file science.log
13:48:10:WU00:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
13:48:13:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
13:48:13:WARNING:WU00:FS01:FahCore returned: UNKNOWN_ENUM (-1073740791 = 0xc0000409)
13:48:14:WU00:FS01:Starting
13:48:14:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" [...]\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 705 -lifeline 12988 -checkpoint 15 -gpu-vendor amd -opencl-platform 1 -opencl-device 0 -gpu 0


FalconFour wrote:For some reason it seems to be taking the FINISHED_UNIT return value and somehow, by inexplicable miracle of nonsense, interpreting that as UNKNOWN_ENUM and asking the core to start work again... which of course fails... but it doesn't do anything with the result. Nothing at all. It seems to just throw it all away, not even sending a result of failure or success to the server.


The FINISHED_UNIT is the red herring, the real cause is shown in the next two lines:
Code: Select all
13:48:13:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
13:48:13:WARNING:WU00:FS01:FahCore returned: UNKNOWN_ENUM (-1073740791 = 0xc0000409)


The client is reporting a Windows error code passed to it as the folding core crashed. This is during a post completion phase where the WU is being packed up with some essential other files to be sent back to the WS. The log shows those files being read just before the crash.

Someone with a deeper understanding of Windows error codes would have to look at this, I just recognize it for being that.

Depending on what exact files are left in the directory after such a crash, the client will attempt to restart the WU. It may start over at the very beginning, at the last checkpoint, or dump it as a corrupted WU.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6107
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA


Return to GPU Projects and FahCores

Who is online

Users browsing this forum: No registered users and 2 guests

cron