GPU stopped working ...

Moderators: Site Moderators, FAHC Science Team

Post Reply
TomTiddler167
Posts: 6
Joined: Wed Sep 30, 2020 4:32 pm

GPU stopped working ...

Post by TomTiddler167 »

I'm running on an Intel NUC NUC8i7HVK (Hades Canyon), which has both Intel Iris 630, and AMD RX Vega M XT/ M GH GPU's. I only configure the RX Vega in the BIOS. Up until Monday, FAH worked flawlessly, running slots on both my CPU, and on the GPU (AMD Vega). On Monday, Intel updated the Iris 630 driver, and somehow this rendered both GPU's to be un-detectable by FAH. I have reverted the driver, and now everything is working again. Sigh.

I'm fairly new to the kinds of information you will need to investigate further. Can you give me some clues please?
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU stopped working ...

Post by bruce »

(See below in my Signature block)

FAH puts a lot of information at the top of it's log file showing us how the system was configured during initialization. Your brief description of the configuration is helpful, but there are lots of other things that may or may not be useful in isolating your problem and I can't predict which ones we'll need. Asking a lot of questions would get us to the same result but that requires many more interactions.

FAH has had difficulties (such as yours) when there are multiple OpenCL drivers and the Development department is working on updates to FAHControl which may deal with your issue, but they're not ready to release so we'll probably have to submit your case to them to see if a fix is already pending or needs to be revised. The top of the old log showing the failure would potentially be useful ,,, and maybe we can fix it manually.

Do you have any specific apps that use OpenCL on the iGPU?

Previous log files are retained in the logs subdirectory of the FAH data directory or you can update and unupdate again.
TomTiddler167
Posts: 6
Joined: Wed Sep 30, 2020 4:32 pm

Re: GPU stopped working ...

Post by TomTiddler167 »

Thanks "bruce" for the info. Unfortunately, I'm a bit pressed for time over the next 2 days, and since the Intel driver installation doesn't permit a "rollback" on the drivers (Sigh), I have to do a complete system restore to get back to a working FAH client after testing the new driver. I'll have some info for it all at the weekend.
TomTiddler167
Posts: 6
Joined: Wed Sep 30, 2020 4:32 pm

Re: GPU stopped working ...

Post by TomTiddler167 »

OK, I have the "top" of the logs for the system running with the 2 different drivers ......

Here's the first, the system works with this driver (Intel Driver version 26.20.100.8141 for Iris 630)

Code: Select all

15:44:23:WU01:FS01:0x22:*********************** Log Started 2020-10-06T15:44:23Z ***********************
15:44:23:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
15:44:23:WU01:FS01:0x22:       Core: Core22
15:44:23:WU01:FS01:0x22:       Type: 0x22
15:44:23:WU01:FS01:0x22:    Version: 0.0.13
15:44:23:WU01:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:44:23:WU01:FS01:0x22:  Copyright: 2020 foldingathome.org
15:44:23:WU01:FS01:0x22:   Homepage: https://foldingathome.org/
15:44:23:WU01:FS01:0x22:       Date: Sep 19 2020
15:44:23:WU01:FS01:0x22:       Time: 02:35:58
15:44:23:WU01:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
15:44:23:WU01:FS01:0x22:     Branch: core22-0.0.13
15:44:23:WU01:FS01:0x22:   Compiler: Visual C++ 2015
15:44:23:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
15:44:23:WU01:FS01:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
15:44:23:WU01:FS01:0x22:   Platform: win32 10
15:44:23:WU01:FS01:0x22:       Bits: 64
15:44:23:WU01:FS01:0x22:       Mode: Release
15:44:23:WU01:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
15:44:23:WU01:FS01:0x22:             <peastman@stanford.edu>
15:44:23:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 8292 -checkpoint 15
15:44:23:WU01:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
15:44:23:WU01:FS01:0x22:************************************ libFAH ************************************
15:44:23:WU01:FS01:0x22:       Date: Sep 7 2020
15:44:23:WU01:FS01:0x22:       Time: 19:09:56
15:44:23:WU01:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
15:44:23:WU01:FS01:0x22:     Branch: HEAD
15:44:23:WU01:FS01:0x22:   Compiler: Visual C++ 2015
15:44:23:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
15:44:23:WU01:FS01:0x22:   Platform: win32 10
15:44:23:WU01:FS01:0x22:       Bits: 64
15:44:23:WU01:FS01:0x22:       Mode: Release
15:44:23:WU01:FS01:0x22:************************************ CBang *************************************
15:44:23:WU01:FS01:0x22:       Date: Sep 7 2020
15:44:23:WU01:FS01:0x22:       Time: 19:08:30
15:44:23:WU01:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
15:44:23:WU01:FS01:0x22:     Branch: HEAD
15:44:23:WU01:FS01:0x22:   Compiler: Visual C++ 2015
15:44:23:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
15:44:23:WU01:FS01:0x22:   Platform: win32 10
15:44:23:WU01:FS01:0x22:       Bits: 64
15:44:23:WU01:FS01:0x22:       Mode: Release
15:44:23:WU01:FS01:0x22:************************************ System ************************************
15:44:23:WU01:FS01:0x22:        CPU: Intel(R) Core(TM) i7-8809G CPU @ 3.10GHz
15:44:23:WU01:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
15:44:23:WU01:FS01:0x22:       CPUs: 8
15:44:23:WU01:FS01:0x22:     Memory: 15.92GiB
15:44:23:WU01:FS01:0x22:Free Memory: 8.10GiB
15:44:23:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
15:44:23:WU01:FS01:0x22: OS Version: 6.2
15:44:23:WU01:FS01:0x22:Has Battery: false
15:44:23:WU01:FS01:0x22: On Battery: false
15:44:23:WU01:FS01:0x22: UTC Offset: -4
15:44:23:WU01:FS01:0x22:        PID: 23388
15:44:24:WU01:FS01:0x22:        CWD: C:\Users\iante\AppData\Roaming\FAHClient\work
15:44:24:WU01:FS01:0x22:************************************ OpenMM ************************************
15:44:24:WU01:FS01:0x22:   Revision: 189320d0
15:44:24:WU01:FS01:0x22:********************************************************************************
15:44:24:WU01:FS01:0x22:Project: 16918 (Run 147, Clone 117, Gen 104)
15:44:24:WU01:FS01:0x22:Unit: 0x000000880002894c5f176177ae4d5e77
15:44:24:WU01:FS01:0x22:Reading tar file core.xml
15:44:24:WU01:FS01:0x22:Reading tar file integrator.xml
15:44:24:WU01:FS01:0x22:Reading tar file state.xml
15:44:24:WU01:FS01:0x22:Reading tar file system.xml
15:44:24:WU01:FS01:0x22:Digital signatures verified
15:44:24:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
15:44:24:WU01:FS01:0x22:Version 0.0.13
15:44:24:WU01:FS01:0x22:  Checkpoint write interval: 100000 steps (2%) [50 total]
15:44:24:WU01:FS01:0x22:  JSON viewer frame write interval: 50000 steps (1%) [100 total]
15:44:24:WU01:FS01:0x22:  XTC frame write interval: 250000 steps (5%) [20 total]
15:44:24:WU01:FS01:0x22:  Global context and integrator variables write interval: disabled
15:44:24:WU01:FS01:0x22:There are 3 platforms available.
15:44:24:WU01:FS01:0x22:Platform 0: Reference
15:44:24:WU01:FS01:0x22:Platform 1: CPU
15:44:24:WU01:FS01:0x22:Platform 2: OpenCL
15:44:24:WU01:FS01:0x22:  opencl-device 0 specified
15:44:29:WU00:FS01:Upload 30.95%
15:44:32:WU01:FS01:0x22:Attempting to create OpenCL context:
15:44:32:WU01:FS01:0x22:  Configuring platform OpenCL
15:44:35:WU00:FS01:Upload 74.28%
15:44:40:WU00:FS01:Upload complete
15:44:40:WU00:FS01:Server responded WORK_ACK (400)
15:44:40:WU00:FS01:Final credit estimate, 39220.00 points
15:44:40:WU00:FS01:Cleaning up
15:44:43:WU01:FS01:0x22:  Using OpenCL on platformId 0 and gpu 0
15:44:43:WU01:FS01:0x22:Completed 0 out of 5000000 steps (0%)
15:44:43:WU01:FS01:0x22:Checkpoint completed at step 0
15:46:17:WU02:FS00:0xa8:Completed 295000 out of 500000 steps (59%)
15:49:55:WU02:FS00:0xa8:Completed 300000 out of 500000 steps (60%)
15:54:03:WU02:FS00:0xa8:Completed 305000 out of 500000 steps (61%)
15:56:31:WU01:FS01:0x22:Completed 50000 out of 5000000 steps (1%)
15:57:41:WU02:FS00:0xa8:Completed 310000 out of 500000 steps (62%)
16:01:29:WU02:FS00:0xa8:Completed 315000 out of 500000 steps (63%)
16:05:53:WU02:FS00:0xa8:Completed 320000 out of 500000 steps (64%)
16:10:15:WU02:FS00:0xa8:Completed 325000 out of 500000 steps (65%)
16:10:37:WU01:FS01:0x22:Completed 100000 out of 5000000 steps (2%)
16:10:38:WU01:FS01:0x22:Checkpoint completed at step 100000
16:14:20:WU02:FS00:0xa8:Completed 330000 out of 500000 steps (66%)
16:18:39:WU02:FS00:0xa8:Completed 335000 out of 500000 steps (67%)
16:22:52:WU02:FS00:0xa8:Completed 340000 out of 500000 steps (68%)
16:25:48:WU01:FS01:0x22:Completed 150000 out of 5000000 steps (3%)
16:27:16:WU02:FS00:0xa8:Completed 345000 out of 500000 steps (69%)
16:31:17:WU02:FS00:0xa8:Completed 350000 out of 500000 steps (70%)
___________________________________________________________________

...... and now here's the top of the log from the failing driver (Intel Driver Version 27.20.100.8681 for Iris 630)

Code: Select all

*********************** Log Started 2020-10-06T18:36:38Z ***********************
18:36:38:Trying to access database...
18:36:38:Successfully acquired database lock
18:36:38:Read GPUs.txt
18:36:38:Enabled folding slot 00: PAUSED cpu:7 (by user)
18:36:38:Enabled folding slot 01: PAUSED gpu:0:[Radeon RX Vega M XT/ M GH] (by user)
18:36:38:ERROR:No compute devices matched GPU #0 {
18:36:38:ERROR:  "vendor": 4098,
18:36:38:ERROR:  "device": 26956,
18:36:38:ERROR:  "type": 1,
18:36:38:ERROR:  "species": 5,
18:36:38:ERROR:  "description": "[Radeon RX Vega M XT/ M GH]"
18:36:38:ERROR:}.  You may need to update your graphics drivers.
18:36:38:****************************** FAHClient ******************************
18:36:38:      Version: 7.6.13
18:36:38:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:36:38:    Copyright: 2020 foldingathome.org
18:36:38:     Homepage: https://foldingathome.org/
18:36:38:         Date: Apr 27 2020
18:36:38:         Time: 21:21:01
18:36:38:     Revision: 5a652817f46116b6e135503af97f18e094414e3b
18:36:38:       Branch: master
18:36:38:     Compiler: Visual C++ 2008
18:36:38:      Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
18:36:38:     Platform: win32 10
18:36:38:         Bits: 32
18:36:38:         Mode: Release
18:36:38:       Config: C:\Users\iante\AppData\Roaming\FAHClient\config.xml
18:36:38:******************************** CBang ********************************
18:36:38:         Date: Apr 24 2020
18:36:38:         Time: 17:07:55
18:36:38:     Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
18:36:38:       Branch: master
18:36:38:     Compiler: Visual C++ 2008
18:36:38:      Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
18:36:38:     Platform: win32 10
18:36:38:         Bits: 32
18:36:38:         Mode: Release
18:36:38:******************************* System ********************************
18:36:38:          CPU: Intel(R) Core(TM) i7-8809G CPU @ 3.10GHz
18:36:38:       CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
18:36:38:         CPUs: 8
18:36:38:       Memory: 15.92GiB
18:36:38:  Free Memory: 12.13GiB
18:36:38:      Threads: WINDOWS_THREADS
18:36:38:   OS Version: 6.2
18:36:38:  Has Battery: false
18:36:38:   On Battery: false
18:36:38:   UTC Offset: -4
18:36:38:          PID: 13896
18:36:38:          CWD: C:\Users\iante\AppData\Roaming\FAHClient
18:36:38:Win32 Service: false
18:36:38:           OS: Windows 10 Enterprise
18:36:38:      OS Arch: AMD64
18:36:38:         GPUs: 1
18:36:38:        GPU 0: Bus:1 Slot:0 Func:0 AMD:5 [Radeon RX Vega M XT/ M GH]
18:36:38:         CUDA: Not detected: Failed to open dynamic library 'nvcuda.dll': The
18:36:38:               specified module could not be found.
18:36:38:
18:36:38:       OpenCL: Not detected: clGetPlatformIDs() returned -1001
18:36:38:******************************* libFAH ********************************
18:36:38:         Date: Apr 15 2020
18:36:38:         Time: 14:53:14
18:36:38:     Revision: 216968bc7025029c841ed6e36e81a03a316890d3
18:36:38:       Branch: master
18:36:38:     Compiler: Visual C++ 2008
18:36:38:      Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
18:36:38:     Platform: win32 10
18:36:38:         Bits: 32
18:36:38:         Mode: Release
18:36:38:***********************************************************************
18:36:38:<config>
18:36:38:  <!-- Folding Core -->
18:36:38:  <core-priority v='low'/>
18:36:38:
18:36:38:  <!-- Folding Slot Configuration -->
18:36:38:  <gpu v='TRUE'/>
18:36:38:
18:36:38:  <!-- Network -->
18:36:38:  <proxy v=':8080'/>
18:36:38:
---------------------------------------------

Hope this helps to get to the bottom of this. When I say "failing", the driver appears to work for the actual Intel iGPU, it just messes up the AMD RX Vega somehow, and of course, I don't know how to get an Intel iGPU working with FAH.

Mod Edit: Added Code Tags - PantherX
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU stopped working ...

Post by bruce »

The Intel GPUs are not supported yet. They are being tested by a select group of beta testers and we're waiting for their reports. Please disable FAH from using the Iris 630. You can set "Pause-on-start" to "true" for that device or you can disable it in the BIOS.
ipkh
Posts: 175
Joined: Thu Jul 16, 2015 2:03 pm

Re: GPU stopped working ...

Post by ipkh »

In this instance a restore or pausing the slot is the only option. The 2 gpus share the output and Intel handles the driver for both.
It is possible that Intel needs to issue a fixed driver to accommodate this use case.


I really wish FAH had a separate testing environment for these true beta tests. Putting the enablement bits in the public GPUs.txt is causing quite a few problems and I doubt the majority of affected users know to come by the forum.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: GPU stopped working ...

Post by PantherX »

ipkh wrote:...I really wish FAH had a separate testing environment for these true beta tests. Putting the enablement bits in the public GPUs.txt is causing quite a few problems and I doubt the majority of affected users know to come by the forum.
There are some ideas floating around. Once the automated GPU benchmarking is in place, that might be the next step to have a different system to test those GPUs. While it might not be a completely different system, it would be better than the current system of testing in PRD.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
TomTiddler167
Posts: 6
Joined: Wed Sep 30, 2020 4:32 pm

Re: GPU stopped working ...

Post by TomTiddler167 »

bruce wrote:The Intel GPUs are not supported yet. They are being tested by a select group of beta testers and we're waiting for their reports. Please disable FAH from using the Iris 630. You can set "Pause-on-start" to "true" for that device or you can disable it in the BIOS.

OK, so you say "You can set "Pause-on-start" to "true" for that device " - How do I do that??

Incidentally, in the logs above, the only difference between the two is the Intel driver for the Iris GPU, there were no changes to the Radeon drivers.
ajm
Posts: 754
Joined: Sat Mar 21, 2020 5:22 am
Location: Lucerne, Switzerland

Re: GPU stopped working ...

Post by ajm »

TomTiddler167 wrote:
bruce wrote:The Intel GPUs are not supported yet. They are being tested by a select group of beta testers and we're waiting for their reports. Please disable FAH from using the Iris 630. You can set "Pause-on-start" to "true" for that device or you can disable it in the BIOS.

OK, so you say "You can set "Pause-on-start" to "true" for that device " - How do I do that??
Advanced Control (aka FAHControl) -> Configure -> Slots -> double clic the concerned slot -> Scroll to "Extra slot options (expert only)" -> Add -> Name: pause-on-start Value: true -> OK -> OK -> OK -> Save
Image
Post Reply