GPU stopped working ...

Moderators: Site Moderators, FAHC Science Team

GPU stopped working ...

Postby TomTiddler167 » Wed Sep 30, 2020 5:43 pm

I'm running on an Intel NUC NUC8i7HVK (Hades Canyon), which has both Intel Iris 630, and AMD RX Vega M XT/ M GH GPU's. I only configure the RX Vega in the BIOS. Up until Monday, FAH worked flawlessly, running slots on both my CPU, and on the GPU (AMD Vega). On Monday, Intel updated the Iris 630 driver, and somehow this rendered both GPU's to be un-detectable by FAH. I have reverted the driver, and now everything is working again. Sigh.

I'm fairly new to the kinds of information you will need to investigate further. Can you give me some clues please?
TomTiddler167
 
Posts: 4
Joined: Wed Sep 30, 2020 5:32 pm

Re: GPU stopped working ...

Postby bruce » Wed Sep 30, 2020 5:58 pm

(See below in my Signature block)

FAH puts a lot of information at the top of it's log file showing us how the system was configured during initialization. Your brief description of the configuration is helpful, but there are lots of other things that may or may not be useful in isolating your problem and I can't predict which ones we'll need. Asking a lot of questions would get us to the same result but that requires many more interactions.

FAH has had difficulties (such as yours) when there are multiple OpenCL drivers and the Development department is working on updates to FAHControl which may deal with your issue, but they're not ready to release so we'll probably have to submit your case to them to see if a fix is already pending or needs to be revised. The top of the old log showing the failure would potentially be useful ,,, and maybe we can fix it manually.

Do you have any specific apps that use OpenCL on the iGPU?

Previous log files are retained in the logs subdirectory of the FAH data directory or you can update and unupdate again.
bruce
 
Posts: 20532
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: GPU stopped working ...

Postby TomTiddler167 » Wed Sep 30, 2020 8:27 pm

Thanks "bruce" for the info. Unfortunately, I'm a bit pressed for time over the next 2 days, and since the Intel driver installation doesn't permit a "rollback" on the drivers (Sigh), I have to do a complete system restore to get back to a working FAH client after testing the new driver. I'll have some info for it all at the weekend.
TomTiddler167
 
Posts: 4
Joined: Wed Sep 30, 2020 5:32 pm

Re: GPU stopped working ...

Postby TomTiddler167 » Tue Oct 06, 2020 7:57 pm

OK, I have the "top" of the logs for the system running with the 2 different drivers ......

Here's the first, the system works with this driver (Intel Driver version 26.20.100.8141 for Iris 630)

Code: Select all
15:44:23:WU01:FS01:0x22:*********************** Log Started 2020-10-06T15:44:23Z ***********************
15:44:23:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
15:44:23:WU01:FS01:0x22:       Core: Core22
15:44:23:WU01:FS01:0x22:       Type: 0x22
15:44:23:WU01:FS01:0x22:    Version: 0.0.13
15:44:23:WU01:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:44:23:WU01:FS01:0x22:  Copyright: 2020 foldingathome.org
15:44:23:WU01:FS01:0x22:   Homepage: https://foldingathome.org/
15:44:23:WU01:FS01:0x22:       Date: Sep 19 2020
15:44:23:WU01:FS01:0x22:       Time: 02:35:58
15:44:23:WU01:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
15:44:23:WU01:FS01:0x22:     Branch: core22-0.0.13
15:44:23:WU01:FS01:0x22:   Compiler: Visual C++ 2015
15:44:23:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
15:44:23:WU01:FS01:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
15:44:23:WU01:FS01:0x22:   Platform: win32 10
15:44:23:WU01:FS01:0x22:       Bits: 64
15:44:23:WU01:FS01:0x22:       Mode: Release
15:44:23:WU01:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
15:44:23:WU01:FS01:0x22:             <peastman@stanford.edu>
15:44:23:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 8292 -checkpoint 15
15:44:23:WU01:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
15:44:23:WU01:FS01:0x22:************************************ libFAH ************************************
15:44:23:WU01:FS01:0x22:       Date: Sep 7 2020
15:44:23:WU01:FS01:0x22:       Time: 19:09:56
15:44:23:WU01:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
15:44:23:WU01:FS01:0x22:     Branch: HEAD
15:44:23:WU01:FS01:0x22:   Compiler: Visual C++ 2015
15:44:23:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
15:44:23:WU01:FS01:0x22:   Platform: win32 10
15:44:23:WU01:FS01:0x22:       Bits: 64
15:44:23:WU01:FS01:0x22:       Mode: Release
15:44:23:WU01:FS01:0x22:************************************ CBang *************************************
15:44:23:WU01:FS01:0x22:       Date: Sep 7 2020
15:44:23:WU01:FS01:0x22:       Time: 19:08:30
15:44:23:WU01:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
15:44:23:WU01:FS01:0x22:     Branch: HEAD
15:44:23:WU01:FS01:0x22:   Compiler: Visual C++ 2015
15:44:23:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
15:44:23:WU01:FS01:0x22:   Platform: win32 10
15:44:23:WU01:FS01:0x22:       Bits: 64
15:44:23:WU01:FS01:0x22:       Mode: Release
15:44:23:WU01:FS01:0x22:************************************ System ************************************
15:44:23:WU01:FS01:0x22:        CPU: Intel(R) Core(TM) i7-8809G CPU @ 3.10GHz
15:44:23:WU01:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
15:44:23:WU01:FS01:0x22:       CPUs: 8
15:44:23:WU01:FS01:0x22:     Memory: 15.92GiB
15:44:23:WU01:FS01:0x22:Free Memory: 8.10GiB
15:44:23:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
15:44:23:WU01:FS01:0x22: OS Version: 6.2
15:44:23:WU01:FS01:0x22:Has Battery: false
15:44:23:WU01:FS01:0x22: On Battery: false
15:44:23:WU01:FS01:0x22: UTC Offset: -4
15:44:23:WU01:FS01:0x22:        PID: 23388
15:44:24:WU01:FS01:0x22:        CWD: C:\Users\iante\AppData\Roaming\FAHClient\work
15:44:24:WU01:FS01:0x22:************************************ OpenMM ************************************
15:44:24:WU01:FS01:0x22:   Revision: 189320d0
15:44:24:WU01:FS01:0x22:********************************************************************************
15:44:24:WU01:FS01:0x22:Project: 16918 (Run 147, Clone 117, Gen 104)
15:44:24:WU01:FS01:0x22:Unit: 0x000000880002894c5f176177ae4d5e77
15:44:24:WU01:FS01:0x22:Reading tar file core.xml
15:44:24:WU01:FS01:0x22:Reading tar file integrator.xml
15:44:24:WU01:FS01:0x22:Reading tar file state.xml
15:44:24:WU01:FS01:0x22:Reading tar file system.xml
15:44:24:WU01:FS01:0x22:Digital signatures verified
15:44:24:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
15:44:24:WU01:FS01:0x22:Version 0.0.13
15:44:24:WU01:FS01:0x22:  Checkpoint write interval: 100000 steps (2%) [50 total]
15:44:24:WU01:FS01:0x22:  JSON viewer frame write interval: 50000 steps (1%) [100 total]
15:44:24:WU01:FS01:0x22:  XTC frame write interval: 250000 steps (5%) [20 total]
15:44:24:WU01:FS01:0x22:  Global context and integrator variables write interval: disabled
15:44:24:WU01:FS01:0x22:There are 3 platforms available.
15:44:24:WU01:FS01:0x22:Platform 0: Reference
15:44:24:WU01:FS01:0x22:Platform 1: CPU
15:44:24:WU01:FS01:0x22:Platform 2: OpenCL
15:44:24:WU01:FS01:0x22:  opencl-device 0 specified
15:44:29:WU00:FS01:Upload 30.95%
15:44:32:WU01:FS01:0x22:Attempting to create OpenCL context:
15:44:32:WU01:FS01:0x22:  Configuring platform OpenCL
15:44:35:WU00:FS01:Upload 74.28%
15:44:40:WU00:FS01:Upload complete
15:44:40:WU00:FS01:Server responded WORK_ACK (400)
15:44:40:WU00:FS01:Final credit estimate, 39220.00 points
15:44:40:WU00:FS01:Cleaning up
15:44:43:WU01:FS01:0x22:  Using OpenCL on platformId 0 and gpu 0
15:44:43:WU01:FS01:0x22:Completed 0 out of 5000000 steps (0%)
15:44:43:WU01:FS01:0x22:Checkpoint completed at step 0
15:46:17:WU02:FS00:0xa8:Completed 295000 out of 500000 steps (59%)
15:49:55:WU02:FS00:0xa8:Completed 300000 out of 500000 steps (60%)
15:54:03:WU02:FS00:0xa8:Completed 305000 out of 500000 steps (61%)
15:56:31:WU01:FS01:0x22:Completed 50000 out of 5000000 steps (1%)
15:57:41:WU02:FS00:0xa8:Completed 310000 out of 500000 steps (62%)
16:01:29:WU02:FS00:0xa8:Completed 315000 out of 500000 steps (63%)
16:05:53:WU02:FS00:0xa8:Completed 320000 out of 500000 steps (64%)
16:10:15:WU02:FS00:0xa8:Completed 325000 out of 500000 steps (65%)
16:10:37:WU01:FS01:0x22:Completed 100000 out of 5000000 steps (2%)
16:10:38:WU01:FS01:0x22:Checkpoint completed at step 100000
16:14:20:WU02:FS00:0xa8:Completed 330000 out of 500000 steps (66%)
16:18:39:WU02:FS00:0xa8:Completed 335000 out of 500000 steps (67%)
16:22:52:WU02:FS00:0xa8:Completed 340000 out of 500000 steps (68%)
16:25:48:WU01:FS01:0x22:Completed 150000 out of 5000000 steps (3%)
16:27:16:WU02:FS00:0xa8:Completed 345000 out of 500000 steps (69%)
16:31:17:WU02:FS00:0xa8:Completed 350000 out of 500000 steps (70%)

___________________________________________________________________

...... and now here's the top of the log from the failing driver (Intel Driver Version 27.20.100.8681 for Iris 630)
Code: Select all
*********************** Log Started 2020-10-06T18:36:38Z ***********************
18:36:38:Trying to access database...
18:36:38:Successfully acquired database lock
18:36:38:Read GPUs.txt
18:36:38:Enabled folding slot 00: PAUSED cpu:7 (by user)
18:36:38:Enabled folding slot 01: PAUSED gpu:0:[Radeon RX Vega M XT/ M GH] (by user)
18:36:38:ERROR:No compute devices matched GPU #0 {
18:36:38:ERROR:  "vendor": 4098,
18:36:38:ERROR:  "device": 26956,
18:36:38:ERROR:  "type": 1,
18:36:38:ERROR:  "species": 5,
18:36:38:ERROR:  "description": "[Radeon RX Vega M XT/ M GH]"
18:36:38:ERROR:}.  You may need to update your graphics drivers.
18:36:38:****************************** FAHClient ******************************
18:36:38:      Version: 7.6.13
18:36:38:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:36:38:    Copyright: 2020 foldingathome.org
18:36:38:     Homepage: https://foldingathome.org/
18:36:38:         Date: Apr 27 2020
18:36:38:         Time: 21:21:01
18:36:38:     Revision: 5a652817f46116b6e135503af97f18e094414e3b
18:36:38:       Branch: master
18:36:38:     Compiler: Visual C++ 2008
18:36:38:      Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
18:36:38:     Platform: win32 10
18:36:38:         Bits: 32
18:36:38:         Mode: Release
18:36:38:       Config: C:\Users\iante\AppData\Roaming\FAHClient\config.xml
18:36:38:******************************** CBang ********************************
18:36:38:         Date: Apr 24 2020
18:36:38:         Time: 17:07:55
18:36:38:     Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
18:36:38:       Branch: master
18:36:38:     Compiler: Visual C++ 2008
18:36:38:      Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
18:36:38:     Platform: win32 10
18:36:38:         Bits: 32
18:36:38:         Mode: Release
18:36:38:******************************* System ********************************
18:36:38:          CPU: Intel(R) Core(TM) i7-8809G CPU @ 3.10GHz
18:36:38:       CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
18:36:38:         CPUs: 8
18:36:38:       Memory: 15.92GiB
18:36:38:  Free Memory: 12.13GiB
18:36:38:      Threads: WINDOWS_THREADS
18:36:38:   OS Version: 6.2
18:36:38:  Has Battery: false
18:36:38:   On Battery: false
18:36:38:   UTC Offset: -4
18:36:38:          PID: 13896
18:36:38:          CWD: C:\Users\iante\AppData\Roaming\FAHClient
18:36:38:Win32 Service: false
18:36:38:           OS: Windows 10 Enterprise
18:36:38:      OS Arch: AMD64
18:36:38:         GPUs: 1
18:36:38:        GPU 0: Bus:1 Slot:0 Func:0 AMD:5 [Radeon RX Vega M XT/ M GH]
18:36:38:         CUDA: Not detected: Failed to open dynamic library 'nvcuda.dll': The
18:36:38:               specified module could not be found.
18:36:38:
18:36:38:       OpenCL: Not detected: clGetPlatformIDs() returned -1001
18:36:38:******************************* libFAH ********************************
18:36:38:         Date: Apr 15 2020
18:36:38:         Time: 14:53:14
18:36:38:     Revision: 216968bc7025029c841ed6e36e81a03a316890d3
18:36:38:       Branch: master
18:36:38:     Compiler: Visual C++ 2008
18:36:38:      Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
18:36:38:     Platform: win32 10
18:36:38:         Bits: 32
18:36:38:         Mode: Release
18:36:38:***********************************************************************
18:36:38:<config>
18:36:38:  <!-- Folding Core -->
18:36:38:  <core-priority v='low'/>
18:36:38:
18:36:38:  <!-- Folding Slot Configuration -->
18:36:38:  <gpu v='TRUE'/>
18:36:38:
18:36:38:  <!-- Network -->
18:36:38:  <proxy v=':8080'/>
18:36:38:


---------------------------------------------

Hope this helps to get to the bottom of this. When I say "failing", the driver appears to work for the actual Intel iGPU, it just messes up the AMD RX Vega somehow, and of course, I don't know how to get an Intel iGPU working with FAH.

Mod Edit: Added Code Tags - PantherX
TomTiddler167
 
Posts: 4
Joined: Wed Sep 30, 2020 5:32 pm

Re: GPU stopped working ...

Postby bruce » Thu Oct 08, 2020 10:54 am

The Intel GPUs are not supported yet. They are being tested by a select group of beta testers and we're waiting for their reports. Please disable FAH from using the Iris 630. You can set "Pause-on-start" to "true" for that device or you can disable it in the BIOS.
bruce
 
Posts: 20532
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: GPU stopped working ...

Postby ipkh » Fri Oct 09, 2020 10:19 pm

In this instance a restore or pausing the slot is the only option. The 2 gpus share the output and Intel handles the driver for both.
It is possible that Intel needs to issue a fixed driver to accommodate this use case.


I really wish FAH had a separate testing environment for these true beta tests. Putting the enablement bits in the public GPUs.txt is causing quite a few problems and I doubt the majority of affected users know to come by the forum.
ipkh
 
Posts: 172
Joined: Thu Jul 16, 2015 3:03 pm

Re: GPU stopped working ...

Postby PantherX » Mon Oct 12, 2020 7:59 am

ipkh wrote:...I really wish FAH had a separate testing environment for these true beta tests. Putting the enablement bits in the public GPUs.txt is causing quite a few problems and I doubt the majority of affected users know to come by the forum.

There are some ideas floating around. Once the automated GPU benchmarking is in place, that might be the next step to have a different system to test those GPUs. While it might not be a completely different system, it would be better than the current system of testing in PRD.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 7017
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: GPU stopped working ...

Postby TomTiddler167 » Fri Oct 16, 2020 4:24 pm

bruce wrote:The Intel GPUs are not supported yet. They are being tested by a select group of beta testers and we're waiting for their reports. Please disable FAH from using the Iris 630. You can set "Pause-on-start" to "true" for that device or you can disable it in the BIOS.



OK, so you say "You can set "Pause-on-start" to "true" for that device " - How do I do that??

Incidentally, in the logs above, the only difference between the two is the Intel driver for the Iris GPU, there were no changes to the Radeon drivers.
TomTiddler167
 
Posts: 4
Joined: Wed Sep 30, 2020 5:32 pm

Re: GPU stopped working ...

Postby ajm » Fri Oct 16, 2020 6:19 pm

TomTiddler167 wrote:
bruce wrote:The Intel GPUs are not supported yet. They are being tested by a select group of beta testers and we're waiting for their reports. Please disable FAH from using the Iris 630. You can set "Pause-on-start" to "true" for that device or you can disable it in the BIOS.



OK, so you say "You can set "Pause-on-start" to "true" for that device " - How do I do that??


Advanced Control (aka FAHControl) -> Configure -> Slots -> double clic the concerned slot -> Scroll to "Extra slot options (expert only)" -> Add -> Name: pause-on-start Value: true -> OK -> OK -> OK -> Save
Image
ajm
 
Posts: 705
Joined: Sat Mar 21, 2020 6:22 am
Location: Lucerne, Switzerland


Return to V7.6.x Public Release Windows/Linux/MacOS X

Who is online

Users browsing this forum: No registered users and 1 guest

cron