Cuda and OpenCL not detected with GTX960

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

FlexibleToast
Posts: 8
Joined: Fri Sep 04, 2020 7:24 pm

Cuda and OpenCL not detected with GTX960

Post by FlexibleToast »

I'm running F@H within Fedora 32 virtual machines hosted on an oVirt cluster using GPU passthrough. I created a template that has working Nvidia 450.X drivers and FAHClient. I deployed the template on my three hosts and when I pass through the K4200 that came on all of them, F@H works as expected. However, when I make another VM and pass through a GTX960 I had laying around, it says it can't find CUDA or OpenCL, despite it being exactly the same as the three other working VMs. Not sure what to check.

Exact error:

Code: Select all

  CUDA Not detected: cuInit() returned 101
OpenCL Net detected: clGetPlatformIDs() returned -1001
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Cuda and OpenCL not detected with GTX960

Post by bruce »

FAH's support is limited to GPUs that are listed in GPUs.txt. Is your GPU supported?

I see two versions of the [GeForce GTX 960] with pcie codes
0x10de:0x1401 and 0x10de:0x1406 Is your GPU one of those or does it have different pcie codes?
ipkh
Posts: 175
Joined: Thu Jul 16, 2015 2:03 pm

Re: Cuda and OpenCL not detected with GTX960

Post by ipkh »

I don't believe Nvidia supports virtualization with non Tesla and Quadro cards.
FlexibleToast
Posts: 8
Joined: Fri Sep 04, 2020 7:24 pm

Re: Cuda and OpenCL not detected with GTX960

Post by FlexibleToast »

bruce wrote:FAH's support is limited to GPUs that are listed in GPUs.txt. Is your GPU supported?

I see two versions of the [GeForce GTX 960] with pcie codes
0x10de:0x1401 and 0x10de:0x1406 Is your GPU one of those or does it have different pcie codes?
Running `FAHClient --lspci` returns

Code: Select all

...
0x10de:0x1401:5:0:0:NVIDIA Corporation:
0x10de:0x0fba:6:0:0:NVIDIA Corporation:
...
ipkh wrote:I don't believe Nvidia supports virtualization with non Tesla and Quadro cards.
I think you're thinking of vGPU support, which is definitely limited to newer Quadro and Tesla cards. Even the old K4200 doesn't make that cut. However, this is direct GPU passthrough. The VM has total authority over that PCI slot as if it was physically connected. The benefit it the VM gets full access to the card, while the down side is that the VM obviously can't be made highly available. A vGPU setup can be made highly available and I think you can even pool multiple GPUs, essentially now making your GPUs software defined.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Cuda and OpenCL not detected with GTX960

Post by PantherX »

Welcome to the F@H Forum FlexibleToast,

Can you please post the log file? Ensure you include the first 100 lines which will inform us of what the system configuration is and what the client settings are. If you require guidance, please view this topic: viewtopic.php?f=24&t=26036
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
FlexibleToast
Posts: 8
Joined: Fri Sep 04, 2020 7:24 pm

Re: Cuda and OpenCL not detected with GTX960

Post by FlexibleToast »

Code: Select all

*********************** Log Started 2020-09-06T17:54:44Z ***********************
17:54:44:Trying to access database...
17:54:44:Successfully acquired database lock
17:54:44:Read GPUs.txt
17:54:44:Enabled folding slot 00: READY cpu:6
17:54:45:Enabled folding slot 01: READY gpu:0:GM206 [GeForce GTX 960] 2308
17:54:46:ERROR:No compute devices matched GPU #0 {
17:54:46:ERROR:  "vendor": 4318,
17:54:46:ERROR:  "device": 5121,
17:54:46:ERROR:  "type": 2,
17:54:46:ERROR:  "species": 5,
17:54:46:ERROR:  "description": "GM206 [GeForce GTX 960] 2308"
17:54:46:ERROR:}.  You may need to update your graphics drivers.
17:54:47:****************************** FAHClient ******************************
17:54:47:    Version: 7.6.13
17:54:47:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:54:47:  Copyright: 2020 foldingathome.org
17:54:47:   Homepage: https://foldingathome.org/
17:54:47:       Date: Apr 28 2020
17:54:47:       Time: 04:20:27
17:54:47:   Revision: 5a652817f46116b6e135503af97f18e094414e3b
17:54:47:     Branch: master
17:54:47:   Compiler: GNU 4.9.4
17:54:47:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
17:54:47:   Platform: linux2 4.19.0-5-amd64
17:54:47:       Bits: 64
17:54:47:       Mode: Release
17:54:47:       Args: --child /etc/fahclient/config.xml --run-as fahclient
17:54:47:             --pid-file=/var/run/fahclient.pid --daemon -v
17:54:47:     Config: /etc/fahclient/config.xml
17:54:47:******************************** CBang ********************************
17:54:47:       Date: Apr 25 2020
17:54:47:       Time: 00:07:55
17:54:47:   Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
17:54:47:     Branch: master
17:54:47:   Compiler: GNU 4.9.4
17:54:47:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
17:54:47:             -fPIC
17:54:47:   Platform: linux2 4.19.0-5-amd64
17:54:47:       Bits: 64
17:54:47:       Mode: Release
17:54:47:******************************* System ********************************
17:54:47:        CPU: Intel Core Processor (Haswell, no TSX)
17:54:47:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 1
17:54:47:       CPUs: 8
17:54:47:     Memory: 3.83GiB
17:54:47:Free Memory: 3.51GiB
17:54:47:    Threads: POSIX_THREADS
17:54:47: OS Version: 5.8
17:54:47:Has Battery: false
17:54:47: On Battery: false
17:54:47: UTC Offset: -5
17:54:47:        PID: 844
17:54:47:        CWD: /var/lib/fahclient
17:54:47:         OS: Linux 5.8.4-200.fc32.x86_64 x86_64
17:54:47:    OS Arch: AMD64
17:54:47:       GPUs: 1
17:54:47:      GPU 0: Bus:5 Slot:0 Func:0 NVIDIA:5 GM206 [GeForce GTX 960] 2308
17:54:47:       CUDA: Not detected: cuInit() returned 101
17:54:47:     OpenCL: Not detected: clGetPlatformIDs() returned -1001
17:54:47:******************************* libFAH ********************************
17:54:47:       Date: Apr 15 2020
17:54:47:       Time: 21:43:27
17:54:47:   Revision: 216968bc7025029c841ed6e36e81a03a316890d3
17:54:47:     Branch: master
17:54:47:   Compiler: GNU 4.9.4
17:54:47:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
17:54:47:   Platform: linux2 4.19.0-5-amd64
17:54:47:       Bits: 64
17:54:47:       Mode: Release
17:54:47:***********************************************************************
17:54:47:<config>
17:54:47:  <!-- Client Control -->
17:54:47:  <client-threads v='6'/>
17:54:47:  <cycle-rate v='4'/>
17:54:47:  <cycles v='-1'/>
17:54:47:  <disable-sleep-when-active v='true'/>
17:54:47:  <exit-when-done v='false'/>
17:54:47:  <fold-anon v='false'/>
17:54:47:  <idle-seconds v='300'/>
17:54:47:  <open-web-control v='false'/>
17:54:47:  <update-gpus-txt v='true'/>
17:54:47:
17:54:47:  <!-- Configuration -->
17:54:47:  <config-rotate v='true'/>
17:54:47:  <config-rotate-dir v='configs'/>
17:54:47:  <config-rotate-max v='16'/>
17:54:47:
17:54:47:  <!-- Debugging -->
17:54:47:  <assignment-servers>
17:54:47:    assign1.foldingathome.org assign2.foldingathome.org assign3.foldingathome.org assign4.foldingathome.org 
17:54:47:  </assignment-servers>
17:54:47:  <auth-as v='true'/>
17:54:47:  <capture-directory v='capture'/>
17:54:47:  <capture-on-error v='false'/>
17:54:47:  <capture-packets v='false'/>
17:54:47:  <capture-requests v='false'/>
17:54:47:  <capture-responses v='false'/>
17:54:47:  <capture-sockets v='false'/>
17:54:47:  <debug-sockets v='false'/>
17:54:47:  <exception-locations v='true'/>
17:54:47:  <stack-traces v='false'/>
17:54:47:
17:54:47:  <!-- Error Handling -->
17:54:47:  <max-slot-errors v='10'/>
17:54:47:  <max-unit-errors v='5'/>
17:54:47:
17:54:47:  <!-- Folding Core -->
17:54:47:  <checkpoint v='15'/>
17:54:47:  <core-priority v='idle'/>
17:54:47:  <cpu-usage v='100'/>
17:54:47:  <gpu-usage v='100'/>
17:54:47:  <no-assembly v='false'/>
17:54:47:
17:54:47:  <!-- Folding Slot Configuration -->
17:54:47:  <cause v='ANY'/>
17:54:47:  <client-subtype v='LINUX'/>
17:54:47:  <client-type v='normal'/>
17:54:47:  <cpu-species v='X86_PENTIUM_II'/>
17:54:47:  <cpu-type v='AMD64'/>
17:54:47:  <cpus v='-1'/>
17:54:47:  <disable-viz v='false'/>
17:54:47:  <gpu v='false'/>
17:54:47:  <max-packet-size v='normal'/>
17:54:47:  <os-species v='UNKNOWN'/>
17:54:47:  <os-type v='LINUX'/>
17:54:47:  <project-key v='0'/>
17:54:47:  <smp v='true'/>
17:54:47:
17:54:47:  <!-- GUI -->
17:54:47:  <gui-enabled v='true'/>
17:54:47:
17:54:47:  <!-- HTTP Server -->
17:54:47:  <allow v='127.0.0.1 192.168.0.30'/>
17:54:47:  <connection-timeout v='60'/>
17:54:47:  <deny v='0/0'/>
17:54:47:  <http-addresses v='0:7396'/>
17:54:47:  <https-addresses v=''/>
17:54:47:  <max-connect-time v='900'/>
17:54:47:  <max-connections v='800'/>
17:54:47:  <max-request-length v='52428800'/>
17:54:47:  <min-connect-time v='300'/>
17:54:47:
17:54:47:  <!-- Logging -->
17:54:47:  <log v='/var/lib/fahclient/log.txt'/>
17:54:47:  <log-color v='true'/>
17:54:47:  <log-crlf v='false'/>
17:54:47:  <log-date v='false'/>
17:54:47:  <log-date-periodically v='21600'/>
17:54:47:  <log-domain v='false'/>
17:54:47:  <log-header v='true'/>
17:54:47:  <log-level v='true'/>
17:54:47:  <log-no-info-header v='true'/>
17:54:47:  <log-redirect v='false'/>
17:54:47:  <log-rotate v='true'/>
17:54:47:  <log-rotate-dir v='/var/lib/fahclient/logs'/>
17:54:47:  <log-rotate-max v='16'/>
17:54:47:  <log-short-level v='false'/>
17:54:47:  <log-simple-domains v='true'/>
17:54:47:  <log-thread-id v='false'/>
17:54:47:  <log-thread-prefix v='true'/>
17:54:47:  <log-time v='true'/>
17:54:47:  <log-to-screen v='true'/>
17:54:47:  <log-truncate v='false'/>
17:54:47:  <verbosity v='3'/>
17:54:47:
17:54:47:  <!-- Network -->
17:54:47:  <proxy v=':8080'/>
17:54:47:  <proxy-enable v='false'/>
17:54:47:  <proxy-pass v='*****'/>
17:54:47:  <proxy-user v=''/>
17:54:47:
17:54:47:  <!-- Process Control -->
17:54:47:  <child v='true'/>
17:54:47:  <daemon v='true'/>
17:54:47:  <fork v='false'/>
17:54:47:  <pid v='false'/>
17:54:47:  <pid-file v='/var/run/fahclient.pid'/>
17:54:47:  <respawn v='false'/>
17:54:47:  <service v='false'/>
17:54:47:
17:54:47:  <!-- Remote Command Server -->
17:54:47:  <command-address v='0.0.0.0'/>
17:54:47:  <command-allow-no-pass v='127.0.0.1'/>
17:54:47:  <command-deny-no-pass v='0/0'/>
17:54:47:  <command-enable v='true'/>
17:54:47:  <command-port v='36330'/>
17:54:47:  <password v='*****'/>
17:54:47:
17:54:47:  <!-- Slot Control -->
17:54:47:  <idle v='false'/>
17:54:47:  <max-shutdown-wait v='60'/>
17:54:47:  <pause-on-battery v='true'/>
17:54:47:  <pause-on-start v='false'/>
17:54:47:  <paused v='false'/>
17:54:47:  <power v='medium'/>
17:54:47:
17:54:47:  <!-- User Information -->
17:54:47:  <machine-id v='0'/>
17:54:47:  <passkey v='*****'/>
17:54:47:  <team v='11812'/>
17:54:47:  <user v='FlexibleToast'/>
17:54:47:
17:54:47:  <!-- Web Server -->
17:54:47:  <web-allow v='127.0.0.1'/>
17:54:47:  <web-deny v='0/0'/>
17:54:47:  <web-enable v='true'/>
17:54:47:
17:54:47:  <!-- Web Server Sessions -->
17:54:47:  <session-cookie v='sid'/>
17:54:47:  <session-lifetime v='86400'/>
17:54:47:  <session-timeout v='3600'/>
17:54:47:
17:54:47:  <!-- Work Unit Control -->
17:54:47:  <dump-after-deadline v='true'/>
17:54:47:  <max-queue v='16'/>
17:54:47:  <max-units v='0'/>
17:54:47:  <next-unit-percentage v='99'/>
17:54:47:  <stall-detection-enabled v='false'/>
17:54:47:  <stall-percent v='5'/>
17:54:47:  <stall-timeout v='1800'/>
17:54:47:
17:54:47:  <!-- Folding Slots -->
17:54:47:  <slot id='0' type='CPU'>
17:54:47:    <machine-id v='0'/>
17:54:47:  </slot>
17:54:47:  <slot id='1' type='GPU'>
17:54:47:    <gpu-index v='0'/>
17:54:47:    <machine-id v='1'/>
17:54:47:  </slot>
17:54:47:</config>
17:54:47:WU00:FS00:Starting
17:54:47:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 844 -checkpoint 15 -np 6
17:54:47:WU00:FS00:Started FahCore on PID 1166
17:54:47:WU00:FS00:Core PID:1170
17:54:47:WU00:FS00:FahCore 0xa7 started
17:54:47:WU01:FS01:Connecting to assign1.foldingathome.org:80
17:54:47:WU00:FS00:0xa7:*********************** Log Started 2020-09-06T17:54:47Z ***********************
17:54:47:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
17:54:47:WU00:FS00:0xa7:       Type: 0xa7
17:54:47:WU00:FS00:0xa7:       Core: Gromacs
17:54:47:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 1166 -checkpoint 15 -np 6
17:54:47:WU00:FS00:0xa7:************************************ CBang *************************************
17:54:47:WU00:FS00:0xa7:       Date: Nov 27 2019
17:54:47:WU00:FS00:0xa7:       Time: 11:26:54
17:54:47:WU00:FS00:0xa7:   Revision: d25803215b59272441049dfa05a0a9bf7a6e3c48
17:54:47:WU00:FS00:0xa7:     Branch: master
17:54:47:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
17:54:47:WU00:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
17:54:47:WU00:FS00:0xa7:             -fno-pie -fPIC
17:54:47:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
17:54:47:WU00:FS00:0xa7:       Bits: 64
17:54:47:WU00:FS00:0xa7:       Mode: Release
17:54:47:WU00:FS00:0xa7:************************************ System ************************************
17:54:47:WU00:FS00:0xa7:        CPU: Intel Core Processor (Haswell, no TSX)
17:54:47:WU00:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 1
17:54:47:WU00:FS00:0xa7:       CPUs: 8
17:54:47:WU00:FS00:0xa7:     Memory: 3.83GiB
17:54:47:WU00:FS00:0xa7:Free Memory: 3.32GiB
17:54:47:WU00:FS00:0xa7:    Threads: POSIX_THREADS
17:54:47:WU00:FS00:0xa7: OS Version: 5.8
17:54:47:WU00:FS00:0xa7:Has Battery: false
17:54:47:WU00:FS00:0xa7: On Battery: false
17:54:47:WU00:FS00:0xa7: UTC Offset: -5
17:54:47:WU00:FS00:0xa7:        PID: 1170
17:54:47:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
17:54:47:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
17:54:47:WU00:FS00:0xa7:    Version: 0.0.19
17:54:47:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:54:47:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
17:54:47:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
17:54:47:WU00:FS00:0xa7:       Date: Nov 26 2019
17:54:47:WU00:FS00:0xa7:       Time: 00:41:42
17:54:47:WU00:FS00:0xa7:   Revision: d5b5c747532224f986b7cd02c968ed9a20c16d6e
17:54:47:WU00:FS00:0xa7:     Branch: master
17:54:47:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
17:54:47:WU00:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
17:54:47:WU00:FS00:0xa7:             -fno-pie
17:54:47:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
17:54:47:WU00:FS00:0xa7:       Bits: 64
17:54:47:WU00:FS00:0xa7:       Mode: Release
17:54:47:WU00:FS00:0xa7:************************************ Build *************************************
17:54:47:WU00:FS00:0xa7:       SIMD: avx_256
17:54:47:WU00:FS00:0xa7:********************************************************************************
17:54:47:WU00:FS00:0xa7:Project: 14379 (Run 2862, Clone 4, Gen 445)
17:54:47:WU00:FS00:0xa7:Unit: 0x000001f7455e42075e932f06ec7aee4f
17:54:47:WU00:FS00:0xa7:Digital signatures verified
17:54:47:WU00:FS00:0xa7:Calling: mdrun -s frame445.tpr -o frame445.trr -cpi state.cpt -cpt 15 -nt 6
17:54:48:WU00:FS00:0xa7:Steps: first=0 total=250000
17:54:48:WU01:FS01:Assigned to work server 192.0.2.1
17:54:48:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GM206 [GeForce GTX 960] 2308 from 192.0.2.1
17:54:48:WU01:FS01:Connecting to 192.0.2.1:8080
ipkh
Posts: 175
Joined: Thu Jul 16, 2015 2:03 pm

Re: Cuda and OpenCL not detected with GTX960

Post by ipkh »

This log shows detection of the graphics card and pcie identification as compatible.
It still doesn't have the drivers yet so I'd try and install the Nvidia drivers.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Cuda and OpenCL not detected with GTX960

Post by PantherX »

This would be the sequence of events to ensure that you fold on your Nvidia GPU under Linux:
1) Ensure that Linux is fully updated.
2) Install the proprietary drivers by Nvidia
3) Install OpenCL package (sudo apt-get install ocl-icd-opencl-dev)
4) Install FAHClient
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
FlexibleToast
Posts: 8
Joined: Fri Sep 04, 2020 7:24 pm

Re: Cuda and OpenCL not detected with GTX960

Post by FlexibleToast »

PantherX wrote:This would be the sequence of events to ensure that you fold on your Nvidia GPU under Linux:
1) Ensure that Linux is fully updated.
2) Install the proprietary drivers by Nvidia
3) Install OpenCL package (sudo apt-get install ocl-icd-opencl-dev)
4) Install FAHClient
1) Yep, brand new installation and was dnf updated
2) Yep, I installed directly from Nvidia using their .run package
3) This one was apparently not done. Of course on Fedora it is 'dnf install ocl-icd-devel.x86_64' not apt. I'm not so sure how necessary this is though since the same template worked with all the K4200 GPUs? I went ahead and installed it and reboot, still have the same error.
gunnarre
Posts: 567
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: Cuda and OpenCL not detected with GTX960

Post by gunnarre »

Try to run the command "clinfo" as root and see if you see the card listed. Then try to run the same command as the user fahclient, assuming that is the user that the FAHClient runs under.

If the ouputs are different between root and fahclient user, then it's a permission issue and you need to add the fahclient user to the group(s) "video" and/or "render" to give it access to the video card; then restart the client.

Edit: If it doesn't work, please post the output of the clinfo command.
Image
Online: GTX 1660 Super, GTX 1080, GTX 1050 Ti 4G OC, RX580 + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 960, GTX 950
FlexibleToast
Posts: 8
Joined: Fri Sep 04, 2020 7:24 pm

Re: Cuda and OpenCL not detected with GTX960

Post by FlexibleToast »

This is really bizarre. Despite it being visible by lspci, and drivers installed straight from Nvidia it is being reported as an unknown error. Not just in Fedora 32, but also CentOS 8, and I couldn't get it to work in Windows 10.

Code: Select all

[root@fah-client-3 ~]# clinfo
Number of platforms                               0
[root@fah-client-3 ~]# nvidia-smi
Unable to determine the device handle for GPU 0000:05:00.0: Unknown Error
Seems to me the problem is with oVirt or the hardware. The 960 was working when it came out of my brother's PC. Might try it in another one of my hosts, maybe the MB slot is just bad.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Cuda and OpenCL not detected with GTX960

Post by PantherX »

It's weird that it wasn't able to work on Windows 10. It should fold fine on Windows once you have installed the right version. Do you know what the error message was under Windows?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
gunnarre
Posts: 567
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: Cuda and OpenCL not detected with GTX960

Post by gunnarre »

I'm not familiar with oVirt, but with libVirt you need to pass through all PCIe devices in a whole IOMMU group to the virtual instance before it will work. So for e.g. a graphics card which is connected directly to the CPU, a single graphics card may show up as two PCI devices in lspci: One Video device and one Audio device, and they make up one IOMMU group together. You need to pass both the Audio and Video logical PCI device to the virtual instance. For a GPU which is connected to a chipset, you might need to pass through the whole chipset of the machine into the virtual instance, if all the devices connected to the chipset shows up as a single IOMMU group.

The latter might be acceptable for a headless server, because you might replace any chipset devices given up from the host system to the guest due to virtualiztion with e.g. a USB network card that is connected to the Bios flash USB port (which doesn't go through the chipset). The block diagram of the motherboard from the motherboard manual is very useful here, because IOMMU groups typically correspond to I/O paths out of the CPU.
Image
Online: GTX 1660 Super, GTX 1080, GTX 1050 Ti 4G OC, RX580 + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 960, GTX 950
FlexibleToast
Posts: 8
Joined: Fri Sep 04, 2020 7:24 pm

Re: Cuda and OpenCL not detected with GTX960

Post by FlexibleToast »

PantherX wrote:It's weird that it wasn't able to work on Windows 10. It should fold fine on Windows once you have installed the right version. Do you know what the error message was under Windows?
It's a driver error despite trying to install directly the driver directly from Nvidia
gunnarre wrote:I'm not familiar with oVirt, but with libVirt you need to pass through all PCIe devices in a whole IOMMU group to the virtual instance before it will work. So for e.g. a graphics card which is connected directly to the CPU, a single graphics card may show up as two PCI devices in lspci: One Video device and one Audio device, and they make up one IOMMU group together. You need to pass both the Audio and Video logical PCI device to the virtual instance. For a GPU which is connected to a chipset, you might need to pass through the whole chipset of the machine into the virtual instance, if all the devices connected to the chipset shows up as a single IOMMU group.
Yes, I pass both the gpu and audio through to the VM. Seems the oVirt team has thought about this and when you go to add a host device to a VM one of the columns is IOMMU group. So, I know I've passed through the entire group, which is just the gpu and audio.

More and more this is looking like a hardware issue. I'm going to try the card in a different host and see if that helps. Would be very coincidental timing if the card died between taking it out of my brother's machine and trying to put it in my oVirt host. It did sit around for a while before it went in though, it's entirely impossible...
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Cuda and OpenCL not detected with GTX960

Post by PantherX »

FlexibleToast wrote:...It's a driver error despite trying to install directly the driver directly from Nvidia...
That's usually an indication of something isn't right. Do you know what the error was? Did a search on the internet about that error message lead anywhere?

There's a small chance that when you removed the GPU from the system, if you weren't grounded, a build-up static electricity could have damaged the GPU when you touched it.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Post Reply