No OpenCL 1.2+ support detected for Vega 64 [RESOLVED]

It seems that a lot of GPU problems revolve around specific versions of drivers. Though AMD has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

No OpenCL 1.2+ support detected for Vega 64 [RESOLVED]

Postby hdastwb » Thu Nov 19, 2020 5:18 am

I've been folding for a while now, but a few months ago my GPU mysteriously stopped taking WUs and I haven't been able to get it folding again since.

My GPU is a Radeon RX Vega 64; I'm running Gentoo with the latest 5.9.8 kernel, version 3.9 of the ROCm packages, and FAHClient version 7.6.21:
Code: Select all
03:20:51:******************************* System ********************************
03:20:51:        CPU: Intel(R) Core(TM) i9-7940X CPU @ 3.10GHz
03:20:51:     CPU ID: GenuineIntel Family 6 Model 85 Stepping 4
03:20:51:       CPUs: 28
03:20:51:     Memory: 23.19GiB
03:20:51:Free Memory: 18.31GiB
03:20:51:    Threads: POSIX_THREADS
03:20:51: OS Version: 5.9
03:20:51:Has Battery: false
03:20:51: On Battery: false
03:20:51: UTC Offset: -5
03:20:51:        PID: 67530
03:20:51:        CWD: /opt/foldingathome
03:20:51:         OS: Linux 5.9.8-gentoo x86_64
03:20:51:    OS Arch: AMD64
03:20:51:       GPUs: 1
03:20:51:      GPU 0: Bus:103 Slot:0 Func:0 AMD:5 Vega 10 XL/XT [Radeon RX Vega 56/64]
03:20:51:       CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
03:20:51:             libcuda.so: cannot open shared object file: No such file or
03:20:51:             directory
03:20:51:***********************************************************************


clinfo seems to indicate support for OpenCL 2.0 and FP64:
Code: Select all
Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.0 AMD-APP.dbg (3204.0)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     gfx900
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 2.0
  Driver Version                                  3204.0 (HSA1.1,LC)
  Device OpenCL C Version                         OpenCL C 2.0
  Device Type                                     GPU
  Device Board Name (AMD)                         Vega 10 XL/XT [Radeon RX Vega 56/64]
  Device Topology (AMD)                           PCI-E, 67:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               64
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1630MHz
  Graphics IP (AMD)                               9.0
  Device Partition                                (core)
    Max number of sub-devices                     64
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size (AMD)                 256
  Max work group size (AMD)                       1024
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No


FAHClient --lspci gives me this entry for the card:
Code: Select all
0x1002:0x687f:103:0:0:Advanced Micro Devices, Inc. [AMD/ATI]:


And the corresponding entry in GPUs.txt seems to indicate that it is considered species 5:
Code: Select all
0x1002:0x687f:1:5:Vega 10 XL/XT [Radeon RX Vega 56/64]


However, FAHClient is giving me this warning and disabling the slot:
Code: Select all
03:20:51:WARNING:FS01:No CUDA or OpenCL 1.2+ support detected for GPU slot 01: gpu:103:0 Vega 10 XL/XT [Radeon RX Vega 56/64].  Disabling.


Is there something that I'm missing here that explains why FAHClient does not think my GPU has adequate OpenCL support? What else can I try to get to the bottom of this?
Last edited by hdastwb on Sat Apr 03, 2021 7:58 pm, edited 1 time in total.
hdastwb
 
Posts: 4
Joined: Tue Oct 20, 2020 4:04 am

Re: No OpenCL 1.2+ support detected for Vega 64 on Gentoo

Postby Joe_H » Thu Nov 19, 2020 7:19 am

Problems have been reported with the latest Linux kernel and using OpenCL, etc. One such report is here - viewtopic.php?f=108&t=36443 - in connection with Fedora.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6974
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: No OpenCL 1.2+ support detected for Vega 64 on Gentoo

Postby Whompithian » Thu Nov 19, 2020 7:48 am

FAHClient refuses to acknowledge the OpenCL provided by ROCm. Ever since the change that prevents GPU work from being assigned when the client fails to detect OpenCL, it has been necessary to run the AMDGPU-Pro OpenCL libraries for AMD hardware.
Whompithian
 
Posts: 26
Joined: Thu Jun 25, 2020 1:40 am

Re: No OpenCL 1.2+ support detected for Vega 64 on Gentoo

Postby hdastwb » Fri Nov 20, 2020 8:53 am

Joe_H wrote:Problems have been reported with the latest Linux kernel and using OpenCL, etc. One such report is here - viewtopic.php?f=108&t=36443 - in connection with Fedora.

Thanks for the pointer; I tried falling back to 5.8.18 as suggested but didn't have any luck. Based on some of the other threads it looks like the issue with the 5.9 kernels might just be an nVidia problem: https://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-Linux-5.9-Delayed

Whompithian wrote:FAHClient refuses to acknowledge the OpenCL provided by ROCm. Ever since the change that prevents GPU work from being assigned when the client fails to detect OpenCL, it has been necessary to run the AMDGPU-Pro OpenCL libraries for AMD hardware.

This is very disappointing since not having to mess with closed-source graphics drivers is one of the main reasons I went AMD, but these things do happen. Installing AMDGPU-Pro on Gentoo doesn't seem to be so simple, though: the packages that I've found so far only install the legacy version which doesn't support RX Vega cards and the documentation that I've found mostly just suggests to install ROCm instead since it's considered to be higher quality and better supported. I'll have to see what sort of installation I can cobble together when I get more time to look into this.

I've also stumbled across this GitHub issue which looks superficially similar: https://github.com/FoldingAtHome/fah-issues/issues/1589
I tried installing the CUDA libraries too as suggested in that issue just in case, but to no avail.
hdastwb
 
Posts: 4
Joined: Tue Oct 20, 2020 4:04 am

Re: No OpenCL 1.2+ support detected for Vega 64 on Gentoo

Postby hdastwb » Sat Nov 21, 2020 6:47 am

I managed to set up an AMDGPU-Pro installation using a modified version of this package based on the instructions in viewtopic.php?f=81&t=33353, and now my clinfo output looks like this:
Code: Select all
Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (3180.7)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     gfx900
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 2.0 AMD-APP (3180.7)
  Driver Version                                  3180.7 (PAL,HSAIL)
  Device OpenCL C Version                         OpenCL C 2.0
  Device Type                                     GPU
  Device Board Name (AMD)                         Radeon RX Vega
  Device Topology (AMD)                           PCI-E, 67:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               64
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1630MHz
  Graphics IP (AMD)                               9.0
  Device Partition                                (core)
    Max number of sub-devices                     64
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size (AMD)                 256
  Max work group size (AMD)                       1024
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No


I'm still seeing the same error message, though, so I'll have to keep looking.
hdastwb
 
Posts: 4
Joined: Tue Oct 20, 2020 4:04 am

Re: No OpenCL 1.2+ support detected for Vega 64 on Gentoo

Postby hdastwb » Sat Apr 03, 2021 7:57 pm

A follow-up: I tried re-enabling the GPU after updating to the 5.10.27 kernel and version 4.1.0 of the ROCm packages, and now it's folding again!
hdastwb
 
Posts: 4
Joined: Tue Oct 20, 2020 4:04 am


Return to Problems with AMD/ATI drivers

Who is online

Users browsing this forum: No registered users and 1 guest

cron