Radeon RX Vega M GH

Tap · Post by **Tap** » Sun Mar 15, 2020 7:13 am

I have the Radeon RX Vega M GH - the GPU that comes onboard the Hades Canyon NUC8i7HVK.
The full line for it via lspci -nn is

Code: Select all

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Polaris 22 XT [Radeon RX Vega M GH] [1002:694c] (rev c0)

Interestingly, you guys already have something with this model in your GPUs.txt:

Code: Select all

0x1002:0x694c:1:4:[Radeon RX Vega M XT]

However, when I try forcing adding it via the advanced client, the assignment server gives me the work server IP 192.0.2.1 - though my CPU is getting jobs fine.

I would love to be able to fold with both my CPU and GPU. This GPU is hard to get details on, and the clashing vendor/product ID doesn't help, so I'd be more than happy to give more info if needed, run tests, etc.

Output from lspci -vvv -nn:

Code: Select all

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Polaris 22 XT [Radeon RX Vega M GH] [1002:694c] (rev c0) (prog-if 00 [VGA controller])
        Subsystem: Intel Corporation Polaris 22 XT [Radeon RX Vega M GH] [8086:2073]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 164
        Region 0: Memory at 2000000000 (64-bit, prefetchable) [size=4G]
        Region 2: Memory at 2100000000 (64-bit, prefetchable) [size=2M]
        Region 4: I/O ports at e000 [size=256]
        Region 5: Memory at db500000 (32-bit, non-prefetchable) [size=256K]
        Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <1us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (downgraded), Width x8 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR+
                         10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix+, MaxEETLPPrefixes 1
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
                         AtomicOpsCtl: ReqEn+
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee007f8  Data: 0000
        Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP+ BadDLLP+ Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [200 v1] Resizable BAR <?>
        Capabilities: [270 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
                LaneErrStat: 0
        Capabilities: [2b0 v1] Address Translation Service (ATS)
                ATSCap: Invalidate Queue Depth: 00
                ATSCtl: Enable-, Smallest Translation Unit: 00
        Capabilities: [2c0 v1] Page Request Interface (PRI)
                PRICtl: Enable- Reset-
                PRISta: RF- UPRGI- Stopped+
                Page Request Capacity: 00000020, Page Request Allocation: 00000000
        Capabilities: [2d0 v1] Process Address Space ID (PASID)
                PASIDCap: Exec+ Priv+, Max PASID Width: 10
                PASIDCtl: Enable- Exec- Priv-
        Capabilities: [320 v1] Latency Tolerance Reporting
                Max snoop latency: 71680ns
                Max no snoop latency: 71680ns
        Capabilities: [328 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 1
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [370 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=0us PortTPowerOnTime=170us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu

Output from clinfo -a:

Code: Select all

Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 19.3.4
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     AMD VEGAM (DRM 3.36.0, 5.5.9-arch1-2, LLVM 9.0.1)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 Mesa 19.3.4
  Driver Version                                  19.3.4
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               24
  Max clock frequency                             1190MHz
  Device Partition                                (n/a)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              4294967296 (4GiB)
  Error Correction support                        No
  Max memory allocation                           3435973836 (3.2GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       32768 bits (4096 bytes)
  Preferred alignment for atomics                 
  Global Memory cache type                        None
  Global Memory cache size                        0
  Global Memory cache line size                   0 bytes
  Image support                                   No
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            2147483647 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             4096x4096x4096 pixels
    Max number of read image args                 32
    Max number of write image args                32
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max number of constant args                     16
  Max constant buffer size                        2147483647 (2GiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Queue properties (on host)                      
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  printf() buffer size                            0
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Clover
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [MESA]
  clCreateContext(NULL, ...) [default]            Success [MESA]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD VEGAM (DRM 3.36.0, 5.5.9-arch1-2, LLVM 9.0.1)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD VEGAM (DRM 3.36.0, 5.5.9-arch1-2, LLVM 9.0.1)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD VEGAM (DRM 3.36.0, 5.5.9-arch1-2, LLVM 9.0.1)

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.12
  ICD loader Profile                              OpenCL 2.2

Post by **bruce** » Sun Mar 15, 2020 7:30 am

Unfortunately your device (polaris) does not support Double Precision Floating Point. All GPU project now require the abilty to process such instructions (at least a very slowly)

Sorry, but it's simply too limited/old.

Tap · Post by **Tap** » Sun Mar 15, 2020 7:32 am

bruce wrote:Unfortunately your device (polaris) does not support Double Precision Floating Point. All GPU project now require the abilty to process such instructions (at least a very slowly)

CLInfo responds otherwise:

Code: Select all

  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No

So that makes no sense to me.

Post by **bruce** » Sun Mar 15, 2020 7:43 am

Well, maybe my source of information is incorrect. At what GFLOPS rate does the hardware process Double Precision instructions? (It's not a question of whether OpenCl supports it) I don't believe there is more than one device identified with the codes [1002:694c]

Tap · Post by **Tap** » Sun Mar 15, 2020 7:54 am

According to clpeak, 225.22-223.42
Keep in mind other applications on my PC may be using compute power (parts of firefox, discord, etc) but this is the result of my testing.
Full output:

Code: Select all

Platform: Clover
  Device: AMD VEGAM (DRM 3.36.0, 5.5.9-arch1-2, LLVM 9.0.1)
    Driver version  : 19.3.4 (Linux x64)
    Compute units   : 24
    Clock frequency : 1190 MHz

    Global memory bandwidth (GBPS)
      float   : 149.87
      float2  : 154.52
      float4  : 161.63
      float8  : 162.02
      float16 : 71.93

    Single-precision compute (GFLOPS)
      float   : 3510.37
      float2  : 3402.92
      float4  : 3391.99
      float8  : 3367.31
      float16 : 3320.23

    Half-precision compute (GFLOPS)
      half   : 3506.79
      half2  : 3157.70
      half4  : 3176.88
      half8  : 3177.25
      half16 : 3140.93

    Double-precision compute (GFLOPS)
      double   : 225.22
      double2  : 225.13
      double4  : 224.82
      double8  : 224.61
      double16 : 223.42

    Integer compute (GIOPS)
      int   : 716.67
      int2  : 716.73
      int4  : 718.68
      int8  : 719.60
      int16 : 715.49

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 7.87
      enqueueReadBuffer          : 7.97
      enqueueMapBuffer(for read) : 11930.46
        memcpy from mapped ptr   : 7.99
      enqueueUnmap(after write)  : 11955.37
        memcpy to mapped ptr     : 7.98

    Kernel launch latency : 38.13 us

Post by **Joe_H** » Sun Mar 15, 2020 7:56 am

This Radeon RX Vega M GH is one of the Polaris based GPU chips that AMD is selling to Intel to be incorporated with a Kaby Lake CPU on a single chip carrier. We ran into this a few months ago, the problem is not whether the GPU supports DP, but that the drivers from Intel have lacked OpenCL support good enough to use with the F@h software.

Bruce may not recall that specific topic, but mention of the chip reminded me of that.

Post by **bruce** » Sun Mar 15, 2020 8:02 am

Yes, I remember that. Remove Intel's OpenCL. We can support only OpenCl that talks directly to the Polaris, not the OpenCL that talks to the Intel iGPU. I don't know how that's managed on the high/low speed switchable GPUs.

Tap · Post by **Tap** » Sun Mar 15, 2020 8:03 am

Joe_H wrote:We ran into this a few months ago, the problem is...
the drivers from Intel have lacked OpenCL support good enough to use with the F@h software.

In my case I'm running the open source AMDGPU drivers from the Arch Linux repos, so this may be a different situation.

Tap · Post by **Tap** » Sun Mar 15, 2020 8:05 am

Here's what I have installed when it comes to OpenCL:

Code: Select all

[silvea@pikanuc ~]$ pacman -Qs opencl 
local/clinfo 2.2.18.04.06-1
    A simple OpenCL application that enumerates all available platform and device properties
local/libclc 0.2.0+589+9aa6f35-1
    Library requirements of the OpenCL C programming language
local/ocl-icd 2.2.12-3
    OpenCL ICD Bindings
local/opencl-headers 2:2.2.20170516-2
    OpenCL (Open Computing Language) header files
local/opencl-mesa 19.3.4-2
    OpenCL support for AMD/ATI Radeon mesa drivers

I can disable the iGPU in the bios, if you'd like me to re-run tests.

JimboPalmer · Post by **JimboPalmer** » Sun Mar 15, 2020 8:07 am

Yes, one of the constant facts is that users need drivers direct from AMD or Nvidia to fold with a GPU. We usually run into this in Linux where folks hope generic drivers might work.
This is an AMD GPU but the only driver support is from Intel, and it just does not provide the needed abilities.

Similarly, if you get a GPU driver in Windows Update from Microsoft, you quit being able to Fold until you reload a driver from AMD or Nvidia.

Those other companies just don't do the job F@H needs to work.

I suspect the GPU has the hardware to Fold, just not the driver software to fold.

Tap · Post by **Tap** » Sun Mar 15, 2020 8:27 am

JimboPalmer wrote:Yes, one of the constant facts is that users need drivers direct from AMD or Nvidia to fold with a GPU.

I am going to give the amdgpu-pro OpenCL driver/layer a shot - it's a closed source/proprietary implementation that runs on top of the open source one. I'll update here if the CL benchmark reports anything different.

Post by **bruce** » Sun Mar 15, 2020 8:33 am

Contact me by PM if you are successful. (for a test.)

Tap · Post by **Tap** » Sun Mar 15, 2020 9:00 am

It seems that I cannot get it going using the AMDGPU-Pro drivers. I can install it but clinfo shows it trying to pass through (only somewhat successfully) to the open source mesa drivers, and removing the mesa implementation causes it to fail to show any OpenCL devices at all. When trying to force clpeak to run with the AMDGPU-Pro platform, it outright fails to initialize a device.
As for the bios change, disabling the iGPU did not seem to affect its performance - I'm not entirely sure what's going on there, but double precision is just as low performance as previous tests claim.

This seems like a dead end, sadly. I'd be happy to test other ideas if anyone has any, but I've hit a brick wall here. I guess I'll be doing CPU-only folding for the time being.

Folding Forum

Radeon RX Vega M GH

Radeon RX Vega M GH

Re: Radeon RX Vega M GH

Re: Radeon RX Vega M GH

Re: Radeon RX Vega M GH

Re: Radeon RX Vega M GH

Re: Radeon RX Vega M GH

Re: Radeon RX Vega M GH

Re: Radeon RX Vega M GH

Re: Radeon RX Vega M GH

Re: Radeon RX Vega M GH

Re: Radeon RX Vega M GH

Re: Radeon RX Vega M GH

Re: Radeon RX Vega M GH