GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_UNIT

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_UNIT

Postby florinandrei » Fri Mar 27, 2020 1:41 am

Ubuntu 16.04. Nvidia drivers version 430.64 installed from graphics-drivers/ppa

All I get is this:

00:32:09:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
00:32:09:WU00:FS01:0x22:Version 0.0.2
00:32:13:WU00:FS01:0x22:ERROR:exception: Error compiling kernel:
00:32:13:WU00:FS01:0x22:Saving result file ../logfile_01.txt
00:32:13:WU00:FS01:0x22:Saving result file science.log
00:32:13:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
00:32:14:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
Last edited by florinandrei on Fri Mar 27, 2020 2:23 am, edited 3 times in total.
User avatar
florinandrei
 
Posts: 14
Joined: Fri Feb 12, 2010 1:16 am
Location: California

Re: GTX1060 driver 430.64 Error compiling kernel

Postby florinandrei » Fri Mar 27, 2020 1:50 am

Let me add the log:

Code: Select all
*********************** Log Started 2020-03-27T00:48:56Z ***********************
00:48:56:************************* Folding@home Client *************************
00:48:56:        Website: https://foldingathome.org/
00:48:56:      Copyright: (c) 2009-2018 foldingathome.org
00:48:56:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:48:56:           Args: --child --lifeline 9316 /etc/fahclient/config.xml --run-as
00:48:56:                 fahclient --pid-file=/var/run/fahclient.pid --daemon
00:48:56:         Config: /etc/fahclient/config.xml
00:48:56:******************************** Build ********************************
00:48:56:        Version: 7.5.1
00:48:56:           Date: May 11 2018
00:48:56:           Time: 19:59:04
00:48:56:     Repository: Git
00:48:56:       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
00:48:56:         Branch: master
00:48:56:       Compiler: GNU 6.3.0 20170516
00:48:56:        Options: -std=gnu++98 -O3 -funroll-loops
00:48:56:       Platform: linux2 4.14.0-3-amd64
00:48:56:           Bits: 64
00:48:56:           Mode: Release
00:48:56:******************************* System ********************************
00:48:56:            CPU: AMD FX(tm)-4300 Quad-Core Processor
00:48:56:         CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
00:48:56:           CPUs: 4
00:48:56:         Memory: 15.65GiB
00:48:56:    Free Memory: 11.92GiB
00:48:56:        Threads: POSIX_THREADS
00:48:56:     OS Version: 4.4
00:48:56:    Has Battery: false
00:48:56:     On Battery: false
00:48:56:     UTC Offset: -7
00:48:56:            PID: 9318
00:48:56:            CWD: /var/lib/fahclient
00:48:56:             OS: Linux 4.4.0-176-generic x86_64
00:48:56:        OS Arch: AMD64
00:48:56:           GPUs: 1
00:48:56:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:7 GP106 [GeForce GTX 1060 6GB] 4372
00:48:56:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:6.1 Driver:10.1
00:48:56:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:430.64
00:48:56:***********************************************************************
00:48:56:<config>
00:48:56:  <!-- Folding Slot Configuration -->
00:48:56:  <smp v='false'/>
00:48:56:
00:48:56:  <!-- User Information -->
00:48:56:  <user v='florinandrei'/>
00:48:56:
00:48:56:  <!-- Folding Slots -->
00:48:56:  <slot id='1' type='GPU'/>
00:48:56:</config>
00:48:56:Switching to user fahclient
00:48:56:Trying to access database...
00:48:56:Successfully acquired database lock
00:48:56:Enabled folding slot 01: READY gpu:0:GP106 [GeForce GTX 1060 6GB] 4372
00:48:56:WU00:FS01:Connecting to 65.254.110.245:8080
User avatar
florinandrei
 
Posts: 14
Joined: Fri Feb 12, 2010 1:16 am
Location: California

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby Joe_H » Sat Mar 28, 2020 12:49 am

Did you also install the nvidia OpenCL support as well?
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6426
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby florinandrei » Sat Mar 28, 2020 1:27 am

I think so.

Code: Select all
# clinfo
Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.2 CUDA 10.1.120
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
  Platform Extensions function suffix             NV

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     GeForce GTX 1060 6GB
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.2 CUDA
  Driver Version                                  430.64
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Topology (NV)                            PCI-E, 01:00.0
  Max compute units                               10
  Max clock frequency                             1771MHz
  Compute Capability (NV)                         6.1
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
=== CL_PROGRAM_BUILD_LOG ===
  Preferred work group size multiple             
  Warp size (NV)                                  32
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              6368198656 (5.931GiB)
  Error Correction support                        No
  Max memory allocation                           1592049664 (1.483GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        163840
  Global Memory cache line                        128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             16384x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                16
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max constant buffer size                        65536 (64KiB)
  Max number of constant args                     9
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties                               
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                         
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                               
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  NVIDIA CUDA
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [NV]
  clCreateContext(NULL, ...) [default]            Success [NV]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.8
  ICD loader Profile                              OpenCL 1.2
   NOTE:   your OpenCL library declares to support OpenCL 1.2,
      but it seems to support up to OpenCL 2.1 too.


BTW, this system used to mine cryptocurrency with ccminer on the GPU a while ago, but with older Nvidia drivers. All I did was - I removed all nvidia* packages, then reinstalled the latest (430) from the PPA:

https://launchpad.net/~graphics-drivers ... ubuntu/ppa
User avatar
florinandrei
 
Posts: 14
Joined: Fri Feb 12, 2010 1:16 am
Location: California

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby florinandrei » Sat Mar 28, 2020 1:43 am

I just checked, ewbf-miner works just fine mining Zcash on this GPU with the 430 drivers. I see it in the output from nvidia-smi.

Code: Select all
$ nvidia-smi
Fri Mar 27 17:41:29 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64       Driver Version: 430.64       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |
| 49%   79C    P2   116W / 120W |    615MiB /  6073MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1709      G   /usr/lib/xorg/Xorg                            32MiB |
|    0     11871      C   ewbf-miner                                   571MiB |
+-----------------------------------------------------------------------------+


So the GPU-related stuff is not completely broken. At least the crypto apps still work (though at a financial loss).
User avatar
florinandrei
 
Posts: 14
Joined: Fri Feb 12, 2010 1:16 am
Location: California

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby Joe_H » Sat Mar 28, 2020 1:46 am

Looks like the OpenCL support is there. It has been over a year since I set up my 16.04 system successfully, only thing else you might need is the OpenCL-dev package. But my notes are buried under a bunch of stuff at the moment, hopefully someone who has done this more recently can add to this topic and get you the rest of the way there.
Joe_H
Site Admin
 
Posts: 6426
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby florinandrei » Sat Mar 28, 2020 1:52 am

I think I have it already:

Code: Select all
# dpkg -l | grep nvidia
ii  nvidia-430                            430.64-0ubuntu0~gpu16.04.2                      amd64        NVIDIA binary driver - version 430.64
ii  nvidia-opencl-dev:amd64               7.5.18-0ubuntu1                                 amd64        NVIDIA OpenCL development files
ii  nvidia-opencl-icd-430                 430.64-0ubuntu0~gpu16.04.2                      amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                          0.8.2                                           amd64        Tools to enable NVIDIA's Prime
ii  nvidia-settings                       440.64.00-0ubuntu1                              amd64        Tool for configuring the NVIDIA graphics driver
User avatar
florinandrei
 
Posts: 14
Joined: Fri Feb 12, 2010 1:16 am
Location: California

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby florinandrei » Sat Mar 28, 2020 2:26 am

Current list of installed packages, still failing:

Code: Select all
# dpkg -l | grep nvidia
ii  nvidia-430                            430.64-0ubuntu0~gpu16.04.2                      amd64        NVIDIA binary driver - version 430.64
ii  nvidia-cuda-dev                       7.5.18-0ubuntu1                                 amd64        NVIDIA CUDA development files
ii  nvidia-cuda-doc                       7.5.18-0ubuntu1                                 all          NVIDIA CUDA and OpenCL documentation
ii  nvidia-cuda-gdb                       7.5.18-0ubuntu1                                 amd64        NVIDIA CUDA Debugger (GDB)
ii  nvidia-cuda-toolkit                   7.5.18-0ubuntu1                                 amd64        NVIDIA CUDA development toolkit
ii  nvidia-opencl-dev:amd64               7.5.18-0ubuntu1                                 amd64        NVIDIA OpenCL development files
ii  nvidia-opencl-icd-430                 430.64-0ubuntu0~gpu16.04.2                      amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                          0.8.2                                           amd64        Tools to enable NVIDIA's Prime
ii  nvidia-profiler                       7.5.18-0ubuntu1                                 amd64        NVIDIA Profiler for CUDA and OpenCL
ii  nvidia-settings                       440.64.00-0ubuntu1                              amd64        Tool for configuring the NVIDIA graphics driver
ii  nvidia-visual-profiler                7.5.18-0ubuntu1                                 amd64        NVIDIA Visual Profiler for CUDA and OpenCL
User avatar
florinandrei
 
Posts: 14
Joined: Fri Feb 12, 2010 1:16 am
Location: California

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby florinandrei » Sat Mar 28, 2020 2:38 am

Is there a way to increase the verbosity of the log? It fails to compile the kernel - okay, but how does it fail exactly? What is the error? That would help identify the cause of the error.
User avatar
florinandrei
 
Posts: 14
Joined: Fri Feb 12, 2010 1:16 am
Location: California

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby Joe_H » Sat Mar 28, 2020 4:32 pm

You can increase the FAHClient logging verbosity, but as I recall the error messages from the folding core may not show up there. The error report might be in the science.log, or other log associated with the processing in the work folder. Those might be too ephemeral in an error like this to capture. FAHClient collects and enters into its log many, but not all, messages connected to the folding cores processing a WU.
Joe_H
Site Admin
 
Posts: 6426
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby florinandrei » Sun Mar 29, 2020 11:07 pm

I've downgraded the drivers to 390.116 (apt-get install nvidia-390, with the PPA) and now the GPU core seems to be crunching WUs.

Code: Select all
# dpkg -l | grep nvidia
ii  nvidia-390                            390.116-0ubuntu1                                amd64        NVIDIA binary driver - version 390.116
ii  nvidia-opencl-dev:amd64               7.5.18-0ubuntu1                                 amd64        NVIDIA OpenCL development files
ii  nvidia-opencl-icd-390                 390.116-0ubuntu1                                amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                          0.8.2                                           amd64        Tools to enable NVIDIA's Prime
ii  nvidia-settings                       440.64.00-0ubuntu1                              amd64        Tool for configuring the NVIDIA graphics driver


Code: Select all
21:56:38:WU00:FS01:0x22:Project: 11753 (Run 0, Clone 2636, Gen 10)
21:56:38:WU00:FS01:0x22:Unit: 0x000000119bf7a4d55e6d76c7d507bf1f
21:56:38:WU00:FS01:0x22:Reading tar file core.xml
21:56:38:WU00:FS01:0x22:Reading tar file integrator.xml
21:56:38:WU00:FS01:0x22:Reading tar file state.xml
21:56:39:WU00:FS01:0x22:Reading tar file system.xml
21:56:40:WU00:FS01:0x22:Digital signatures verified
21:56:40:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
21:56:40:WU00:FS01:0x22:Version 0.0.2
21:57:06:WU00:FS01:0x22:Completed 0 out of 1000000 steps (0%)
21:57:06:WU00:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:59:49:WU00:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
22:02:31:WU00:FS01:0x22:Completed 20000 out of 1000000 steps (2%)


But, folks, this is not right. That's a very old driver. The current driver series is 44x. A lot of people will install the latest. A lot of Ubuntu users, like me, will follow the recommendation from the PPA, which is the 43x series - and FAH will fail.

Can we get the Linux GPU core to work with more recent drivers?

Thanks!
User avatar
florinandrei
 
Posts: 14
Joined: Fri Feb 12, 2010 1:16 am
Location: California

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby kevinjos » Mon Mar 30, 2020 5:00 pm

I am running FAH on Ubuntu 18.04 with Nvidia Driver Version: 440.64.00 + CUDA Version: 10.2.

Are you able to compile the Nvidia examples on your system with 430.xx installed? https://docs.nvidia.com/cuda/cuda-insta ... g-examples
Image
kevinjos
 
Posts: 4
Joined: Sun Mar 29, 2020 10:04 pm

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby m1geo » Tue Mar 31, 2020 3:26 am

I, too, have exactly this issue:

Xubuntu 19.10
GeForce GTX 1070 8GB
Nvidia Driver 440.64 (also tried 435, 430).
I have nvidia-opencl-dev, ocl-icd-opencl-dev, etc installed.

Machine info:
Code: Select all
01:22:59:************************* Folding@home Client *************************
01:22:59:        Website: https://foldingathome.org/
01:22:59:      Copyright: (c) 2009-2018 foldingathome.org
01:22:59:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
01:22:59:           Args: --child --lifeline 5517 /etc/fahclient/config.xml --run-as
01:22:59:                 fahclient --pid-file=/var/run/fahclient.pid --daemon
01:22:59:         Config: /etc/fahclient/config.xml
01:22:59:******************************** Build ********************************
01:22:59:        Version: 7.5.1
01:22:59:           Date: May 11 2018
01:22:59:           Time: 19:59:04
01:22:59:     Repository: Git
01:22:59:       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
01:22:59:         Branch: master
01:22:59:       Compiler: GNU 6.3.0 20170516
01:22:59:        Options: -std=gnu++98 -O3 -funroll-loops
01:22:59:       Platform: linux2 4.14.0-3-amd64
01:22:59:           Bits: 64
01:22:59:           Mode: Release
01:22:59:******************************* System ********************************
01:22:59:            CPU: AMD Ryzen 9 3900X 12-Core Processor
01:22:59:         CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
01:22:59:           CPUs: 24
01:22:59:         Memory: 62.79GiB
01:22:59:    Free Memory: 58.91GiB
01:22:59:        Threads: POSIX_THREADS
01:22:59:     OS Version: 5.3
01:22:59:    Has Battery: false
01:22:59:     On Battery: false
01:22:59:     UTC Offset: 1
01:22:59:            PID: 5519
01:22:59:            CWD: /var/lib/fahclient
01:22:59:             OS: Linux 5.3.0-42-generic x86_64
01:22:59:        OS Arch: AMD64
01:22:59:           GPUs: 1
01:22:59:          GPU 0: Bus:9 Slot:0 Func:0 NVIDIA:7 GP104 [GeForce GTX 1070] 6463
01:22:59:  CUDA Device 0: Platform:0 Device:0 Bus:9 Slot:0 Compute:6.1 Driver:10.2
01:22:59:OpenCL Device 0: Platform:0 Device:0 Bus:9 Slot:0 Compute:1.2 Driver:440.64
01:22:59:***********************************************************************


I am seeing the same error as others above:

Code: Select all
01:45:09:WU02:FS01:Connecting to 65.254.110.245:8080
01:45:09:WARNING:WU02:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
01:45:09:WU02:FS01:Connecting to 18.218.241.186:80
01:45:10:WU02:FS01:Assigned to work server 40.114.52.201
01:45:10:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:GP104 [GeForce GTX 1070] 6463 from 40.114.52.201
01:45:10:WU02:FS01:Connecting to 40.114.52.201:8080
01:45:33:WU02:FS01:Downloading 29.59MiB
01:45:39:WU02:FS01:Download 22.39%
01:45:45:WU02:FS01:Download 36.97%
01:45:51:WU02:FS01:Download 56.82%
01:45:57:WU02:FS01:Download 69.92%
01:46:03:WU02:FS01:Download 90.62%
01:46:05:WU02:FS01:Download complete
01:46:05:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:11778 run:0 clone:30675 gen:6 core:0x22 unit:0x0000000b287234c95e792f710bd2efa9
01:46:05:WU02:FS01:Starting
01:46:05:WU02:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 02 -suffix 01 -version 705 -lifeline 5519 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
01:46:05:WU02:FS01:Started FahCore on PID 8162
01:46:05:WU02:FS01:Core PID:8166
01:46:05:WU02:FS01:FahCore 0x22 started
01:46:05:WU02:FS01:0x22:*********************** Log Started 2020-03-31T01:46:05Z ***********************
01:46:05:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
01:46:05:WU02:FS01:0x22:       Type: 0x22
01:46:05:WU02:FS01:0x22:       Core: Core22
01:46:05:WU02:FS01:0x22:    Website: https://foldingathome.org/
01:46:05:WU02:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
01:46:05:WU02:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
01:46:05:WU02:FS01:0x22:             <rafal.wiewiora@choderalab.org>
01:46:05:WU02:FS01:0x22:       Args: -dir 02 -suffix 01 -version 705 -lifeline 8162 -checkpoint 15
01:46:05:WU02:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
01:46:05:WU02:FS01:0x22:             0 -gpu 0
01:46:05:WU02:FS01:0x22:     Config: <none>
01:46:05:WU02:FS01:0x22:************************************ Build *************************************
01:46:05:WU02:FS01:0x22:    Version: 0.0.2
01:46:05:WU02:FS01:0x22:       Date: Dec 6 2019
01:46:05:WU02:FS01:0x22:       Time: 21:20:17
01:46:05:WU02:FS01:0x22: Repository: Git
01:46:05:WU02:FS01:0x22:   Revision: f87d92b58abdf7e6bf2e173cfbc4dc3e837c7042
01:46:05:WU02:FS01:0x22:     Branch: core22
01:46:05:WU02:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
01:46:05:WU02:FS01:0x22:    Options: -std=gnu++98 -O3 -funroll-loops
01:46:05:WU02:FS01:0x22:   Platform: linux2 4.9.87-linuxkit-aufs
01:46:05:WU02:FS01:0x22:       Bits: 64
01:46:05:WU02:FS01:0x22:       Mode: Release
01:46:05:WU02:FS01:0x22:************************************ System ************************************
01:46:05:WU02:FS01:0x22:        CPU: AMD Ryzen 9 3900X 12-Core Processor
01:46:05:WU02:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
01:46:05:WU02:FS01:0x22:       CPUs: 24
01:46:05:WU02:FS01:0x22:     Memory: 62.79GiB
01:46:05:WU02:FS01:0x22:Free Memory: 57.96GiB
01:46:05:WU02:FS01:0x22:    Threads: POSIX_THREADS
01:46:05:WU02:FS01:0x22: OS Version: 5.3
01:46:05:WU02:FS01:0x22:Has Battery: false
01:46:05:WU02:FS01:0x22: On Battery: false
01:46:05:WU02:FS01:0x22: UTC Offset: 1
01:46:05:WU02:FS01:0x22:        PID: 8166
01:46:05:WU02:FS01:0x22:        CWD: /var/lib/fahclient/work
01:46:05:WU02:FS01:0x22:         OS: Linux 5.3.0-42-generic x86_64
01:46:05:WU02:FS01:0x22:    OS Arch: AMD64
01:46:05:WU02:FS01:0x22:********************************************************************************
01:46:05:WU02:FS01:0x22:Project: 11778 (Run 0, Clone 30675, Gen 6)
01:46:05:WU02:FS01:0x22:Unit: 0x0000000b287234c95e792f710bd2efa9
01:46:05:WU02:FS01:0x22:Reading tar file core.xml
01:46:05:WU02:FS01:0x22:Reading tar file integrator.xml
01:46:05:WU02:FS01:0x22:Reading tar file state.xml
01:46:05:WU02:FS01:0x22:Reading tar file system.xml
01:46:05:WU02:FS01:0x22:Digital signatures verified
01:46:05:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
01:46:05:WU02:FS01:0x22:Version 0.0.2
01:46:11:WU02:FS01:0x22:Completed 0 out of 2000000 steps (0%)
01:46:11:WU02:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
01:46:15:WU02:FS01:0x22:ERROR:exception: clWaitForEvents
01:46:15:WU02:FS01:0x22:Saving result file ../logfile_01.txt
01:46:15:WU02:FS01:0x22:Saving result file checkpt.crc
01:46:15:WU02:FS01:0x22:Saving result file science.log
01:46:15:WU02:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
01:46:45:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
01:46:45:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:11778 run:0 clone:30675 gen:6 core:0x22 unit:0x0000000b287234c95e792f710bd2efa9
01:46:45:WU02:FS01:Uploading 9.00KiB to 40.114.52.201
01:48:59:WU02:FS01:Upload 100.00%
01:49:33:WU02:FS01:Upload complete
01:49:33:WU02:FS01:Server responded WORK_ACK (400)
01:49:33:WU02:FS01:Cleaning up


Very occasionally, like once every 20 times, I'll see this:

Code: Select all
02:00:29:WU00:FS01:0x22:Unit: 0x0000002480fccb0a5e6ebfa00627303e
02:00:29:WU00:FS01:0x22:Reading tar file core.xml
02:00:29:WU00:FS01:0x22:Reading tar file integrator.xml
02:00:29:WU00:FS01:0x22:Reading tar file state.xml
02:00:29:WU00:FS01:0x22:Reading tar file system.xml
02:00:29:WU00:FS01:0x22:Digital signatures verified
02:00:29:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
02:00:29:WU00:FS01:0x22:Version 0.0.2
02:00:37:WU00:FS01:0x22:Completed 0 out of 1000000 steps (0%)
02:00:37:WU00:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
02:00:58:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
02:00:58:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan
02:01:16:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
02:01:16:WU00:FS01:0x22:Following exception occured: Particle coordinate is nan


The CPU was overclocked, but I completely put it back to standard and it made no difference. Still the same errors.

Any help greatly appreciated. I'd like this machine running long term as it's always on and usually not doing much at all. It's used for occasional FPGA builds.

Thanks!
m1geo
 
Posts: 10
Joined: Tue Mar 31, 2020 3:07 am
Location: Cambridge, UK

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby m1geo » Wed Apr 01, 2020 2:53 am

As an update to my post yesterday, I can compile CUDA applications and they work.


bandwidthTest:
Code: Select all
george@ryzen:~/cuda-samples/Samples/bandwidthTest$ make
/usr/bin/nvcc -ccbin g++ -I../../Common  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o bandwidthTest.o -c bandwidthTest.cu
/usr/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o bandwidthTest bandwidthTest.o
mkdir -p ../../bin/x86_64/linux/release
cp bandwidthTest ../../bin/x86_64/linux/release
george@ryzen:~/cuda-samples/Samples/bandwidthTest$ ./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 1070
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)   Bandwidth(GB/s)
   32000000         13.1

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)   Bandwidth(GB/s)
   32000000         13.5

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)   Bandwidth(GB/s)
   32000000         196.1

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.


The same story with deviceQuery:
Code: Select all
george@ryzen:~/cuda-samples/Samples/deviceQuery$ make
/usr/bin/nvcc -ccbin g++ -I../../Common  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery.o -c deviceQuery.cpp
/usr/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery deviceQuery.o
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release
george@ryzen:~/cuda-samples/Samples/deviceQuery$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1070"
  CUDA Driver Version / Runtime Version          10.2 / 10.1
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 8116 MBytes (8510701568 bytes)
  (15) Multiprocessors, (128) CUDA Cores/MP:     1920 CUDA Cores
  GPU Max Clock rate:                            1835 MHz (1.84 GHz)
  Memory Clock rate:                             4004 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 9 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS
george@ryzen:~/cuda-samples/Samples/deviceQuery$


I hacked the matrix multiply example into an infinite loop and I can see the GPU utilisation hit 100% inside nvidia-settings, so I think everything is working.

Any suggestions would be greatly received. Thanks!
m1geo
 
Posts: 10
Joined: Tue Mar 31, 2020 3:07 am
Location: Cambridge, UK

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby ipkh » Wed Apr 01, 2020 3:15 am

I can't think of anything wrong about your configuration. Everything in the long file seems to backup that your card is detected properly and that cuda and opencl is detected.
My only thing thought is that maybe an update to the kernel might be needed as I personally run Xubuntuu 18.04 with a newer kernel.
ipkh
 
Posts: 134
Joined: Thu Jul 16, 2015 3:03 pm

Next

Return to Problems with NVidia drivers

Who is online

Users browsing this forum: No registered users and 2 guests

cron