GPU Disabled

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Post Reply
ralwing
Posts: 1
Joined: Fri Feb 12, 2021 9:15 pm

GPU Disabled

Post by ralwing »

Hi,
My GPU has been working fine for a few months now, but sudenly, after a reboot i see it's disabled.

Code: Select all

*********************** Log Started 2021-02-12T13:18:57Z ***********************
13:18:57:******************************* libFAH ********************************
13:18:57:           Date: Oct 20 2020
13:18:57:           Time: 20:36:39
13:18:57:       Revision: 5ca109d295a6245e2a2f590b3d0085ad5e567aeb
13:18:57:         Branch: master
13:18:57:       Compiler: GNU 8.3.0
13:18:57:        Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
13:18:57:                 -fdata-sections -O3 -funroll-loops -fno-pie
13:18:57:       Platform: linux2 5.8.0-1-amd64
13:18:57:           Bits: 64
13:18:57:           Mode: Release
13:18:57:****************************** FAHClient ******************************
13:18:57:        Version: 7.6.21
13:18:57:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
13:18:57:      Copyright: 2020 foldingathome.org
13:18:57:       Homepage: https://foldingathome.org/
13:18:57:           Date: Oct 20 2020
13:18:57:           Time: 20:39:00
13:18:57:       Revision: 6efbf0e138e22d3963e6a291f78dcb9c6422a278
13:18:57:         Branch: master
13:18:57:       Compiler: GNU 8.3.0
13:18:57:        Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
13:18:57:                 -fdata-sections -O3 -funroll-loops -fno-pie
13:18:57:       Platform: linux2 5.8.0-1-amd64
13:18:57:           Bits: 64
13:18:57:           Mode: Release
13:18:57:           Args: -v --chdir /var/snap/folding-at-home-fcole90/common
13:18:57:         Config: /var/snap/folding-at-home-fcole90/common/config.xml
13:18:57:******************************** CBang ********************************
13:18:57:           Date: Oct 20 2020
13:18:57:           Time: 18:37:59
13:18:57:       Revision: 7e4ce85225d7eaeb775e87c31740181ca603de60
13:18:57:         Branch: master
13:18:57:       Compiler: GNU 8.3.0
13:18:57:        Options: -faligned-new -std=c++11 -fsigned-char -ffunction-sections
13:18:57:                 -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
13:18:57:       Platform: linux2 5.8.0-1-amd64
13:18:57:           Bits: 64
13:18:57:           Mode: Release
13:18:57:******************************* System ********************************
13:18:57:            CPU: AMD Ryzen 5 3600X 6-Core Processor
13:18:57:         CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
13:18:57:           CPUs: 12
13:18:57:         Memory: 31.37GiB
13:18:57:    Free Memory: 30.55GiB
13:18:57:        Threads: POSIX_THREADS
13:18:57:     OS Version: 5.8
13:18:57:    Has Battery: false
13:18:57:     On Battery: false
13:18:57:     UTC Offset: 1
13:18:57:            PID: 880
13:18:57:            CWD: /var/snap/folding-at-home-fcole90/81
13:18:57:             OS: Linux 5.8.0-43-generic x86_64
13:18:57:        OS Arch: AMD64
13:18:57:           GPUs: 1
13:18:57:          GPU 0: Bus:6 Slot:0 Func:0 NVIDIA:1
13:18:57:  CUDA Device 0: Platform:0 Device:0 Bus:6 Slot:0 Compute:7.5 Driver:11.2
13:18:57:OpenCL Device 0: Platform:0 Device:0 Bus:6 Slot:0 Compute:1.2 Driver:460.27
13:18:57:***********************************************************************
13:18:57:<config>
13:18:57:  <!-- Client Control -->
13:18:57:  <client-threads v='6'/>
13:18:57:  <cycle-rate v='4'/>
13:18:57:  <cycles v='-1'/>
13:18:57:  <disable-sleep-when-active v='true'/>
13:18:57:  <exit-when-done v='false'/>
13:18:57:  <fold-anon v='false'/>
13:18:57:  <idle-seconds v='300'/>
13:18:57:  <open-web-control v='false'/>
13:18:57:  <update-gpus-txt v='true'/>
13:18:57:
13:18:57:  <!-- Configuration -->
13:18:57:  <config-rotate v='true'/>
13:18:57:  <config-rotate-dir v='configs'/>
13:18:57:  <config-rotate-max v='16'/>
13:18:57:
13:18:57:  <!-- Debugging -->
13:18:57:  <assignment-servers>
13:18:57:    assign1.foldingathome.org assign2.foldingathome.org assign3.foldingathome.org assign4.foldingathome.org 
13:18:57:  </assignment-servers>
13:18:57:  <auth-as v='true'/>
13:18:57:  <capture-directory v='capture'/>
13:18:57:  <capture-on-error v='false'/>
13:18:57:  <capture-packets v='false'/>
13:18:57:  <capture-requests v='false'/>
13:18:57:  <capture-responses v='false'/>
13:18:57:  <capture-sockets v='false'/>
13:18:57:  <debug-sockets v='false'/>
13:18:57:  <exception-locations v='true'/>
13:18:57:  <stack-traces v='false'/>
13:18:57:
13:18:57:  <!-- Error Handling -->
13:18:57:  <max-slot-errors v='10'/>
13:18:57:  <max-unit-errors v='5'/>
13:18:57:
13:18:57:  <!-- Folding Core -->
13:18:57:  <checkpoint v='15'/>
13:18:57:  <core-priority v='idle'/>
13:18:57:  <cpu-usage v='100'/>
13:18:57:  <gpu-usage v='100'/>
13:18:57:  <no-assembly v='false'/>
13:18:57:
13:18:57:  <!-- Folding Slot Configuration -->
13:18:57:  <cause v='ANY'/>
13:18:57:  <client-subtype v='LINUX'/>
13:18:57:  <client-type v='normal'/>
13:18:57:  <cpu-species v='X86_AMD'/>
13:18:57:  <cpu-type v='AMD64'/>
13:18:57:  <cpus v='-1'/>
13:18:57:  <disable-viz v='false'/>
13:18:57:  <gpu v='true'/>
13:18:57:  <gpu-beta v='false'/>
13:18:57:  <max-packet-size v='normal'/>
13:18:57:  <os-species v='UNKNOWN'/>
13:18:57:  <os-type v='LINUX'/>
13:18:57:  <project-key v='0'/>
13:18:57:  <smp v='true'/>
13:18:57:
13:18:57:  <!-- GUI -->
13:18:57:  <gui-enabled v='true'/>
13:18:57:
13:18:57:  <!-- HTTP Server -->
13:18:57:  <allow v='127.0.0.1'/>
13:18:57:  <connection-timeout v='60'/>
13:18:57:  <deny v='0/0'/>
13:18:57:  <http-addresses v='0:7396'/>
13:18:57:  <https-addresses v=''/>
13:18:57:  <max-connect-time v='900'/>
13:18:57:  <max-connections v='800'/>
13:18:57:  <max-request-length v='52428800'/>
13:18:57:  <min-connect-time v='300'/>
13:18:57:
13:18:57:  <!-- Logging -->
13:18:57:  <log v='log.txt'/>
13:18:57:  <log-color v='true'/>
13:18:57:  <log-crlf v='false'/>
13:18:57:  <log-date v='false'/>
13:18:57:  <log-date-periodically v='21600'/>
13:18:57:  <log-domain v='false'/>
13:18:57:  <log-header v='true'/>
13:18:57:  <log-level v='true'/>
13:18:57:  <log-no-info-header v='true'/>
13:18:57:  <log-redirect v='false'/>
13:18:57:  <log-rotate v='true'/>
13:18:57:  <log-rotate-dir v='logs'/>
13:18:57:  <log-rotate-max v='16'/>
13:18:57:  <log-short-level v='false'/>
13:18:57:  <log-simple-domains v='true'/>
13:18:57:  <log-thread-id v='false'/>
13:18:57:  <log-thread-prefix v='true'/>
13:18:57:  <log-time v='true'/>
13:18:57:  <log-to-screen v='true'/>
13:18:57:  <log-truncate v='false'/>
13:18:57:  <verbosity v='3'/>
13:18:57:
13:18:57:  <!-- Network -->
13:18:57:  <proxy v=':8080'/>
13:18:57:  <proxy-enable v='false'/>
13:18:57:  <proxy-pass v='*****'/>
13:18:57:  <proxy-user v=''/>
13:18:57:
13:18:57:  <!-- Process Control -->
13:18:57:  <child v='false'/>
13:18:57:  <daemon v='false'/>
13:18:57:  <fork v='false'/>
13:18:57:  <pid v='false'/>
13:18:57:  <pid-file v='FAHClient.pid'/>
13:18:57:  <respawn v='false'/>
13:18:57:  <service v='false'/>
13:18:57:
13:18:57:  <!-- Remote Command Server -->
13:18:57:  <command-address v='0.0.0.0'/>
13:18:57:  <command-allow-no-pass v='127.0.0.1'/>
13:18:57:  <command-deny-no-pass v='0/0'/>
13:18:57:  <command-enable v='true'/>
13:18:57:  <command-port v='36330'/>
13:18:57:
13:18:57:  <!-- Slot Control -->
13:18:57:  <auto-conf v='true'/>
13:18:57:  <idle v='false'/>
13:18:57:  <max-shutdown-wait v='60'/>
13:18:57:  <pause-on-battery v='true'/>
13:18:57:  <pause-on-start v='false'/>
13:18:57:  <paused v='false'/>
13:18:57:  <power v='full'/>
13:18:57:
13:18:57:  <!-- User Information -->
13:18:57:  <machine-id v='0'/>
13:18:57:  <passkey v='*****'/>
13:18:57:  <team v='yyy'/>
13:18:57:  <user v='xxx'/>
13:18:57:
13:18:57:  <!-- Web Server -->
13:18:57:  <web-allow v='127.0.0.1'/>
13:18:57:  <web-deny v='0/0'/>
13:18:57:  <web-enable v='true'/>
13:18:57:
13:18:57:  <!-- Web Server Sessions -->
13:18:57:  <session-cookie v='sid'/>
13:18:57:  <session-lifetime v='86400'/>
13:18:57:  <session-timeout v='3600'/>
13:18:57:
13:18:57:  <!-- Work Unit Control -->
13:18:57:  <dump-after-deadline v='true'/>
13:18:57:  <max-queue v='16'/>
13:18:57:  <max-units v='0'/>
13:18:57:  <next-unit-percentage v='99'/>
13:18:57:  <stall-detection-enabled v='false'/>
13:18:57:  <stall-percent v='5'/>
13:18:57:  <stall-timeout v='1800'/>
13:18:57:
13:18:57:  <!-- Folding Slots -->
13:18:57:  <slot id='0' type='CPU'>
13:18:57:    <paused v='true'/>
13:18:57:  </slot>
13:18:57:  <slot id='1' type='GPU'>
13:18:57:    <pci-bus v='6'/>
13:18:57:    <pci-slot v='0'/>
13:18:57:  </slot>
13:18:57:</config>
13:18:57:Trying to access database...
13:18:57:Successfully acquired database lock
13:18:57:FS00:Initialized folding slot 00: cpu:11
13:18:57:WARNING:FS01:Disabling beta GPU slot 01: gpu:6:0.  Beta GPUs can be tested for no points by setting ``gpu-beta=true`` in the configuration.
nvidia-smi:

Code: Select all

Fri Feb 12 22:22:45 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04    Driver Version: 460.27.04    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 206...  On   | 00000000:06:00.0  On |                  N/A |
| 55%   45C    P2    37W / 184W |    696MiB /  7981MiB |      9%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1079      G   /usr/lib/xorg/Xorg                 59MiB |
|    0   N/A  N/A      1864      G   /usr/lib/xorg/Xorg                335MiB |
|    0   N/A  N/A      1992      G   /usr/bin/gnome-shell               57MiB |
|    0   N/A  N/A      3382      G   ...gAAAAAAAAA --shared-files       10MiB |
|    0   N/A  N/A      3671      G   ...AAAAAAAA== --shared-files       28MiB |
|    0   N/A  N/A      4134      C   /usr/NX/bin/nxnode.bin            149MiB |
|    0   N/A  N/A      8878      G   ...gAAAAAAAAA --shared-files       40MiB |
+-----------------------------------------------------------------------------+

I can't downgrade the cuda nor the nvidia-drivers, any other ideas?
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU Disabled

Post by bruce »

FAHCore_22 manages the CUDA version. In order to support a wide range of GPUs, everything you need for CUDA is dowloaded with that FAHCore. If you have installed CUDA or you try to change it's version you are probably creating conflicts.

FAH assumes you have installed the nV driver directly from nvidia.com.

OpenCL is not managed by the FAHCore but it's packaged with the nV driver.

I'm not quite sure what made FAH decide your GPU was a beta GPU but that's probably why it was disabled.
13:18:57:WARNING:FS01:Disabling beta GPU slot 01: gpu:6:0. Beta GPUs can be tested for no points by setting ``gpu-beta=true`` in the configuration.
A beta GPU is not the same as client-type beta.
bikeaddict
Posts: 186
Joined: Sun May 03, 2020 1:20 am

Re: GPU Disabled

Post by bikeaddict »

My machines were getting that beta message while having problems with DNS lookups. i think it was trying to download the GPUs.txt file and got confused when it failed.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU Disabled

Post by bruce »

It's not clear what GPU you have or what video driver is running. You might run lspci and see what it tells you. I see

GPU 0: Bus:6 Slot:0 Func:0 NVIDIA:1
CUDA Device 0: Platform:0 Device:0 Bus:6 Slot:0 Compute:7.5 Driver:11.2

elsewhere is says

GeForce RTX 206...

which doesn't describe a supported GPU that I can find. GPUs classified as NVIDIA:1 are not supported which is why it's being called a beta GPU. The RTX 2060 (if that's what you actually have) should be listed by 'lspci' and should be found in GPUs.txt.

If your local copy of GPUs.txt is corrupted, delete it and restart FAHClient and it will download a fresh copy.

Please reset the verbosity to the default value of 3.
gunnarre
Posts: 567
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: GPU Disabled

Post by gunnarre »

I had the same failure mode. I deleted GPUs.txt and restarted the client, which made the GPU start folding again. Could it be that some of the servers are delivering a bogus GPUs.txt file to the clients?
Image
Online: GTX 1660 Super, GTX 1080, GTX 1050 Ti 4G OC, RX580 + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 960, GTX 950
Joe_H
Site Admin
Posts: 7854
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: GPU Disabled

Post by Joe_H »

There should just be one source for the GPUs.txt file. What can happen is for the file to not download completely, or be modified by AV scans. Causes can vary and be the same as interfere with other downloads and uploads done by the client.

One common problem is the client starting up before the system it is on establishes its network connection to the internet completely. In this case the client may delete the existing file and replace it with an empty file when the download fails.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
gunnarre
Posts: 567
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: GPU Disabled

Post by gunnarre »

Yes, that is likely what happened. The computer in question only has wireless internet, and it doesn't always establish the network connection before FAH starts.
Image
Online: GTX 1660 Super, GTX 1080, GTX 1050 Ti 4G OC, RX580 + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 960, GTX 950
bikeaddict
Posts: 186
Joined: Sun May 03, 2020 1:20 am

Re: GPU Disabled

Post by bikeaddict »

Since it dumps whatever WU it was working on when this happens, I set pause-on-start to true in case it ever happens again. The minor downside is I have to manually restart folding after a reboot or a FAHClient restart.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU Disabled

Post by bruce »

bikeaddict wrote:Since it dumps whatever WU it was working on when this happens, I set pause-on-start to true in case it ever happens again. The minor downside is I have to manually restart folding after a reboot or a FAHClient restart.
I'm not sure that will work. When FAHClient restarts, it normally reconfigures you GPUs. If it doesn't find your GPU in GPUs.txt, then it does not configure a GPU which presumably leaves a half-processed WU with no device that can process it. The WU will be dumped.

I consider it a bug that FAHClient can start before the internet is fully initialized, but that's considered an enhancement so it probably won't be fixed any time soon. My recommendation would be to remove the auto-start script (depending on your OS) and manually start FAHClient after the internet is functional with the script that was provided during the install. Automatically dumping the active WU isn't a good idea. Sending yourself a message reminding yourself to start FAHClient is a good idea, too.
Itachi4eva
Posts: 3
Joined: Sat Jun 03, 2023 1:46 am

Re: GPU Disabled

Post by Itachi4eva »

Hi there I had the same problem. However, I had another instance of Folding@Home installed on my computer at home so I just literally copied the GPU.txt file from C:\Program Data\FAHCLient folder (the Program Data folder is usually hidden so you have to unhide it using the Folder Options) from the working PC to the same location of the new installation. If this bug is so simple then the Folding@Home team need to just put the GPU.txt file on their website so you can download and replace. Note I had the correct NVidia drivers already installed and it wasn't an internet not connecting before the application problem since the program wasn't set to run at startup and the internet was connected before I manually launched the program and still encountered the "Beta GPU" error message. I hope this helps
Joe_H
Site Admin
Posts: 7854
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: GPU Disabled

Post by Joe_H »

Itachi4eva wrote: Sat Jun 03, 2023 1:52 am If this bug is so simple then the Folding@Home team need to just put the GPU.txt file on their website so you can download and replace.
The download link has been posted many times - https://apps.foldingathome.org/GPUs.txt
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply