Unable to run NVIDIA GPU with driver 535 [Solved]

Moderators: Site Moderators, FAHC Science Team

jloflin
Posts: 9
Joined: Mon Apr 06, 2020 2:17 pm

Unable to run NVIDIA GPU with driver 535 [Solved]

Post by jloflin »

I just installed FAH V8.1.18 this morning because V7.6.1 would not use my gpu. The cpu was found and started folding immediately. My gpu, Nvidia gtx2060 was not found on the setup page. When I dropped down to the Nvidia drive 525, everything works. Here is my log from my attempt with the 535 driver. If anyone can point out why the gpu wasn't seen, I would appreciate it.

I am running Linux Mint V21.2.

Code: Select all

16:06:31:I1:*********************** Folding@home Client ***********************
16:06:31:I1: Version: 8.1.18
16:06:31:I1: Author: Joseph Coffland 
16:06:31:I1: Org: foldingathome.org
16:06:31:I1: Copyright: 2023 foldingathome.org
16:06:31:I1: Homepage: https://foldingathome.org/
16:06:31:I1: License: https://www.gnu.org/licenses/gpl-3.0.txt
16:06:31:I1: Date: Apr 18 2023
16:06:31:I1: Time: 12:09:09
16:06:31:I1: Revision: 80a3d5eb8f60f7833de2954087682958b511895c
16:06:31:I1: Branch: master
16:06:31:I1: Compiler: GNU 10.2.1 20210110
16:06:31:I1: Options: -faligned-new -std=c++17 -fsigned-char -ffunction-sections
16:06:31:I1: -fdata-sections -O3 -funroll-loops -fno-pie
16:06:31:I1: Platform: linux 5.10.0-16-cloud-amd64
16:06:31:I1: Bits: 64
16:06:31:I1: Mode: Release
16:06:31:I1: Args: --log=/var/log/fah-client/log.txt
16:06:31:I1: --log-rotate-dir=/var/log/fah-client/
16:06:31:I1:****************************** CBang ******************************
16:06:31:I1: Version: 1.7.2
16:06:31:I1: Author: Joseph Coffland 
16:06:31:I1: Org: Cauldron Development LLC
16:06:31:I1: Copyright: Cauldron Development LLC, 2003-2023
16:06:31:I1: Homepage: https://cauldrondevelopment.com/
16:06:31:I1: License: GPL 2+
16:06:31:I1: Date: Apr 14 2023
16:06:31:I1: Time: 16:26:30
16:06:31:I1: Revision: ac8bbdd5bb93c01679a881f5962fed800bf29e58
16:06:31:I1: Branch: master
16:06:31:I1: Compiler: GNU 10.2.1 20210110
16:06:31:I1: Options: -faligned-new -std=c++17 -fsigned-char -ffunction-sections
16:06:31:I1: -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
16:06:31:I1: Platform: linux 5.10.0-16-cloud-amd64
16:06:31:I1: Bits: 64
16:06:31:I1: Mode: Release
16:06:31:I1:***************************** System ******************************
16:06:31:I1: CPU: AMD Ryzen 9 3900X 12-Core Processor
16:06:31:I1: CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:06:31:I1: CPUs: 24
16:06:31:I1: Memory: 31.27GiB
16:06:31:I1:Free Memory: 30.70GiB
16:06:31:I1: Threads: POSIX_THREADS
16:06:31:I1: OS Version: 5.15
16:06:31:I1:Has Battery: false
16:06:31:I1: On Battery: false
16:06:31:I1: UTC Offset: -6
16:06:31:I1: PID: 980
16:06:31:I1: CWD: /var/lib/fah-client
16:06:31:I1: Exec: /usr/bin/fah-client
16:06:31:I1:*******************************************************************
16:06:31:I2:
16:06:31:I1:Opening Database
16:06:31:I1:Listening for HTTP on 127.0.0.1:7396
16:06:31:I3:id = 1+sDgcOpZoS9yXJUQR5PI1od3HBjODohpyeXUQ6qa4E=
16:06:31:I3:Loading work unit 1 to group '' with ID 6Nv0fCGb_TS-ToxwvBY_HvlW1iRv0EP_s2ozw2_vOk8
16:06:31:I3:Loaded 1 wus.
16:06:31:E :Exception: clGetPlatformIDs() returned -1001
16:06:31:E :Exception: cuInit() returned 100
16:06:31:I3:gpus = {
16:06:31:I3: "gpu:09:00:00": {"vendor": 4318, "device": 7944, "type": "nvidia", "supported": false}
16:06:31:I3:}
16:06:31:I1:Loaded cores/fahcore-a8-lin-64bit-avx2_256-0.0.12/FahCore_a8
16:06:31:I3::WU1:Running FahCore: /var/lib/fah-client/cores/fahcore-a8-lin-64bit-avx2_256-0.0.12/FahCore_a8 -dir 6Nv0fCGb_TS-ToxwvBY_HvlW1iRv0EP_s2ozw2_vOk8 -suffix 01 -version 8.1.18 -lifeline 980 -np 23
16:06:31:I3::WU1:Started FahCore on PID 1047
16:06:32:I1::WU1:*********************** Log Started 2023-08-10T16:06:31Z ***********************
16:06:32:I1::WU1:************************** Gromacs Folding@home Core ***************************
16:06:32:I1::WU1: Core: Gromacs
16:06:32:I1::WU1: Type: 0xa8
16:06:32:I1::WU1: Version: 0.0.12
16:06:32:I1::WU1: Author: Joseph Coffland 
16:06:32:I1::WU1: Copyright: 2020 foldingathome.org
16:06:32:I1::WU1: Homepage: https://foldingathome.org/
16:06:32:I1::WU1: Date: Jan 16 2021
16:06:32:I1::WU1: Time: 19:24:44
16:06:32:I1::WU1: Compiler: GNU 8.3.0
16:06:32:I1::WU1: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
16:06:32:I1::WU1: -fdata-sections -O3 -funroll-loops -fno-pie
16:06:32:I1::WU1: Platform: linux2 4.15.0-128-generic
16:06:32:I1::WU1: Bits: 64
16:06:32:I1::WU1: Mode: Release
16:06:32:I1::WU1: SIMD: avx2_256
16:06:32:I1::WU1: OpenMP: ON
16:06:32:I1::WU1: CUDA: OFF
16:06:32:I1::WU1: Args: -dir 6Nv0fCGb_TS-ToxwvBY_HvlW1iRv0EP_s2ozw2_vOk8 -suffix 01
16:06:32:I1::WU1: -version 8.1.18 -lifeline 980 -np 23
16:06:32:I1::WU1:************************************ libFAH ************************************
16:06:32:I1::WU1: Date: Jan 16 2021
16:06:32:I1::WU1: Time: 19:21:38
16:06:32:I1::WU1: Compiler: GNU 8.3.0
16:06:32:I1::WU1: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
16:06:32:I1::WU1: -fdata-sections -O3 -funroll-loops -fno-pie
16:06:32:I1::WU1: Platform: linux2 4.15.0-128-generic
16:06:32:I1::WU1: Bits: 64
16:06:32:I1::WU1: Mode: Release
16:06:32:I1::WU1:************************************ CBang *************************************
16:06:32:I1::WU1: Date: Jan 16 2021
16:06:32:I1::WU1: Time: 19:21:24
16:06:32:I1::WU1: Compiler: GNU 8.3.0
16:06:32:I1::WU1: Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
16:06:32:I1::WU1: -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
16:06:32:I1::WU1: Platform: linux2 4.15.0-128-generic
16:06:32:I1::WU1: Bits: 64
16:06:32:I1::WU1: Mode: Release
16:06:32:I1::WU1:************************************ System ************************************
16:06:32:I1::WU1: CPU: AMD Ryzen 9 3900X 12-Core Processor
16:06:32:I1::WU1: CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:06:32:I1::WU1: CPUs: 24
16:06:32:I1::WU1: Memory: 31.27GiB
16:06:32:I1::WU1:Free Memory: 30.63GiB
16:06:32:I1::WU1: Threads: POSIX_THREADS
16:06:32:I1::WU1: OS Version: 5.15
16:06:32:I1::WU1:Has Battery: false
16:06:32:I1::WU1: On Battery: false
16:06:32:I1::WU1: UTC Offset: -6
16:06:32:I1::WU1: PID: 1047
16:06:32:I1::WU1: CWD: /var/lib/fah-client/work
16:06:32:I1::WU1:******************************************************************************** 
Last edited by jloflin on Sun Aug 13, 2023 3:27 pm, edited 1 time in total.
toTOW
Site Moderator
Posts: 6367
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Unable to run NVIDIA GPU with driver 535

Post by toTOW »

16:06:31:E :Exception: clGetPlatformIDs() returned -1001
16:06:31:E :Exception: cuInit() returned 100
These errors means that the client can't open the OpenCL and CUDA libraries ... they were probably similar with v7 client. This is usually caused by permission issues.

Is the client started with the service command created by client installer ?
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
jloflin
Posts: 9
Joined: Mon Apr 06, 2020 2:17 pm

Re: Unable to run NVIDIA GPU with driver 535

Post by jloflin »

These errors means that the client can't open the OpenCL and CUDA libraries ... they were probably similar with v7 client. This is usually caused by permission issues.

Is the client started with the service command created by client installer ?
Yes. I followed the directions at:
https://foldingathome.org/foldinghome-v ... de/?lng=en
and the gpu part of the program would not function with the Nvidia 535 driver, but works perfectly with the Nvidia 525 driver.
toTOW
Site Moderator
Posts: 6367
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Unable to run NVIDIA GPU with driver 535

Post by toTOW »

There are no references to service commands or how to start/stop the client on this page ... so how do you start the client ?

It looks like this : viewtopic.php?p=361627#p361627
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
jloflin
Posts: 9
Joined: Mon Apr 06, 2020 2:17 pm

Re: Unable to run NVIDIA GPU with driver 535

Post by jloflin »

I followed these directions from the V8 Client Guide:
Client home page

The client home page is the first screen you will encounter. After connecting, it will display the status of your F@H client. If you have not yet configured a username, team or passkey a dialog will popup asking you to do so via the Settings page or choose to fold anonymously.
This dialog appears if you have not yet configured your client.

Once configured, the header at the top of the page will show your username, team and the points earned so far. The buttons in the top right provide quick access to the client settings and log viewer.

Below the header, in the body of the page, is a large green Start Folding button. After configuring your user settings, click this button to get going. Click it again when you want to pause folding.
With Nvidia driver 535, on the settings page, the gpu line showed as disabled and I was not able to click the enable box (greyed out). With the Nvidia 525 driver, I clicked the enable box on the settings page and the gpu started folding.
toTOW
Site Moderator
Posts: 6367
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Unable to run NVIDIA GPU with driver 535

Post by toTOW »

If the drivers were the issue, we would know it : everyone would be complaining. And I have a v7 client running perfectly fine with those drivers.

If you look at this thread, you'll see that the client has a different behaviour after installing it and when you restart it later : viewtopic.php?t=39636
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
jloflin
Posts: 9
Joined: Mon Apr 06, 2020 2:17 pm

Re: Unable to run NVIDIA GPU with driver 535

Post by jloflin »

So, just to check things out, I paused folding (Nvidia driver 525), went to the Driver Manager, selected the Nvidia 535 driver, applied it, rebooted it, restarted folding, and now it works. I have no idea what went wrong, or what fixed it.
Anyway, it's running correctly now.
AthanSpod
Posts: 11
Joined: Wed Mar 25, 2020 8:27 am

Re: Unable to run NVIDIA GPU with driver 535 [Solved]

Post by AthanSpod »

There's absolutely something weird going on with the v8 client.

I had this working, but a few months back when preparing to be utilising F@H as heating over the colder months. I then paused it and did nothing with it other than letting the service run on boot up (this is my desktop, shutdown each night currently).

Today I decided I needed that bit of heat, so unpaused the client. The CPU picked up a work unit immediately. The GPU did not.

The log of the failed attempts, multiple

Code: Select all

systemctl restart fah-client.service
showed the:

Code: Select all

OpenCL not supported: clGetPlatformIDs() returned -1001
CUDA not supported: cuInit() returned 999
issue.

I then started playing with starting it directly, as the `fah-client` user, under strace to see if I could find what was going on. Magically it started working, downloading 'cores' for the GPU. Now after exiting that and `systemctl start fah-client.service` it's happily running on both CPU and GPU. Specifically I used:

Code: Select all

cd /var/lib/fah-client && strace -o fah-client -s 4096 -f -ff /usr/bin/fah-client --config=/etc/fah-client/config.xml --log=/var/log/fah-client/log.txt --log-rotate-dir=/var/log/fah-client/
(I should have specified a full path into /var/tmp/strace/fah-client/ on the -o argument to strace, but not doing so only resulted in the strace output being in /var/lib/fah-client/)

This is on a Debian 12/bookworm system, using the package from https://download.foldingathome.org/rele ... _amd64.deb . `id fah-client`

Code: Select all

uid=997(fah-client) gid=996(fah-client) groups=996(fah-client),44(video),137(render)
but that seems moot as:

Code: Select all

16:54:19 0$ ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195, 254 Nov  3 08:25 /dev/nvidia-modeset
crw-rw-rw- 1 root root 238,   0 Nov  3 08:25 /dev/nvidia-uvm
crw-rw-rw- 1 root root 238,   1 Nov  3 08:25 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Nov  3 08:25 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Nov  3 08:25 /dev/nvidiactl

/dev/nvidia-caps:
total 0
cr-------- 1 root root 241, 1 Nov  3 08:25 nvidia-cap1
cr--r--r-- 1 root root 241, 2 Nov  3 08:25 nvidia-cap2
I did not install, uninstall or reinstall anything between it complaining about those CUDA calls and it starting to work again. This feels like there's some other issue, with the v8 client code, that then results in the CUDA errors being logged, rather than those pointing to the root cause.
AthanSpod
Posts: 11
Joined: Wed Mar 25, 2020 8:27 am

Re: Unable to run NVIDIA GPU with driver 535 [Solved]

Post by AthanSpod »

Further investigation, looking at the systemd unit that's supplied. It has some lines aimed at securing the system against the client causing side effects:

Code: Select all

PrivateTmp=yes
NoNewPrivileges=yes
ProtectSystem=full
ProtectHome=yes
So, I rebooted first with `PrivateTmp` line commented out - no change. The web page shows "Resources not available" for the GPU. But doing the same with only `NoNewPrivileges` commented out results in it working.

This is with NVIDIA drivers 535.216.01 (although I'll try the 550 series later today... I had been using 560 ones, but those don't have the recent security fix). I'm using my own kernel, but the .config is copied from Debian ones.
AthanSpod
Posts: 11
Joined: Wed Mar 25, 2020 8:27 am

Re: Unable to run NVIDIA GPU with driver 535 [Solved]

Post by AthanSpod »

Edit: I brainfarted, it's `NoNewPrivileges` that makes the difference, not `ProtectSystem`. Editing ... done.

Using an override (i.e. `systemctl edit fah-client.service`) to set NoNewPrivileges to "no" also results in the GPU working, as expected. Now, going by the systemd.exec man page this would mean that something in the code is utilising `execve()` on a setuid or setgid binary. Is there any chance the libcuda calls are actually doing some such thing ?

Of course it's possible there's something else going on, perhaps some sort of race condition, and this is a red herring. I'll report back if it stops working for me again. Right now I'm going to go install the 550 series latest driver.
Marcos FRM
Posts: 23
Joined: Fri Feb 23, 2024 6:26 pm

Re: Unable to run NVIDIA GPU with driver 535 [Solved]

Post by Marcos FRM »

AthanSpod wrote: Mon Nov 04, 2024 11:51 amUsing an override (i.e. `systemctl edit fah-client.service`) to set NoNewPrivileges to "no" also results in the GPU working, as expected. Now, going by the systemd.exec man page this would mean that something in the code is utilising `execve()` on a setuid or setgid binary. Is there any chance the libcuda calls are actually doing some such thing ?
This might be the case.

https://manpages.ubuntu.com/manpages/or ... obe.1.html

I expected device nodes to be created by the modules themselves, or at least for the driver to have udev rules for this purpose.

Removing NoNewPrivileges= from the systemd unit file is unfortunate.
toTOW
Site Moderator
Posts: 6367
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Unable to run NVIDIA GPU with driver 535 [Solved]

Post by toTOW »

Please update your client to the latest. v8.1.x is old, filled with bugs and no longer supported.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
calxalot
Site Moderator
Posts: 1156
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Unable to run NVIDIA GPU with driver 535 [Solved]

Post by calxalot »

They are. An old thread was kinda hijacked.
AthanSpod
Posts: 11
Joined: Wed Mar 25, 2020 8:27 am

Re: Unable to run NVIDIA GPU with driver 535 [Solved]

Post by AthanSpod »

Indeed, it's a little hidden in the link, but I'm using 8.3.18.
AthanSpod
Posts: 11
Joined: Wed Mar 25, 2020 8:27 am

Re: Unable to run NVIDIA GPU with driver 535 [Solved]

Post by AthanSpod »

Alternative solution.

1. It's the `nvidia_uvm` module that needs to be loaded for CUDA support.
2. Ensuring this is already loaded allows fah-client to detect and use CUDA support.
3. This is also true if the fah-client.service systemd unit still has `NoNewPrivileges=yes` in effect.

Presumably anyone else not seeing this issue has the module (auto) loading otherwise. I've put a line in my `/etc/modules` for it now.
Post Reply