[NOT solved] CUDA and OpenCL not detected (Nvidia; Manjaro)

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

[NOT solved] CUDA and OpenCL not detected (Nvidia; Manjaro)

Postby Pezlu » Thu Apr 16, 2020 3:31 pm

Hello everyone, I started folding a few days ago and I'm new to the forums.
After folding for a few days I noticed I never got a WU for my GPU (my CPU has always folded fine) so I checked the logs, did some research and found out that I had to install OpenCL. I reinstalled the driver, various OpenCL packages and the fah client, and managed to get my GPU to fold, but only until I reboot; whenever I restart my PC, I get error messages.

I've read a lot of topics about issues with Nvidia cards and OpenCL, but none that addresses my specific issue. Also, my current situation could shed some light on other similar issues that have been reported, since in my case it's partially working.

I'm on Manjaro Linux (Arch based) and my card is a Nvidia Geforce GTX 660.

I currently have the following OpenCL related packages installed:
    ocl-icd
    opencl-nvidia-440xx
    lib32-ocl-icd
    lib32-opencl-nvidia-440xx
    opencl-headers
    clinfo

After installing these packages, stopping foldingathome.service, reinstalling the fah client (foldingathome-noroot), and starting foldingathome.service again, my GPU was recognized and started folding. However, when I reboot, I get the following errors:
Code: Select all
ERROR:WU00:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually

Code: Select all
WU00:FS01:0x22:ERROR:exception: There is no registered Platform called "OpenCL"


Log snippet:
Code: Select all
*********************** Log Started 2020-04-16T13:56:00Z ***********************
13:58:14:FS01:Unpaused
13:58:14:WU00:FS01:Starting
13:58:14:ERROR:WU00:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
13:58:18:WU00:FS01:Starting
13:58:18:ERROR:WU00:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
13:58:48:FS01:Paused
13:58:57:FS01:Unpaused
13:59:18:WU00:FS01:Starting
13:59:18:WU00:FS01:Running FahCore: /opt/fah/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 705 -lifeline 882 -checkpoint 15 -gpu-vendor nvidia -opencl-device 0 -gpu 0
13:59:18:WU00:FS01:Started FahCore on PID 2587
13:59:18:WU00:FS01:Core PID:2591
13:59:18:WU00:FS01:FahCore 0x22 started
13:59:20:WU00:FS01:0x22:*********************** Log Started 2020-04-16T13:59:20Z ***********************
13:59:20:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
13:59:20:WU00:FS01:0x22:       Type: 0x22
13:59:20:WU00:FS01:0x22:       Core: Core22
13:59:20:WU00:FS01:0x22:    Website: https://foldingathome.org/
13:59:20:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
13:59:20:WU00:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
13:59:20:WU00:FS01:0x22:             <rafal.wiewiora@choderalab.org>
13:59:20:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 705 -lifeline 2587 -checkpoint 15
13:59:20:WU00:FS01:0x22:             -gpu-vendor nvidia -opencl-device 0 -gpu 0
13:59:20:WU00:FS01:0x22:     Config: <none>
13:59:20:WU00:FS01:0x22:************************************ Build *************************************
13:59:20:WU00:FS01:0x22:    Version: 0.0.2
13:59:20:WU00:FS01:0x22:       Date: Dec 6 2019
13:59:20:WU00:FS01:0x22:       Time: 21:20:17
13:59:20:WU00:FS01:0x22: Repository: Git
13:59:20:WU00:FS01:0x22:   Revision: f87d92b58abdf7e6bf2e173cfbc4dc3e837c7042
13:59:20:WU00:FS01:0x22:     Branch: core22
13:59:20:WU00:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
13:59:20:WU00:FS01:0x22:    Options: -std=gnu++98 -O3 -funroll-loops
13:59:20:WU00:FS01:0x22:   Platform: linux2 4.9.87-linuxkit-aufs
13:59:20:WU00:FS01:0x22:       Bits: 64
13:59:20:WU00:FS01:0x22:       Mode: Release
13:59:20:WU00:FS01:0x22:************************************ System ************************************
13:59:20:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
13:59:20:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 42 Stepping 7
13:59:20:WU00:FS01:0x22:       CPUs: 8
13:59:20:WU00:FS01:0x22:     Memory: 7.76GiB
13:59:20:WU00:FS01:0x22:Free Memory: 4.01GiB
13:59:20:WU00:FS01:0x22:    Threads: POSIX_THREADS
13:59:20:WU00:FS01:0x22: OS Version: 5.6
13:59:20:WU00:FS01:0x22:Has Battery: false
13:59:20:WU00:FS01:0x22: On Battery: false
13:59:20:WU00:FS01:0x22: UTC Offset: 2
13:59:20:WU00:FS01:0x22:        PID: 2591
13:59:20:WU00:FS01:0x22:        CWD: /var/lib/private/fah/work
13:59:20:WU00:FS01:0x22:         OS: Linux 5.6.3-2-MANJARO x86_64
13:59:20:WU00:FS01:0x22:    OS Arch: AMD64
13:59:20:WU00:FS01:0x22:********************************************************************************
13:59:20:WU00:FS01:0x22:Project: 11742 (Run 0, Clone 8123, Gen 58)
13:59:20:WU00:FS01:0x22:Unit: 0x000000558ca304f15e6bc4f29544d655
13:59:20:WU00:FS01:0x22:Digital signatures verified
13:59:20:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
13:59:20:WU00:FS01:0x22:Version 0.0.2
13:59:20:WU00:FS01:0x22:  Found a checkpoint file
13:59:20:WU00:FS01:0x22:ERROR:exception: There is no registered Platform called "OpenCL"
13:59:20:WU00:FS01:0x22:Saving result file ../logfile_01.txt
13:59:20:WU00:FS01:0x22:Saving result file checkpointState.xml
13:59:23:WU00:FS01:0x22:Saving result file checkpt.crc
13:59:23:WU00:FS01:0x22:Saving result file positions.xtc
13:59:24:WU00:FS01:0x22:Saving result file science.log
13:59:24:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
13:59:24:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
13:59:24:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11742 run:0 clone:8123 gen:58 core:0x22 unit:0x000000558ca304f15e6bc4f29544d655
13:59:24:WU00:FS01:Uploading 4.08MiB to 140.163.4.241
13:59:24:WU00:FS01:Connecting to 140.163.4.241:8080
13:59:25:WU01:FS01:Connecting to 65.254.110.245:8080
13:59:26:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
13:59:26:WU01:FS01:Connecting to 18.218.241.186:80
13:59:26:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
13:59:26:ERROR:WU01:FS01:Exception: Could not get an assignment
13:59:27:WU01:FS01:Connecting to 65.254.110.245:8080
13:59:27:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
13:59:27:WU01:FS01:Connecting to 18.218.241.186:80
13:59:28:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
13:59:28:ERROR:WU01:FS01:Exception: Could not get an assignment
13:59:30:WU00:FS01:Upload 65.87%
13:59:34:WU00:FS01:Upload complete
13:59:34:WU00:FS01:Server responded WORK_ACK (400)
13:59:34:WU00:FS01:Cleaning up
14:00:27:WU01:FS01:Connecting to 65.254.110.245:8080
14:00:27:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
14:00:27:WU01:FS01:Connecting to 18.218.241.186:80
14:00:27:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
14:00:27:ERROR:WU01:FS01:Exception: Could not get an assignment
14:02:04:WU01:FS01:Connecting to 65.254.110.245:8080
14:02:04:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
14:02:04:WU01:FS01:Connecting to 18.218.241.186:80
14:02:05:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
14:02:05:ERROR:WU01:FS01:Exception: Could not get an assignment


I can "force" the GPU to restart by doing the following:
    running systemctl stop foldingathome.service and systemctl disable foldingathome.service
    uninstalling foldingathome-noroot
    rebooting
    installing foldingathome-noroot
    running cd /opt/fah and sudo ./FAHClient --configure
    running systemctl enable --now foldingathome.service

After doing this, the GPU works. However, when I reboot, I get the errors I mentioned earlier again, and I have to redo the whole procedure.

The issue obviously resides in my setup, but I can't pinpoint the issue.

Thanks in advance for all the help! I hope I'll soon be able to fold a little more.
Last edited by Pezlu on Fri Apr 17, 2020 3:55 pm, edited 2 times in total.
Pezlu
 
Posts: 7
Joined: Thu Apr 16, 2020 3:03 pm

Re: OpenCL not detected after reboot (Geforce GTX 660; Manja

Postby Pezlu » Thu Apr 16, 2020 11:04 pm

Update: after various tries (documented here) I found out that I did not have cuda installed and also that foldingathome-noroot has issues detecting cuda. I installed foldingathome (root version) in its place and now everything works.

This information may be useful to anyone who is struggling to get his GPU to fold under Linux.

If anyone knows a way to let foldingathome-noroot detect cuda as if it were root I'd be happy to know and revert to that client. Anyway, I'm marking the thread as solved.
Pezlu
 
Posts: 7
Joined: Thu Apr 16, 2020 3:03 pm

Re: [SOLVED] OpenCL not detected after reboot (Nvidia; Manja

Postby Pezlu » Fri Apr 17, 2020 3:54 pm

EDIT: the issue is being discussed in this topic in the Manjaro forum: https://forum.manjaro.org/t/foldingatho ... l/136566/1
Basically, the new version of the foldingathome package in the AUR does not run as root, and this seems to stop it from being able to detect CUDA and OpenCL. Thus, the GPU is recognized, but cannot fold and is usually not assigned any WU.
Other issues in the original post have been solved, including the "resetting config.xml" I mentioned below.

Original post:
I retract what I posted yesterday: today's update brought back the issue. The foldingathome package was updated to 7.6.8-2 (as well as fahcontrol, 7.6.8.1-1, and fahviewer, 7.6.8-1).
However, it now works similar to foldimgathome-noroot:
  • it uses /var/lib/private/fah/ as a working directory (as opposed to /opt/fah/, which was used by the root version up until this morning);
  • it does NOT detect CUDA and OpenCL;
  • it does not even add a GPU slot by default, and if I try to add one (either by fahcontrol or by manually editing config.xml) its immediately removed, as if the file was reset;
  • after the update the line <gpu v='false'/> was added to config.xml, although I was able to change it to true.
The package foldingathome-noroot is not available any more in the AUR, so I guess this is intended to be the new default.

I'm removing the "solved" mark from the topic, hoping someone can help
Pezlu
 
Posts: 7
Joined: Thu Apr 16, 2020 3:03 pm

Re: [NOT solved] CUDA and OpenCL not detected (Nvidia; Manja

Postby Perforator-UFO » Sat Apr 18, 2020 12:20 am

Same issue here.
New install of "folding-at-home-fcole90" via snap, I'm on latest Manjaro and latest 5.6 kernel.
Perforator-UFO
 
Posts: 1
Joined: Sat Apr 18, 2020 12:14 am

Re: [NOT solved] CUDA and OpenCL not detected (Nvidia; Manja

Postby Pezlu » Sat Apr 18, 2020 9:52 am

There is a new version (7.6.8-7) that seems to work for me. Better to start from scratch though: disabling the service, deleting the old working folder, installing the new version, rebooting and reenabling the service. Just to be sure.
Pezlu
 
Posts: 7
Joined: Thu Apr 16, 2020 3:03 pm

Re: [NOT solved] CUDA and OpenCL not detected (Nvidia; Manja

Postby PantherX » Sat Apr 18, 2020 9:55 am

There's an even newer version, V7.6.9: viewtopic.php?f=24&t=34466

Please note that I have read 3 reports so far about the client not detecting the GPU and this is what I have proposed as a workaround: viewtopic.php?p=327039#p327039
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6345
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: [NOT solved] CUDA and OpenCL not detected (Nvidia; Manja

Postby Pezlu » Sat Apr 18, 2020 11:45 am

Thank you. The issue seems to arise from a recent change in the AUR version: basically it doesn't run as root anymore, so CUDA and OpenCL are not recognized. However the community is quickly trying to fix it. Actually, it seems to be working almost perfectly for me right now, although after a reboot sometimes I have to restart the service manually.

For those using the AUR version, having a look at the comments on AUR might shed some light: https://aur.archlinux.org/packages/foldingathome/
Pezlu
 
Posts: 7
Joined: Thu Apr 16, 2020 3:03 pm

Re: [NOT solved] CUDA and OpenCL not detected (Nvidia; Manja

Postby ipkh » Sat Apr 18, 2020 1:00 pm

Issues with an AUR package are best addressed with the Distribution development community. By repackaging the software, the distribution maintainer takes over responsibility for installation issues.

Note that distributions get special permission from the folding group to package the software.
ipkh
 
Posts: 134
Joined: Thu Jul 16, 2015 3:03 pm

Re: [NOT solved] CUDA and OpenCL not detected (Nvidia; Manja

Postby Pezlu » Sat Apr 18, 2020 1:08 pm

Yes, I've commented on the AUR page now, since it's a responsibility of the mantainer. I wasn't sure of where my issue stemmed from when I first posted here.

Thank you for all the suggestions; in any case everything works right now (with a workaround, but still works)!
Pezlu
 
Posts: 7
Joined: Thu Apr 16, 2020 3:03 pm


Return to Problems with NVidia drivers

Who is online

Users browsing this forum: No registered users and 2 guests

cron