Page 1 of 1

AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Posted: Sat Dec 17, 2022 3:11 am
by tchiers
I've got a system with an AMD 6750XT in it running Ubuntu 22.04. An AMDGPU driver update was released this week and I wanted to install it to see if it would include the ROCm OpenCL perf improvements seen on Windows earlier this year.

Instead, GPU folding stopped working altogether. :evil:
fahclient would find the GPU OK, download a work unit, and start it only to choke immediately with BAD_WORK_UNIT. Repeat until the client auto-disabled the GPU slot.

This looked and felt like an OpenCL problem - it's been problematic for me long ago folding on a GPU on linux. BUT - this time OpenCL appeared to be working fine elsewhere in the system - clinfo ran normally, fahbench could find and use it, and even the fahclient syslog reported

Code: Select all

04:27:21:******************************* System ********************************
04:27:21:            CPU: AMD Ryzen 7 5800X 8-Core Processor
04:27:21:         CPU ID: AuthenticAMD Family 25 Model 33 Stepping 2
04:27:21:           CPUs: 16
04:27:21:         Memory: 31.27GiB
04:27:21:    Free Memory: 24.57GiB
04:27:21:        Threads: POSIX_THREADS
04:27:21:     OS Version: 5.15
04:27:21:    Has Battery: false
04:27:21:     On Battery: false
04:27:21:     UTC Offset: -6
04:27:21:            PID: 42255
04:27:21:            CWD: /var/lib/fahclient
04:27:21:             OS: Linux 5.15.0-56-generic x86_64
04:27:21:        OS Arch: AMD64
04:27:21:           GPUs: 1
04:27:21:          GPU 0: Bus:9 Slot:0 Func:0 AMD:6 Navi 22 XT-XL [Radeon RX
04:27:21:                 6700/6700XT/6800M]
04:27:21:           CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
04:27:21:                 libcuda.so: cannot open shared object file: No such file or
04:27:21:                 directory
04:27:21:OpenCL Device 0: Platform:0 Device:0 Bus:9 Slot:0 Compute:2.0 Driver:3513.0
The telltale for this problem was further down in the log, when core22 tries to actually start

Code: Select all

04:28:03:WU00:FS01:0x22:Project: 18909 (Run 37, Clone 4, Gen 31)
04:28:03:WU00:FS01:0x22:Reading tar file core.xml
04:28:03:WU00:FS01:0x22:Reading tar file integrator.xml
04:28:03:WU00:FS01:0x22:Reading tar file state.xml
04:28:03:WU00:FS01:0x22:Reading tar file system.xml
04:28:03:WU00:FS01:0x22:Digital signatures verified
04:28:03:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
04:28:03:WU00:FS01:0x22:Version 0.0.20
04:28:03:WU00:FS01:0x22:  Checkpoint write interval: 62500 steps (5%) [20 total]
04:28:03:WU00:FS01:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
04:28:03:WU00:FS01:0x22:  XTC frame write interval: 25000 steps (2%) [50 total]
04:28:03:WU00:FS01:0x22:  Global context and integrator variables write interval: disabled
04:28:03:WU00:FS01:0x22:There are 2 platforms available.
04:28:03:WU00:FS01:0x22:Platform 0: Reference
04:28:03:WU00:FS01:0x22:Platform 1: CPU
04:28:03:WU00:FS01:0x22:opencl-device was set but OpenCL platform could not be found.
04:28:03:WU00:FS01:0x22:ERROR:126: Neither CUDA nor OpenCL is available.
It should have found

Code: Select all

02:27:17:WU01:FS01:0x22:There are 3 platforms available.
02:27:17:WU01:FS01:0x22:Platform 0: Reference
02:27:17:WU01:FS01:0x22:Platform 1: CPU
02:27:17:WU01:FS01:0x22:Platform 2: OpenCL
I traced the problem to OpenMM, and eventually figured it out.

Core22 ships from Folding@Home work servers with libstdc++.so.6 version GLIBCXX_3.4.28, but libamdocl64.so (the AMD ROCm OpenCL implementation) requires GLIBCXX_3.4.29

Fortunately the system libstdc++ (/lib/x86_64-linux-gnu/libstdc++.so.6) is GLIBCXX_3.4.30, so it can just be swapped in.

Workaround
Configure the GPU slot, and enable it.
Let it fail and disable.

Code: Select all

cd /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.20/Core_22.fah/
sudo rm libstdc++.so.6
sudo ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6 libstdc++.so.6
Re-enable the GPU slot and it should work

This workaround will last as long until a new core version is released or something else clears your cores.foldingathome.org cache. Then you will need to apply it to the new directory.

Fix
The Folding@Home team needs to update the version of libstdc++ they are shipping with their workunits.

Oh, and does the new ROCm improve folding performance?
Estimated PPD 2110772
8-)

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Posted: Sun Dec 18, 2022 9:41 pm
by toTOW
I forwarded your post to FahCore and OpenMM devs.

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Posted: Wed Dec 21, 2022 12:36 am
by JohnChodera
Thanks for the heads up, @tchiers and @toTOW!
We'll look into a solution.

~ John Chodera // MSKCC

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Posted: Mon Dec 26, 2022 5:32 pm
by jonorok
Thank you!

This fixed it for me, too.

RX 6700XT, ROCM 5.4, on Pop OS 22.04.

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Posted: Tue Mar 21, 2023 10:23 pm
by hashibs
Your troubleshooting has saved me a lot of frustration!

I'd been waiting some months to upgrade one of my folding PCs to Ubuntu 22.04, until AMD drivers for the RX 6700 XT were available and had "stabilized" a bit. Finally decided to do that today, and afterward was unpleasantly surprised to find the gpu idle and this error in the log:

Code: Select all

17:56:18:WU01:FS01:0x22:ERROR:125: Failed to create a GPU-enabled OpenMM Context.
The amdgpu-install script didn't throw any errors, the output of clinfo seemed correct, and so did the config files, permissions, etc. I searched around, somehow found this page, and gave your steps a shot. Works great. Thanks for making it easier to contribute!

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Posted: Tue Apr 04, 2023 8:40 pm
by muziqaz
OK, this is very interesting, however, this does not explain why I can fold with fahcore_22 and shipped libs (old GLIBCXX), when I run fahcore_22 on its own. I only get no opencl platform when run through fahclient. or is fahclient require newer GLIBCXX?
I will try this workaround and see if it helps, and then give suggestions to devs. This thing needs to be handled automatically

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Posted: Tue Apr 04, 2023 9:26 pm
by muziqaz
Ok, this is actually working. Incredible, and quite alarming, as this might happen in the future too, and only several failed WUs would ring the bells to us to start acting :(

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Posted: Sat Sep 16, 2023 12:36 pm
by SovietReimu1917
Why Core22 contains libstdc++.so.6?
I think it should use the system one since it is the basic library installed on most systems.

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Posted: Sat Sep 16, 2023 1:20 pm
by muziqaz
SovietReimu1917 wrote: Sat Sep 16, 2023 12:36 pm Why Core22 contains libstdc++.so.6?
I think it should use the system one since it is the basic library installed on most systems.
Nearly all libs present in fahcore folder are just in case libs, to make sure that system users do not need to fish any libs in case they are not present in their system for some reason :)
If FAH devs were sure that 100% of systems contain all the necessary libs, they would not package them together with fahcore :)

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Posted: Mon Sep 18, 2023 6:22 pm
by toTOW
We had so many compatibility issues with those libs that are sometimes never updated by the user or the distribution that it's safer to provide them with the core ... :roll:

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Posted: Sat Nov 04, 2023 4:42 am
by DarkFoss
Thank you fixed this for me as well. Kubuntu 22.04 LTS Rocm ver 5.7.0 Asrock 7900XTX.
As an aside I downloaded FAHBench and did similar but copied libstdc++.so.6.0.30 into the folder rm'ed the old libstdc++.so.6 that was inside and made the symlink point back to libstdc++.so.6.0.30. FAHBench now works fine.
I'm thinking doing that within the core might be a better workaround. I'll give it a try after these wu's finish up.

Re: AMD 6750XT & ROCm OpenCL 5.4.1 & Core22 & Linux libstdc++ conflict

Posted: Sat Feb 03, 2024 12:55 pm
by L0nerism
After fighting with getting a Vega 64 on Debian Sid (ROCm v5.7.3) to fold Core 22 work units, I'm making this post to thank you for this workaround. :D