Some information for Kepler GPU Users

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Post Reply
psaam0001
Posts: 383
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Some information for Kepler GPU Users

Post by psaam0001 »

I do not remember seeing this information posted here. So, I thought that I would share two links of importance concerning the Kepler based GPU support in future Windows Hardware Quality Lab (WHQL) certified drivers, starting with updates that will be released during/after October of 2021:

Support Plan for Kepler Series Cards: https://nvidia.custhelp.com/app/answers ... /related/1

List of Kepler Series GeForce Devices*: https://nvidia.custhelp.com/app/answers ... UyMQ%3D%3D

* I will presume that this may also affect Kepler GPU supporting workstation cards, unless NVidia specifically says otherwise.

Unknown to me at this time, is whether or not these support related changes will be implemented in the Linux drivers or kernels. So, I will have to caution you to start planning to upgrade your folding GPU's on your Linux desktops as well.

Paul
FalconFour
Posts: 29
Joined: Fri Sep 05, 2008 11:57 am

Re: Some information for Kepler GPU Users

Post by FalconFour »

I use 2x nVidia Grid K520 cards (each 2x GF 760 equivalents - which are Kepler) in one of my room-heating folding rigs, running Ubuntu 20.x LTS, and after a series of normal WUs, suddenly I started getting errors on each: "Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)". They had been working normally, then suddenly, they just weren't - about a week ago.

Problem is, they fall back to a broken OpenCL implementation. The OpenCL fallback always piles them all onto the same GPU index, so only 1 of the 4 GPUs was getting work - and that 1 would have 4 WUs running simultaneously. I caught that and necked it down to 1 WU by pausing the others and finishing one at a time - fairly slowly (40% slower).

What gives? Did F@H silently drop support for Kepler as well? This is under Linux and no driver changes were done - they just stopped working right. If I were to guess, not being able to see the command line it gives to "nvrtc", I'd suspect it's using a new compile flag that only supports newer GPUs to give them the advantage. Great, understandable - but how about recognizing setups that don't support it, and keeping them back on the old flags?

They're old cards, but the focus of my folding is to use old hardware to contribute useful work while heating my room. No sense buying new hardware for that. It'd be preferable to keep CUDA working so these are maximally utilized, but even just having the OpenCL fall-back working properly would be a start. The nVidia drivers don't seem to play nice with F@H client, so at startup it gives the error "OpenCL: Not detected: Failed to open dynamic library 'libOpenCL.so': libOpenCL.so: cannot open shared object file: No such file or directory" during enumeration of GPUs -- yet it still folds using the GPU when it runs the fallback.

Due to the complete lack of documentation on GPU slot flags, I also can't find a way to get the slots to map to GPUs in the OpenCL flags given to the WU processes. It just sends "-1" and assumes they'll figure it out (they don't, lol).
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Some information for Kepler GPU Users

Post by Neil-B »

FalconFour wrote:What gives? Did F@H silently drop support for Kepler as well?
https://foldingforum.org/viewtopic.php?f=24&t=37545
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
FalconFour
Posts: 29
Joined: Fri Sep 05, 2008 11:57 am

Re: Some information for Kepler GPU Users

Post by FalconFour »

craaaaaaaaap :cry:

welp it was a nice run I suppose. My just-as-old AMD Radeon R9 280x card in my MacPro1,1 continues to fold though :lol: for now!

Getting OpenCL fallback working properly (for multi-multi-GPU systems) would be nice though, but at a 40% handicap on already-slow cards, it honestly struggles to complete WUs in a timely manner as it is. It was doing well under CUDA so that's a shame. :/

Oh well, I'm blasting points off like a rocket with a 3060ti in the living room PC though 8-)
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Some information for Kepler GPU Users

Post by Neil-B »

FalconFour wrote:Oh well, I'm blasting points off like a rocket with a 3060ti in the living room PC though 8-)
What you should be gaining in throughput/contribution to science on your RTX3060Ti with the latest core may well offset the loss of throughput/contribution to science of the older kit ... it may however not resolve any loss space heater issues :(
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
psaam0001
Posts: 383
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Re: Some information for Kepler GPU Users

Post by psaam0001 »

Neil-B wrote:What you should be gaining in throughput/contribution to science on your RTX3060Ti with the latest core may well offset the loss of throughput/contribution to science of the older kit ... it may however not resolve any loss space heater issues :(
I know I had better get ready to add some Noctua, Fractal or Corsair case fans to my folding rig's, once I am able to do the upgrades.... Hmmm, I may not need to turn the heat on in my room afterwards.

Paul
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Some information for Kepler GPU Users

Post by Neil-B »

psaam0001 wrote:I know I had better get ready to add some Noctua, Fractal or Corsair case fans to my folding rig's, once I am able to do the upgrades.... Hmmm, I may not need to turn the heat on in my room afterwards.

Paul
My case is a Lian Li Air ... It has 12x120mm and 2x80mm managing airflow through case - cpu cooler is an 360mm AIO using three of the 120mm fans - the rtx3070 is a trifan Asus Rog Stric OC so three more fans there - and the psu is a corsair HX1000i so that is a 140mm fan ... and yes that does allow me to OC the gpu to 2025 and keep the office nice and warm ... want to add an RTX3080Ti LC but need the 2nd mortgage before I can get that ;)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
psaam0001
Posts: 383
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Re: Some information for Kepler GPU Users

Post by psaam0001 »

My current FAH Hardware is encased in Fractal Core 1000 cases (micro-ATX)... The one I'm running the GTX 1050ti and GTX 1650 Super (this GPU is destined for the e-cycling bin) in, will be moved into a Cooler Master HAF XB EVO Lan/test bench case. As soon as I am fully employed, will be when the upgrades happen.

Paul
FalconFour
Posts: 29
Joined: Fri Sep 05, 2008 11:57 am

Re: Some information for Kepler GPU Users

Post by FalconFour »

holy crap if any of y'all are thinking of e-wasting (read: sending to the grinder) any full size nVidia GPUs newer than Kepler, send it my way... just retired my GRID cards because of aforementioned issues. But if it's still new enough to fold on, geez, cmon. One of my rigs (the GRID system) doesn't even have a case, it's just a motherboard, PSU, cards on a table :lol:
psaam0001
Posts: 383
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Re: Some information for Kepler GPU Users

Post by psaam0001 »

It will be awhile before I can e-cycle that 1650 Super... The GTX 1050ti is likely going to move to the system that I only have a 450 watt PSU on for now, once I swap in a Ryzen 9 5950X in place of the Ryzen 3 3200G.

I'm starting to think of building an inexpensive system to put the good GT 1030, and soon to be repaired GT 1030 on. No definite plans on that though, as I'm trying to get a car (so I can work to pay for my 'hobby'). Waiting for the supply chain issues in the US to clear up is also a factor.

Paul

Update (12/15/2021): I found the correct replacement fans for both that Zotac GTX 1650 Super, and the Gigabyte GT 1030 :D... I just need to wait a couple of weeks to get them (let the holiday packages get through first). So if everything works, I'll start saving for the inexpensive "Cancer Buster" FAH system.
Last edited by psaam0001 on Wed Dec 15, 2021 12:56 pm, edited 1 time in total.
FalconFour
Posts: 29
Joined: Fri Sep 05, 2008 11:57 am

Re: Some information for Kepler GPU Users

Post by FalconFour »

OK. I'm back with a little more experimentation. First, should mention I was only using Ubuntu for folding because I'd found a slight performance advantage to it that made up for its significant difficulty in setup. That now no longer seems to necessarily be the case - especially if you tweak the process priority for the fahcore_22.exe process (boost it to high = keeps the GPU really well fed).

So, I loaded up one of the two GRID K520 cards into another PC with Windows 10 (x64, 21h1), paired with none other than... a genuine "full fledged" GeForce GTX 760 (exactly the same chip as the K520s have two of). Windows promptly installed a really old driver from 2015 for the GRID card, and that took the GTX 760's driver down as well. I knew that wouldn't work for folding. So, I fiddled the INF and installer files for driver version 471.41 a little (the driver that was previously installed for the 760), added the GRID's PCI ID to them, rebooted with driver signature enforcement disabled, and voila... it installed gracefully for all 3 "cards" the first try.

Now equipped with driver 471.41, I installed Folding@Home. It immediately saw and snatched-up all 3 GPUs. However... they all failed to initialize CUDA -- again, "invalid value for --gpu-architecture (-arch)". They fall back to OpenCL, providing a total of 250k PPD for the whole system (consuming 600 watts, oof). OpenCL is just awful for these cards -- and in fact, I got the 760 (before the GRIDs) in part because of the gigantic performance boost CUDA support added (some 55% for the 760s I have).

That error string is the confusing part. Just like under Linux, I am running a presumably supported version (471.41 > 461.09), but this compiler error persists. Is it possible this is merely a mistake in implementation that could be easily fixed, giving just a little more life to these old cards?

A last ditch effort to grasp at hope for these cards :lol: Hopefully beneficial to more than myself, as I'm sure I can't be the only one sad to lose support.
folddban
Posts: 2
Joined: Tue Dec 14, 2021 7:24 pm

Re: Some information for Kepler GPU Users

Post by folddban »

Yup same issue here. New to folding and getting this error on a 760. Tried downgrading the driver to get an older cuda version but i got the dropping WUs error. If any devs are reading this i'm open to help testing.

EDIT:
I tried install nvidia drivers 460 using theri installer (not available in manjaro mhwd). Things got messy and i couldnt get them to work at all, not even xorg. They left a lot of libraries that i had to manually delete, since pacman was refusing to install. Installed 470 with mhwd, no cuda package installed (never had) but now cuda seems to work! Idk what happend, but as far as i am concerned i have the same setup when it was failing.
FalconFour
Posts: 29
Joined: Fri Sep 05, 2008 11:57 am

Re: Some information for Kepler GPU Users

Post by FalconFour »

folddban wrote:Installed 470 with mhwd, no cuda package installed (never had) but now cuda seems to work! Idk what happend, but as far as i am concerned i have the same setup when it was failing.
Wait, how do you know CUDA is working? Does the "configuring CUDA context" phase succeed, or is it falling back to OpenCL in the logs? I've resigned to retire my Kepler GPUs (the Grid K520s and already sold the 760) and replace them with Tesla M40s (Titan X / GF 980ti GPUs, basically). Far more difficult to work with, requiring a modern board that supports "above 4G decoding" in BIOS -- all my F@H PCs are 2010-2015 era and don't support it. So a real PITA and additional expense to swallow to get a new(er) board and CPU to support them. But at just-north-of 1 million PPD each, I'm pleased with the result, mostly.

Interested to see if you're really getting CUDA running, as I might load those K520s up again.
folddban
Posts: 2
Joined: Tue Dec 14, 2021 7:24 pm

Re: Some information for Kepler GPU Users

Post by folddban »

Sorry for the late replay. Figured out that it only happened with some WU that were being processed with core22 0.0.16, but not working (cuda context fails) with core22 0.0.18. Not sure why it does not work, drivers and cuda version is greater than the minimum required. Will post a suggestion for the devs asap. Not sure why the 0.0.16 version works, im in linux and in the post linked above it says it shoudnt work.
Post Reply