Linux folding with multiple Nvidia GPUs

Moderators: Site Moderators, FAHC Science Team

Linux folding with multiple Nvidia GPUs

Postby hiigaran » Sun Feb 04, 2018 3:46 pm

Right, so my two folding rigs have been chugging along nicely for a long time now, without any issues. Each has the following specs:

AsRock X99 WS-E
Xeon E5-2609
4x GTX 1080

At the moment, they both use an old 80 gig SATA drive. My goal was to install a small M.2 drive to replace these, so I could do away with the hard drives. Unfortunately, I have run into issues. Rather than add a whole heap of text, I'll just refer to this question of mine from the Ubuntu StackExchange. The issue in every case seems to stem from issues getting Nvidia's drivers to work, and that despite a fresh install of each of the mentioned distros, the system would reboot to a blank screen after GRUB and the splash screens.

I'm wondering if it's just the desktop environment that does not work, so to that end, I'd like to ask if anyone here has experience setting up Nvidia GPU folding rigs without a GUI. So suppose I've installed Ubuntu Server, and tasksel only has Standard System Utilities and SSH server selected. How would I be able to proceed?

(I've had so much downtime on one of my rigs because of this!)
User avatar
hiigaran
 
Posts: 134
Joined: Thu Nov 17, 2011 7:01 pm

Re: Linux folding with multiple Nvidia GPUs

Postby des1957 » Sun Feb 04, 2018 8:32 pm

Running 5 1080ti on Mint 18.3. After fresh install, I open driver manager and I have the option of switching from Noveau to latest Nvidia driver. Select Nvidia driver,let it install and restart. Boots fine and am running latest Nvidia driver. Did fresh install this morn.
des1957
 
Posts: 29
Joined: Fri Jan 04, 2013 4:20 pm

Re: Linux folding with multiple Nvidia GPUs

Postby hiigaran » Sun Feb 04, 2018 10:36 pm

Odd. That was one of the things I had tried. Was a copy of Lubuntu at the time though.

I'm currently on Ubuntu Server, and it seems to be the most promising so far. Only thing is that I'm not sure how to get the Xorg processes to start on each GPU. It will fold, but without being able to manually set the fan speed to full via nvidia-settings, I can't run it safely.
User avatar
hiigaran
 
Posts: 134
Joined: Thu Nov 17, 2011 7:01 pm

Re: Linux folding with multiple Nvidia GPUs

Postby bollix47 » Sun Feb 04, 2018 11:41 pm

Have you tried creating a batch file and running it as part of the auto startup jobs?

example from a 2 GPU system:

Code: Select all
#!/bin/bash
nvidia-settings -a [gpu:0]/GPUFanControlState=1
nvidia-settings -a [fan:0]/GPUTargetFanSpeed=75
nvidia-settings -a [gpu:1]/GPUFanControlState=1
nvidia-settings -a [fan:1]/GPUTargetFanSpeed=60


Obviously you would need more lines and different fan speeds but this does work as long as you have enabled everything in xorg:

Code: Select all
sudo nvidia-xconfig -a --enable-all-gpus --cool-bits=28


Disclaimer: All my systems have ubuntu-desktop installed so I'm not 100% certain if the above will work on the server version.
bollix47
 
Posts: 2871
Joined: Sun Dec 02, 2007 6:04 am
Location: Canada

Re: Linux folding with multiple Nvidia GPUs

Postby hiigaran » Mon Feb 05, 2018 4:16 am

Yep. No change to the fans on boot.
User avatar
hiigaran
 
Posts: 134
Joined: Thu Nov 17, 2011 7:01 pm

Re: Linux folding with multiple Nvidia GPUs

Postby bollix47 » Mon Feb 05, 2018 11:21 am

A few questions:

Did you make your script executable?
Have you tried executing it in a terminal so see what errors, if any, show?
Where exactly did you add your script to get it to run during boot?
Did you check dmesg for errors?
If you run nvidia-smi in a terminal does it report any errors?
If you tried to use 100% in your script try using a lower number like 90%.
What are the contents of /etc/X11/xorg.conf?

You may not be able to do what you're trying to do using just the basic server version. You may need to add ubuntu-desktop in order to get what you need to run X commands.
bollix47
 
Posts: 2871
Joined: Sun Dec 02, 2007 6:04 am
Location: Canada

Re: Linux folding with multiple Nvidia GPUs

Postby SteveWillis » Mon Feb 05, 2018 4:42 pm

Personally, since the fans are the only mechanical part of a gpu and I don't want to wear them out prematurely, I don't like them to run over 80% and I think the built in fan speed control works pretty well . Just my two cents worth.
Image

1080 and 1080TI GPUs on Linux Mint
SteveWillis
 
Posts: 409
Joined: Fri Apr 15, 2016 1:42 am

Re: Linux folding with multiple Nvidia GPUs

Postby hiigaran » Mon Feb 05, 2018 5:08 pm

The fans won't outlive their use. Watercooling upgrades are planned at some unknown point in time. Besides, the automatic fan controls are unreliable and make the GPUs run unnecessarily hot.

bollix47 wrote:A few questions:

Did you make your script executable?
Have you tried executing it in a terminal so see what errors, if any, show?
Where exactly did you add your script to get it to run during boot?
Did you check dmesg for errors?
If you run nvidia-smi in a terminal does it report any errors?
If you tried to use 100% in your script try using a lower number like 90%.
What are the contents of /etc/X11/xorg.conf?

You may not be able to do what you're trying to do using just the basic server version. You may need to add ubuntu-desktop in order to get what you need to run X commands.


1: chmod +x was performed, yes.

2: The error when executing any nvidia-settings command, with or without additional parameters, is:

Code: Select all
Unable to init server: Could not connect: Connection refused
ERROR The control display is undefined; please run 'nvidia-settings --help' for usage information


This message appears for every nvidia-settings line contained within the script.

3: ctrontab -e with @reboot

4: What kind of messages should I be on the lookout for here? Contents seem okay to me.

5: Nope

6: No difference, for the reason in point 2

7: Xorg.conf pastebin

Since you mentioned dmesg and smi, I'll expand a little on those as well. Ubuntu Server will boot just fine into terminal. However, once lighdm or any other type of display manager is installed, it will boot to a graphical login screen. User inputs are responsive locally in this GUI for a few seconds, before it freezes up. Neither keyboard nor mouse will work, and the mouse pointer can be seen as stuck. Alt F1 or any similar keys will not take you back to terminal. However, I can still SSH into the system remotely and all terminal functions work just fine, so the issue seems to be at least related to the display manager. I tried replacing lightdm with gdm3 or sddm, but they all had the same problem.

While a display manager is installed, smi throws up this error and nothing more:

Code: Select all
Unable to determine the device handle for GPU 000:04:00.0: GPU is lost. Reboot the system to recover this GPU.


dmesg near the end:

Code: Select all
NVRM: Xid (PCI:0000:04:00): 79, GPU has fallen off the bus.


Why this is confusing is that 79 suggests hardware issues. Faulty GPU, or insufficient power. However, it can't be either of these. This rig works when the old hard drive is connected and running the older Mint. I'm currently running it now, since I've had way too much downtime trying to install a newer system. Power is definitely not an issue either. 1600 watts, reliable PSU, and again, it works when the old hard drive is in.

Oddly enough, I decided to try installing the exact same version of Mint and nvidia drivers that are on that hard drive. Same kernel as well. I still get the blank screen issue after doing nothing but installing the drivers and rebooting. This has been attempted with the manual ./run file directly from nvidia for 390, as well as the ppa method for 390, 387, and 384. Mint also displays 384 as recommended drivers in the GUI drivers settings, which I have also attempted to use.
User avatar
hiigaran
 
Posts: 134
Joined: Thu Nov 17, 2011 7:01 pm

Re: Linux folding with multiple Nvidia GPUs

Postby bruce » Mon Feb 05, 2018 5:39 pm

@hiigaran
I've encountered the same problem any time I tried to use NV's .run file while the display manager is in the boot sequence. You've added a lot more information describing details I have not encountered since I give up ... reinstall Linux ... and then use a driver package that's repackaged by Ubuntu.

I've read other reports that suggest that you have to blacklist Nouveau but I didn't try that myself.
bruce
 
Posts: 20009
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: Linux folding with multiple Nvidia GPUs

Postby bollix47 » Mon Feb 05, 2018 5:47 pm

Have you given any consideration to 16.04 server? Ubuntu 17.10 is buggy and although I did get it working with the desktop version I had to jump through a lot of hoops to get it running properly. The first being getting off the Wayland default:
https://itsfoss.com/switch-xorg-wayland/
I'm not sure why that would affect the server version but I'd either use the 16.04 version or wait until 18.04 where the default will be ubuntu on xorg and wayland will be an option.
Also, I'd use the desktop version since X is probably being totally ignored in the server version and even though you've managed to create an xorg.conf it may be ignored at boot.
bollix47
 
Posts: 2871
Joined: Sun Dec 02, 2007 6:04 am
Location: Canada

Re: Linux folding with multiple Nvidia GPUs

Postby hiigaran » Mon Feb 05, 2018 8:10 pm

I see the following packages installed matching 'wayland' in any part of their names:

Code: Select all
    libwayland-client0/artful,now 1.14.0-1 amd64 [installed,automatic]
    libwayland-cursor0/artful,now 1.14.0-1 amd64 [installed,automatic]
    libwayland-egl1-mesa/artful-updates,now 17.2.4-0ubuntu1~17.10.2 amd64 [installed,automatic]
    libwayland-server0/artful,now 1.14.0-1 amd64 [installed,automatic]


These 4 libs don't seem sufficient to be running wayland on their own, but regardless, I'll see if I can remove them later. I'm currently unable to install because for some reason the installer and installed system cannot connect to the internet. It even gave up trying to automatically set the time in the installer. This makes no sense, given that the original system still has a connection.

bruce wrote:@hiigaran
I've encountered the same problem any time I tried to use NV's .run file while the display manager is in the boot sequence. You've added a lot more information describing details I have not encountered since I give up ... reinstall Linux ... and then use a driver package that's repackaged by Ubuntu.

I've read other reports that suggest that you have to blacklist Nouveau but I didn't try that myself.


I think I've performed over 40 reinstalls over the last 3 or 4 days! Trying to make sure there are no loose ends when trying something new.

Nouveau has been blacklisted, yes. I've made sure to do that whether I've used the .run file or via the ppa.

EDIT: Made a little progress. Ran the following commands:

Code: Select all
sudo dpkg --add-architecture i386
sudo apt update
sudo apt install libc6:i386
sudo apt install libstdc++6:i386


running the nvidia-settings command without any parameters now gives me this:

Code: Select all
libgtk-3.so.0: cannot open shared object file: No such file or directory
libnvidia-gtk3.so: cannot open shared object file: No such file or directory
libgtk-x11-2.0.so.0: cannot open shared object file: No such file or directory
libnvidia-gtk2.so: cannot open shared object file: No such file or directory


Now the question is, which package(s) contains these libraries?
User avatar
hiigaran
 
Posts: 134
Joined: Thu Nov 17, 2011 7:01 pm

Re: Linux folding with multiple Nvidia GPUs

Postby bollix47 » Mon Feb 05, 2018 10:30 pm

Here is a short summary of what I would do if I wanted 17.10:

1. Install the Desktop 64amd version of Ubuntu 17.10
2. At the first login click on the gear and switch to Ubuntu on xorg
3. Open Software & Updates and click on the Additional Drivers tab
4. It should indicate you're using Nouveau
5. Click the button next to Nvidia propriety drivers ... IIRC they are 384.111 **
6. Click on Apply changes & reboot

** If they are not listed then you may need to close Software & Updates and add a repo:
Code: Select all
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update


After doing that and running Software Updater you should be able to add coolbits to xorg, reboot and create your fan profiles.
Code: Select all
sudo nvidia-xconfig -a --enable-all-gpus --cool-bits=28 --allow-empty-initial-configuration


There are numerous other things to do after a clean install and here are a few of some I like to do:
1. Install gdebi and use it to install any .deb files that you may need
2. Install the older python version if needed
3. Install latest version (7.4.16) of FAHClient & FAHControl
bollix47
 
Posts: 2871
Joined: Sun Dec 02, 2007 6:04 am
Location: Canada

Re: Linux folding with multiple Nvidia GPUs

Postby bruce » Tue Feb 06, 2018 12:40 am

bollix47 wrote:Have you given any consideration to 16.04 server? Ubuntu 17.10 is buggy and although I did get it working with the desktop version I had to jump through a lot of hoops to get it running properly

I've encountered this problem since 14.04 LTS or maybe earlier. In all cases, it seems that there's still a problem either in the way NVidia's install works or the screen manager recovers from whatever error is happening. I believe that the installations work on all server versions but not the corresponding desktop versions.

People have documented manual contortions that avoid leaving you with a non-bootable desktop version. The fact that the .deb works but the .run does not seems to suggest that the team that build the .deb packages know how to avoid the pitfalls. Why doesn't NV know that and why haven't they incorporated the appropriate error-recovery methodology into the distributed sequence?

Here's an old topic on this same problem. It gives some suggestions that might help us today (with minor modifications).
Tips for installing Nvidia .run files in Linux
bruce
 
Posts: 20009
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: Linux folding with multiple Nvidia GPUs

Postby hiigaran » Tue Feb 06, 2018 5:01 am

bollix47 wrote:Here is a short summary of what I would do if I wanted 17.10:

1. Install the Desktop 64amd version of Ubuntu 17.10
2. At the first login click on the gear and switch to Ubuntu on xorg
3. Open Software & Updates and click on the Additional Drivers tab
4. It should indicate you're using Nouveau
5. Click the button next to Nvidia propriety drivers ... IIRC they are 384.111 **
6. Click on Apply changes & reboot


1: Done
2: There's no gear. Skipped this step
3: Done
4: Yep
5: Yep. And done.
6: Done

Reboot results in the same problem. After logging in, it freezes a few seconds after the desktop appears.

Okay, forget this whole thing. Let's try something completely different. How can I copy all the data from the old hard drive to the new M.2? The hard drive is 80 GB in capacity and the M.2 is 60 GB, but the amount of used space on the hard drive is no more than 20 GB.
User avatar
hiigaran
 
Posts: 134
Joined: Thu Nov 17, 2011 7:01 pm

Re: Linux folding with multiple Nvidia GPUs

Postby bollix47 » Tue Feb 06, 2018 4:01 pm

Not sure where you were looking for the 'gear' but it's there right after you select your username and skipping this step is defeating the whole purpose of the exercise since it means you're using wayland and wayland is buggy. I installed 17.10 on a system last night and had no problems including using the gear in the login screen to change to xorg.

As far as your copy question there are many standalone copy programs out there with their own support facilities ... for example, I've had some success with filezilla. gl
bollix47
 
Posts: 2871
Joined: Sun Dec 02, 2007 6:04 am
Location: Canada

Next

Return to V7.4.4 Public Release Windows/Linux/MacOS X (deprecated)

Who is online

Users browsing this forum: No registered users and 2 guests

cron