GPU memtest failure [Linux GPU folding]

Moderators: slegrand, Site Moderators, PandeGroup

Re: GPU memtest failure

Postby davidcoton » Sun Feb 10, 2013 12:21 am

Using the sudo command -- it worked that way to stop the dm to install nvidia drivers first time, the change is that now nvidia is installed it doesn't seem to stop the same way. I may have to unpack the nvidia installer to see if openGL can be installed separately with the display driver running. If that fails I guess it's another full reinstall of Ubuntu, when I have time and SMP folding permits.

David
Image
davidcoton
 
Posts: 952
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: GPU memtest failure

Postby art_l_j_PlanetAMD64 » Sun Feb 10, 2013 1:09 am

davidcoton wrote:Using the sudo command -- it worked that way to stop the dm to install nvidia drivers first time, the change is that now nvidia is installed it doesn't seem to stop the same way. I may have to unpack the nvidia installer to see if openGL can be installed separately with the display driver running. If that fails I guess it's another full reinstall of Ubuntu, when I have time and SMP folding permits.

David

OK, I ran this command on my #5 system:
Code: Select all
ps aux | sort -k 3,3 >ps_X_list.txt
and here is the last part of the file 'ps_X_list.txt':
Code: Select all
root       840  0.0  0.0      0     0 ?        S    03:15   0:00 [hd-audio0]
root         9  0.0  0.0      0     0 ?        S    03:14   0:00 [events/0]
statd     1090  0.0  0.0  14384   848 ?        Ss   03:15   0:00 /sbin/rpc.statd
109       1473  0.1  1.5 230852 31756 ?        Sl   03:15   1:25 /usr/bin/FAHClient --child --lifeline 1373 /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon
art       3263  1.2  1.4 340572 29584 tty1     Sl   13:40   2:28 /usr/bin/python /usr/bin/FAHControl
art       3374 16.3  2.3 2676496 49136 ?       Ssl  16:15   5:36 .\FahCore_15.exe -dir work/ -suffix 01 -nice 19 -checkpoint 15 -verbose -lifeline 8 -version 641                                     
109       1485  165  3.9 161088 80804 ?        SNl  03:15 1351:20 /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 702 -lifeline 1481 -checkpoint 15 -np 2
root      3120  8.6  2.1 142244 44840 tty8     Ss+  13:40  16:32 /usr/bin/X -nolisten tcp :0 -auth /tmp/serverauth.OFJTOODti8
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
So it looks like the /usr/bin/X process, PID 3120 (in my #5 system), is running the X server, and not any Display Manager (DM).

This explains why the '/etc/init.d/?dm stop' command only worked once. Because running the 'startx' command from the CLI did not restart the DM, it just restarted the X server directly.

EDIT:
The correct way to do the whole process is this, but using the 275.43 driver, not the one shown in that link. At the end of the NVidia driver installation, the correct thing to do is to reboot the computer, not to try to get back in the GUI right away. Besides, a reboot is required to load the new kernel with the NVidia driver in it.

I hope this helps!

Art
art_l_j_PlanetAMD64
Over 1.04 Billion Total Points
Over 185,000 Work Units
Over 3,800,000 PPD
Overall rank (if points are combined) 20 of 1721690
In memory of my Mother May 12th 1923 - February 10th 2012
art_l_j_PlanetAMD64
 
Posts: 741
Joined: Sun May 30, 2010 2:28 pm

Re: GPU memtest failure

Postby art_l_j_PlanetAMD64 » Sun Feb 10, 2013 4:03 am

davidcoton wrote:Ubuntu 12.10 relatively new install. Nvidia 550Ti. Currently NO NV video drivers (default nouveau drivers working apparently correctly). Grub also reports no video drivers, so unable to access the grub menu at startup.

This version does not seem to have a root console available, but I tried the gdm3 stop command under sudo. The GDM3 script does not exist.

David

David, you should be able to show the full GRUB menu if you hold down the (right) <Shift> key while the system is booting: Ubuntu 12.10 Simplifies GRUB Boot Menu
By default the GRUB menu is hidden and it is only shown when a system fails to shut down properly, or the (right) Shift key is held down during the boot time.


You could also customize the GRUB menu that appears (or to make it appear if it doesn't) by following these instructions: Ubuntu documentation - Grub2
1. On a new installation of Ubuntu 9.10 or later with no other installed operating systems, GRUB 2 will boot directly to the login prompt or Desktop. No menu will be displayed.

2. Hold down (right) SHIFT to display the menu during boot. In certain cases, pressing the ESC key may also display the menu.

8. The user can create a custom file in which the user can place his own menu entries. This file will not be overwritten. By default, a custom file named 40_custom is available for use in the /etc/grub.d folder.

9. The primary configuration file for changing menu display settings is called grub and by default is located in the /etc/default folder.

10. There are multiple files for configuring the menu - /etc/default/grub mentioned above, and all the files in the /etc/grub.d/ directory.

This should get you to where you want to be, namely logged in as root in a straight-text CLI, no GUI or X server in sight! :D

I hope this helps,
Art
art_l_j_PlanetAMD64
 
Posts: 741
Joined: Sun May 30, 2010 2:28 pm

Re: GPU memtest failure

Postby davidcoton » Sun Feb 10, 2013 2:10 pm

Hi Art,

Well, some progress. You reminded me that accessing the Grub menu requires the right shift key, so now I can get into recovery, use fsck to enable the filing system, and then use the root console to reinstall the nvidia driver. (I had found that in the documentation before, but just forgot and was trying the left shift.)

However it turns out the openGL lib is installed by default, that wasn't the problem. So everything (?) is installed, but I get the libGL error when starting the F@H client.

Who needs adventure games, when you can spend days on end installing GPU F@H on Linux?

David
davidcoton
 
Posts: 952
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: GPU memtest failure

Postby art_l_j_PlanetAMD64 » Sun Feb 10, 2013 4:10 pm

davidcoton wrote:Hi Art,

Well, some progress. You reminded me that accessing the Grub menu requires the right shift key, so now I can get into recovery, use fsck to enable the filing system, and then use the root console to reinstall the nvidia driver. (I had found that in the documentation before, but just forgot and was trying the left shift.)

However it turns out the openGL lib is installed by default, that wasn't the problem. So everything (?) is installed, but I get the libGL error when starting the F@H client.

Who needs adventure games, when you can spend days on end installing GPU F@H on Linux?

David

Hi David, sorry to hear about your trials and tribulations. :(

Did you do this part?
13. Test to make sure the wrapper is properly linked:
Code: Select all
ldd ~/.wine/drive_c/windows/system32/cudart.dll

The output should look something like this:
Code: Select all
            linux-gate.so.1 =>  (0xf7706000)
            libcudart.so.3 => /usr/local/cuda/lib/libcudart.so.3 (0xf7697000)
            libwine.so.1 => /usr/lib32/libwine.so.1 (0xf7556000)
            libm.so.6 => /lib32/libm.so.6 (0xf752f000)
            libc.so.6 => /lib32/libc.so.6 (0xf73cc000)
            libdl.so.2 => /lib32/libdl.so.2 (0xf73c8000)
            libpthread.so.0 => /lib32/libpthread.so.0 (0xf73af000)
            librt.so.1 => /lib32/librt.so.1 (0xf73a6000)
            libstdc++.so.6 => /usr/lib32/libstdc++.so.6 (0xf72b7000)
            libgcc_s.so.1 => /usr/lib32/libgcc_s.so.1 (0xf72a7000)
            /lib/ld-linux.so.2 (0xf7707000)

That should show you if anything's not linked correctly.

I just found this guide, that has some good information about getting rid of nouveau: Debian Linux 6: Install Nvidia Proprietary Unix Driver
All the commands should work for Ubuntu.

Once logged into the single user mode, remove the following packages (if installed):
Code: Select all
apt-get --purge remove xserver-xorg-video-nouveau nvidia-kernel-common nvidia-kernel-dkms nvidia-glx nvidia-smi


Search for all installed nvidia packages and delete them (do not skip this step):
Code: Select all
dpkg --list | grep -i --color nvidia


Type the following command to install the Unix driver:
Code: Select all
sh NVIDIA-Linux-x86_64-275.43.run


Just follow the on-screen instructions. Make sure you upgrade xorg.conf when prompted. Finally, reboot the system:
Code: Select all
shutdown -r now


I just went through the whole procedure again, this morning, and you're right, it's not much fun when things do not go according to plan. I just built an 8th SMP/GPU folding system, mainly to use as a test bed for automating the startup of the Linux/Wine/GPU3 FAH client. When I'm done, the GPU3 client will automatically start and stop, just like the SMP client does now under Linux.

Keep hangin' in there! :ewink:

Art
art_l_j_PlanetAMD64
 
Posts: 741
Joined: Sun May 30, 2010 2:28 pm

Re: GPU memtest failure

Postby davidcoton » Sun Feb 10, 2013 4:35 pm

Thanks Art for all the suggestions.

I think I've followed all the steps, some several times, and I hope I've repeated all the relevant later steps for each repeat of earlier steps. I've got libgl files for both 32bit and 64bit, in reasonable directories (AFAICT), but it still looks as though the F@H client is picking the wrong version. I don't have any idea why or what to do about it. There have been reports of problems finding libgl.so.1 from the v7 Control (or Client?) on Ubuntu, so I may need to go back to 12.04 or even earlier. Previous working builds on my system may have been upgrades rather than clean installs. But I don't think (at the moment) it's an issue with the nvidia driver installation (except, possibly, its interaction with Ubuntu 12.10).

I'll probably give up for now, until I've got a more promising lead -- currently I'm re-reading all 24 pages of the install guide thread to see if there's any clues. I didn't find anything directly relevant searching here for libgl -- other reports seem to be completely not found rather than the ELF class problem I've got.

David
davidcoton
 
Posts: 952
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: GPU memtest failure

Postby art_l_j_PlanetAMD64 » Sun Feb 10, 2013 6:05 pm

davidcoton wrote:Thanks Art for all the suggestions.

I think I've followed all the steps, some several times, and I hope I've repeated all the relevant later steps for each repeat of earlier steps. I've got libgl files for both 32bit and 64bit, in reasonable directories (AFAICT), but it still looks as though the F@H client is picking the wrong version. I don't have any idea why or what to do about it. There have been reports of problems finding libgl.so.1 from the v7 Control (or Client?) on Ubuntu, so I may need to go back to 12.04 or even earlier. Previous working builds on my system may have been upgrades rather than clean installs. But I don't think (at the moment) it's an issue with the nvidia driver installation (except, possibly, its interaction with Ubuntu 12.10).

I'll probably give up for now, until I've got a more promising lead -- currently I'm re-reading all 24 pages of the install guide thread to see if there's any clues. I didn't find anything directly relevant searching here for libgl -- other reports seem to be completely not found rather than the ELF class problem I've got.

David

David, have you tried setting the LD_LIBRARY_PATH variable? It looks like the linker is picking out the wrong (64-bit) rather than the right (32-bit) libgl. Using LD_LIBRARY_PATH to point to the 32-bit version should fix that. LD_LIBRARY_PATH is usually empty unless it had been set earlier:
Code: Select all
$ echo $LD_LIBRARY_PATH

$ export LD_LIBRARY_PATH=/usr/lib32
$ echo $LD_LIBRARY_PATH
/usr/lib32
$

Also, the ld man page shows that you can get the link map using the --print-map option. The output should be redirected to a file, because it's likely to be very big. Then you can see exactly where everything is coming from.

The man page also says:
The linker uses the following search paths to locate required shared libraries:
1. Any directories specified by -rpath-link options.

2. Any directories specified by -rpath options. The difference between -rpath and -rpath-link is that directories specified by -rpath options are included in the executable and used at runtime, whereas the -rpath-link option is only effective at link time. Searching -rpath in this way is only supported by native linkers and cross linkers which have been configured with the --with-sysroot option.

3. On an ELF system, for native linkers, if the -rpath and -rpath-link options were not used, search the contents of the environment variable "LD_RUN_PATH".

4. On SunOS, if the -rpath option was not used, search any directories specified using -L options.

5. For a native linker, the search the contents of the environment variable "LD_LIBRARY_PATH".

6. For a native ELF linker, the directories in "DT_RUNPATH" or "DT_RPATH" of a shared library are searched for shared libraries needed by it. The "DT_RPATH" entries are ignored if "DT_RUNPATH" entries exist.

7. The default directories, normally /lib and /usr/lib.

8. For a native linker on an ELF system, if the file /etc/ld.so.conf exists, the list of directories found in that file.

If the required shared library is not found, the linker will issue a warning and continue with the link.
art_l_j_PlanetAMD64
 
Posts: 741
Joined: Sun May 30, 2010 2:28 pm

Re: GPU memtest failure

Postby davidcoton » Sun Feb 10, 2013 6:28 pm

Progress! Adding /usr/lib32 to LD_LIBRARY_PATH fixed that problem. Now when I run the client, I've got:

Code: Select all
Folding@Home User Configuration

fixme:win:EnumDisplayDevicesW ((null),0,0x32b02c,0x00000000), stub!
[18:18:54] NVIDIA GeForce GTX 550 Ti detected
[18:18:54] At present your GPU is not supported or you need a current driver.
You may wish to consider running our standard client,
which you can download at folding.stanford.edu.


That's with 310.90 drivers at present (I know I may need to go back some), so the message hardly makes sense.

David
davidcoton
 
Posts: 952
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: GPU memtest failure

Postby art_l_j_PlanetAMD64 » Sun Feb 10, 2013 6:58 pm

davidcoton wrote:Progress! Adding /usr/lib32 to LD_LIBRARY_PATH fixed that problem. Now when I run the client, I've got:

Code: Select all
Folding@Home User Configuration

fixme:win:EnumDisplayDevicesW ((null),0,0x32b02c,0x00000000), stub!
[18:18:54] NVIDIA GeForce GTX 550 Ti detected
[18:18:54] At present your GPU is not supported or you need a current driver.
You may wish to consider running our standard client,
which you can download at folding.stanford.edu.


That's with 310.90 drivers at present (I know I may need to go back some), so the message hardly makes sense.

David

OK, you need to go to the 275.43 driver, as that is the last one that I found to be compatible with the v6.41 FAH GPU client. It works for me on all 3 Debian Linux systems here.

Get it like this:
Code: Select all
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/275.43/NVIDIA-Linux-x86_64-275.43.run


You're almost there! :D

Art
art_l_j_PlanetAMD64
 
Posts: 741
Joined: Sun May 30, 2010 2:28 pm

Re: GPU memtest failure

Postby davidcoton » Sun Feb 10, 2013 8:45 pm

Success! Many thanks Art for your help.

In the end I could only go back to the 304 driver, because:
    The 275 driver seems to need a v2 kernel, so won't install (Ubuntu 12.10 has a v3 kernel).
    The 295 driver will install, but the system won't start (blank screen, even in failsafex mode).

But I'm not sure even that was necessary: it turned out I needed to use -forcegpu even with -configonly.

Watch out for further cries of pain if it falls over tonight.

Thanks again

(BTW, do you run a small nuclear power station and live in the Arctic? How else can you manage power in and heat out for your folding farm? :D )

David
davidcoton
 
Posts: 952
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: GPU memtest failure

Postby art_l_j_PlanetAMD64 » Sun Feb 10, 2013 9:04 pm

davidcoton wrote:Success! Many thanks Art for your help.

In the end I could only go back to the 304 driver, because:
    The 275 driver seems to need a v2 kernel, so won't install (Ubuntu 12.10 has a v3 kernel).
    The 295 driver will install, but the system won't start (blank screen, even in failsafex mode).

But I'm not sure even that was necessary: it turned out I needed to use -forcegpu even with -configonly.

Watch out for further cries of pain if it falls over tonight.

Thanks again

(BTW, do you run a small nuclear power station and live in the Arctic? How else can you manage power in and heat out for your folding farm? :D )

David

David, that's fantastic! :D

Well, it does help to live in the "Great White North" (== Canada) in the middle of Winter! :ewink:

Yeah, the farm does eat up the power, with 8 computers and 11 total GPUs:
  • 8 x GTX 660 Ti OC (Gigabyte GV-N66TOC-2GD)
  • 2 x GTX 460 (Zotac ZT-40401-10P)
  • 1 x GTX 285 (MSI N285GTX SuperPipe OC)
Even in the middle of winter here, I've got the windows open, and big fans moving air around here. Just like in the old days, in the computer room of a VAX 11/780, with ice-cold hurricane-force winds. :shock:

Keep on folding,
Art
art_l_j_PlanetAMD64
 
Posts: 741
Joined: Sun May 30, 2010 2:28 pm

Re: GPU memtest failure

Postby Joe_H » Sun Feb 10, 2013 11:23 pm

art_l_j_PlanetAMD64 wrote:Just like in the old days, in the computer room of a VAX 11/780, with ice-cold hurricane-force winds.

That takes me back. Used to manage a server room with two 780's, two 785's, 8600, two 6310's, a 6420 and assorted other machines over the years from Convex, IBM, PR1ME and more. Got real quiet when the 800 A panel tripped, even more so when the 600 A panel for the AC went.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 3766
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Previous

Return to unOfficial Linux GPU (WINE wrapper) (3rd party support)

Who is online

Users browsing this forum: No registered users and 1 guest

cron