[solved] fah6: relocation error [DEBIAN/sidux]

Moderators: Site Moderators, PandeGroup

[solved] fah6: relocation error [DEBIAN/sidux]

Postby ThunderRd » Tue Jun 01, 2010 9:05 am

Does anyone have any thoughts on this log from my sidux machine, running the 6.29 client on SMP?

I have never had a hint of a problem on it, but today, after finishing a WU, it did this:

Code: Select all
Writing final coordinates.
[20:31:38] Completed 500000 out of 500000 steps  (100%)

 Average load imbalance: 2.3 %
 Part of the total run time spent waiting due to load imbalance: 0.9 %


   Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:  18397.176  18397.176    100.0
                       5h06:37
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:     49.896      3.226      0.709     33.842

Thanx for Using GROMACS - Have a Nice Day

[20:31:38] DynamicWrapper: Finished Work Unit: sleep=10000
[20:31:48]
[20:31:48] Finished Work Unit:
[20:31:48] - Reading up to 3699216 from "work/wudata_05.trr": Read 3699216
[20:31:48] trr file hash check passed.
[20:31:48] edr file hash check passed.
[20:31:48] logfile size: 66253
[20:31:48] Leaving Run
[20:31:51] - Writing 3800621 bytes of core data to disk...
[20:31:52]   ... Done.
[20:34:19] - Shutting down core
[20:34:19]
[20:34:19] Folding@home Core Shutdown: FINISHED_UNIT
[20:34:41] CoreStatus = 64 (100)
[20:34:41] Unit 5 finished with 73 percent of time to deadline remaining.
[20:34:41] Updated performance fraction: 0.755146
[20:34:41] Sending work to server
[20:34:41] Project: 6051 (Run 0, Clone 60, Gen 29)


[20:34:41] + Attempting to send results [May 31 20:34:41 UTC]
[20:34:41] - Reading file work/wuresults_05.dat from core
[20:34:41]   (Read 3800621 bytes from disk)
[20:34:41] Connecting to http://171.64.65.54:8080/
[20:35:17] Posted data.
[20:35:18] Initial: 0000; - Uploaded at ~100 kB/s
[20:35:18] - Averaged speed for that direction ~86 kB/s
[20:35:18] + Results successfully sent
[20:35:18] Thank you for your contribution to Folding@Home.
[20:35:18] + Number of Units Completed: 1135

[20:36:05] Trying to send all finished work units
[20:36:05] + No unsent completed units remaining.
[20:36:05] - Preparing to get new work unit...
[20:36:05] Cleaning up work directory
[20:36:05] + Attempting to get work packet
[20:36:05] Passkey found
[20:36:05] - Will indicate memory of 2013 MB
[20:36:05] - Connecting to assignment server
[20:36:05] Connecting to http://assign.stanford.edu:8080/
fah6: relocation error: /lib/libnss_files.so.2: symbol __rawmemchr, version GLIBC_2.2.5 not defined in file libc.so.6 with link time reference
thunderrd@OPTERON-185:~/FAH6$
thunderrd@OPTERON-185:~/FAH6$ fah

Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.

2 cores detected


--- Opening Log file [June 1 08:57:56 UTC]


# Linux SMP Console Edition ###################################################
###############################################################################

                       Folding@Home Client Version 6.29

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/thunderrd/FAH6
Executable: ./fah6
Arguments: -smp -advmethods -verbosity 9

[08:57:56] - Ask before connecting: No
[08:57:56] - User name: ThunderRd (Team 45)
[08:57:56] - User ID: 576C1872220C527
[08:57:56] - Machine ID: 1
[08:57:56]
[08:57:56] Work directory not found. Creating...
[08:57:56] Could not open work queue, generating new queue...
[08:57:56] - Preparing to get new work unit...
[08:57:56] Cleaning up work directory
[08:57:56] - Autosending finished units... [08:57:56]
[08:57:56] + Attempting to get work packet
[08:57:56] Trying to send all finished work units
[08:57:56] Passkey found
[08:57:56] + No unsent completed units remaining.
[08:57:56] - Autosend completed
[08:57:56] - Will indicate memory of 2013 MB
[08:57:56] - Connecting to assignment server
[08:57:56] Connecting to http://assign.stanford.edu:8080/
fah6: relocation error: /lib/libnss_files.so.2: symbol __rawmemchr, version GLIBC_2.2.5 not defined in file libc.so.6 with link time reference
thunderrd@OPTERON-185:~/FAH6$


Interestingly enough, I ran apt-get dist-upgrade just yesterday, and everything seemed to upgrade flawlessly. I'm wondering if this could be similar to the Fedora13 problem people have been complaining about lately, maybe a version mismatch or something? Is this machine suddenly a candidate for the alternate fah executables, fah6.static or fah6_alt?
Last edited by ThunderRd on Tue Jun 01, 2010 1:43 pm, edited 1 time in total.
ASUS Maximus Extreme X38 - QX9650@4.2G - 8G Corsair Dominator DDR3-2000 - GTX470 - Win7 Pro, Driver 305.68 running GPU3 + SMP
ASUS P5Q Pro Turbo P45 - Q6600@3.5G - 4G HyperX DDR2-1066 - GT440 - Gentoo/aptosid, Driver 304.51 running GPU3 [in WINE] + SMP
ThunderRd
 
Posts: 123
Joined: Sun Dec 02, 2007 5:30 am
Location: Nong Khai, Thailand

Re: fah6: relocation error, never seen before

Postby toTOW » Tue Jun 01, 2010 9:17 am

It's a glibc problem ... it's known to affect latest Ubuntu and Fedora distributions ...

Here are two links to the issues :
viewtopic.php?f=44&t=13064
viewtopic.php?f=44&t=12939

There are fixes posted by tear.
Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.

FAH-Addict : latest news, tests and reviews about Folding@Home project.

Image
User avatar
toTOW
Site Moderator
 
Posts: 7999
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: fah6: relocation error, never seen before

Postby ThunderRd » Tue Jun 01, 2010 11:52 am

I read those threads before I posted; I just wasn't sure if the problem was the same. The Hus fix is for "Could not CosmHTTPOpen", which I do not see, but tear has deprecated his fix as of yesterday and said to use it: is it correct for me? I mean, running sidux or Debian Sid is a bit touchy at times, and I don't want to screw something up...

In the meantime, while waiting for someone to post, I tried fah6.static and fah6_alt.

fah6_alt runs, but returns an endless "no appropriate server available". fah6.static gives me the immediate relocation error. What does this mean to me?

Thanks for the help, totow.
ThunderRd
 
Posts: 123
Joined: Sun Dec 02, 2007 5:30 am
Location: Nong Khai, Thailand

Re: fah6: relocation error, never seen before

Postby gizmo » Tue Jun 01, 2010 12:10 pm

On Fedora, the problem appears to have been worked around by simply running nscd. You could try that and see what happens.
gizmo
 
Posts: 34
Joined: Mon Sep 21, 2009 1:35 am

Re: fah6: relocation error, never seen before

Postby tear » Tue Jun 01, 2010 1:18 pm

Couldn't CosmHTTPOpen
Cannot get ID from server
relocation error

are all manifestations of pretty much the same problem* (due to subtle differences between Ubuntu and Fedora you get different output but root cause is the same).

nscd workaround will also work on Ubuntu but you'll need to explicitly enable host caching (in /etc/nscd.conf) per smoking2000's post.

Having said that -- sequence of operations would be (haven't tested it -- traveling):
Code: Select all
sudo apt-get install nscd
# open /etc/nscd.conf with your $EDITOR
# find following line:
#         enable-cache            hosts           no
# change "no" to "yes"
# save & exit the editor
sudo service nscd restart


EDIT, Jul 21:
Upon installation "nscd" should already be configured to start
automatically; if, for any reason, that's not happening, issue:
Code: Select all
sudo update-rc.d nscd enable


EDIT, Aug 26
*) which is: 6.x FAH client Linux binaries are non-portable and work only by accident; numerous details are available.

HTH,
tear
Last edited by tear on Thu Aug 26, 2010 3:39 pm, edited 3 times in total.
One man's ceiling is another man's floor.
Image
tear
 
Posts: 924
Joined: Sun Dec 02, 2007 4:08 am
Location: Rocky Mountains

Re: fah6: relocation error, never seen before

Postby ThunderRd » Tue Jun 01, 2010 1:42 pm

Yep, manually enabling the hosts caching in /etc/nscd.conf did it perfectly. Running now, so it appears no significant variations in sidux.

Thanks for the kind help, gentlemen. You too, gizmo ;) See you at HQ.
ThunderRd
 
Posts: 123
Joined: Sun Dec 02, 2007 5:30 am
Location: Nong Khai, Thailand

Re: [solved] fah6: relocation error [DEBIAN/sidux]

Postby pcowley » Sat Jul 10, 2010 6:09 am

Very strange this fix - worked on one machine, which got me a little excited, but alas, not on the other.

I have 2 x AMD 64 dual core machines. On my primary one, the nscd fix above worked. But on the other one it did not work after following the same procedure. I still get:
Code: Select all
[06:04:25] Loaded queue successfully.
[06:04:25]
[06:04:25] - Autosending finished units... [06:04:25]
[06:04:25] + Processing work unit
[06:04:25] Trying to send all finished work units
[06:04:25] Core required: FahCore_a3.exe
[06:04:25] + No unsent completed units remaining.
[06:04:25] - Autosend completed
[06:04:25] Core not found.
[06:04:25] - Core is not present or corrupted.
[06:04:25] - Attempting to download new core...
[06:04:25] + Downloading new core: FahCore_a3.exe
[06:04:25] Downloading core (/~pande/Linux/AMD64/Core_a3.fah from www.stanford.edu)
fah6: relocation error: /lib/libnss_files.so.2: symbol __rawmemchr, version GLIBC_2.2.5 not defined in file libc.so.6 with link time reference


I have also cold booted the machine it did not work on but no difference (not that I really expected there to be!)
I am running Ubuntu 10.04 on both machines and they both have all the latest updates.

Bugger!!! <grin>

Any further ideas?

Cheers
Pete
pcowley
 
Posts: 20
Joined: Mon Sep 01, 2008 10:12 am

Re: [solved] fah6: relocation error [DEBIAN/sidux]

Postby tear » Sat Jul 10, 2010 6:47 am

I've received one report of intermittent failures with nscd.... if you restart the client, say... 5 times
does it make any difference?

Also, please post complete (starting at client's very startup) log next time.

It seems like your client downloaded a WU successfully and now is attempting to fetch the FahCore.
Can you confirm whether WU download occurred on Ubuntu 10.04? (that would suggest something
did work)

Did you modify /etc/nscd.conf to enable host caching? Can you double check please?


tear
tear
 
Posts: 924
Joined: Sun Dec 02, 2007 4:08 am
Location: Rocky Mountains

Re: [solved] fah6: relocation error [DEBIAN/sidux]

Postby johnT89 » Mon Jul 19, 2010 11:58 am

aren't there any security or miscellaneous issues with installind nscd and enabling host cache?
johnT89
 
Posts: 56
Joined: Sun Aug 23, 2009 4:43 pm

Re: [solved] fah6: relocation error [DEBIAN/sidux]

Postby tear » Mon Jul 19, 2010 3:06 pm

Some concerns had been expressed by Debian glibc maintainers (see here) but unless you use NIS/Kerberos logons you should be fine.

tear
tear
 
Posts: 924
Joined: Sun Dec 02, 2007 4:08 am
Location: Rocky Mountains

Re: [solved] fah6: relocation error [DEBIAN/sidux]

Postby noorman » Wed Jul 21, 2010 9:40 pm

.

Had done the change to the nscd conf and it worked, till tonight.

Can't remember if I had done a system update yesterday, but when I tried to launch SMP Linux tonight, I got the 'relocation' error again.

Checked the setting in the nscd.conf file and it was still at 'yes'.
Conclusion, I needed to try a restart of the service.
And, indeed, that was needed to enable Linux SMP again ...

Was this through a system update or caused by something else ? (the service had been stopped, but not (knowingly) by me ...)
Any clues ?

.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
User avatar
noorman
 
Posts: 553
Joined: Sun Dec 02, 2007 2:26 pm
Location: Belgium, near the International Sea-Port of Antwerp

Re: [solved] fah6: relocation error [DEBIAN/sidux]

Postby tear » Wed Jul 21, 2010 10:04 pm

"service" command works run-time; changes made by it do not last across reboots.

You'd need to use chkconfig (or similar) tool to store the preference.
I don't remember how to do it off the top of my head (at work now) but look
and you shall find :-)


tear
tear
 
Posts: 924
Joined: Sun Dec 02, 2007 4:08 am
Location: Rocky Mountains

Re: [solved] fah6: relocation error [DEBIAN/sidux]

Postby tear » Thu Jul 22, 2010 3:12 am

Home now. Two datapoints:
1. I recalled that when I had installed nscd it got set up for automatic startup
2. The tool you need to use is update-rc.d, i.e. "sudo update-rc.d nscd enable" -- this will store your preference


HTH,
tear
tear
 
Posts: 924
Joined: Sun Dec 02, 2007 4:08 am
Location: Rocky Mountains

Re: [solved] fah6: relocation error [DEBIAN/sidux]

Postby noorman » Thu Jul 22, 2010 8:42 am

.

Very, very odd;
I had nscd installed and configured shortly after it was posted and I 've been running Linux SMP at night and in weekends (lower power tariff here) !
Never in all these attempts I had this error again / I run this system in DUAL-boot, WinXP and Ubuntu 10.04 Desktop.
So it has been rebooted and even shut down (for air cooling maintenance) several times without any hick-ups :?:

Only last night, I got this message to my utter surprize; so I linked it to the possibility that a system upgrade had changed a or more settings ...

Anyway, I tracked back to that original post about the nscd solution and I found the update; I just ran that command and hope this is fixed now.

Thanks to 'tear' for getting this fix in so soon / I 'm a happy Folder again / last night I just reran the service start command; that fixed it for that session :D

.
Last edited by noorman on Thu Jul 22, 2010 2:31 pm, edited 1 time in total.
User avatar
noorman
 
Posts: 553
Joined: Sun Dec 02, 2007 2:26 pm
Location: Belgium, near the International Sea-Port of Antwerp

Re: [solved] fah6: relocation error [DEBIAN/sidux]

Postby tear » Thu Jul 22, 2010 1:47 pm

I wouldn't be surprised if there was a bug in nscd that would cause it to... crash
given some... input (or something).

Your report makes it the third (you said it was a sudden change in behaviour):
-- first one was reported with non-forum channels
-- second one is there: viewtopic.php?f=44&t=15297 (whoever locked the thread apparently didn't give the subject a lot of thought)

Every one of them happened on Ubuntu too.

Let know if it happens again.


Thanks,
tear
tear
 
Posts: 924
Joined: Sun Dec 02, 2007 4:08 am
Location: Rocky Mountains

Next

Return to V6.34Beta SMP2 with passkey [Not Bigadv]

Who is online

Users browsing this forum: No registered users and 0 guests