Two servers down -- work in progress

Any announcements about FAH policy, servers and new projects will be made here.

Moderators: Site Moderators, FAHC Science Team

Post Reply
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Two servers down -- work in progress

Post by VijayPande »

We are working on two servers right now:
171.64.65.83
128.59.74.4
We don't have an ETA, but I hope both will be back up later today.
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
mrshirts
Pande Group Member
Posts: 54
Joined: Sat Apr 26, 2008 4:32 am

Re: Two servers down -- work in progress

Post by mrshirts »

Unfortunately, it looks like there are RAID problems with 128.59.74.4. The raid will need to be rebuilt, which will likely take a few days. It's possible jobs can be accepted before then with the correct configuration, but I can't say for sure. I will continue to post on this as more information comes in.

128.59.74.4 has been a troubled server, and it will be retired after the RAID array is back up and it and accepts the jobs that are outstanding.
toTOW
Site Moderator
Posts: 6309
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Two servers down -- work in progress

Post by toTOW »

mrshirts wrote:Update on 128.59.74.4:
The good news, the all data (2 TB) is safe. I was able to rebuild and mount the raid. The bad news is, the server won't boot normally. Since it's actually at Columbia (where I don't work anymore), I'm a bit at the mercy of the IT support staff there in terms of getting it up and running again. The current plan is therefore to copy the data off that is needed to continue the projects, and try to relay the IP to a different machine, putting it into accept only mode. Time line is probably going to be about a week, unfortunately.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
toTOW
Site Moderator
Posts: 6309
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Two servers down -- work in progress

Post by toTOW »

mrshirts wrote:Latest update: another machine should be in place on Monday at Columbia, so I should be able to forward WU to a working server at that point. I'll post again on Monday with an update. Apologies for the inconvenience -- 128.59.74.4 will of course be retired, and things should work more smoothly here at U Va, since I'll be directly maintaining the machines.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
MstrBlstr
Posts: 578
Joined: Thu Nov 29, 2007 7:03 pm
Location: Texas

Re: Two servers down -- work in progress

Post by MstrBlstr »

mrshirts wrote:The IT guy at Columbia has a backload, so the computer to relay the WU is not up yet. I'll keep updating as more information comes available. Since the timeout is long (60 days), hopefully most people will not end up affected in the end.
-=MB=-
toTOW
Site Moderator
Posts: 6309
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Two servers down -- work in progress

Post by toTOW »

mrshirts wrote:Good news! Although we were not able to bring the machine up at Columbia, we set up another machine and I configured it to forward the traffic to the new Virginia machine. I can see the WUs rolling in as I speak now. All the old data is preserved as well.

It will take a little bit longer to set it up so that the credits are properly put into the database automatically, because of this port forwarding, but we'll get it taken care of. At this point, the WUs are being accepted, the stats files are being created, and they will be processed -- possibly it will take until Monday. But the major problem has been fixed.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply