plfah1-*.mskcc.org temporarily going down for maintenance
Moderators: Site Moderators, FAHC Science Team
-
- Pande Group Member
- Posts: 470
- Joined: Fri Feb 22, 2013 9:59 pm
plfah1-*.mskcc.org temporarily going down for maintenance
`plfah1-*.mskcc.org` is having some temporary RAID issues, so the work servers are being suspended for a few hours.
~ The Chodera lab
~ The Chodera lab
-
- Posts: 2574
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: plfah1-*.mskcc.org temporarily going down for maintenanc
Thank you for notifying us!
(It is always comforting to know it is not something we did)
(It is always comforting to know it is not something we did)
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
-
- Pande Group Member
- Posts: 470
- Joined: Fri Feb 22, 2013 9:59 pm
Re: plfah1-*.mskcc.org temporarily going down for maintenanc
Update: The RAID rebuild failed and the controller may be faulty, but we're trying to research the cabling first in case that fixes the issue. If not, we'll replace the controller and start to rebuild the RAID, bringing the server back online once the rebuild is complete.
We've heard some sporadic reports that the WUs did not have unaffected servers listed as collection servers (CSs), so we're coordinating with some other FAH Consortium labs to add more offsite collection servers so that disruption will be minimal in case this ever happens again in the future.
More updates soon. Again, our apologies for the downtime here.
The affected server IP range is 140.163.4.231-140.163.4.235
We've heard some sporadic reports that the WUs did not have unaffected servers listed as collection servers (CSs), so we're coordinating with some other FAH Consortium labs to add more offsite collection servers so that disruption will be minimal in case this ever happens again in the future.
More updates soon. Again, our apologies for the downtime here.
The affected server IP range is 140.163.4.231-140.163.4.235
-
- Pande Group Member
- Posts: 470
- Joined: Fri Feb 22, 2013 9:59 pm
Re: plfah1-*.mskcc.org temporarily going down for maintenanc
UPDATE: Reseating the cables did not resolve the issue, so Dell is dispatching a technician and parts within 4 hours today to replace the RAID controller, drawer, and SAS chain cable.
More updates on ETA for restoration once the RAID has started rebuilding.
~ The Chodera Lab
More updates on ETA for restoration once the RAID has started rebuilding.
~ The Chodera Lab
-
- Pande Group Member
- Posts: 470
- Joined: Fri Feb 22, 2013 9:59 pm
Re: plfah1-*.mskcc.org temporarily going down for maintenanc
The hardware vendor apparently doesn't consider the chassis drawer to be subject to our 4-hour onsite warranty, so is having a replacement drawer shipped. Unfortunately, this means the earliest we project being online following RAID rebuild is Thu 25 Jan.
Apologies again for the disruption, and I'll update if there is any new information in the meantime.
~ The Chodera lab
Apologies again for the disruption, and I'll update if there is any new information in the meantime.
~ The Chodera lab
-
- Pande Group Member
- Posts: 470
- Joined: Fri Feb 22, 2013 9:59 pm
Re: plfah1-*.mskcc.org temporarily going down for maintenanc
Update: Dell is dispatching a tech with the replacement part this morning! Hopefully we will be back online sooner than planned!
~ The Chodera Lab
~ The Chodera Lab
-
- Pande Group Member
- Posts: 470
- Joined: Fri Feb 22, 2013 9:59 pm
Re: plfah1-*.mskcc.org temporarily going down for maintenanc
UPDATE: Our awesome Open Systems Group and datacenter team now have the hardware replaced and the RAID is rebuilding, with an ETA for completion of 60+ hours.
~ The Chodera lab
~ The Chodera lab
-
- Pande Group Member
- Posts: 470
- Joined: Fri Feb 22, 2013 9:59 pm
Re: plfah1-*.mskcc.org temporarily going down for maintenanc
UPDATE: Estimates suggest approximately 40 hours remain for RAID rebuild.
-
- Posts: 2574
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: plfah1-*.mskcc.org temporarily going down for maintenanc
Dr Chodera,
The donors get error messages with IP addresses, while you have reported the downtime of a server by DNS name. Would it be possible to give us the IP address so we could use your estimated time to rebuild to address donor issues?
Yours
Jimbo
The donors get error messages with IP addresses, while you have reported the downtime of a server by DNS name. Would it be possible to give us the IP address so we could use your estimated time to rebuild to address donor issues?
Yours
Jimbo
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
-
- Site Admin
- Posts: 7878
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: plfah1-*.mskcc.org temporarily going down for maintenanc
These WS's are not currently taking any connections, so I doubt there will be any reports for a bit. But if you look at the Server Status page, the ones down are IP numbers 140.163.4.231-235 and 140.163.4.241-245.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
-
- Pande Group Member
- Posts: 470
- Joined: Fri Feb 22, 2013 9:59 pm
Re: plfah1-*.mskcc.org temporarily going down for maintenanc
WSs 140.163.4.231-233 are back online! Thanks for your patience.
-
- Pande Group Member
- Posts: 470
- Joined: Fri Feb 22, 2013 9:59 pm
Re: plfah1-*.mskcc.org temporarily going down for maintenanc
I'll be sure to report IP addresses in the subject line next time. Sorry for the hassle!
-
- Posts: 1596
- Joined: Tue May 28, 2013 12:14 pm
- Location: Tokyo
Re: plfah1-*.mskcc.org temporarily going down for maintenanc
Thanks for the efforts and greetings to the IT support team
Please contribute your logs to http://ppd.fahmm.net