General problem with Stanford Servers?

Moderators: Site Moderators, FAHC Science Team

Post Reply
Mr.Nosmo
Posts: 26
Joined: Fri Oct 17, 2008 9:04 am
Hardware configuration: i7-980X@4200 (AirCooled) w/6GB DDR3-1600 (8-8-8-20), 200GB Vertex2 SSD & 3xGTS-250 w/22" Eizo Monitors.

General problem with Stanford Servers?

Post by Mr.Nosmo »

Lately I have had issues with getting work and uploading work and there are quite a few posts in the forum about this from other folders. Maybe it's because of "Internet Security" software, ISP's or other factors?

Does Stanford have a general issue with the servers or is it me that ask too much? I'm not uset to down-time, because I used to work for IBM zSeries and downtime is something "we" can't accept....

I'm not complaining and I'll continue to fold as long as the project runs, I can afford the electricity-bill, or until I die, but I'll be happy to have a bit info about the servers and maybe start a brain-storming on how the up-time can be improved...

Please come with some positive input!
John_Weatherman
Posts: 289
Joined: Sun Dec 02, 2007 4:31 am
Location: Carrizo Plain National Monument, California
Contact:

Re: General problem with Stanford Servers?

Post by John_Weatherman »

There have been some server issues but nothing major. If you're having problems then post your log file and it can be checked out. A little downtime is something folders must get used to, but luckily it does n't normally mess up the science. And that's what we're all here for.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: General problem with Stanford Servers?

Post by bruce »

IBM is in the business of providing high-availability solutions. If a merchant can't get a credit card authorization in a few seconds, it's a serious failure. FAH is not like that.

FAH takes a distributed approach where (almost) any component of the system can fail occasionally and it may take some time to repair. As long as the overall functionality isn't seriously impacted, some downtime is acceptable. i.e.- a component can be down as long as the overall system is still functional.

As you collect gripes from others and attempt to draw overall conclusions about FAH (concluding that "we" can't accept downtime), consider the following:
> If I can't get work from my favorite server yet I can get work from some other server, should that be called a failure?
(>>> If the other server gives me an WU with a lower PPD, is that a failure?)
> If I can't upload right now but the result does find its way home in a few hours, should that be called a failure?
> If somebody discards a WU (intentionally or unintentionally) should that be called a FAH a failure?
> If I can't get work for XX minutes, should that be called a failure?
(>>> ... and how should XX compare to the "few seconds" in my credit card approval example.)
> If some points are temporarily "lost" but a re-credit is processed manually some time later, should that be called a failure?
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: General problem with Stanford Servers?

Post by 7im »

Yes, Stanford does have problems. They are underfunded, understaffed, and overworked students and faculty. They are not technology professionals (well, most aren't). As Bruce indicated, the project usually has many servers to provide work and collect work so that individual server uptime is not a critical issue. And even when that doesn't work, the fah client is designed to cache the completed work unit, and download new work from a different server, and keep processing until a solution is found. Even then, a fah data packet is not the same priority as a banking transaction data packet. In real life, the bank is the priority. (to me, the fah data packet is more important, because I'm guaranteed to die someday, but never guaranteed to have a lot of money ;))

Brainstorm...

1. Huge donation of IBM servers, IBM service, or cash, or all three.
2. Patience while stanford continues the process of installing and converting over to, and optimizing the new servers that $100,000 purchased for them last year.
3. Patience while the client and server code is rewritten from ground up to be more up to date, reliable, and easier to maintain. (2nd half of #1 could help speed this part along) V7 client may help, but new server code is much bigger.
4. Patience while Pande Group gets it's new crop of researchers up to speed, and expands the research and the location of FAH servers to several other universities. (co-location can be a great uptime helper, as long as they are well managed and well integrated)
5. And if no patience is available, then tolerance is an acceptable alternative. ;)

Any additions to what's already taking place? :D


P.S. The head of the project addressed a similar issue like this... and I hope he doesn't mind if I repost it...
VijayPande wrote:One has to put this all in perspective. Supercomputer centers have 10x to 100x the budget we have for operations and still are often down over the weekends for much longer than FAH is when something unexpected comes up. We have some very dedicated people in our team -- people willing to do fixes on weekends and holidays -- but they do have to sleep. Also, running a FAH server is not like running apache (it is a lot more complex and people aren't familiar with it), so hiring a 3rd party firm to manage off hours wouldn't work (or would be very, very expensive).

So, if you see a problem that isn't being fixed and it's in between 10:30pm and 7:30am pacific time, odds are it will have to wait until 8:30am pacific time or so for someone to deal with it. We've built a lot of redundancy into FAH operations, but there are limits to this too, especially in very early beta projects like GPU3. Hopefully with this in mind, people can have a better sense of when fixes can be made, and how hard we work to fix them as quickly as possible.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Bobby-Uschi
Posts: 70
Joined: Thu Jul 31, 2008 3:26 pm
Hardware configuration: PC1//C2Q-Q9450,GA-X48-DS5-NinjaMini,GTX285,2x160GB Western Sata2,2x1GB Geil800,Tagan 800W;XP Pro SP3-32Bit;
PC2//C2Q-Q2600k.GB-P67UD4-Freezer 7Pro,GTX285Leadtek,260 GB Western Sata2,4x2GB GeilPC3,OCZ600W;Win7-64Bit;Siemens 22"
Location: Deutschland

Re: General problem with Stanford Servers?

Post by Bobby-Uschi »

http://fah-web.stanford.edu/serverstat.html -????
Will not work for GPU2.No connection with any GPU server
PC1//C2Q-Q9450,GA-X48-DS5-,2xGTX285,2x160GB Western Sata2,2x1GB Geil800,Tagan 800W;XP Pro SP3-32Bit
PC2//C2Q-Q2600k.GB-P67UD4-Freezer 7Pro,GTX285Leadtek,260 GB WeSata2,4x2GB GeilPC3,OCZ600W;Win7-64Bit;Siemens 22"stern
toTOW
Site Moderator
Posts: 6312
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: General problem with Stanford Servers?

Post by toTOW »

Some servers are pretty slow to answer, but I got work on my two NV GPUs after a few attempts ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
weedacres
Posts: 138
Joined: Mon Dec 24, 2007 11:18 pm
Hardware configuration: UserNames: weedacres_gpu ...
Location: Eastern Washington

Re: General problem with Stanford Servers?

Post by weedacres »

Lots of hand tending this morning, having to manually restart gpu clients hung while trying to download from 171.67.108.20 and 21.
Image
Bobby-Uschi
Posts: 70
Joined: Thu Jul 31, 2008 3:26 pm
Hardware configuration: PC1//C2Q-Q9450,GA-X48-DS5-NinjaMini,GTX285,2x160GB Western Sata2,2x1GB Geil800,Tagan 800W;XP Pro SP3-32Bit;
PC2//C2Q-Q2600k.GB-P67UD4-Freezer 7Pro,GTX285Leadtek,260 GB Western Sata2,4x2GB GeilPC3,OCZ600W;Win7-64Bit;Siemens 22"
Location: Deutschland

Re: General problem with Stanford Servers?

Post by Bobby-Uschi »

Thanks toTOW,
2 machines working again, but one is without work.
The servers are so slow .......
Thank you
PC1//C2Q-Q9450,GA-X48-DS5-,2xGTX285,2x160GB Western Sata2,2x1GB Geil800,Tagan 800W;XP Pro SP3-32Bit
PC2//C2Q-Q2600k.GB-P67UD4-Freezer 7Pro,GTX285Leadtek,260 GB WeSata2,4x2GB GeilPC3,OCZ600W;Win7-64Bit;Siemens 22"stern
Post Reply