171.64.122.78/171.64.122.76 down?

Moderators: Site Moderators, FAHC Science Team

Post Reply
nid-de-corbeau
Posts: 2
Joined: Thu Jan 31, 2008 8:40 am

171.64.122.78/171.64.122.76 down?

Post by nid-de-corbeau »

Since 23rd August, every day, I am received the following message indicating that 171.64.122.76 can not accept my WU.
I am about to finish another one.
I have also posted in the group for server 171.64.122.78 but did not get any helpful reply.
Does anyone have any ideas as to how I can get the WU uploaded ?

Thank you.

[04:36:54] - Couldn't send HTTP request to server
[04:36:54] + Could not connect to Work Server (results)
[04:36:54] (171.64.122.78:8080)
[04:36:54] - Error: Could not transmit unit 07 (completed August 23) to work server.


[04:36:54] + Attempting to send results
[04:36:55] - Couldn't send HTTP request to server
[04:36:55] + Could not connect to Work Server (results)
[04:36:55] (171.64.122.76:8080)
[04:36:55] Could not transmit unit 07 to Collection server; keeping in queue.
Baowoulf
Posts: 208
Joined: Wed Dec 12, 2007 8:44 pm
Hardware configuration: Pentium 4 2.8 GHz, 512MB DDR Ram, 128MB Radeon 9800, Creative Soundblaster Audigy 4 Pro
Location: Jupiter 6
Contact:

Re: 171.64.122.78/171.64.122.76 down?

Post by Baowoulf »

171.64.122.78 is in REJECT atm on the server page. The other one seems ok but doesn't say it has any WU's waiting to be returned, but it has a netload of 201.
ppetrone
Pande Group Member
Posts: 115
Joined: Wed Dec 12, 2007 6:20 pm
Location: Stanford
Contact:

Re: 171.64.122.78/171.64.122.76 down?

Post by ppetrone »

Hi there,

Yes, the server is in REJECT. Sorry for the inconveniences. The server is currently on maintenance.
viewtopic.php?f=18&t=5064

Thanks,

Paula
nid-de-corbeau
Posts: 2
Joined: Thu Jan 31, 2008 8:40 am

Re: 171.64.122.78/171.64.122.76 down?

Post by nid-de-corbeau »

Hi. Sorry to be repetitive, but I haven't had a clear answer on this and, from other postings I see, I suspect that many contributors are affected by this problem and may not be aware of it.

1. OK, I know that 171.64.122.78:8080 has been down for weeks - fair enough.

2. But the one other server that my winfah attempts to connect to is supposed to be up, accoriding to what is being said in the forums, yet I have connected to both of these servers 171.64.122.78:8080 AND 171.64.122.76:8080 ("[15:45:27] - 295 failed uploads of this unit.") hundreds of times without success.

winfah communicates very briefly with the [down] 171.64.122.78:8080, but it often takes 10 - 30 seconds elapsed in communicating with 171.64.122.76:8080 before failing, which makes me think that there is communication but that the work unit is failing to be sent for some reason that is not apparent.

Can anyone clarify why upload to 171.64.122.76:8080 should fail hundreds of times while the server is supposedly up and accepting WUs ?

And is it normal that winfah would only attempt to connect to these two servers alone, and to no other servers, for uploading of work units, for weeks at a time ?

I have tried using the qd utility to look at the work queue (output below), and I have tried deleting every finished work unit from the queue leaving only this one unit, and I have tried using the qfix utility to repair the queue. Nothing has helped. Nothing.

[15:45:27] + Attempting to send results
[15:45:27] - Reading file work/wuresults_07.dat from core
[15:45:27] (Read 1604493 bytes from disk)
[15:45:27] Connecting to http://171.64.122.76:8080/
[15:45:27] - Couldn't send HTTP request to server
[15:45:27] + Could not connect to Work Server (results)
[15:45:27] (171.64.122.76:8080)
[15:45:27] Could not transmit unit 07 to Collection server; keeping in queue.
[15:45:27] - Failed to send unit 07 to server
[15:45:27] ***** Got a SIGTERM signal (2)

----- qd.exe output ------------
qd released 18 June 2008 (fr 069)
qd executed Fri Sep 12 01:53:57 AUS Eastern Standard Time 2008 (Thu Sep 11 15:53:57 UTC 2008)
Queue version 5.01
Current index: 8
Index 9: empty

...(uploaded work units deleted)...

Index 6: deleted 234.00 pts
server: 171.65.103.160:8080; project: 2170
Folding: run 28, clone 170, generation 21; benchmark 1408; misc: 500, 200
issue: Wed Jul 30 00:46:23 2008; begin: Wed Jul 30 00:44:53 2008
end: ZERO; due: Sat Oct 04 00:44:53 2008 (66 days)
preferred: Sat Sep 13 00:44:53 2008 (45 days)
core URL: http://www.stanford.edu/~pande/Win32/x86Core_82.fah
CPU: 1,687 Pentium II/III; OS: 1,7 Win2K
tag: P2170R28C170F21
assignment info (le): Wed Jul 30 00:46:22 2008; BB79C9E1
CS: 171.64.122.76; P limit: 5241856
user: xxxxxx; team: 24; ID: yyyyyyyyyy; mach ID: 1
work/wudata_06.dat file size: 94250; WU type: Folding@Home

Index 7: ready for upload 225.00 pts (0.506 pt/hr) 3.4 X min speed
server: 171.64.122.78:8080; project: 4432, "p4432_Seq41_Amber03"
Folding: run 420, clone 1, generation 15; benchmark 1408; misc: 500, 102
issue: Tue Aug 05 13:08:51 2008; begin: Tue Aug 05 13:07:08 2008
end: Sun Aug 24 01:25:47 2008; due: Tue Oct 07 13:07:08 2008 (63 days)
preferred: Wed Sep 17 13:07:08 2008 (43 days)
core URL: http://www.stanford.edu/~pande/Win32/x86Core_78.fah (V1.90)
CPU: 1,687 Pentium II/III; OS: 1,7 Win2K
tag: P4432R420C1F15
assignment info (le): Tue Aug 05 13:08:49 2008; BB73BD70
CS: 171.64.122.76; upload failures: 295; P limit: 5241856
user: xxxxxx; team: 24; ID: yyyyyyyyyy; mach ID: 1
work/wudata_07.dat file size: 238621; WU type: Folding@Home

Index 8: finished 234.00 pts (0.705 pt/hr) 4.78 X min speed
server: 171.65.103.160:8080; project: 2170
Folding: run 60, clone 854, generation 6; benchmark 3468; misc: 500, 200
issue: Wed Aug 27 01:01:41 2008; begin: Wed Aug 27 01:01:44 2008
end: Tue Sep 09 20:44:10 2008; due: Sat Nov 01 02:01:44 2008 (66 days)
preferred: Sat Oct 11 01:01:44 2008 (45 days)
core URL: http://www.stanford.edu/~pande/Win32/x86Core_82.fah
CPU: 1,87 Pentium IV; OS: 1,7 Win2K
tag: P2170R60C854F6
assignment info (le): Wed Aug 27 01:01:39 2008; BB1EF70C
CS: 171.64.122.76; P limit: 5241856
user: xxxxxx; team: 24; ID: yyyyyyyyyy; mach ID: 1
work/wudata_08.dat file size: 93504; WU type: Folding@Home
Results successfully sent: Tue Jan 01 08:30:40 2008
Average download rate 71.990 KB/s (u=4); upload rate 34.864 KB/s (u=4)
Performance fraction 0.842710 (u=4)
Average pph: 0.648, ppd: 15.54, ppw: 108.8, ppy: 5676


----- qfix.exe output ------------
entry 9, status 0, address 0.0.0.0
entry 0, status 0, address 171.64.122.72:8080
entry 1, status 0, address 134.139.127.31:8080
entry 2, status 0, address 171.64.122.72:8080
entry 3, status 0, address 134.139.127.32:8080
entry 4, status 0, address 171.65.103.160:8080
entry 5, status 0, address 171.64.122.72:8080
entry 6, status 0, address 171.65.103.160:8080
entry 7, status 2, address 171.64.122.78:8080
Found results <work\wuresults_07.dat>: proj 4432, run 420, clone 1, gen 15
-- queue entry: proj 4432, run 420, clone 1, gen 15
-- already queued for upload
entry 8, status 0, address 171.65.103.160:8080
File is OK


Thanks for reading.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 171.64.122.78/171.64.122.76 down?

Post by bruce »

FAH was originally designed (v4 and earlier) to upload only to the server that issued the WU. That's the address you see listed in the output from qfix or queueinfo. In v5, the option to upload to a collection server was added for times when the primary server was down or too busy. When the CS gets busy, too, there's nothing you can do. At that point, only the guys at Stanford can do anything, and about the only option is to fix broken servers (if there are any) and reduce the number of new downloads from the work servers until they have time to upload whatever backlog has built up. Of course that can only be done if there are plenty of WUs to be distributed from lightly loaded servers that can handle the re-balancing.
Alphax
Posts: 24
Joined: Tue Aug 26, 2008 2:51 pm
Hardware configuration: Vista Home Premium 32-bit
Intel Core2 Duo T7700@2.4GHz, 3GB DDR2
NVIDIA GeForce 8600M GT, 256MB DDR2, PCI-E x16, 178.24 drivers

Re: 171.64.122.78/171.64.122.76 down?

Post by Alphax »

Well, I'd been having issues where I couldn't upload WUs to these servers (78/76) since August 12, and now it's Magically Fixed (lost a WU but had another 2-3 upload from queue in the last hour or so). I should probably upgrade my client sometime...
(Hardware info in profile)
ppetrone
Pande Group Member
Posts: 115
Joined: Wed Dec 12, 2007 6:20 pm
Location: Stanford
Contact:

Re: 171.64.122.78/171.64.122.76 down?

Post by ppetrone »

Not sure about "Magically Fixed"... someone in the lab has worked it out. :ewink:
VijayPande
Pande Group Member
Posts: 2058
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: 171.64.122.78/171.64.122.76 down?

Post by VijayPande »

There's a lot going on here, with people working on improving the network, client code, and server code. It's the type of thing where the network issue was the straw that broke the camel's back, but we're looking to see what we can do to streamline everywhere to help. We're also pushing on the network side, which hopefully is improving.
bill93xfah
Posts: 3
Joined: Mon Aug 04, 2008 9:23 pm

Re: 171.64.122.78/171.64.122.76 down?

Post by bill93xfah »

I have been having this problem all week, too. It is very frustrating. Especially since one of the servers I am trying to connect with shows a status of "accept".

Bill Denholm
Rreini
Posts: 3
Joined: Thu Jul 24, 2008 10:07 pm

Re: 171.64.122.78/171.64.122.76 down?

Post by Rreini »

Bump up, 'cause 171.64.122.76 isn't responding properly now -- netload is 202, and the server's not coming back with "OK".
JaredKFan
Posts: 2
Joined: Tue Oct 21, 2008 6:03 pm
Location: Crestline, CA
Contact:

Re: 171.64.122.78/171.64.122.76 down?

Post by JaredKFan »

Bump too - server isn't responding for me to upload a finished work unit. netload is 200 currently.
ppetrone
Pande Group Member
Posts: 115
Joined: Wed Dec 12, 2007 6:20 pm
Location: Stanford
Contact:

Re: 171.64.122.78/171.64.122.76 down?

Post by ppetrone »

It's up now. Let's wait and see whether the netload goes down in the next hours.

Paula
eberlyml
Posts: 23
Joined: Sun Dec 02, 2007 10:17 pm
Location: Southeast Pennsylvania

Re: 171.64.122.78/171.64.122.76 down?

Post by eberlyml »

Yeah, I've got work waiting to go to .76 along with 171.64.65.111 that is in REJECT. Hopefully one of them is better soon :). Net load on .76 doesn't seem to be improving quickly.
eberlyml, team centos #48721
anandhanju
Posts: 526
Joined: Mon Dec 03, 2007 4:33 am
Location: Australia

Re: 171.64.122.78/171.64.122.76 down?

Post by anandhanju »

171.64.122.78 is in REJECT. In fact, it has been in this state for a week. Yet, I see the WUs RCVD column changing. Can someone kick it back to life?
ppetrone
Pande Group Member
Posts: 115
Joined: Wed Dec 12, 2007 6:20 pm
Location: Stanford
Contact:

Re: 171.64.122.78/171.64.122.76 down?

Post by ppetrone »

Hi there,

Thanks. I just kicked it back up. Let's see if it keeps going.
Paula
Post Reply