171.67.108.33 - vsp05c - Problems?

Moderators: Site Moderators, FAHC Science Team

Post Reply
Fireball0236
Posts: 58
Joined: Sat Oct 09, 2010 6:05 am

171.67.108.33 - vsp05c - Problems?

Post by Fireball0236 »

Did something happen to this server?

"Thu Nov 11 11:55:10 PST 2010" this server had a good 115k WUs to send. An update or 2 later, it dropped down to 10k left. And now it's hovering randomly between 1-100. Nor have I seen the Projects by this WS on the project summary page lately (P1126x-1128x).

Today, my clients have been assigned to this WS repeatedly, but without much success (until finially being assigned to the WS of the 65xx Projects):

Code: Select all

[20:19:38] - Preparing to get new work unit...
[20:19:38] + Attempting to get work packet
[20:19:38] - Connecting to assignment server
[20:19:38] Connecting to http://assign.stanford.edu:8080/
[20:19:40] Posted data.
[20:19:40] Initial: 43AB; - Successful: assigned to (171.67.108.33).
[20:19:40] + News From Folding@Home: Welcome to Folding@Home
[20:19:40] Loaded queue successfully.
[20:19:40] Connecting to http://171.67.108.33:8080/
[20:19:41] Posted data.
[20:19:41] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[20:19:41] - Attempt #1  to get work failed, and no other work to do.
             Waiting before retry.
[20:19:55] + Attempting to get work packet
[20:19:55] - Connecting to assignment server
[20:19:55] Connecting to http://assign.stanford.edu:8080/
[20:19:56] Posted data.
[20:19:56] Initial: 43AB; - Successful: assigned to (171.67.108.33).
[20:19:56] + News From Folding@Home: Welcome to Folding@Home
[20:19:56] Loaded queue successfully.
[20:19:56] Connecting to http://171.67.108.33:8080/
[20:19:56] Posted data.
[20:19:56] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[20:19:56] - Attempt #2  to get work failed, and no other work to do.
             Waiting before retry.
[20:20:14] + Attempting to get work packet
[20:20:14] - Connecting to assignment server
[20:20:14] Connecting to http://assign.stanford.edu:8080/
[20:20:15] Posted data.
[20:20:15] Initial: 43AB; - Successful: assigned to (171.67.108.33).
[20:20:15] + News From Folding@Home: Welcome to Folding@Home
[20:20:15] Loaded queue successfully.
[20:20:15] Connecting to http://171.67.108.33:8080/
[20:20:16] Posted data.
[20:20:16] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[20:20:16] - Attempt #3  to get work failed, and no other work to do.
             Waiting before retry.
[20:20:41] + Attempting to get work packet
[20:20:41] - Connecting to assignment server
[20:20:41] Connecting to http://assign.stanford.edu:8080/
[20:20:42] Posted data.
[20:20:42] Initial: 43AB; - Successful: assigned to (171.67.108.33).
[20:20:42] + News From Folding@Home: Welcome to Folding@Home
[20:20:42] Loaded queue successfully.
[20:20:42] Connecting to http://171.67.108.33:8080/
[20:20:43] Posted data.
[20:20:43] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[20:20:43] - Attempt #4  to get work failed, and no other work to do.
             Waiting before retry.
[20:21:26] + Attempting to get work packet
[20:21:26] - Connecting to assignment server
[20:21:26] Connecting to http://assign.stanford.edu:8080/
[20:21:26] Posted data.
[20:21:26] Initial: 43AB; - Successful: assigned to (171.67.108.33).
[20:21:26] + News From Folding@Home: Welcome to Folding@Home
[20:21:27] Loaded queue successfully.
[20:21:27] Connecting to http://171.67.108.33:8080/
[20:21:27] Posted data.
[20:21:27] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[20:21:27] - Attempt #5  to get work failed, and no other work to do.
             Waiting before retry.
[20:22:52] + Attempting to get work packet
[20:22:52] - Connecting to assignment server
[20:22:52] Connecting to http://assign.stanford.edu:8080/
[20:22:53] Posted data.
[20:22:53] Initial: 43AB; - Successful: assigned to (171.67.108.33).
[20:22:53] + News From Folding@Home: Welcome to Folding@Home
[20:22:53] Loaded queue successfully.
[20:22:53] Connecting to http://171.67.108.33:8080/
[20:22:54] Posted data.
[20:22:54] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[20:22:54] - Attempt #6  to get work failed, and no other work to do.
             Waiting before retry.
[20:25:42] + Attempting to get work packet
[20:25:42] - Connecting to assignment server
[20:25:42] Connecting to http://assign.stanford.edu:8080/
[20:25:42] Posted data.
[20:25:42] Initial: 43AB; - Successful: assigned to (171.67.108.33).
[20:25:42] + News From Folding@Home: Welcome to Folding@Home
[20:25:42] Loaded queue successfully.
[20:25:42] Connecting to http://171.67.108.33:8080/
[20:25:43] Posted data.
[20:25:43] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[20:25:43] - Attempt #7  to get work failed, and no other work to do.
             Waiting before retry.
[20:31:08] + Attempting to get work packet
[20:31:08] - Connecting to assignment server
[20:31:08] Connecting to http://assign.stanford.edu:8080/
[20:31:08] Posted data.
[20:31:08] Initial: 40AB; - Successful: assigned to (171.64.65.62).
[20:31:08] + News From Folding@Home: Welcome to Folding@Home
[20:31:09] Loaded queue successfully.
[20:31:09] Connecting to http://171.64.65.62:8080/
[20:31:10] Posted data.
[20:31:10] Initial: 0000; - Receiving payload (expected size: 502854)
[20:31:11] - Downloaded at ~491 kB/s
[20:31:11] - Averaged speed for that direction ~353 kB/s
[20:31:11] + Received work.
If it's misbehaving, someone give it a nudge? The 1126x-1128x projects were consistent ones my clients liked :P .


~ Fireball0236
Last edited by toTOW on Sun Nov 14, 2010 8:52 pm, edited 2 times in total.
Reason: IP address fixed in title.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 171.67.108.33 - vsp05c - Problems?

Post by bruce »

One of the reasons that we tell people that it's bad to discard WUs is because FAH often has a shortage of WUs. What you're seeing here is one aspect of that type of issue.

Consider a server with say 10,000 WUs. What happens if 11000 people all try to get WUs from that server? Simple answer, it runs out.

OK, what happens next? Well, that depends.
First suppose there are lots of other servers with WUs for the same client? Donors looking for work will be redirected to other servers and although they won't get the same project, they'll get an assignment.
Now suppose that there are no other servers with WUs for the same client? People will get messages about there being no appropriate server to assign them work. Meanwhile, some people with the first 10000 WUs will begin to return their results so the server will generate new WUs which will be immediately assigned. The server may show a few hundred WUs or it may run out again, based on how quickly folks are returning WUs compared to how many folks are looking for new WUs. Obviously if those WUs have a preferred deadline of two to four weeks, every WU which is lost, will be completely out of circulation for two to four weeks before it is reassigned. Under these conditions, every lost WU does delay the project, but it also angers other donors who can't get work.

Now let's consider a hybrid situation. Suppose that other servers have WUs that can be assigned to the same client but they also have limited numbers of WUs. Some folks will get assignments from other servers; some will not.

That's the condition that I think I see here. Nothing is fundamentally wrong with the server. Where have all the WUs gone? Maybe we can make some guesses but I don't know any way to really know.

What can be done about it? Well, an obvious suggestion is to bring other servers or other projects on-line or generate extra WUs for the existing projects. Not being part of the scientific community, I don't really know what's involved in doing that. I can think of several reasons why that might be difficult, but one way or another these difficulties can be overcome.
1) Maybe the servers are already running at capacity. (Buy more server hardware? Repair defective servers?)
2) Maybe WUs for the upcoming projects have not been created yet or the ones that are being tested are unstable? (The researchers need more time.)
3) Flood the servers with more WUs for existing projects. (Even if the server can handle it, this delays the project. We all know that science moves faster if we minimize the time that a WU spends assigned to our machine, but did we ever consider the time that a WUs sits on a server waiting for somebody to work on it? Extra WUs delay a project, too.) (If a project needs a certain number of trajectories to provide reliable statistics for a particular project, additional trajectories will not add value to the project so this would be a form of "busy work" rather than valid scientific research. Do we expect Stanford to waste processing time this way, particularly if they know that there will be new projects coming on-line soon that will need those CPUs?)
Fireball0236
Posts: 58
Joined: Sat Oct 09, 2010 6:05 am

Re: 171.67.108.33 - vsp05c - Problems?

Post by Fireball0236 »

Thanks for the explanation, bruce. Though that would mean that some 100k people were assigned to the same server, within 1-2 server update intervals, and snagged up all the WUs. (This is the first time since I started folding that I noticed this server with a lack of WUs, it has always had >60k WUs available).

This was only a minor remark I originally added to the topic. The main concern I had when making the topic was:

Code: Select all

[20:19:41] Initial: 0000; - Error: Bad packet type from server, expected work assignment
And getting that 7 times in a row. I'd expect not to be assigned to the WS if it doesn't have work; or get a normal WU packet. (Yesterday) Another client of mine got assigned to this server 4 times, and the 5th time managed to get a good packet off this server (P11266). Is this also explained by there not being enough WUs available on that server?

And did I mistype the IP address? Sorry toTOW >.< .


~ Fireball0236
Post Reply