128.252.203.10 problem or WU?

Moderators: Site Moderators, FAHC Science Team

Re: 128.252.203.10 problem or WU?

Postby level6 » Wed May 06, 2020 6:24 am

Ah, excellent info, thanks PantherX! And, that is great news, indeed.

I definitely have more lurking to do to understand these details better.
User avatar
level6
 
Posts: 13
Joined: Tue May 05, 2020 3:35 am
Location: Dallas, Texas, USA

Re: 128.252.203.10 problem or WU?

Postby PantherX » Wed May 06, 2020 6:31 am

Oussebon wrote:...Anything to be done?

Apart from reporting it in the Forum which is then raised to the researcher, not much :( You can simply leave the client running and hopefully, the completed WU is uploaded before the Expiration date. Apart from that, there's not much one can do unless the researchers ask for something specific.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6589
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: 128.252.203.10 problem or WU?

Postby Oussebon » Wed May 06, 2020 1:18 pm

GDF wrote:This is only anecdotal, worked for me, and might have been complete coincidence. I paused the slot with the problem, waited for the server to reboot (which you can see on the serverstats page by watching uptime roll back to zero), then restarted the slot. The upload went right through.

Thanks for the tip. The server was just restarted but sadly no joy.

Fails at 0.23% as above.

Although what used to happen is that it might occasionally try to start sending, getting as far as 0.23%, and fail, then the server would "actively refuse" the connection for every subsequent attempt.

Now, it starts uploading every time it is meant to. It just always fails at 0.23%.

Same messages as per previously-posted logs (Transfer failed).

PantherX wrote:
Oussebon wrote:...Anything to be done?

Apart from reporting it in the Forum which is then raised to the researcher, not much :( You can simply leave the client running and hopefully, the completed WU is uploaded before the Expiration date. Apart from that, there's not much one can do unless the researchers ask for something specific.


Thanks - and sorry, initially missed your post somehow! As things have changed a little (first hurdle overcome, still 2nd hurdle in the way) I hope the update helps them narrow it down. Shame to waste WUs after all.

Further edit: Scratch the above - back to the active refused error message as per last post.
Oussebon
 
Posts: 5
Joined: Mon Mar 16, 2020 4:10 pm

Re: 128.252.203.10 problem or WU?

Postby GDF » Wed May 06, 2020 4:53 pm

PantherX wrote:I do understand your POV and it negatively impacts all involved, the researchers and the donors. However, considering that there are multiple labs involved (https://foldingathome.org/about/the-fol ... onsortium/) across the globe in various countries dealing with various lock-down policies, even on a "good" day, it would take a bit of time. In a pandemic situation, it is a lot harder but no-one has given up and instead, they have double-down and working to improving various aspects to ensure that it is fixed. Sometimes, labs will have to involve their internal IT department which can also add to the delay if it is a University infrastructure limitation like internet or electricity.


Thanks for the welcome, and I understand the problem intimately, as remote server management is something I do as part of my day job. Right now I'm having to get explicit permission to be in the building where my hardware is located, so I'm grateful that nearly all of the process can be managed through the net. I'm also glad I don't have to hear about problems through word of mouth in a public forum!

I appreciate all the effort that is going into this and I'm happy to be able to play a tiny part.
GDF
 
Posts: 3
Joined: Mon May 04, 2020 8:53 pm

Re: 128.252.203.10 problem or WU?

Postby kalamai2 » Wed May 06, 2020 5:59 pm

Failing for me as well for at least a few hours (log freshly started after a restart attempt). Does seem redundancy/scaling on the work collection servers would be helpful.

Code: Select all
*********************** Log Started 2020-05-06T16:37:53Z ***********************
16:37:53:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11760 run:0 clone:6289 gen:29 core:0x22 unit:0x0000003180fccb0a5e6f0a8922c6cfcd
16:37:53:WU00:FS01:Uploading 23.06MiB to 128.252.203.10
16:37:53:WU00:FS01:Connecting to 128.252.203.10:8080
16:38:15:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
16:38:15:WU00:FS01:Connecting to 128.252.203.10:80
16:38:36:WARNING:WU00:FS01:Exception: Failed to send results to work server: Failed to connect to 128.252.203.10:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
16:38:37:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11760 run:0 clone:6289 gen:29 core:0x22 unit:0x0000003180fccb0a5e6f0a8922c6cfcd
16:38:37:WU00:FS01:Uploading 23.06MiB to 128.252.203.10
16:38:37:WU00:FS01:Connecting to 128.252.203.10:8080
16:38:58:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
16:38:58:WU00:FS01:Connecting to 128.252.203.10:80
16:39:19:WARNING:WU00:FS01:Exception: Failed to send results to work server: Failed to connect to 128.252.203.10:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
16:39:37:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11760 run:0 clone:6289 gen:29 core:0x22 unit:0x0000003180fccb0a5e6f0a8922c6cfcd
16:39:37:WU00:FS01:Uploading 23.06MiB to 128.252.203.10
16:39:37:WU00:FS01:Connecting to 128.252.203.10:8080
16:41:06:WU00:FS01:Upload 0.54%
16:41:06:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
16:41:14:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11760 run:0 clone:6289 gen:29 core:0x22 unit:0x0000003180fccb0a5e6f0a8922c6cfcd
16:41:14:WU00:FS01:Uploading 23.06MiB to 128.252.203.10
16:41:14:WU00:FS01:Connecting to 128.252.203.10:8080
16:41:30:WU00:FS01:Upload 0.27%
16:42:38:WU00:FS01:Upload 0.54%
16:42:38:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
16:43:52:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11760 run:0 clone:6289 gen:29 core:0x22 unit:0x0000003180fccb0a5e6f0a8922c6cfcd
16:43:52:WU00:FS01:Uploading 23.06MiB to 128.252.203.10
16:43:52:WU00:FS01:Connecting to 128.252.203.10:8080
16:45:29:WU00:FS01:Upload 0.54%
16:45:29:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
16:45:30:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11760 run:0 clone:6289 gen:29 core:0x22 unit:0x0000003180fccb0a5e6f0a8922c6cfcd
16:45:30:WU00:FS01:Uploading 23.06MiB to 128.252.203.10
16:45:30:WU00:FS01:Connecting to 128.252.203.10:8080
16:45:49:WU00:FS01:Upload 0.54%
16:45:49:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
16:47:07:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11760 run:0 clone:6289 gen:29 core:0x22 unit:0x0000003180fccb0a5e6f0a8922c6cfcd
16:47:07:WU00:FS01:Uploading 23.06MiB to 128.252.203.10
16:47:07:WU00:FS01:Connecting to 128.252.203.10:8080
16:47:13:WU00:FS01:Upload 20.33%
16:47:19:WU00:FS01:Upload 84.83%
16:47:54:WARNING:WU00:FS01:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
16:49:44:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11760 run:0 clone:6289 gen:29 core:0x22 unit:0x0000003180fccb0a5e6f0a8922c6cfcd
16:49:44:WU00:FS01:Uploading 23.06MiB to 128.252.203.10
16:49:44:WU00:FS01:Connecting to 128.252.203.10:8080
16:53:01:WU00:FS01:Upload 1.63%
16:53:01:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
16:53:58:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11760 run:0 clone:6289 gen:29 core:0x22 unit:0x0000003180fccb0a5e6f0a8922c6cfcd
16:53:59:WU00:FS01:Uploading 23.06MiB to 128.252.203.10
16:53:59:WU00:FS01:Connecting to 128.252.203.10:8080
16:54:01:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
16:54:01:WU00:FS01:Connecting to 128.252.203.10:80
16:54:04:WARNING:WU00:FS01:Exception: Failed to send results to work server: Failed to connect to 128.252.203.10:80: No connection could be made because the target machine actively refused it.
kalamai2
 
Posts: 2
Joined: Wed May 06, 2020 5:51 pm

Re: 128.252.203.10 problem or WU?

Postby kalamai2 » Wed May 06, 2020 8:05 pm

This finally uploaded for me, I noticed a fresh restart in server stats and also just updated my client version to .13 - I imagine it was the server restart that got it working but who knows :)

Thanks.

Mike
kalamai2
 
Posts: 2
Joined: Wed May 06, 2020 5:51 pm

Re: 128.252.203.10 problem or WU?

Postby Jeanne de Flandre » Sat May 09, 2020 10:50 am

Hi,

Upload to 128.252.203.10 for Project 11760 still keeps failing from 2020/05/06 21:43.

The estimated credit is already same as the base credit. I don't care my 'lost credit', but I want that the server will receive the result.

Code: Select all
*********************** Log Started 2020-05-06T17:20:44Z ***********************
...
21:43:03:WU00:FS01:Uploading 23.09MiB to 128.252.203.10
21:43:03:WU00:FS01:Connecting to 128.252.203.10:8080
21:43:18:WU00:FS01:Upload 0.27%
21:43:42:WU00:FS01:Upload 0.54%
21:43:43:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
21:43:43:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11760 run:0 clone:1179 gen:25 core:0x22 unit:0x0000003080fccb0a5e6d7cd5382edcbf
...
*********************** Log Started 2020-05-09T02:19:53Z ***********************
...
05:54:57:WU00:FS01:Uploading 23.09MiB to 128.252.203.10
05:54:57:WU00:FS01:Connecting to 128.252.203.10:8080
05:55:12:WU00:FS01:Upload 0.54%
05:55:12:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
Jeanne de Flandre
 
Posts: 3
Joined: Sat May 02, 2020 12:55 pm

Re: 128.252.203.10 problem or WU?

Postby PantherX » Sat May 09, 2020 12:09 pm

Welcome to the F@H Forum Jeanne de Flandre,

Please note that the Server 128.252.203.10 has an uptime of ~15 minutes which means that it was recently restarted. Thus, your completed WU will hopefully be accepted soon :)
User avatar
PantherX
Site Moderator
 
Posts: 6589
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: 128.252.203.10 problem or WU?

Postby Jeanne de Flandre » Sat May 09, 2020 12:35 pm

Thanks for your reply PantherX.
Each time I see https://apps.foldingathome.org/serverstats , 128.252.203.10 *always* has quite short uptime and now it says "a few seconds". So I guess that it repeats rebooting.
Jeanne de Flandre
 
Posts: 3
Joined: Sat May 02, 2020 12:55 pm

Re: 128.252.203.10 problem or WU?

Postby PantherX » Sat May 09, 2020 9:00 pm

Thanks for that, I have informed the researcher so let's see what happens :)
User avatar
PantherX
Site Moderator
 
Posts: 6589
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: 128.252.203.10 problem or WU?

Postby bruce » Sat May 09, 2020 11:30 pm

anandhanju wrote:Thanks for your reports. The necessary folks have been notified and they will be looking into this.
Since the start of the COVID surge, the demand for new assignments has regularly exceeded the available bandwidth on FAH's servers. As fast as FAH could add more servers, the demand increased even more. Code was added to the Assignment Server to limit this excess to something on the order of what can actually be useful.

From my observations, there's nothing really limiting the bandwidth of the WUs being returned. When a FAHClient decides it's time to upload a result, it proceeds without any knowledge of how much inbound bandwidth is available. I frequently see very slow upload speeds, which IMHO indicates the inbound path is (probably) saturated. I can't think of a good way to manage that bandwidth other than to let an increasing percentage of upload transactions fail and redirect them to a Collection Server.

Rebooting the server, of course, terminates the active uploads and then it takes some time for the backlog to decide to retry. That's not a very good system but as I said, I can't think of a better solution.

Comments anyone?
bruce
 
Posts: 19839
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: 128.252.203.10 problem or WU?

Postby PantherX » Sat May 09, 2020 11:45 pm

Please note that the Server (128.252.203.10) has been poked and it seems to be stable now. Let's see if your WUs are now uploaded without issues :)

bruce wrote:...I can't think of a good way to manage that bandwidth other than to let an increasing percentage of upload transactions fail and redirect them to a Collection Server...

Is it possible to alternate the WU being allocated to say WU A will return to WS and WU B will return to CS? In other words, alternate the primary and secondary returns. That way, the initial impact on the WS has been "halved" but then the CS has gone from being a backup to being a production one. Plus, the links between the WS and CS will now be used continuously during production as opposed to only for backup.
User avatar
PantherX
Site Moderator
 
Posts: 6589
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: 128.252.203.10 problem or WU?

Postby Jeanne de Flandre » Sun May 10, 2020 7:00 am

Thanks for your intervention. Now it is finally uploaded. :)

Code: Select all
******************************* Date: 2020-05-09 *******************************
23:54:57:WU00:FS01:Uploading 23.09MiB to 128.252.203.10
23:54:57:WU00:FS01:Connecting to 128.252.203.10:8080
23:55:03:WU00:FS01:Upload 2.98%
...
23:58:15:WU00:FS01:Upload 98.51%
23:58:19:WU00:FS01:Upload complete
23:58:19:WU00:FS01:Server responded WORK_ACK (400)
23:58:19:WU00:FS01:Final credit estimate, 12884.00 points
23:58:19:WU00:FS01:Cleaning up
Jeanne de Flandre
 
Posts: 3
Joined: Sat May 02, 2020 12:55 pm

Re: 128.252.203.10 problem or WU?

Postby PantherX » Sun May 10, 2020 7:32 am

That's great to hear! Thanks for the confirmation :)
User avatar
PantherX
Site Moderator
 
Posts: 6589
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Previous

Return to Issues with a specific server

Who is online

Users browsing this forum: Google [Bot] and 0 guests

cron