Page 1 of 3

Stuck "Send" to 52.224.109.74

Posted: Tue Apr 07, 2020 3:49 pm
by tirreus
Hi,

I am stuck at 100% in status "Send" for a day already in project 14369 (1207, 3, 4).

System log:
15:45:53:WU01:FS00:Trying to send results to collection server
15:45:53:WU01:FS00:Uploading 8.27MiB to 52.224.109.74
15:45:53:WU01:FS00:Connecting to 52.224.109.74:8080
15:46:14:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80
15:46:14:WU01:FS00:Connecting to 52.224.109.74:80
15:46:36:ERROR:WU01:FS00:Exception: Failed to connect to 52.224.109.74:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

And repeats over and over and over...

Re: Stuck at 100% in status "Send"

Posted: Tue Apr 07, 2020 3:54 pm
by Jesse_V
That work server is up for me. Can you get to 52.224.109.74:80 in your web browser?

Re: Stuck at 100% in status "Send"

Posted: Tue Apr 07, 2020 3:59 pm
by bruce
The 100% notation isn't really applicable. It reports the status of the WU ... i.e.- that computations have been completed.

As far as http://52.224.109.74 is concerned, it seems to be working now. I'm not sure what happened earlier. What range of times is denoted by "over and over and over"?

Re: Stuck at 100% in status "Send"

Posted: Tue Apr 07, 2020 7:55 pm
by uyaem
The server is reachable for me, but only after a significant amount of time. Load-related (on azure? hmmm)?
It's showing a warning on the https://apps.foldingathome.org/serverstats page.

Re: Stuck at 100% in status "Send"

Posted: Wed Apr 08, 2020 1:27 am
by aol
I have this issue with 52.224.109.74 as well.
Many attempts to send results have failed to upload to this collection server.
http : // fah4 . eastus . cloudapp . azure . com:80 takes at least a minute to load, but always fails with a connection reset error first before Chrome tries again.

From the FAHControl Log:

Code: Select all

01:12:29:WU01:FS01:Uploading 197.17MiB to 13.82.98.119
01:12:29:WU01:FS01:Connecting to 13.82.98.119:8080
01:12:29:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
01:12:29:WU01:FS01:Trying to send results to collection server
01:12:29:WU01:FS01:Uploading 197.17MiB to 52.224.109.74
01:12:29:WU01:FS01:Connecting to 52.224.109.74:8080
01:13:12:WU01:FS01:Upload 0.10%
01:13:12:ERROR:WU01:FS01:Exception: Transfer failed
So fah3 . eastus . cloudapp . azure . com (the "Work Server" listed in FAHControl) seems to only assign and is rejecting answers.
Then it tries to send the work to fah4, but it only gets 0.10% of the way before too many connections to fah4 (I assume) make the server drop the connection, resulting in the "Transfer failed" exception.
From the server stats page, it looks like fah1, fah2, fah3, and fah5 on the Azure Cloud are all assign-only and are all directing completed work to this one poor fah4 server...
Why the stark contrast in number of servers listed as "Assign" to the number of servers listed as "Accept" (not to mention many of which seem to be down or error'd out)?

Re: Stuck at 100% in status "Send"

Posted: Wed Apr 08, 2020 6:45 am
by Neil-B
Assign means 'assign and accept' whereas Accept means 'accept only'

Re: Stuck at 100% in status "Send"

Posted: Wed Apr 08, 2020 9:09 am
by tirreus
The problem still persists.
I basically wrote this post in a hope that FAH admins or Project owner is informed and make correction. Apparently, this did not happen.
The run will expire in few hours, so the work will be discarded and lost (and it is not nice, we are giving our HW for use and this way it is wasted, I would expect somebody cares about the system and projects).
Then, new WU will be downloaded, I guess, hopefully I will be luckier and this time I will get work for project owner who needs the results and cares about work being done for him... ;)

Re: Stuck at 100% in status "Send"

Posted: Wed Apr 08, 2020 5:20 pm
by MatthewM
I now too have 2 WUs stuck in send, both are to 52.224.109.74 - I have been fine for weeks, then it started late yesterday - I just can't see this being on my end as nothing changed and internet works fine.

16:51:09:WARNING:WU02:FS01:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0
16:52:14:ERROR:WU02:FS01:Exception: Transfer failed
16:52:26:WARNING:WU02:FS01:Exception: Failed to send results to work server: Transfer failed
16:53:30:ERROR:WU02:FS01:Exception: Transfer failed

Re: Stuck at 100% in status "Send"

Posted: Thu Apr 09, 2020 9:57 am
by PantherX
tirreus wrote:...I would expect somebody cares about the system and projects...
The F@H Team do care for all the donors and the work everyone has put in.

If we step back and consider the following:
1) Global Pandemic that impacts everyone in 180+ countries. Keep in mind that a large portion of the F@H Team is based in USA and currently, they have a pandemic situation.
2) The F@H Team is understaffed from day 1. That's the nature of research so it's an up-hill battle for most of the days.
3) The F@H Team is staffed by humans... they need to eat, sleep, rest, think and have lives outside of "work" (families, responsibilities, etc.).
4) There's an Easter break in USA which typically means that people take time off work for R&R and focus on themselves.
5) The F@H Team's focus is to meet the the supply of WUs by increasing the F@H Server capacity... not an easy job as they have mostly on-premise hardware and are not 100% cloud.
6) The F@H Team has a single developer that looks after all the code for everything... a single human can't do several tasks at the time.
7) The F@H Team have some amazing corporate support and are working with them to increase the F@H Server capacity.

Looking at the bigger picture, I hope you understand that there's a lot that's happening in the background to meet the demands for WUs, you keep the donors up-to-date and to manage their own lives during the pandemic. If you're keen to see how you can help F@H, why not join in for Developer Fireside Chat: viewtopic.php?f=16&t=34136

Re: Stuck at 100% in status "Send"

Posted: Thu Apr 09, 2020 10:32 am
by iceman1992
PantherX wrote:6) The F@H Team has a single developer that looks after all the code for everything... a single human can't do several tasks at the time
This to me jumps out as a big problem. A single developer for everything? Including clients, controls, cores? Do they not have the resources to have a small dev team?

Re: Stuck at 100% in status "Send"

Posted: Thu Apr 09, 2020 11:11 am
by PantherX
iceman1992 wrote:
PantherX wrote:6) The F@H Team has a single developer that looks after all the code for everything... a single human can't do several tasks at the time
This to me jumps out as a big problem. A single developer for everything? Including clients, controls, cores? Do they not have the resources to have a small dev team?
Yep, a single developer Joseph Coffland who has been working with the F@H Team for several years. He looks after V7 clients (Linux, Mac, Windows), the code for all F@H Servers (AS, WS, CS) and some of the websites. Keep in mind that he also has other clients and projects to juggle so isn't an exclusive F@H developer. The budget that the F@H Team operates on is very slim.

Other F@H Team members might help out a bit in bits and peices but keep in mind that they aren't a developer, instead their background is post-doc and grads who study protein folding.

Re: Stuck at 100% in status "Send"

Posted: Thu Apr 09, 2020 11:26 am
by iceman1992
PantherX wrote:Yep, a single developer Joseph Coffland who has been working with the F@H Team for several years. He looks after V7 clients (Linux, Mac, Windows), the code for all F@H Servers (AS, WS, CS) and some of the websites. Keep in mind that he also has other clients and projects to juggle so isn't an exclusive F@H developer. The budget that the F@H Team operates on is very slim.
I can probably help out a little bit with the web stuff, I'm a webapp developer. Might be able to help with fahcontrol, looks like it's written in Python? Not sure what the servers are using

Re: Stuck at 100% in status "Send"

Posted: Mon Apr 13, 2020 4:41 am
by jcoffland
PantherX wrote:a single human can't do several tasks at the time
Of course I can.

In all seriousness. Folding@home needs more money. My income for the year in scraped together by four different universities. COVID-19 has blown this years budget to hell. We've had lots of companies helping with hardware and many volunteers but getting cash is much harder. For the record, you can donate to WUSTL via https://foldingathome.org/donate and 100% of it goes to Folding@home.

Re: Stuck at 100% in status "Send"

Posted: Mon Apr 13, 2020 5:06 am
by anandhanju
Can you please post the section of the log that has the Project, Run, Clone and Gen numbers when it is trying to upload as well as the line that has the upload size? This will help the researchers get to the root of the problem sooner.

For example,
12:35:55:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:13878 run:0 clone:622 gen:17 core:0x22 unit:0x0000001334e06d4a5e80cfe592ad2446
12:35:55:WU01:FS01:Uploading 48.05MiB to 52.224.109.74

Re: Stuck at 100% in status "Send"

Posted: Mon Apr 13, 2020 5:15 am
by iceman1992
jcoffland wrote:For the record, you can donate to WUSTL via https://foldingathome.org/donate and 100% of it goes to Folding@home.
Do I need to fill in my complete address information there? I don't live in the US so it shouldn't be relevant, right?