Page 4 of 6

Re: Failed to connect: 3.133.76.19 and 3.21.157.11

Posted: Tue Apr 28, 2020 8:03 am
by 1TM
server 3.133.76.19 got "NO ERROR" results 100% uploaded, then server dumped these results.

Re: Failed to connect: 3.133.76.19 and 3.21.157.11

Posted: Tue Apr 28, 2020 10:52 am
by excelblue
Also getting the same issues, even when I spin up my own proxy in us-east-2.

According to my proxy logs, I'm getting HTTP 413 (Request Entity Too Large) errors for 3.133.76.19 and 3.21.157.11 when trying to send a project 16435.

Perhaps a misconfiguration?

Re: Failed to connect: 3.133.76.19 and 3.21.157.11

Posted: Tue Apr 28, 2020 10:54 am
by CaptainHalon
Same issue since yesterday: two different boxes can't upload results to 3.133.76.19/3.21.157.11, both for project 16434. Is there a way to just erase them and then exclude said project from future folding downloads?

Re: Failed to connect: 3.133.76.19 and 3.21.157.11

Posted: Tue Apr 28, 2020 11:45 am
by Neil-B
Please don't erase WUs - Your kit has done the hard work to fold the WU now give the client a chance to upload it … Yes there are issues connecting with server but those may get sorted before the WUs expire and they can therefore still be valuable - if they don't upload by the expiration then the client will itself remove the WU.

There is no way to exclude a specific project from downloading … some people are choosing to use their firewalls to block specific servers but this will remove any chance of the processed WUs being uploaded and is really not recommended behaviour - but everyone makes their own choice.

Re: Failed to connect: 3.133.76.19 and 3.21.157.11

Posted: Tue Apr 28, 2020 12:15 pm
by 1TM
Recommendations on how to deal with WU are in a parallel thread on "Issues with a specific WU":
viewtopic.php?f=19&t=16526

WU "NO ERROR" results had been 100% uploaded, but server rejects and dumps it.
Not making choices, just followed the instruction linked above, so other WU could work.

Re: Failed to connect: 3.133.76.19 and 3.21.157.11

Posted: Tue Apr 28, 2020 12:35 pm
by frest1
My logs on several clients are getting a bit full of this:

Code: Select all

12:10:13:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:277 clone:0 gen:7 core:0x22 unit:0x0000000903854c135e9a4efb5a1a5bea
12:10:13:WU02:FS00:Uploading 141.54MiB to 3.133.76.19
12:10:13:WU02:FS00:Connecting to 3.133.76.19:8080
12:10:34:WARNING:WU02:FS00:WorkServer connection failed on port 8080 trying 80
12:10:34:WU02:FS00:Connecting to 3.133.76.19:80
12:10:34:WU02:FS00:Upload 0.04%
12:11:21:FS01:Finishing
12:11:42:WU02:FS00:Upload 0.13%
12:11:42:WARNING:WU02:FS00:Exception: Failed to send results to work server: Transfer failed
12:11:42:WU02:FS00:Trying to send results to collection server
12:11:42:WU02:FS00:Uploading 141.54MiB to 3.21.157.11
12:11:42:WU02:FS00:Connecting to 3.21.157.11:8080
12:12:03:WARNING:WU02:FS00:WorkServer connection failed on port 8080 trying 80
12:12:03:WU02:FS00:Connecting to 3.21.157.11:80
12:12:03:WU02:FS00:Upload 0.04%
12:12:04:ERROR:WU02:FS00:Exception: Transfer failed

Re: Failed to connect: 3.133.76.19 and 3.21.157.11

Posted: Tue Apr 28, 2020 12:37 pm
by CaptainHalon
Neil-B wrote:There is no way to exclude a specific project from downloading … some people are choosing to use their firewalls to block specific servers but this will remove any chance of the processed WUs being uploaded and is really not recommended behaviour - but everyone makes their own choice.
If I block the work server on the firewall, but not the collection server, then can't the already-completed WU's still be uploaded? Just wanted to prevent boxes from downloading further WU's and wasting energy for results that may not ever upload.

Re: Failed to connect: 3.133.76.19 and 3.21.157.11

Posted: Tue Apr 28, 2020 1:01 pm
by 1TM
Work server is the 1st destination to upload results.
Normally the FAHControl should switch to the 2nd destination, the Collection server if it can't connect to the 1st, but not in all cases yet.
Sometimes server gets stuck making a connection and getting results 100% uploaded but then responds with something odd as (464) PLEASE_WAIT or (404) WORK_QUIT for hours or for days, and FAHControl doesn't recognize this.

Hopefully future versions of FAHControl would recognize recurring 464 WAIT and 404 QUIT responses and send results to the 2nd destination automatically.

Any router settings is a last resort method if FAHControl does not recognize the stuck error state, and only to be done temporarily, to re-route the already completed WU to the 2nd destination, or to keep getting work done.
In case you have other WU downloaded and running, and you have an unlimited data connection then leave these stuck work units alone as the server will eventually re-open.

Re: Failed to connect: 3.133.76.19 and 3.21.157.11

Posted: Tue Apr 28, 2020 1:12 pm
by tomasmu
excelblue wrote:Also getting the same issues, even when I spin up my own proxy in us-east-2.

According to my proxy logs, I'm getting HTTP 413 (Request Entity Too Large) errors for 3.133.76.19 and 3.21.157.11 when trying to send a project 16435.

Perhaps a misconfiguration?
Good catch!
Yes, looks like a configuration issue.

I did a quick check with Fiddler as proxy, I am also getting HTTP 413 from both servers (or 502 Bad Gateway).
Explains why my small work unit (<4 MB) was uploaded, but not the two large ones (>140 MB).

Code: Select all

11:27:19:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:16434 run:143 clone:0 gen:6 core:0x22 unit:0x0000000703854c135e9cbacca7f07cfb
11:27:19:WU00:FS01:Uploading 140.41MiB to 3.133.76.19
11:27:19:WU00:FS01:Connecting to 127.0.0.1:8888
11:27:40:WARNING:WU00:FS01:Exception: Failed to send results to work server: 10001: Server responded: HTTP_BAD_GATEWAY
11:27:40:WU00:FS01:Trying to send results to collection server
11:27:40:WU00:FS01:Uploading 140.41MiB to 3.21.157.11
11:27:40:WU00:FS01:Connecting to 127.0.0.1:8888
11:27:41:ERROR:WU00:FS01:Exception: 10001: Server responded: HTTP_REQUEST_ENTITY_TOO_LARGE

Code: Select all

11:27:19:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:16434 run:558 clone:3 gen:4 core:0x22 unit:0x0000000403854c135e9cbacca4000092
11:27:19:WU01:FS01:Uploading 140.73MiB to 3.133.76.19
11:27:19:WU01:FS01:Connecting to 127.0.0.1:8888
11:27:55:WARNING:WU01:FS01:Exception: Failed to send results to work server: 10001: Server responded: HTTP_REQUEST_ENTITY_TOO_LARGE
11:27:55:WU01:FS01:Trying to send results to collection server
11:27:55:WU01:FS01:Uploading 140.73MiB to 3.21.157.11
11:27:55:WU01:FS01:Connecting to 127.0.0.1:8888
11:27:56:ERROR:WU01:FS01:Exception: 10001: Server responded: HTTP_REQUEST_ENTITY_TOO_LARGE

Re: Failed to connect: 3.133.76.19 and 3.21.157.11

Posted: Tue Apr 28, 2020 2:06 pm
by jrweiss
Another one:

Code: Select all

13:45:14:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16434 run:89 clone:2 gen:3 core:0x22 unit:0x0000000303854c135e9cbacd24718d14
13:45:14:WU02:FS01:Uploading 140.41MiB to 3.133.76.19
13:45:14:WU02:FS01:Connecting to 3.133.76.19:8080
13:45:47:WU01:FS00:0xa7:Completed 345000 out of 500000 steps (69%)
13:46:01:WU02:FS01:Upload 0.13%
13:46:01:WARNING:WU02:FS01:Exception: Failed to send results to work server: Transfer failed
13:46:01:WU02:FS01:Trying to send results to collection server
13:46:01:WU02:FS01:Uploading 140.41MiB to 3.21.157.11
13:46:01:WU02:FS01:Connecting to 3.21.157.11:8080
13:46:01:ERROR:WU02:FS01:Exception: Transfer failed

Re: Failed to connect: 3.133.76.19 and 3.21.157.11

Posted: Tue Apr 28, 2020 4:42 pm
by Kebast
I tried creating a firewall rule blocking the work server, hoping to force to the collection server, but that didn't work either. Still stuck.

Code: Select all

15:46:09:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:433 clone:0 gen:6 core:0x22 unit:0x0000000903854c135e9a4efb584bc493
15:46:09:WU02:FS01:Uploading 141.53MiB to 3.133.76.19
15:46:09:WU02:FS01:Connecting to 3.133.76.19:8080
15:46:30:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
15:46:30:WU02:FS01:Connecting to 3.133.76.19:80
15:46:52:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
15:46:52:WU02:FS01:Trying to send results to collection server
15:46:52:WU02:FS01:Uploading 141.53MiB to 3.21.157.11
15:46:52:WU02:FS01:Connecting to 3.21.157.11:8080
15:46:52:ERROR:WU02:FS01:Exception: Transfer failed
15:58:42:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:433 clone:0 gen:6 core:0x22 unit:0x0000000903854c135e9a4efb584bc493
15:58:42:WU02:FS01:Uploading 141.53MiB to 3.133.76.19
15:58:42:WU02:FS01:Connecting to 3.133.76.19:8080
15:58:42:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
15:58:42:WU02:FS01:Connecting to 3.133.76.19:80
15:58:42:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: An attempt was made to access a socket in a way forbidden by its access permissions.
15:58:42:WU02:FS01:Trying to send results to collection server
15:58:42:WU02:FS01:Uploading 141.53MiB to 3.21.157.11
15:58:42:WU02:FS01:Connecting to 3.21.157.11:8080
15:58:42:ERROR:WU02:FS01:Exception: Transfer failed
16:00:19:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:433 clone:0 gen:6 core:0x22 unit:0x0000000903854c135e9a4efb584bc493
16:00:19:WU02:FS01:Uploading 141.53MiB to 3.133.76.19
16:00:19:WU02:FS01:Connecting to 3.133.76.19:8080
16:00:19:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
16:00:19:WU02:FS01:Connecting to 3.133.76.19:80
16:00:19:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: An attempt was made to access a socket in a way forbidden by its access permissions.
16:00:19:WU02:FS01:Trying to send results to collection server
16:00:19:WU02:FS01:Uploading 141.53MiB to 3.21.157.11
16:00:19:WU02:FS01:Connecting to 3.21.157.11:8080
16:00:19:ERROR:WU02:FS01:Exception: Transfer failed
16:02:56:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:433 clone:0 gen:6 core:0x22 unit:0x0000000903854c135e9a4efb584bc493
16:02:56:WU02:FS01:Uploading 141.53MiB to 3.133.76.19
16:02:56:WU02:FS01:Connecting to 3.133.76.19:8080
16:02:56:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
16:02:56:WU02:FS01:Connecting to 3.133.76.19:80
16:02:56:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: An attempt was made to access a socket in a way forbidden by its access permissions.
16:02:56:WU02:FS01:Trying to send results to collection server
16:02:56:WU02:FS01:Uploading 141.53MiB to 3.21.157.11
16:02:56:WU02:FS01:Connecting to 3.21.157.11:8080
16:02:56:ERROR:WU02:FS01:Exception: Transfer failed
16:07:11:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:433 clone:0 gen:6 core:0x22 unit:0x0000000903854c135e9a4efb584bc493
16:07:11:WU02:FS01:Uploading 141.53MiB to 3.133.76.19
16:07:11:WU02:FS01:Connecting to 3.133.76.19:8080
16:07:11:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
16:07:11:WU02:FS01:Connecting to 3.133.76.19:80
16:07:11:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: An attempt was made to access a socket in a way forbidden by its access permissions.
16:07:11:WU02:FS01:Trying to send results to collection server
16:07:11:WU02:FS01:Uploading 141.53MiB to 3.21.157.11
16:07:11:WU02:FS01:Connecting to 3.21.157.11:8080
16:07:11:ERROR:WU02:FS01:Exception: Transfer failed
16:14:02:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:433 clone:0 gen:6 core:0x22 unit:0x0000000903854c135e9a4efb584bc493
16:14:02:WU02:FS01:Uploading 141.53MiB to 3.133.76.19
16:14:02:WU02:FS01:Connecting to 3.133.76.19:8080
16:14:02:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
16:14:02:WU02:FS01:Connecting to 3.133.76.19:80
16:14:02:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: An attempt was made to access a socket in a way forbidden by its access permissions.
16:14:02:WU02:FS01:Trying to send results to collection server
16:14:02:WU02:FS01:Uploading 141.53MiB to 3.21.157.11
16:14:02:WU02:FS01:Connecting to 3.21.157.11:8080
16:14:02:ERROR:WU02:FS01:Exception: Transfer failed
16:25:08:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:433 clone:0 gen:6 core:0x22 unit:0x0000000903854c135e9a4efb584bc493
16:25:08:WU02:FS01:Uploading 141.53MiB to 3.133.76.19
16:25:08:WU02:FS01:Connecting to 3.133.76.19:8080
16:25:08:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
16:25:08:WU02:FS01:Connecting to 3.133.76.19:80
16:25:08:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: An attempt was made to access a socket in a way forbidden by its access permissions.
16:25:08:WU02:FS01:Trying to send results to collection server
16:25:08:WU02:FS01:Uploading 141.53MiB to 3.21.157.11
16:25:08:WU02:FS01:Connecting to 3.21.157.11:8080
16:25:08:ERROR:WU02:FS01:Exception: Transfer failed

Re: Failed to connect: 3.133.76.19 and 3.21.157.11

Posted: Tue Apr 28, 2020 6:19 pm
by Artemios
seems to be related to the size of the WU (Project 16435 in my case). I have successfully uploaded a few WU of 127MB size

Code: Select all

07:43:13:WU02:FS02:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:792 clone:0 gen:6 core:0x22 unit:0x0000000903854c135e9a4efadad5f974
07:43:13:WU02:FS02:Uploading 126.66MiB to 3.133.76.19
07:43:13:WU02:FS02:Connecting to 3.133.76.19:8080
07:43:25:WU02:FS02:Upload 0.15%
07:43:31:WU02:FS02:Upload 1.68%
...
07:49:19:WU02:FS02:Server responded WORK_ACK (400)
07:49:19:WU02:FS02:Final credit estimate, 157912.00 points
07:49:19:WU02:FS02:Cleaning up
but the following 2 WU of size 142MB finished almost 2 days ago still refuse to upload:

Code: Select all

16:36:47:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:646 clone:0 gen:3 core:0x22 unit:0x0000000503854c135e9a4efa4d43d93f
16:36:47:WU02:FS01:Uploading 141.53MiB to 3.133.76.19
16:36:47:WU02:FS01:Connecting to 3.133.76.19:8080
16:37:05:WU02:FS01:Upload 0.13%
16:37:05:WARNING:WU02:FS01:Exception: Failed to send results to work server: Transfer failed
16:37:05:WU02:FS01:Trying to send results to collection server
16:37:05:WU02:FS01:Uploading 141.53MiB to 3.21.157.11
16:37:05:WU02:FS01:Connecting to 3.21.157.11:8080
16:37:06:ERROR:WU02:FS01:Exception: Transfer failed

Code: Select all

16:29:25:WU01:FS02:Sending unit results: id:01 state:SEND error:NO_ERROR project:16435 run:755 clone:0 gen:1 core:0x22 unit:0x0000000103854c135e9a4efa34d65490
16:29:25:WU01:FS02:Uploading 141.53MiB to 3.133.76.19
16:29:25:WU01:FS02:Connecting to 3.133.76.19:8080
16:29:46:WARNING:WU01:FS02:WorkServer connection failed on port 8080 trying 80
16:29:46:WU01:FS02:Connecting to 3.133.76.19:80
16:29:47:WU01:FS02:Upload 0.04%
16:30:27:WU01:FS02:Upload 0.13%
16:30:28:WARNING:WU01:FS02:Exception: Failed to send results to work server: Transfer failed
16:30:28:WU01:FS02:Trying to send results to collection server
16:30:28:WU01:FS02:Uploading 141.53MiB to 3.21.157.11
16:30:28:WU01:FS02:Connecting to 3.21.157.11:8080
16:30:28:ERROR:WU01:FS02:Exception: Transfer failed

Re: Failed to connect: 3.133.76.19 and 3.21.157.11

Posted: Tue Apr 28, 2020 7:09 pm
by CaptainHalon
Still reports of this, eh? And as I understand, the server admin was notified yesterday. Is the fix really that complicated or do they just not give a damn?

Re: Failed to connect: 3.133.76.19 and 3.21.157.11

Posted: Tue Apr 28, 2020 7:19 pm
by Neil-B
I know which of the options you offer I reckon is the case, but from the way you write you question I guess you believe the other to be the case … I have faith that they are trying their best to get this fixed, but if it involves configuration settings of new types of servers it may not just be a case of rebooting the kit.

For what it is worth the researchers will be pushing for this to be resolved as quickly as possible - it is impacting the science … It is however one of a number of fairly significant issues in play at the moment.

Re: Failed to connect: 3.133.76.19 and 3.21.157.11

Posted: Tue Apr 28, 2020 8:13 pm
by PantherX
Considering that the Servers are physically located in different labs across the world, time zones come into play. Also, investigation with cloud providers and sponsorships take time. Thus, we need to be patient and trust that they are doing all that they can considering that some COVID-19 projects are time critical so there's no change that this is taken "lightly".