Page 1 of 12

Send Errors - 155.247.164.213 & .214

Posted: Sun Mar 15, 2020 7:04 am
by gordonbb
I'm getting WUs OK and processing them but when completed they're failing to send to the Work Server and Collection Server

Code: Select all

03:46:20:WU01:FS00:0x22:Completed 1000000 out of 1000000 steps (100%)
03:46:26:WU01:FS00:0x22:Saving result file ../logfile_01.txt
03:46:26:WU01:FS00:0x22:Saving result file checkpointState.xml
03:46:26:WU01:FS00:0x22:Saving result file checkpt.crc
03:46:26:WU01:FS00:0x22:Saving result file positions.xtc
03:46:26:WU01:FS00:0x22:Saving result file science.log
03:46:26:WU01:FS00:0x22:Folding@home Core Shutdown: FINISHED_UNIT
03:46:26:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
03:46:26:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:11753 run:0 clone:363 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d76bf9d7e206a
03:46:26:WU01:FS00:Uploading 49.92MiB to 155.247.164.213
03:46:26:WU01:FS00:Connecting to 155.247.164.213:8080
03:46:26:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
03:46:26:WU01:FS00:Trying to send results to collection server
03:46:26:WU01:FS00:Uploading 49.92MiB to 155.247.164.214
03:46:26:WU01:FS00:Connecting to 155.247.164.214:8080
03:46:27:ERROR:WU01:FS00:Exception: Transfer failed
... Multiple Attempts ...
06:46:16:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:11753 run:0 clone:363 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d76bf9d7e206a
06:46:16:WU01:FS00:Uploading 49.92MiB to 155.247.164.213
06:46:16:WU01:FS00:Connecting to 155.247.164.213:8080
06:46:16:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
06:46:16:WU01:FS00:Trying to send results to collection server
06:46:16:WU01:FS00:Uploading 49.92MiB to 155.247.164.214
06:46:16:WU01:FS00:Connecting to 155.247.164.214:8080
06:46:17:ERROR:WU01:FS00:Exception: Transfer failed
Same thing, 2nd System - Same WS & CS

Code: Select all

06:41:01:WU05:FS02:0x22:Completed 1960000 out of 2000000 steps (98%)
06:41:02:WU02:FS02:Sending unit results: id:02 state:SEND error:NO_ERROR project:11758 run:0 clone:248 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d770fce597dbe
06:41:02:WU02:FS02:Uploading 55.24MiB to 155.247.164.213
06:41:02:WU02:FS02:Connecting to 155.247.164.213:8080
06:41:02:WARNING:WU02:FS02:Exception: Failed to send results to work server: Transfer failed
06:41:02:WU02:FS02:Trying to send results to collection server
06:41:02:WU02:FS02:Uploading 55.24MiB to 155.247.164.214
06:41:02:WU02:FS02:Connecting to 155.247.164.214:8080
06:41:03:ERROR:WU02:FS02:Exception: Transfer failed
06:47:54:WU02:FS02:Sending unit results: id:02 state:SEND error:NO_ERROR project:11758 run:0 clone:248 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d770fce597dbe
06:47:54:WU02:FS02:Uploading 55.24MiB to 155.247.164.213
06:47:54:WU02:FS02:Connecting to 155.247.164.213:8080
06:47:54:WARNING:WU02:FS02:Exception: Failed to send results to work server: Transfer failed
06:47:54:WU02:FS02:Trying to send results to collection server
06:47:54:WU02:FS02:Uploading 55.24MiB to 155.247.164.214
06:47:54:WU02:FS02:Connecting to 155.247.164.214:8080
06:47:54:ERROR:WU02:FS02:Exception: Transfer failed
And on a third System

Code: Select all

04:05:17:WU02:FS02:0x22:Saving result file ../logfile_01.txt
04:05:17:WU02:FS02:0x22:Saving result file checkpointState.xml
04:05:17:WU02:FS02:0x22:Saving result file checkpt.crc
04:05:17:WU02:FS02:0x22:Saving result file positions.xtc
04:05:17:WU02:FS02:0x22:Saving result file science.log
04:05:17:WU02:FS02:0x22:Folding@home Core Shutdown: FINISHED_UNIT
04:05:17:WU02:FS02:FahCore returned: FINISHED_UNIT (100 = 0x64)
04:05:17:WU02:FS02:Sending unit results: id:02 state:SEND error:NO_ERROR project:11758 run:0 clone:248 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d770fce597dbe
04:05:17:WU02:FS02:Uploading 55.24MiB to 155.247.164.213
04:05:17:WU02:FS02:Connecting to 155.247.164.213:8080
04:05:18:WARNING:WU02:FS02:Exception: Failed to send results to work server: Transfer failed
04:05:18:WU02:FS02:Trying to send results to collection server
04:05:18:WU02:FS02:Uploading 55.24MiB to 155.247.164.214
04:05:18:WU02:FS02:Connecting to 155.247.164.214:8080
04:05:18:ERROR:WU02:FS02:Exception: Transfer failed
... Multiple Attempts ...
06:59:59:WU02:FS02:Sending unit results: id:02 state:SEND error:NO_ERROR project:11758 run:0 clone:248 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d770fce597dbe
06:59:59:WU02:FS02:Uploading 55.24MiB to 155.247.164.213
06:59:59:WU02:FS02:Connecting to 155.247.164.213:8080
06:59:59:WARNING:WU02:FS02:Exception: Failed to send results to work server: Transfer failed
06:59:59:WU02:FS02:Trying to send results to collection server
06:59:59:WU02:FS02:Uploading 55.24MiB to 155.247.164.214
06:59:59:WU02:FS02:Connecting to 155.247.164.214:8080
06:59:59:ERROR:WU02:FS02:Exception: Transfer failed

Re: Send Errors - 155.247.164.213 & .214

Posted: Sun Mar 15, 2020 7:32 am
by gordonbb
Checked the Server Stats and both these are showing as down.

Which explains things.

Hopefully the crew aren’t suffering from this surfeit of Lampreys

Re: Send Errors - 155.247.164.213 & .214

Posted: Sun Mar 15, 2020 7:52 am
by bruce
Well, down means DOWN.

It's the middle of the night anywhere in th USA right now and volunteers should be sleeping ... preparing their immune system for another day of exposure to some random viruses.

I don't know who will be reponsible for fiuring out why the servers are down and fixing it but it won't be until tomorrow. Without more info, I don't know if it's the responsibly of the FAH team at temple.edu or the campus network support folks.

Re: Send Errors - 155.247.164.213 & .214

Posted: Sun Mar 15, 2020 10:35 am
by ChrisKFoldingAtHome
Same here

Re: Send Errors - 155.247.164.213 & .214

Posted: Sun Mar 15, 2020 9:14 pm
by suchamoneypit
My clients can't connect to .214, so it is down. Are we able to switch servers so we can fold ? Or is waiting the only option.

Re: Send Errors - 155.247.164.213 & .214

Posted: Mon Mar 16, 2020 12:27 am
by 0xbirb
suchamoneypit wrote:My clients can't connect to .214, so it is down. Are we able to switch servers so we can fold ? Or is waiting the only option.
Most likely this is due to the sheer amount of new donors as a result of both Intel and Nvidia tweeting about the PC Master Race on Reddit. :) It's a "good thing". Sort of like when the kid who had no friends just wanted a birthday greetings card for his birthday, and people in the community responded and send him a card - or rather enough cards to swim in. :D

If you want to contribute still, just leave your machine running. Eventually it should be able to pick up new WUs and start folding, granted that it has been configured correctly (which is most cases is likely just the defaults).

Re: Send Errors - 155.247.164.213 & .214

Posted: Mon Mar 16, 2020 4:06 am
by TiO2
Does the completed work eventually get deleted if it can't be send to 155.247.164.213, or does it keep trying to send it until the server is back up?

Re: Send Errors - 155.247.164.213 & .214

Posted: Mon Mar 16, 2020 4:08 am
by TiO2
Also I noticed that 155.247.164.214's status is set to "Assign", while it's the collection server for my WU that's been trying to send for a couple of hours. It appears to be up but not accepting work units.

Re: Send Errors - 155.247.164.213 & .214

Posted: Mon Mar 16, 2020 8:43 am
by alxbelu
One of my machines has been trying to submit a WU for 11758 for over 24hrs now (72 attempts); during Sunday I noted that the servers (213 & 214) were mostly down, but as of this morning they seem to be up according to the server status page, yet I am still getting this (UTC time):

Code: Select all

07:48:52:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
07:48:52:WU00:FS01:Connecting to 155.247.164.213:8080
07:48:52:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
07:48:52:WU00:FS01:Trying to send results to collection server
07:48:52:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
07:48:52:WU00:FS01:Connecting to 155.247.164.214:8080
07:48:56:ERROR:WU00:FS01:Exception: Transfer failed
07:53:06:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
07:53:06:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
07:53:06:WU00:FS01:Connecting to 155.247.164.213:8080
07:53:07:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
07:53:07:WU00:FS01:Trying to send results to collection server
07:53:07:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
07:53:07:WU00:FS01:Connecting to 155.247.164.214:8080
07:53:07:ERROR:WU00:FS01:Exception: Transfer failed
07:59:58:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
07:59:58:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
07:59:58:WU00:FS01:Connecting to 155.247.164.213:8080
07:59:58:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
07:59:58:WU00:FS01:Trying to send results to collection server
07:59:58:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
07:59:58:WU00:FS01:Connecting to 155.247.164.214:8080
07:59:59:ERROR:WU00:FS01:Exception: Transfer failed
08:11:03:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
08:11:03:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
08:11:03:WU00:FS01:Connecting to 155.247.164.213:8080
08:11:04:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
08:11:04:WU00:FS01:Trying to send results to collection server
08:11:04:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
08:11:04:WU00:FS01:Connecting to 155.247.164.214:8080
08:11:04:ERROR:WU00:FS01:Exception: Transfer failed
08:29:00:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
08:29:00:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
08:29:00:WU00:FS01:Connecting to 155.247.164.213:8080
08:29:01:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
08:29:01:WU00:FS01:Trying to send results to collection server
08:29:01:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
08:29:01:WU00:FS01:Connecting to 155.247.164.214:8080
08:29:01:ERROR:WU00:FS01:Exception: Transfer failed
I've reset the retry timer multiple times as it has extended well beyond 1hr (log indicates it's been up over 2hrs).

The same machine (and folding slot) is proceeding and has completed multiple other WUs meanwhile trying to send this though, so it's not blocking or anything, I'm just trying to figure out why it fails to submit the work even now when the servers are claimed to be up.

Re: Send Errors - 155.247.164.213 & .214

Posted: Mon Mar 16, 2020 9:05 am
by Scraig
But why is the Estimated Credit constantly reduced? It was previously worked hard and now it is shrinking minute by minute while the Collection Server 155.247.164.214 is not receiving any data. That's not OK.

Re: Send Errors - 155.247.164.213 & .214

Posted: Mon Mar 16, 2020 10:18 am
by alxbelu
I actually just now even got a new WU (11753) from 213, just after failing to upload the mentioned 11758 WU (just after restarting the FAH client):

Code: Select all

10:13:31:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
10:13:31:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
10:13:31:WU00:FS01:Connecting to 155.247.164.213:8080
10:13:31:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
10:13:31:WU00:FS01:Trying to send results to collection server
10:13:31:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
10:13:31:WU00:FS01:Connecting to 155.247.164.214:8080
10:13:32:ERROR:WU00:FS01:Exception: Transfer failed
10:13:37:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
10:13:37:WU02:FS01:Connecting to 128.252.203.10:80
10:14:02:ERROR:WU02:FS01:Exception: 10002: Received short response, expected 512 bytes, got 0
10:14:16:WU02:FS01:Connecting to 65.254.110.245:8080
10:14:16:WU02:FS01:Assigned to work server 155.247.164.213
10:14:16:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:TU106M [GeForce RTX 2060 Mobile] from 155.247.164.213
10:14:16:WU02:FS01:Connecting to 155.247.164.213:8080
10:14:27:WU02:FS01:Downloading 11.98MiB
10:14:33:WU02:FS01:Download 84.00%
10:14:34:WU02:FS01:Download complete
10:14:34:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:11753 run:0 clone:3574 gen:1 core:0x22 unit:0x000000029bf7a4d55e6d76caa76041b9
10:14:34:WU02:FS01:Starting

Re: Send Errors - 155.247.164.213 & .214

Posted: Mon Mar 16, 2020 12:35 pm
by NathanJanssens
suchamoneypit wrote:My clients can't connect to .214, so it is down. Are we able to switch servers so we can fold ? Or is waiting the only option.
I would be interested in this as well. Instead of trying again and again to connect to a certain server, every try coming with longer intervals, is it possible to leave that server be for a bit (let's say after 3 or 4 failed attempts) and find another one?

Re: Send Errors - 155.247.164.213 & .214

Posted: Mon Mar 16, 2020 12:53 pm
by Scraig
bruce wrote:Well, down means DOWN.

It's the middle of the night anywhere in th USA right now and volunteers should be sleeping ... preparing their immune system for another day of exposure to some random viruses.

I don't know who will be reponsible for fiuring out why the servers are down and fixing it but it won't be until tomorrow. Without more infor, I don't know if it's the responsibly of the FAH team at temple.edu or the campus network support folks.
The Estimated Credit is now dropping every second. That's not fair, FAH should change that. The Doners are very patient and want to support FAH. For the failure of the server you should not even punish them.

Re: Send Errors - 155.247.164.213 & .214

Posted: Mon Mar 16, 2020 3:20 pm
by GeriCom76
Will these servers back again?
I tried to upload a finished project almost 2 days but got error message in log file. I've been running a 24/7 system.
Project number: 11753

Re: Send Errors - 155.247.164.213 & .214

Posted: Mon Mar 16, 2020 3:33 pm
by TiO2
My GPU slot has also been attempting to send a completed WU to .214 for the last 12 hours. The WU is about to hit timeout.