Send Errors - 155.247.164.213 & .214

Moderators: Site Moderators, FAHC Science Team

Send Errors - 155.247.164.213 & .214

Postby gordonbb » Sun Mar 15, 2020 8:04 am

I'm getting WUs OK and processing them but when completed they're failing to send to the Work Server and Collection Server
Code: Select all
03:46:20:WU01:FS00:0x22:Completed 1000000 out of 1000000 steps (100%)
03:46:26:WU01:FS00:0x22:Saving result file ../logfile_01.txt
03:46:26:WU01:FS00:0x22:Saving result file checkpointState.xml
03:46:26:WU01:FS00:0x22:Saving result file checkpt.crc
03:46:26:WU01:FS00:0x22:Saving result file positions.xtc
03:46:26:WU01:FS00:0x22:Saving result file science.log
03:46:26:WU01:FS00:0x22:Folding@home Core Shutdown: FINISHED_UNIT
03:46:26:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
03:46:26:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:11753 run:0 clone:363 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d76bf9d7e206a
03:46:26:WU01:FS00:Uploading 49.92MiB to 155.247.164.213
03:46:26:WU01:FS00:Connecting to 155.247.164.213:8080
03:46:26:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
03:46:26:WU01:FS00:Trying to send results to collection server
03:46:26:WU01:FS00:Uploading 49.92MiB to 155.247.164.214
03:46:26:WU01:FS00:Connecting to 155.247.164.214:8080
03:46:27:ERROR:WU01:FS00:Exception: Transfer failed
... Multiple Attempts ...
06:46:16:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:11753 run:0 clone:363 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d76bf9d7e206a
06:46:16:WU01:FS00:Uploading 49.92MiB to 155.247.164.213
06:46:16:WU01:FS00:Connecting to 155.247.164.213:8080
06:46:16:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
06:46:16:WU01:FS00:Trying to send results to collection server
06:46:16:WU01:FS00:Uploading 49.92MiB to 155.247.164.214
06:46:16:WU01:FS00:Connecting to 155.247.164.214:8080
06:46:17:ERROR:WU01:FS00:Exception: Transfer failed

Same thing, 2nd System - Same WS & CS
Code: Select all
06:41:01:WU05:FS02:0x22:Completed 1960000 out of 2000000 steps (98%)
06:41:02:WU02:FS02:Sending unit results: id:02 state:SEND error:NO_ERROR project:11758 run:0 clone:248 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d770fce597dbe
06:41:02:WU02:FS02:Uploading 55.24MiB to 155.247.164.213
06:41:02:WU02:FS02:Connecting to 155.247.164.213:8080
06:41:02:WARNING:WU02:FS02:Exception: Failed to send results to work server: Transfer failed
06:41:02:WU02:FS02:Trying to send results to collection server
06:41:02:WU02:FS02:Uploading 55.24MiB to 155.247.164.214
06:41:02:WU02:FS02:Connecting to 155.247.164.214:8080
06:41:03:ERROR:WU02:FS02:Exception: Transfer failed
06:47:54:WU02:FS02:Sending unit results: id:02 state:SEND error:NO_ERROR project:11758 run:0 clone:248 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d770fce597dbe
06:47:54:WU02:FS02:Uploading 55.24MiB to 155.247.164.213
06:47:54:WU02:FS02:Connecting to 155.247.164.213:8080
06:47:54:WARNING:WU02:FS02:Exception: Failed to send results to work server: Transfer failed
06:47:54:WU02:FS02:Trying to send results to collection server
06:47:54:WU02:FS02:Uploading 55.24MiB to 155.247.164.214
06:47:54:WU02:FS02:Connecting to 155.247.164.214:8080
06:47:54:ERROR:WU02:FS02:Exception: Transfer failed

And on a third System
Code: Select all
04:05:17:WU02:FS02:0x22:Saving result file ../logfile_01.txt
04:05:17:WU02:FS02:0x22:Saving result file checkpointState.xml
04:05:17:WU02:FS02:0x22:Saving result file checkpt.crc
04:05:17:WU02:FS02:0x22:Saving result file positions.xtc
04:05:17:WU02:FS02:0x22:Saving result file science.log
04:05:17:WU02:FS02:0x22:Folding@home Core Shutdown: FINISHED_UNIT
04:05:17:WU02:FS02:FahCore returned: FINISHED_UNIT (100 = 0x64)
04:05:17:WU02:FS02:Sending unit results: id:02 state:SEND error:NO_ERROR project:11758 run:0 clone:248 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d770fce597dbe
04:05:17:WU02:FS02:Uploading 55.24MiB to 155.247.164.213
04:05:17:WU02:FS02:Connecting to 155.247.164.213:8080
04:05:18:WARNING:WU02:FS02:Exception: Failed to send results to work server: Transfer failed
04:05:18:WU02:FS02:Trying to send results to collection server
04:05:18:WU02:FS02:Uploading 55.24MiB to 155.247.164.214
04:05:18:WU02:FS02:Connecting to 155.247.164.214:8080
04:05:18:ERROR:WU02:FS02:Exception: Transfer failed
... Multiple Attempts ...
06:59:59:WU02:FS02:Sending unit results: id:02 state:SEND error:NO_ERROR project:11758 run:0 clone:248 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d770fce597dbe
06:59:59:WU02:FS02:Uploading 55.24MiB to 155.247.164.213
06:59:59:WU02:FS02:Connecting to 155.247.164.213:8080
06:59:59:WARNING:WU02:FS02:Exception: Failed to send results to work server: Transfer failed
06:59:59:WU02:FS02:Trying to send results to collection server
06:59:59:WU02:FS02:Uploading 55.24MiB to 155.247.164.214
06:59:59:WU02:FS02:Connecting to 155.247.164.214:8080
06:59:59:ERROR:WU02:FS02:Exception: Transfer failed
Image
User avatar
gordonbb
 
Posts: 476
Joined: Mon May 21, 2018 5:12 pm
Location: Great White North

Re: Send Errors - 155.247.164.213 & .214

Postby gordonbb » Sun Mar 15, 2020 8:32 am

Checked the Server Stats and both these are showing as down.

Which explains things.

Hopefully the crew aren’t suffering from this surfeit of Lampreys
User avatar
gordonbb
 
Posts: 476
Joined: Mon May 21, 2018 5:12 pm
Location: Great White North

Re: Send Errors - 155.247.164.213 & .214

Postby bruce » Sun Mar 15, 2020 8:52 am

Well, down means DOWN.

It's the middle of the night anywhere in th USA right now and volunteers should be sleeping ... preparing their immune system for another day of exposure to some random viruses.

I don't know who will be reponsible for fiuring out why the servers are down and fixing it but it won't be until tomorrow. Without more info, I don't know if it's the responsibly of the FAH team at temple.edu or the campus network support folks.
bruce
 
Posts: 19676
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: Send Errors - 155.247.164.213 & .214

Postby ChrisKFoldingAtHome » Sun Mar 15, 2020 11:35 am

Same here
ChrisKFoldingAtHome
 
Posts: 3
Joined: Sun Mar 15, 2020 11:08 am

Re: Send Errors - 155.247.164.213 & .214

Postby suchamoneypit » Sun Mar 15, 2020 10:14 pm

My clients can't connect to .214, so it is down. Are we able to switch servers so we can fold ? Or is waiting the only option.
suchamoneypit
 
Posts: 1
Joined: Mon Nov 05, 2018 4:09 am

Re: Send Errors - 155.247.164.213 & .214

Postby 0xbirb » Mon Mar 16, 2020 1:27 am

suchamoneypit wrote:My clients can't connect to .214, so it is down. Are we able to switch servers so we can fold ? Or is waiting the only option.


Most likely this is due to the sheer amount of new donors as a result of both Intel and Nvidia tweeting about the PC Master Race on Reddit. :) It's a "good thing". Sort of like when the kid who had no friends just wanted a birthday greetings card for his birthday, and people in the community responded and send him a card - or rather enough cards to swim in. :D

If you want to contribute still, just leave your machine running. Eventually it should be able to pick up new WUs and start folding, granted that it has been configured correctly (which is most cases is likely just the defaults).
0xbirb
 
Posts: 4
Joined: Sat Mar 14, 2020 5:56 pm

Re: Send Errors - 155.247.164.213 & .214

Postby TiO2 » Mon Mar 16, 2020 5:06 am

Does the completed work eventually get deleted if it can't be send to 155.247.164.213, or does it keep trying to send it until the server is back up?
TiO2
 
Posts: 4
Joined: Sun Feb 17, 2019 8:01 pm

Re: Send Errors - 155.247.164.213 & .214

Postby TiO2 » Mon Mar 16, 2020 5:08 am

Also I noticed that 155.247.164.214's status is set to "Assign", while it's the collection server for my WU that's been trying to send for a couple of hours. It appears to be up but not accepting work units.
TiO2
 
Posts: 4
Joined: Sun Feb 17, 2019 8:01 pm

Re: Send Errors - 155.247.164.213 & .214

Postby alxbelu » Mon Mar 16, 2020 9:43 am

One of my machines has been trying to submit a WU for 11758 for over 24hrs now (72 attempts); during Sunday I noted that the servers (213 & 214) were mostly down, but as of this morning they seem to be up according to the server status page, yet I am still getting this (UTC time):
Code: Select all
07:48:52:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
07:48:52:WU00:FS01:Connecting to 155.247.164.213:8080
07:48:52:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
07:48:52:WU00:FS01:Trying to send results to collection server
07:48:52:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
07:48:52:WU00:FS01:Connecting to 155.247.164.214:8080
07:48:56:ERROR:WU00:FS01:Exception: Transfer failed
07:53:06:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
07:53:06:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
07:53:06:WU00:FS01:Connecting to 155.247.164.213:8080
07:53:07:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
07:53:07:WU00:FS01:Trying to send results to collection server
07:53:07:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
07:53:07:WU00:FS01:Connecting to 155.247.164.214:8080
07:53:07:ERROR:WU00:FS01:Exception: Transfer failed
07:59:58:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
07:59:58:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
07:59:58:WU00:FS01:Connecting to 155.247.164.213:8080
07:59:58:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
07:59:58:WU00:FS01:Trying to send results to collection server
07:59:58:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
07:59:58:WU00:FS01:Connecting to 155.247.164.214:8080
07:59:59:ERROR:WU00:FS01:Exception: Transfer failed
08:11:03:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
08:11:03:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
08:11:03:WU00:FS01:Connecting to 155.247.164.213:8080
08:11:04:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
08:11:04:WU00:FS01:Trying to send results to collection server
08:11:04:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
08:11:04:WU00:FS01:Connecting to 155.247.164.214:8080
08:11:04:ERROR:WU00:FS01:Exception: Transfer failed
08:29:00:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
08:29:00:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
08:29:00:WU00:FS01:Connecting to 155.247.164.213:8080
08:29:01:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
08:29:01:WU00:FS01:Trying to send results to collection server
08:29:01:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
08:29:01:WU00:FS01:Connecting to 155.247.164.214:8080
08:29:01:ERROR:WU00:FS01:Exception: Transfer failed


I've reset the retry timer multiple times as it has extended well beyond 1hr (log indicates it's been up over 2hrs).

The same machine (and folding slot) is proceeding and has completed multiple other WUs meanwhile trying to send this though, so it's not blocking or anything, I'm just trying to figure out why it fails to submit the work even now when the servers are claimed to be up.
Official F@H Twitter (frequently updated): https://twitter.com/foldingathome
Official F@H Facebook: https://www.facebook.com/Foldinghome-136059519794607/

(I'm not affiliated with the F@H Team, just promoting these channels for official updates)
alxbelu
 
Posts: 109
Joined: Sat Mar 14, 2020 7:28 pm

Re: Send Errors - 155.247.164.213 & .214

Postby Scraig » Mon Mar 16, 2020 10:05 am

But why is the Estimated Credit constantly reduced? It was previously worked hard and now it is shrinking minute by minute while the Collection Server 155.247.164.214 is not receiving any data. That's not OK.
Scraig
 
Posts: 12
Joined: Mon Mar 16, 2020 9:58 am

Re: Send Errors - 155.247.164.213 & .214

Postby alxbelu » Mon Mar 16, 2020 11:18 am

I actually just now even got a new WU (11753) from 213, just after failing to upload the mentioned 11758 WU (just after restarting the FAH client):
Code: Select all
10:13:31:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
10:13:31:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
10:13:31:WU00:FS01:Connecting to 155.247.164.213:8080
10:13:31:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
10:13:31:WU00:FS01:Trying to send results to collection server
10:13:31:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
10:13:31:WU00:FS01:Connecting to 155.247.164.214:8080
10:13:32:ERROR:WU00:FS01:Exception: Transfer failed
10:13:37:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
10:13:37:WU02:FS01:Connecting to 128.252.203.10:80
10:14:02:ERROR:WU02:FS01:Exception: 10002: Received short response, expected 512 bytes, got 0
10:14:16:WU02:FS01:Connecting to 65.254.110.245:8080
10:14:16:WU02:FS01:Assigned to work server 155.247.164.213
10:14:16:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:TU106M [GeForce RTX 2060 Mobile] from 155.247.164.213
10:14:16:WU02:FS01:Connecting to 155.247.164.213:8080
10:14:27:WU02:FS01:Downloading 11.98MiB
10:14:33:WU02:FS01:Download 84.00%
10:14:34:WU02:FS01:Download complete
10:14:34:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:11753 run:0 clone:3574 gen:1 core:0x22 unit:0x000000029bf7a4d55e6d76caa76041b9
10:14:34:WU02:FS01:Starting
alxbelu
 
Posts: 109
Joined: Sat Mar 14, 2020 7:28 pm

Re: Send Errors - 155.247.164.213 & .214

Postby NathanJanssens » Mon Mar 16, 2020 1:35 pm

suchamoneypit wrote:My clients can't connect to .214, so it is down. Are we able to switch servers so we can fold ? Or is waiting the only option.


I would be interested in this as well. Instead of trying again and again to connect to a certain server, every try coming with longer intervals, is it possible to leave that server be for a bit (let's say after 3 or 4 failed attempts) and find another one?
NathanJanssens
 
Posts: 17
Joined: Fri Mar 13, 2020 11:52 am

Re: Send Errors - 155.247.164.213 & .214

Postby Scraig » Mon Mar 16, 2020 1:53 pm

bruce wrote:Well, down means DOWN.

It's the middle of the night anywhere in th USA right now and volunteers should be sleeping ... preparing their immune system for another day of exposure to some random viruses.

I don't know who will be reponsible for fiuring out why the servers are down and fixing it but it won't be until tomorrow. Without more infor, I don't know if it's the responsibly of the FAH team at temple.edu or the campus network support folks.


The Estimated Credit is now dropping every second. That's not fair, FAH should change that. The Doners are very patient and want to support FAH. For the failure of the server you should not even punish them.
Scraig
 
Posts: 12
Joined: Mon Mar 16, 2020 9:58 am

Re: Send Errors - 155.247.164.213 & .214

Postby GeriCom76 » Mon Mar 16, 2020 4:20 pm

Will these servers back again?
I tried to upload a finished project almost 2 days but got error message in log file. I've been running a 24/7 system.
Project number: 11753
Last edited by GeriCom76 on Mon Mar 16, 2020 4:45 pm, edited 1 time in total.
GeriCom76
 
Posts: 1
Joined: Sun Mar 15, 2020 9:49 pm

Re: Send Errors - 155.247.164.213 & .214

Postby TiO2 » Mon Mar 16, 2020 4:33 pm

My GPU slot has also been attempting to send a completed WU to .214 for the last 12 hours. The WU is about to hit timeout.
TiO2
 
Posts: 4
Joined: Sun Feb 17, 2019 8:01 pm

Next

Return to Issues with a specific server

Who is online

Users browsing this forum: Google [Bot] and 1 guest

cron