Send Errors - 155.247.164.213 & .214

Moderators: Site Moderators, FAHC Science Team

Re: Send Errors - 155.247.164.213 & .214

Postby Klutz » Fri Mar 20, 2020 11:16 am

As new user, where do I read about status of the server? A link would be very helpful. I have found https://apps.foldingathome.org/serverstats , but cannot find any status (or unable to understand it).


You're on the right page. Find the entry for 155.247.164.214, then hover your cursor over the column "Has CS" (where it says "Yes" in green). A small popup will appear to display the errors that plague this server. Being unable to connect to 155.247.164.213 is one of them.
Klutz
 
Posts: 7
Joined: Tue Mar 17, 2020 11:35 am

Re: Send Errors - 155.247.164.213 & .214

Postby roffvald » Fri Mar 20, 2020 12:55 pm

Same issue here with .213 and .214

11:53:17:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1380 gen:0 core:0x22 unit:0x000000059bf7a4d55e6d771272f2b9e7
11:53:17:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
11:53:17:WU00:FS01:Connecting to 155.247.164.213:8080
11:53:17:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
11:53:17:WU00:FS01:Trying to send results to collection server
11:53:17:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
11:53:17:WU00:FS01:Connecting to 155.247.164.214:8080
11:53:19:ERROR:WU00:FS01:Exception: Transfer failed
roffvald
 
Posts: 3
Joined: Mon Mar 16, 2020 11:44 am

Re: Send Errors - 155.247.164.213 & .214

Postby Darth_Peter_dualxeon » Fri Mar 20, 2020 1:04 pm

At https://apps.foldingathome.org/serverstats , what does "Has CS" acronym mean? Why are "public" and"beta" jobs the same number? What's the difference?
by the way on the page, both these servers have 0 error and >90TiB space (is that the free space on the hard disks where the results get uploaded?)
And I can ping them. So then how I cannot upload result?

Sometimes I was seeing that there are jobs on the server and then did not receive work unit.... is this the total number of jobs, including the ones that are already assigned to someone else?
Darth_Peter_dualxeon
 
Posts: 46
Joined: Fri Mar 20, 2020 4:13 am

Re: Send Errors - 155.247.164.213 & .214

Postby Qwarkman » Fri Mar 20, 2020 4:21 pm

I now have a third WU with the exact same problem. The first two has timed out and are now worthless. I expect nothing different for this one.
Either shut the servers down or fix them, letting them assign WU's is a serious waste of resources if they keep dishing out WU's with no ability to collect results.
I'm considering shutting down my GPU folding because for now it's a waste of time and money unless I babysit the work queue and remove all WU's assigned from these servers.
Qwarkman
 
Posts: 3
Joined: Wed Mar 18, 2020 1:54 pm

Re: Send Errors - 155.247.164.213 & .214

Postby vangli » Fri Mar 20, 2020 5:29 pm

I did see that hey tried to restart the 214 server twice. No effect. Not knowing the F&H software, do the server communicate through an encrypted channel with sertificates, and has this changed? It seems that some more servers has failures connecting to 213. Anyway, agree with Qwarkman, stop sending WU's that can't be collected. Waste of time and computing power.
Regards
Bent Vangli, Oslo, Norway
vangli
 
Posts: 12
Joined: Thu Mar 19, 2020 11:35 am

Re: Send Errors - 155.247.164.213 & .214

Postby Jesse_V » Fri Mar 20, 2020 5:50 pm

Darth_Peter_dualxeon wrote:At https://apps.foldingathome.org/serverstats , what does "Has CS" acronym mean? Why are "public" and"beta" jobs the same number? What's the difference?
by the way on the page, both these servers have 0 error and >90TiB space (is that the free space on the hard disks where the results get uploaded?)
And I can ping them. So then how I cannot upload result?

Sometimes I was seeing that there are jobs on the server and then did not receive work unit.... is this the total number of jobs, including the ones that are already assigned to someone else?


CS means Collection Server. It's where the workunits go after they are completed. There are new projects in "beta" because they might be unstable or cause errors, so the people opting into beta have stable hardware and watch the log a little more closely. That way the teams can fix any issues before pushing them to the larger "public" group.

I'm not sure why you can't upload the results, but they are currently working on the servers to keep up with the overwhelming demand, so hopefully that gets fixed soon.

Qwarkman wrote:I now have a third WU with the exact same problem. The first two has timed out and are now worthless. I expect nothing different for this one.
Either shut the servers down or fix them, letting them assign WU's is a serious waste of resources if they keep dishing out WU's with no ability to collect results.
I'm considering shutting down my GPU folding because for now it's a waste of time and money unless I babysit the work queue and remove all WU's assigned from these servers.


I get it. The WUs should upload in time once they fix some of the errors with the server. There's just so much demand at the moment. I'd recommend just keeping everything running as I expect it to clear up soon.

vangli wrote:I did see that hey tried to restart the 214 server twice. No effect. Not knowing the F&H software, do the server communicate through an encrypted channel with sertificates, and has this changed? It seems that some more servers has failures connecting to 213. Anyway, agree with Qwarkman, stop sending WU's that can't be collected. Waste of time and computing power.


The clients talk to the server over HTTP. There is a hash signature and sanity checks to ensure that the workunits are validated. Technically, it's an HTTP request to a raw IP address. The servers just may be overwhelmed at the moment. In 10 years I've never seen this many people jumping into the network at once and it's causing substantial load on the servers.
F@h is now the top computing platform on the planet and nothing unites people like a dedicated fight against a common enemy. This virus affects all of us. Lets end it together.
Jesse_V
Site Moderator
 
Posts: 2846
Joined: Mon Jul 18, 2011 5:44 am
Location: Western Washington

Re: Send Errors - 155.247.164.213 & .214

Postby Joe_H » Fri Mar 20, 2020 7:37 pm

Klutz wrote:
Moderators here have been in contact with the people running the server, more than once. They are aware and working on it among the other issues.

It would be helpful to get an update from someone who actually has oversight over these two servers. The server status page has clearly stated the exact nature of this error for a number of days, yet we've seen no indication that any attempt has been made to rectify it. This is a waste of valuable resources and goodwill.

Some of the folding team supporting the project servers are posting on Twitter and Facebook.

As for the Server Status page showing anything about the "exact nature of this problem", where is that and what are you interpreting to be that? About the only thing I can see is that they have greatly reduced the rate of assignments from those two servers while they try to sort this out.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6452
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: Send Errors - 155.247.164.213 & .214

Postby Klutz » Fri Mar 20, 2020 8:40 pm

As for the Server Status page showing anything about the "exact nature of this problem", where is that and what are you interpreting to be that?

I already answered that above: https://foldingforum.org/viewtopic.php?f=18&t=32492&start=60#p315744
Klutz
 
Posts: 7
Joined: Tue Mar 17, 2020 11:35 am

Re: Send Errors - 155.247.164.213 & .214

Postby TitanXp » Sat Mar 21, 2020 4:18 am

Still down? These 2 servers are giving me the same error as OP.
Project 11777
TitanXp
 
Posts: 11
Joined: Wed Apr 12, 2017 12:44 am

Re: Send Errors - 155.247.164.213 & .214

Postby sixty4bitdiablo » Sat Mar 21, 2020 6:56 am

Same issues here.

05:50:39:WU01:FS01:Trying to send results to collection server
05:50:39:WU01:FS01:Uploading 55.24MiB to 155.247.164.214
05:50:39:WU01:FS01:Connecting to 155.247.164.214:8080
05:50:39:ERROR:WU01:FS01:Exception: Transfer failed

Any ideas? It's been like this for a day or so now.
sixty4bitdiablo
 
Posts: 1
Joined: Sat Mar 21, 2020 6:55 am

Re: Send Errors - 155.247.164.213 & .214

Postby vangli » Sat Mar 21, 2020 8:30 am

In fact several days. I have 3 WUs waiting for upload, the first one with 80, yes eighty, retries. I accept that technical problems occure. However, total lack of information from the administrators and the fact that new WUs are stil sent, which cannot be collected, is bad.
vangli
 
Posts: 12
Joined: Thu Mar 19, 2020 11:35 am

Re: Send Errors - 155.247.164.213 & .214

Postby jima13 » Sat Mar 21, 2020 9:08 am

At this point I'd just be piling on, but what the heck>
Code: Select all
 *********************** Log Started 2020-03-21T02:17:20Z ***********************
02:17:22:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:17:23:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:17:23:WU03:FS02:Connecting to 155.247.164.213:8080
02:19:50:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:19:50:WU03:FS02:Trying to send results to collection server
02:19:50:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:19:50:WU03:FS02:Connecting to 155.247.164.214:8080
02:19:51:ERROR:WU03:FS02:Exception: Transfer failed
02:19:51:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:19:52:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:19:52:WU03:FS02:Connecting to 155.247.164.213:8080
02:19:53:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:19:53:WU03:FS02:Trying to send results to collection server
02:19:53:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:19:53:WU03:FS02:Connecting to 155.247.164.214:8080
02:19:55:ERROR:WU03:FS02:Exception: Transfer failed
02:20:51:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:20:51:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:20:51:WU03:FS02:Connecting to 155.247.164.213:8080
02:20:52:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:20:52:WU03:FS02:Trying to send results to collection server
02:20:52:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:20:52:WU03:FS02:Connecting to 155.247.164.214:8080
02:20:53:ERROR:WU03:FS02:Exception: Transfer failed
02:22:28:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:22:29:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:22:29:WU03:FS02:Connecting to 155.247.164.213:8080
02:22:29:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:22:29:WU03:FS02:Trying to send results to collection server
02:22:29:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:22:29:WU03:FS02:Connecting to 155.247.164.214:8080
02:22:29:ERROR:WU03:FS02:Exception: Transfer failed
02:25:06:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:25:06:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:25:06:WU03:FS02:Connecting to 155.247.164.213:8080
02:25:06:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:25:06:WU03:FS02:Trying to send results to collection server
02:25:06:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:25:06:WU03:FS02:Connecting to 155.247.164.214:8080
02:25:06:ERROR:WU03:FS02:Exception: Transfer failed
02:29:20:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:29:20:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:29:20:WU03:FS02:Connecting to 155.247.164.213:8080
02:29:21:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:29:21:WU03:FS02:Trying to send results to collection server
02:29:21:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:29:21:WU03:FS02:Connecting to 155.247.164.214:8080
02:29:21:ERROR:WU03:FS02:Exception: Transfer failed
02:36:11:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:36:11:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:36:11:WU03:FS02:Connecting to 155.247.164.213:8080
02:36:12:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:36:12:WU03:FS02:Trying to send results to collection server
02:36:12:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:36:12:WU03:FS02:Connecting to 155.247.164.214:8080
02:36:12:ERROR:WU03:FS02:Exception: Transfer failed
02:47:17:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
02:47:17:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
02:47:17:WU03:FS02:Connecting to 155.247.164.213:8080
02:47:17:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
02:47:17:WU03:FS02:Trying to send results to collection server
02:47:17:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
02:47:17:WU03:FS02:Connecting to 155.247.164.214:8080
02:47:18:ERROR:WU03:FS02:Exception: Transfer failed
03:05:14:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
03:05:14:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
03:05:14:WU03:FS02:Connecting to 155.247.164.213:8080
03:05:15:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
03:05:15:WU03:FS02:Trying to send results to collection server
03:05:15:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
03:05:15:WU03:FS02:Connecting to 155.247.164.214:8080
03:05:15:ERROR:WU03:FS02:Exception: Transfer failed
03:34:16:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
03:34:16:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
03:34:16:WU03:FS02:Connecting to 155.247.164.213:8080
03:34:16:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
03:34:16:WU03:FS02:Trying to send results to collection server
03:34:16:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
03:34:16:WU03:FS02:Connecting to 155.247.164.214:8080
03:34:17:ERROR:WU03:FS02:Exception: Transfer failed
04:21:15:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
04:21:15:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
04:21:15:WU03:FS02:Connecting to 155.247.164.213:8080
04:21:15:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
04:21:15:WU03:FS02:Trying to send results to collection server
04:21:15:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
04:21:15:WU03:FS02:Connecting to 155.247.164.214:8080
04:21:16:ERROR:WU03:FS02:Exception: Transfer failed
05:37:15:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
05:37:16:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
05:37:16:WU03:FS02:Connecting to 155.247.164.213:8080
05:37:16:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
05:37:16:WU03:FS02:Trying to send results to collection server
05:37:16:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
05:37:16:WU03:FS02:Connecting to 155.247.164.214:8080
05:37:16:ERROR:WU03:FS02:Exception: Transfer failed
07:40:15:WU03:FS02:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:918 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d7711337cc5c2
07:40:15:WU03:FS02:Uploading 55.24MiB to 155.247.164.213
07:40:15:WU03:FS02:Connecting to 155.247.164.213:8080
07:40:16:WARNING:WU03:FS02:Exception: Failed to send results to work server: Transfer failed
07:40:16:WU03:FS02:Trying to send results to collection server
07:40:16:WU03:FS02:Uploading 55.24MiB to 155.247.164.214
07:40:16:WU03:FS02:Connecting to 155.247.164.214:8080
07:40:17:ERROR:WU03:FS02:Exception: Transfer failed


What really bugs me is the time between tries keeps expanding, so after 12 tries the next try is in 3 hours. Is there a reason for this, or can it be coded down to an hour or less?
Image
jima13
 
Posts: 29
Joined: Fri Dec 07, 2007 6:27 am
Location: La Grande, OR

Re: Send Errors - 155.247.164.213 & .214

Postby Whittle » Sat Mar 21, 2020 9:09 am

What happens for me is it tries to upload to 155.247.164.213:

Code: Select all
07:59:09:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11758 run:0 clone:41 gen:0 core:0x22 unit:0x000000079bf7a4d55e6d770eee5f6026
07:59:09:WU01:FS01:Uploading 55.24MiB to 155.247.164.213
07:59:09:WU01:FS01:Connecting to 155.247.164.213:8080
07:59:11:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed

In a Wireshark capture I see the following response from 155.247.164.213, before it closes the connection with a TCP reset:

Code: Select all
HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE

When I follow the TCP stream for 155.247.164.213 in Wireshark it looks to be sending the results and then the server cuts it off part way through:

Code: Select all
      [...]
      <Position x="1.8908989287272897" y="-2.547739371460885" z="5.77243400391751"/>
      <Position x="1.8994014240536208" y="-2.618247308183257" z="5.674706601713439"/>
      <Position x="2.0814601488350846" y="-2.46088035887HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE
Content-Type: text/html
Connection: close

<html><head><title>413 HTTP_REQUEST_ENTITY_TOO_LARGE</title></head><body><h1>413 HTTP_REQUEST_ENTITY_TOO_LARGE</h1></body></html>

It then immediately retries the upload, this time to 155.247.164.214 but I just a TCP reset in Wireshark which shows as "Transfer failed" in the logs:

Code: Select all
07:59:11:WU01:FS01:Trying to send results to collection server
07:59:11:WU01:FS01:Uploading 55.24MiB to 155.247.164.214
07:59:11:WU01:FS01:Connecting to 155.247.164.214:8080
07:59:12:ERROR:WU01:FS01:Exception: Transfer failed

Sometimes I also get that "HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE" error in Wireshark for 155.247.164.214 too.

Hope that's of some help.
Last edited by Whittle on Sat Mar 21, 2020 9:24 am, edited 1 time in total.
Whittle
 
Posts: 5
Joined: Sun Mar 15, 2020 4:33 pm

Re: Send Errors - 155.247.164.213 & .214

Postby jonault » Sat Mar 21, 2020 9:15 am

jima13 wrote:What really bugs me is the time between tries keeps expanding, so after 12 tries the next try is in 3 hours. Is there a reason for this, or can it be coded down to an hour or less?


If the reason the client can't get a connection is because too many clients are trying to talk to the server, then having the failed clients slow down their attempts to connect eases the burden on the server so they can start getting through. If they didn't slow down, they could wind up effectively DDOSing the server. I doubt that they'll want to change that behavior.
Image
jonault
 
Posts: 176
Joined: Fri Dec 14, 2007 10:53 pm

Re: Send Errors - 155.247.164.213 & .214

Postby vangli » Sat Mar 21, 2020 5:04 pm

Found just the same as Whittle . Transmission ends with HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE analyzing with wireshark. One possibility seems to be that the collector isn't able to receive a chunk of 55 Mbyte . Not very uncommon in a web server setup. Could it be that a reconfiguration to allow chunks of 64 Mbyte or something like that will help? Wireshark analyze show that a connection is established, but then disconnected a part out in transmission

Code: Select all
4769   1213.944078814   192.168.123.21   155.247.164.214   TCP   66   44488 → 8080 [ACK] Seq=15948 Ack=220 Win=30336 Len=0 TSval=102210099 TSecr=3240794470
4770   1213.944189552   155.247.164.214   192.168.123.21   HTTP   66   HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE  (text/html)


Editet:

I have done some further analyzes. After connection to server, the following packet are sent to the collecting server:

Code: Select all
119   36.152390113   192.168.123.21   155.247.164.213   TCP   173   57632 → 8080 [PSH, ACK] Seq=1 Ack=1 Win=29312 Len=107 TSval=1078346373 TSecr=3159245236 [TCP segment of a reassembled PDU]


The payload of this packet is:

Code: Select all
POST http://155.247.164.213/ HTTP/1.0
Content-Length: 57925632
Content-Type: application/octet-stream


As you can see it tells the size of the transmission, 57925632 bytes. After this the connection collapses after exchanging some more packets, ending up with HTTP_REQUEST_ENTITY_TOO_LARGE. The real transmission seems never to start. If this can help, I would be happy.
Last edited by vangli on Sat Mar 21, 2020 8:30 pm, edited 1 time in total.
vangli
 
Posts: 12
Joined: Thu Mar 19, 2020 11:35 am

PreviousNext

Return to Issues with a specific server

Who is online

Users browsing this forum: No registered users and 1 guest

cron