Failing to SEND completed work unit

Moderators: Site Moderators, FAHC Science Team

Failing to SEND completed work unit

Postby paulmd199 » Fri Apr 03, 2020 10:01 pm

I am aware of the fact that your servers are overloaded in allocating new work units. Are they likewise overloaded for receiving completed work units?

I have a completed WU that I am unable to send, despite trying all night. Will completed work units keep accumulating until received? If so is there a limit to this? Will the client eventually give up trying to send?

Filtered log excerpt follows:

Code: Select all
20:23:58:WU01:FS03:0x22:Completed 40000 out of 2000000 steps (2%)
20:24:31:WU03:FS03:Sending unit results: id:03 state:SEND error:NO_ERROR project:11749 run:0 clone:5822 gen:14 core:0x22 unit:0x000000208ca304e75e6bb52dc4161efa
20:24:31:WU03:FS03:Uploading 12.56MiB to 140.163.4.231
20:24:31:WU03:FS03:Connecting to 140.163.4.231:8080
20:24:53:WARNING:WU03:FS03:WorkServer connection failed on port 8080 trying 80
20:24:53:WU03:FS03:Connecting to 140.163.4.231:80
20:25:14:WARNING:WU03:FS03:Exception: Failed to send results to work server: Failed to connect to 140.163.4.231:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
20:27:11:WU01:FS03:0x22:Completed 60000 out of 2000000 steps (3%)
20:30:19:WU01:FS03:0x22:Completed 80000 out of 2000000 steps (4%)
20:31:22:WU03:FS03:Sending unit results: id:03 state:SEND error:NO_ERROR project:11749 run:0 clone:5822 gen:14 core:0x22 unit:0x000000208ca304e75e6bb52dc4161efa
20:31:23:WU03:FS03:Uploading 12.56MiB to 140.163.4.231
20:31:23:WU03:FS03:Connecting to 140.163.4.231:8080
20:31:44:WARNING:WU03:FS03:WorkServer connection failed on port 8080 trying 80
20:31:44:WU03:FS03:Connecting to 140.163.4.231:80
20:32:05:WARNING:WU03:FS03:Exception: Failed to send results to work server: Failed to connect to 140.163.4.231:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
20:33:31:WU01:FS03:0x22:Completed 100000 out of 2000000 steps (5%)
20:36:39:WU01:FS03:0x22:Completed 120000 out of 2000000 steps (6%)
20:39:46:WU01:FS03:0x22:Completed 140000 out of 2000000 steps (7%)
20:42:28:WU03:FS03:Sending unit results: id:03 state:SEND error:NO_ERROR project:11749 run:0 clone:5822 gen:14 core:0x22 unit:0x000000208ca304e75e6bb52dc4161efa
20:42:28:WU03:FS03:Uploading 12.56MiB to 140.163.4.231
20:42:28:WU03:FS03:Connecting to 140.163.4.231:8080
20:42:50:WARNING:WU03:FS03:WorkServer connection failed on port 8080 trying 80
20:42:50:WU03:FS03:Connecting to 140.163.4.231:80
20:42:56:WU01:FS03:0x22:Completed 160000 out of 2000000 steps (8%)
20:42:57:WU03:FS03:Upload 0.50%
20:44:05:WU03:FS03:Upload 0.99%
20:44:05:WARNING:WU03:FS03:Exception: Failed to send results to work server: Transfer failed


Mod Edit: Added Code Tags - PantherX
paulmd199
 
Posts: 37
Joined: Wed Apr 01, 2020 5:41 am

Re: Failing to SEND completed work unit

Postby PantherX » Fri Apr 03, 2020 10:06 pm

Welcome to the F@H Forum paulmd199,

In a nutshell, some servers only assign WUs, some only receive WUs, and other send and receive WUs. When you add in the limitations of bandwidth, you can end up where sending a completed WU takes time.

Completed WUs will continue to attempt sending the data to the Server. It will stop once the Expiration date is reached. The expiration date will vary for each project.

On my system, I have seen a completed WU take 24 hours to be successfully returned :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6605
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: Failing to SEND completed work unit

Postby X-Wing » Fri Apr 03, 2020 10:10 pm

I have read that their main issue right now is indeed the reception of completed work units (Note: I'm not affiliated with FAH, I'm a community member just like you). I am in the same boat, with the same error message. Hopefully some of the new servers coming online will be able to reduce the pressure a little bit. By my mental math, FAH is on course to 2.5x their server capacity in the last few months once all currently listed servers come fully online (10 to 26).
Rig: i3-8350K, GTX 1660Ti, GTX 750Ti, 16GB DRR4-3000MHz.
X-Wing
 
Posts: 56
Joined: Sun Apr 28, 2019 12:43 am

Re: Failing to SEND completed work unit

Postby paulmd199 » Fri Apr 03, 2020 10:24 pm

Thanks to you both, i will just wait this out and try not to obsess over it too much.
paulmd199
 
Posts: 37
Joined: Wed Apr 01, 2020 5:41 am

Re: Failing to SEND completed work unit

Postby paulmd199 » Fri Apr 03, 2020 11:17 pm

Hold on, maybe there is an issue. According to my logs, I'm attempting to connect to 140.163.4.231, which , according to https://apps.foldingathome.org/serverstats is set to assign, not to accept. I don't know enough about how FaH works to say that this is indeed the problem. But thought it worth pointing out.

140.163.4.231 plfah1-1.mskcc.org WS 9.6.1 rafal.wiewiora 3,600.00/hr 0 1 No Assign 69,801 69,801 OPENMM_22 5.10TiB 2 days 2020-04-03T22:09:07Z
paulmd199
 
Posts: 37
Joined: Wed Apr 01, 2020 5:41 am

Re: Failing to SEND completed work unit

Postby X-Wing » Fri Apr 03, 2020 11:46 pm

According to a thread about the servers I read recently, "Assign" means both assign and accept. "Accept" means accept only.
X-Wing
 
Posts: 56
Joined: Sun Apr 28, 2019 12:43 am

Re: Failing to SEND completed work unit

Postby Joe_H » Sat Apr 04, 2020 4:44 pm

That is correct. The servers in Accept mode either have no WUs to send out and are able to take returns, are near full and waiting on some returns before transferring off data to other storage, or for some other reason or combination of reasons.

A Work Server in Assign status is both sending and receiving WUs.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6547
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: Failing to SEND completed work unit

Postby astrorob » Sat Apr 04, 2020 5:58 pm

that particular server (140.163.4.231) seems to have some problem for a couple of days now. i think there was another thread where Joe_H or another moderator said they were aware of the problem. i've still got one WU that won't upload to that server - it gets to about 1% (very, very slowly) and then fails.
Image
astrorob
 
Posts: 37
Joined: Sun Mar 15, 2020 8:59 pm

Re: Failing to SEND completed work unit

Postby paulmd199 » Sat Apr 04, 2020 9:18 pm

My issue follows the pattern exactly as astrorob laid out, only now i have two completed WUs in this state. hope it's fixed before they expire.
paulmd199
 
Posts: 37
Joined: Wed Apr 01, 2020 5:41 am


Return to V7.5.1 Public Release Windows/Linux/MacOS X

Who is online

Users browsing this forum: No registered users and 1 guest

cron