Send Errors - 155.247.164.213 & .214

Moderators: Site Moderators, FAHC Science Team

Re: Unable to upload WU - upload failed

Postby alxbelu » Tue Mar 24, 2020 4:50 pm

There's a known issue with project 11758 and servers 213/214, project/server managers are aware: viewtopic.php?f=18&t=32492&start=90
Official F@H Twitter (frequently updated): https://twitter.com/foldingathome
Official F@H Facebook: https://www.facebook.com/Foldinghome-136059519794607/

(I'm not affiliated with the F@H Team, just promoting these channels for official updates)
alxbelu
 
Posts: 109
Joined: Sat Mar 14, 2020 7:28 pm

Re: Send Errors - 155.247.164.213 & .214

Postby cs278 » Wed Mar 25, 2020 12:10 am

davidcoton wrote:I believe the server code on these two servers has been updated.


Updated in what way? To fix uploading the results or stopped handing out WUs with such large results?

I'm still seeing HTTP/1.0 413 HTTP_REQUEST_ENTITY_TOO_LARGE responses when trying to upload results, I've been trying to submit a WU for the last 4 days.

I appreciate people have many pressures on their time, especially given the situation right now, but this should be a really quick fix to allow these results to be submitted. :(
cs278
 
Posts: 1
Joined: Tue Mar 24, 2020 11:54 pm

Re: Send Errors - 155.247.164.213 & .214

Postby davidcoton » Wed Mar 25, 2020 1:16 am

Sorry I don't have any further details, server code/bug tracking is not available outside the team.
There is an attempt to prioritise people's time to concentrate on the most critical/highest impact fixes, I would be reasonably confident that this issue (if not fixed already) is on the list and prioritised according to the best guesstimate of the effort involved.
Image
davidcoton
 
Posts: 1102
Joined: Wed Nov 05, 2008 4:19 pm
Location: Cambridge, UK

Re: Send Errors - 155.247.164.213 & .214

Postby pachydermus » Wed Mar 25, 2020 9:23 am

I would be reasonably confident that this issue (if not fixed already) is on the list and prioritised according to the best guesstimate of the effort involved.

It's been over a week since this particular issue was reported. I along with everyone else who's posted on this topic has wasted workunits. That's not good. At the very least this particular project needs to be stopped until someone deems it important enough to take 2 minutes of their valuable time to fix it.
pachydermus
 
Posts: 17
Joined: Tue Mar 24, 2020 12:06 pm

Re: Send Errors - 155.247.164.213 & .214

Postby bowman » Wed Mar 25, 2020 9:56 am

Happening for the first time to me.

08:54:13:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11758 run:0 clone:2354 gen:0 core:0x22 unit:0x0000000a9bf7a4d55e6d7715c7032479
08:54:13:WU01:FS01:Uploading 55.24MiB to 155.247.164.213
08:54:13:WU01:FS01:Connecting to 155.247.164.213:8080
08:54:14:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
08:54:14:WU01:FS01:Trying to send results to collection server
08:54:14:WU01:FS01:Uploading 55.24MiB to 155.247.164.214
08:54:14:WU01:FS01:Connecting to 155.247.164.214:8080
08:54:14:ERROR:WU01:FS01:Exception: Transfer failed
FYI: not Professor Greg Bowman.
bowman
 
Posts: 20
Joined: Wed Jun 11, 2008 10:06 am

Re: Send Errors - 155.247.164.213 & .214

Postby sc00p » Wed Mar 25, 2020 11:17 am

CS "might" just be under very heavy load... seconds ago I got one of my WUs sent back in... (to 155.247.164.214) :)
sc00p
 
Posts: 4
Joined: Sat Jan 26, 2008 10:04 pm

Re: Send Errors - 155.247.164.213 & .214

Postby alxbelu » Wed Mar 25, 2020 11:38 am

sc00p wrote:CS "might" just be under very heavy load... seconds ago I got one of my WUs sent back in... (to 155.247.164.214) :)


I think many of us have had WUs uploaded to 213/214, just not this particular one (11758), for e.g;
alxbelu wrote:Yep, I have now not been able to upload this specific WU for over 48hrs. And to be clear, the machines have completed multiple other WUs meanwhile, including receiving and uploading to this specific server (213), for e.g:

Code: Select all
10:14:16:WU02:FS01:Connecting to 65.254.110.245:8080
10:14:16:WU02:FS01:Assigned to work server 155.247.164.213
10:14:16:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:TU106M [GeForce RTX 2060 Mobile] from 155.247.164.213
10:14:16:WU02:FS01:Connecting to 155.247.164.213:8080
10:14:27:WU02:FS01:Downloading 11.98MiB
10:14:33:WU02:FS01:Download 84.00%
10:14:34:WU02:FS01:Download complete
10:14:34:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:11753 run:0 clone:3574 gen:1 core:0x22 unit:0x000000029bf7a4d55e6d76caa76041b9
10:14:34:WU02:FS01:Starting
10:14:34:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\alxbelu\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 02 -suffix 01 -version 705 -lifeline 11980 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
10:14:34:WU02:FS01:Started FahCore on PID 18220
10:14:34:WU02:FS01:Core PID:9752
10:14:34:WU02:FS01:FahCore 0x22 started
10:14:35:WU02:FS01:0x22:*********************** Log Started 2020-03-16T10:14:34Z ***********************
10:14:35:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
10:14:35:WU02:FS01:0x22:       Type: 0x22
10:14:35:WU02:FS01:0x22:       Core: Core22
10:14:35:WU02:FS01:0x22:    Website: https://foldingathome.org/
10:14:35:WU02:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
10:14:35:WU02:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
10:14:35:WU02:FS01:0x22:             <rafal.wiewiora@choderalab.org>
10:14:35:WU02:FS01:0x22:       Args: -dir 02 -suffix 01 -version 705 -lifeline 18220 -checkpoint 15
10:14:35:WU02:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
10:14:35:WU02:FS01:0x22:             0 -gpu 0
10:14:35:WU02:FS01:0x22:     Config: <none>
10:14:35:WU02:FS01:0x22:************************************ Build *************************************
10:14:35:WU02:FS01:0x22:    Version: 0.0.2
10:14:35:WU02:FS01:0x22:       Date: Dec 6 2019
10:14:35:WU02:FS01:0x22:       Time: 21:30:31
10:14:35:WU02:FS01:0x22: Repository: Git
10:14:35:WU02:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
10:14:35:WU02:FS01:0x22:     Branch: HEAD
10:14:35:WU02:FS01:0x22:   Compiler: Visual C++ 2008
10:14:35:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
10:14:35:WU02:FS01:0x22:   Platform: win32 10
10:14:35:WU02:FS01:0x22:       Bits: 64
10:14:35:WU02:FS01:0x22:       Mode: Release
10:14:35:WU02:FS01:0x22:************************************ System ************************************
10:14:35:WU02:FS01:0x22:        CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
10:14:35:WU02:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
10:14:35:WU02:FS01:0x22:       CPUs: 12
10:14:35:WU02:FS01:0x22:     Memory: 15.85GiB
10:14:35:WU02:FS01:0x22:Free Memory: 8.79GiB
10:14:35:WU02:FS01:0x22:    Threads: WINDOWS_THREADS
10:14:35:WU02:FS01:0x22: OS Version: 6.2
10:14:35:WU02:FS01:0x22:Has Battery: true
10:14:35:WU02:FS01:0x22: On Battery: false
10:14:35:WU02:FS01:0x22: UTC Offset: 1
10:14:35:WU02:FS01:0x22:        PID: 9752
10:14:35:WU02:FS01:0x22:        CWD: C:\Users\alxbelu\AppData\Roaming\FAHClient\work
10:14:35:WU02:FS01:0x22:         OS: Windows 10 Home
10:14:35:WU02:FS01:0x22:    OS Arch: AMD64
10:14:35:WU02:FS01:0x22:********************************************************************************
10:14:35:WU02:FS01:0x22:Project: 11753 (Run 0, Clone 3574, Gen 1)
10:14:35:WU02:FS01:0x22:Unit: 0x000000029bf7a4d55e6d76caa76041b9
10:14:35:WU02:FS01:0x22:Reading tar file core.xml
10:14:35:WU02:FS01:0x22:Reading tar file integrator.xml
10:14:35:WU02:FS01:0x22:Reading tar file state.xml
10:14:38:WU02:FS01:0x22:Reading tar file system.xml
10:14:39:WU02:FS01:0x22:Digital signatures verified
10:14:39:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
10:14:39:WU02:FS01:0x22:Version 0.0.2
10:15:08:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
10:15:08:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
10:15:08:WU00:FS01:Connecting to 155.247.164.213:8080
10:15:08:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
10:15:08:WU00:FS01:Trying to send results to collection server
10:15:08:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
10:15:08:WU00:FS01:Connecting to 155.247.164.214:8080
10:15:09:ERROR:WU00:FS01:Exception: Transfer failed
10:15:09:WU02:FS01:0x22:Completed 0 out of 1000000 steps (0%)
10:15:09:WU02:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
10:17:00:WU02:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
10:17:45:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
10:17:45:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
10:17:45:WU00:FS01:Connecting to 155.247.164.213:8080
10:17:47:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
10:17:47:WU00:FS01:Trying to send results to collection server
10:17:47:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
10:17:47:WU00:FS01:Connecting to 155.247.164.214:8080
10:17:47:ERROR:WU00:FS01:Exception: Transfer failed
10:18:51:WU02:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
10:20:41:WU02:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
10:21:59:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
10:21:59:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
10:21:59:WU00:FS01:Connecting to 155.247.164.213:8080
10:22:01:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
10:22:01:WU00:FS01:Trying to send results to collection server
10:22:01:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
10:22:01:WU00:FS01:Connecting to 155.247.164.214:8080
10:22:01:ERROR:WU00:FS01:Exception: Transfer failed
10:22:32:WU02:FS01:0x22:Completed 40000 out of 1000000 steps (4%)
10:24:23:WU02:FS01:0x22:Completed 50000 out of 1000000 steps (5%)
10:26:25:WU02:FS01:0x22:Completed 60000 out of 1000000 steps (6%)
10:28:16:WU02:FS01:0x22:Completed 70000 out of 1000000 steps (7%)
10:28:51:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
10:28:51:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
10:28:51:WU00:FS01:Connecting to 155.247.164.213:8080
10:28:51:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
10:28:51:WU00:FS01:Trying to send results to collection server
10:28:51:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
10:28:51:WU00:FS01:Connecting to 155.247.164.214:8080
10:28:52:ERROR:WU00:FS01:Exception: Transfer failed
10:30:06:WU02:FS01:0x22:Completed 80000 out of 1000000 steps (8%)
10:31:57:WU02:FS01:0x22:Completed 90000 out of 1000000 steps (9%)
10:33:47:WU02:FS01:0x22:Completed 100000 out of 1000000 steps (10%)
10:35:49:WU02:FS01:0x22:Completed 110000 out of 1000000 steps (11%)
10:37:40:WU02:FS01:0x22:Completed 120000 out of 1000000 steps (12%)
10:39:30:WU02:FS01:0x22:Completed 130000 out of 1000000 steps (13%)
10:39:56:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
10:39:56:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
10:39:56:WU00:FS01:Connecting to 155.247.164.213:8080
10:39:57:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
10:39:57:WU00:FS01:Trying to send results to collection server
10:39:57:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
10:39:57:WU00:FS01:Connecting to 155.247.164.214:8080
10:39:57:ERROR:WU00:FS01:Exception: Transfer failed
10:41:21:WU02:FS01:0x22:Completed 140000 out of 1000000 steps (14%)
10:43:12:WU02:FS01:0x22:Completed 150000 out of 1000000 steps (15%)
10:45:13:WU02:FS01:0x22:Completed 160000 out of 1000000 steps (16%)
10:47:03:WU02:FS01:0x22:Completed 170000 out of 1000000 steps (17%)
10:48:54:WU02:FS01:0x22:Completed 180000 out of 1000000 steps (18%)
10:50:45:WU02:FS01:0x22:Completed 190000 out of 1000000 steps (19%)
10:52:35:WU02:FS01:0x22:Completed 200000 out of 1000000 steps (20%)
10:54:37:WU02:FS01:0x22:Completed 210000 out of 1000000 steps (21%)
10:56:27:WU02:FS01:0x22:Completed 220000 out of 1000000 steps (22%)
10:57:53:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
10:57:53:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
10:57:53:WU00:FS01:Connecting to 155.247.164.213:8080
10:57:54:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
10:57:54:WU00:FS01:Trying to send results to collection server
10:57:54:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
10:57:54:WU00:FS01:Connecting to 155.247.164.214:8080
10:57:54:ERROR:WU00:FS01:Exception: Transfer failed
10:58:18:WU02:FS01:0x22:Completed 230000 out of 1000000 steps (23%)
11:00:08:WU02:FS01:0x22:Completed 240000 out of 1000000 steps (24%)
11:01:59:WU02:FS01:0x22:Completed 250000 out of 1000000 steps (25%)
11:04:01:WU02:FS01:0x22:Completed 260000 out of 1000000 steps (26%)
11:05:51:WU02:FS01:0x22:Completed 270000 out of 1000000 steps (27%)
11:07:42:WU02:FS01:0x22:Completed 280000 out of 1000000 steps (28%)
11:09:33:WU02:FS01:0x22:Completed 290000 out of 1000000 steps (29%)
11:11:24:WU02:FS01:0x22:Completed 300000 out of 1000000 steps (30%)
11:13:25:WU02:FS01:0x22:Completed 310000 out of 1000000 steps (31%)
11:15:16:WU02:FS01:0x22:Completed 320000 out of 1000000 steps (32%)
11:17:07:WU02:FS01:0x22:Completed 330000 out of 1000000 steps (33%)
11:18:57:WU02:FS01:0x22:Completed 340000 out of 1000000 steps (34%)
11:20:48:WU02:FS01:0x22:Completed 350000 out of 1000000 steps (35%)
11:22:50:WU02:FS01:0x22:Completed 360000 out of 1000000 steps (36%)
11:24:41:WU02:FS01:0x22:Completed 370000 out of 1000000 steps (37%)
11:26:31:WU02:FS01:0x22:Completed 380000 out of 1000000 steps (38%)
11:26:55:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
11:26:55:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
11:26:55:WU00:FS01:Connecting to 155.247.164.213:8080
11:26:56:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
11:26:56:WU00:FS01:Trying to send results to collection server
11:26:56:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
11:26:56:WU00:FS01:Connecting to 155.247.164.214:8080
11:26:57:ERROR:WU00:FS01:Exception: Transfer failed
11:28:19:WU02:FS01:0x22:Completed 390000 out of 1000000 steps (39%)
11:30:08:WU02:FS01:0x22:Completed 400000 out of 1000000 steps (40%)
11:32:00:WU02:FS01:0x22:Completed 410000 out of 1000000 steps (41%)
11:33:48:WU02:FS01:0x22:Completed 420000 out of 1000000 steps (42%)
11:35:37:WU02:FS01:0x22:Completed 430000 out of 1000000 steps (43%)
11:37:25:WU02:FS01:0x22:Completed 440000 out of 1000000 steps (44%)
11:39:13:WU02:FS01:0x22:Completed 450000 out of 1000000 steps (45%)
11:41:06:WU02:FS01:0x22:Completed 460000 out of 1000000 steps (46%)
11:42:54:WU02:FS01:0x22:Completed 470000 out of 1000000 steps (47%)
11:44:42:WU02:FS01:0x22:Completed 480000 out of 1000000 steps (48%)
11:46:31:WU02:FS01:0x22:Completed 490000 out of 1000000 steps (49%)
11:48:21:WU02:FS01:0x22:Completed 500000 out of 1000000 steps (50%)
11:50:17:WU02:FS01:0x22:Completed 510000 out of 1000000 steps (51%)
11:52:10:WU02:FS01:0x22:Completed 520000 out of 1000000 steps (52%)
11:54:02:WU02:FS01:0x22:Completed 530000 out of 1000000 steps (53%)
11:55:54:WU02:FS01:0x22:Completed 540000 out of 1000000 steps (54%)
11:57:46:WU02:FS01:0x22:Completed 550000 out of 1000000 steps (55%)
11:59:43:WU02:FS01:0x22:Completed 560000 out of 1000000 steps (56%)
12:01:35:WU02:FS01:0x22:Completed 570000 out of 1000000 steps (57%)
12:03:27:WU02:FS01:0x22:Completed 580000 out of 1000000 steps (58%)
12:05:20:WU02:FS01:0x22:Completed 590000 out of 1000000 steps (59%)
12:07:12:WU02:FS01:0x22:Completed 600000 out of 1000000 steps (60%)
12:09:08:WU02:FS01:0x22:Completed 610000 out of 1000000 steps (61%)
12:11:01:WU02:FS01:0x22:Completed 620000 out of 1000000 steps (62%)
12:12:53:WU02:FS01:0x22:Completed 630000 out of 1000000 steps (63%)
12:13:54:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
12:13:54:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
12:13:54:WU00:FS01:Connecting to 155.247.164.213:8080
12:13:55:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
12:13:55:WU00:FS01:Trying to send results to collection server
12:13:55:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
12:13:55:WU00:FS01:Connecting to 155.247.164.214:8080
12:13:55:ERROR:WU00:FS01:Exception: Transfer failed
12:14:46:WU02:FS01:0x22:Completed 640000 out of 1000000 steps (64%)
12:16:40:WU02:FS01:0x22:Completed 650000 out of 1000000 steps (65%)
12:18:43:WU02:FS01:0x22:Completed 660000 out of 1000000 steps (66%)
12:20:37:WU02:FS01:0x22:Completed 670000 out of 1000000 steps (67%)
12:22:31:WU02:FS01:0x22:Completed 680000 out of 1000000 steps (68%)
12:24:25:WU02:FS01:0x22:Completed 690000 out of 1000000 steps (69%)
12:26:19:WU02:FS01:0x22:Completed 700000 out of 1000000 steps (70%)
12:28:22:WU02:FS01:0x22:Completed 710000 out of 1000000 steps (71%)
12:30:16:WU02:FS01:0x22:Completed 720000 out of 1000000 steps (72%)
12:32:10:WU02:FS01:0x22:Completed 730000 out of 1000000 steps (73%)
12:34:04:WU02:FS01:0x22:Completed 740000 out of 1000000 steps (74%)
12:35:58:WU02:FS01:0x22:Completed 750000 out of 1000000 steps (75%)
12:38:01:WU02:FS01:0x22:Completed 760000 out of 1000000 steps (76%)
12:39:55:WU02:FS01:0x22:Completed 770000 out of 1000000 steps (77%)
12:41:49:WU02:FS01:0x22:Completed 780000 out of 1000000 steps (78%)
12:43:43:WU02:FS01:0x22:Completed 790000 out of 1000000 steps (79%)
12:45:37:WU02:FS01:0x22:Completed 800000 out of 1000000 steps (80%)
12:47:41:WU02:FS01:0x22:Completed 810000 out of 1000000 steps (81%)
12:49:35:WU02:FS01:0x22:Completed 820000 out of 1000000 steps (82%)
12:51:29:WU02:FS01:0x22:Completed 830000 out of 1000000 steps (83%)
12:53:22:WU02:FS01:0x22:Completed 840000 out of 1000000 steps (84%)
12:55:16:WU02:FS01:0x22:Completed 850000 out of 1000000 steps (85%)
12:57:20:WU02:FS01:0x22:Completed 860000 out of 1000000 steps (86%)
12:59:14:WU02:FS01:0x22:Completed 870000 out of 1000000 steps (87%)
13:01:08:WU02:FS01:0x22:Completed 880000 out of 1000000 steps (88%)
13:03:02:WU02:FS01:0x22:Completed 890000 out of 1000000 steps (89%)
13:04:55:WU02:FS01:0x22:Completed 900000 out of 1000000 steps (90%)
13:06:59:WU02:FS01:0x22:Completed 910000 out of 1000000 steps (91%)
13:08:53:WU02:FS01:0x22:Completed 920000 out of 1000000 steps (92%)
13:10:46:WU02:FS01:0x22:Completed 930000 out of 1000000 steps (93%)
13:12:40:WU02:FS01:0x22:Completed 940000 out of 1000000 steps (94%)
13:14:34:WU02:FS01:0x22:Completed 950000 out of 1000000 steps (95%)
13:16:37:WU02:FS01:0x22:Completed 960000 out of 1000000 steps (96%)
13:18:31:WU02:FS01:0x22:Completed 970000 out of 1000000 steps (97%)
13:20:25:WU02:FS01:0x22:Completed 980000 out of 1000000 steps (98%)
13:22:19:WU02:FS01:0x22:Completed 990000 out of 1000000 steps (99%)
13:22:19:WU01:FS01:Connecting to 65.254.110.245:8080
13:22:19:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
13:22:19:WU01:FS01:Connecting to 18.218.241.186:80
13:22:20:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
13:22:20:ERROR:WU01:FS01:Exception: Could not get an assignment
13:22:20:WU01:FS01:Connecting to 65.254.110.245:8080
13:22:21:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
13:22:21:WU01:FS01:Connecting to 18.218.241.186:80
13:22:21:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
13:22:21:ERROR:WU01:FS01:Exception: Could not get an assignment
13:23:20:WU01:FS01:Connecting to 65.254.110.245:8080
13:23:21:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
13:23:21:WU01:FS01:Connecting to 18.218.241.186:80
13:23:21:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
13:23:21:ERROR:WU01:FS01:Exception: Could not get an assignment
13:24:12:WU02:FS01:0x22:Completed 1000000 out of 1000000 steps (100%)
13:24:22:WU02:FS01:0x22:Saving result file ..\logfile_01.txt
13:24:22:WU02:FS01:0x22:Saving result file checkpointState.xml
13:24:30:WU02:FS01:0x22:Saving result file checkpt.crc
13:24:30:WU02:FS01:0x22:Saving result file positions.xtc
13:24:34:WU02:FS01:0x22:Saving result file science.log
13:24:34:WU02:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
13:24:35:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
13:24:35:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11753 run:0 clone:3574 gen:1 core:0x22 unit:0x000000029bf7a4d55e6d76caa76041b9
13:24:35:WU02:FS01:Uploading 21.92MiB to 155.247.164.213
13:24:35:WU02:FS01:Connecting to 155.247.164.213:8080
13:24:41:WU02:FS01:Upload complete
13:24:41:WU02:FS01:Server responded WORK_ACK (400)
13:24:41:WU02:FS01:Final credit estimate, 105000.00 points
13:24:41:WU02:FS01:Cleaning up


Note how between receiving the WU (11753) from 213 and successfully uploading the result, a previous WU (11758) fails to upload to the very same server (213) 8 times.

It seems unlikely that this is a simple overload issue, rather I suspect the servers were rebooted/reconfigured after sending out a previous WU, and now no longer accepts any submissions from those.

edit: it seems many of us are also specifically having issues with WU 11758 and 213/214?


I've also been monitoring the WUs that I've failed to upload, and I realize that the first one I got (that has by now expired for me), was assigned around the 14th of March (expired 23rd, and I made the quoted post below on 16th), but according to the WU page it was assigned to at least four other donors well after I had already been assigned to it: https://apps.foldingathome.org/wu#proje ... 1756&gen=0

Besides possibly there being an issue with upload size, this seems to indicate that the assignments weren't recorded/saved properly by the WS? Or are they allowed to assign WUs to more than one donor at a time?

alxbelu wrote:One of my machines has been trying to submit a WU for 11758 for over 24hrs now (72 attempts); during Sunday I noted that the servers (213 & 214) were mostly down, but as of this morning they seem to be up according to the server status page, yet I am still getting this (UTC time):
Code: Select all
07:48:52:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
07:48:52:WU00:FS01:Connecting to 155.247.164.213:8080
07:48:52:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
07:48:52:WU00:FS01:Trying to send results to collection server
07:48:52:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
07:48:52:WU00:FS01:Connecting to 155.247.164.214:8080
07:48:56:ERROR:WU00:FS01:Exception: Transfer failed
07:53:06:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
07:53:06:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
07:53:06:WU00:FS01:Connecting to 155.247.164.213:8080
07:53:07:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
07:53:07:WU00:FS01:Trying to send results to collection server
07:53:07:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
07:53:07:WU00:FS01:Connecting to 155.247.164.214:8080
07:53:07:ERROR:WU00:FS01:Exception: Transfer failed
07:59:58:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
07:59:58:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
07:59:58:WU00:FS01:Connecting to 155.247.164.213:8080
07:59:58:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
07:59:58:WU00:FS01:Trying to send results to collection server
07:59:58:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
07:59:58:WU00:FS01:Connecting to 155.247.164.214:8080
07:59:59:ERROR:WU00:FS01:Exception: Transfer failed
08:11:03:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
08:11:03:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
08:11:03:WU00:FS01:Connecting to 155.247.164.213:8080
08:11:04:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
08:11:04:WU00:FS01:Trying to send results to collection server
08:11:04:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
08:11:04:WU00:FS01:Connecting to 155.247.164.214:8080
08:11:04:ERROR:WU00:FS01:Exception: Transfer failed
08:29:00:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:1756 gen:0 core:0x22 unit:0x000000009bf7a4d55e6d771303ec7ef7
08:29:00:WU00:FS01:Uploading 55.24MiB to 155.247.164.213
08:29:00:WU00:FS01:Connecting to 155.247.164.213:8080
08:29:01:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
08:29:01:WU00:FS01:Trying to send results to collection server
08:29:01:WU00:FS01:Uploading 55.24MiB to 155.247.164.214
08:29:01:WU00:FS01:Connecting to 155.247.164.214:8080
08:29:01:ERROR:WU00:FS01:Exception: Transfer failed


I've reset the retry timer multiple times as it has extended well beyond 1hr (log indicates it's been up over 2hrs).

The same machine (and folding slot) is proceeding and has completed multiple other WUs meanwhile trying to send this though, so it's not blocking or anything, I'm just trying to figure out why it fails to submit the work even now when the servers are claimed to be up.

edit: corrected WU-link
Last edited by alxbelu on Wed Mar 25, 2020 1:32 pm, edited 1 time in total.
alxbelu
 
Posts: 109
Joined: Sat Mar 14, 2020 7:28 pm

Re: Send Errors - 155.247.164.213 & .214

Postby davidcoton » Wed Mar 25, 2020 12:29 pm

WUs should only be re-issued when the earlier issue is not returned by the deadline. At that point the client that is unable to upload should delete the failed upload.
I don't think this issue has been fully understood yet. It does not affect the whole project, or other projects on the same server.
I'm not going to speculate about what may be wrong, I don't know the server code at all so it would be pointless.
It is reasonable to think that one possibility is that server data was incomplete at some point when it went down, but the system design does not stop uploads of unknown WUs, it just rejects the data after it is correctly received. My guess is that such a system is intended to prevent corrupt data from blocking good uploads.
davidcoton
 
Posts: 1102
Joined: Wed Nov 05, 2008 4:19 pm
Location: Cambridge, UK

Re: Send Errors - 155.247.164.213 & .214

Postby vnicolici » Wed Mar 25, 2020 2:17 pm

As to why the issue is apparently so hard to fix, I'm guessing the logging on the servers might be inadequate. Which wouldn't surprise me at all after seeing the poor logging on the clients. Not even logging the error information that the servers are sending them.

So probably, just like the clients, the servers don't log enough details when this particular issue occurs. And I assume the issue is, as Wireshark captures indicated, that the results are too large and exceed a configured limit.

So, I asked previously if the people responsible for those servers/projects investigated the potential issue of a configured size limit for the results.

Seeing the lack of progress in fixing this, I have to ask, have they even been made aware of this potential cause? That it could be a result size limit causing the problem?

Honestly, if they don't have enough information, because their logs do not contain relevant information and/or the information from this forum about the potential result size limit was not forwarded to them, then I can understand why the issue has still not been fixed.

So far I had 2 units with the issue that were lost because they exceeded the deadline.

Whatever the reason this is taking so long to fix, if it happens a third time I'll have to block those servers in my firewall until the issue is confirmed fixed, so that I don't get more units from those servers. Sorry, but I don't see any other way to prevent wasting time and increasing my electric bill doing useless work.
vnicolici
 
Posts: 15
Joined: Sun Mar 15, 2020 1:10 am

Re: Send Errors - 155.247.164.213 & .214

Postby vangli » Wed Mar 25, 2020 2:21 pm

Well, wireshark analyzes says that you got a 413 HTTP_REQUEST_ENTITY_TOO_LARGE at the clients HTTP POST message telling the size of the file it will send, those 55MBytes. Some clients have time to send some data before they got the denial from the server, and the disconnect. If this denial is a false message hiding some other error, its very bad programming or configuration leaving users out in the dark. According to the RFC then meaning of 413 is "413 Payload Too Large (RFC 7231) - The request is larger than the server is willing or able to process. Previously called "Request Entity Too Large".

On normal webservers, this is an easy reconfiguration, often not even requiring a restart, only a reload of configuration file. However, not knowing their software or backend of the folding servers, this may be more complicated due to other limitations in the processes behind. AFIK the moderators is in contact with the team, and I am certain news will come at the moment they got better information. Information that ideally already should been given.

Happy folding for the other WU's. :)

Editet: Got a big file just uploade to another CS
13:51:25:WU02:FS01:Uploading 55.24MiB to 13.90.152.57
13:51:25:WU02:FS01:Connecting to 13.90.152.57:8080
.
.
13:58:26:WU02:FS01:Upload 99.68%
13:58:29:WU02:FS01:Upload complete
13:58:29:WU02:FS01:Server responded WORK_ACK (400)
Regards
Bent Vangli, Oslo, Norway
vangli
 
Posts: 12
Joined: Thu Mar 19, 2020 11:35 am

Re: Send Errors - 155.247.164.213 & .214

Postby frest1 » Wed Mar 25, 2020 3:02 pm

Just came here to confirm some issues with these servers ongoing at the moment. They have persisted for at least 24 hours now:

Code: Select all
13:53:30:WU00:FS00:Uploading 55.24MiB to 155.247.164.213
13:53:30:WU00:FS00:Connecting to 155.247.164.213:8080
13:53:51:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
13:53:51:WU00:FS00:Connecting to 155.247.164.213:80
13:53:51:WU00:FS00:Upload 0.11%
13:53:51:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
13:53:51:WU00:FS00:Trying to send results to collection server
13:53:51:WU00:FS00:Uploading 55.24MiB to 155.247.164.214
13:53:51:WU00:FS00:Connecting to 155.247.164.214:8080
13:54:12:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
13:54:12:WU00:FS00:Connecting to 155.247.164.214:80
13:54:12:WU00:FS00:Upload 0.11%
13:54:13:ERROR:WU00:FS00:Exception: Transfer failed
13:56:07:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:11758 run:0 clone:248 gen:0 core:0x22 unit:0x0000000b9bf7a4d55e6d770fce597dbe
13:56:07:WU00:FS00:Uploading 55.24MiB to 155.247.164.213
13:56:07:WU00:FS00:Connecting to 155.247.164.213:8080
13:56:28:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
13:56:28:WU00:FS00:Connecting to 155.247.164.213:80
13:56:28:WU00:FS00:Upload 0.11%
13:56:29:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
13:56:29:WU00:FS00:Trying to send results to collection server
13:56:29:WU00:FS00:Uploading 55.24MiB to 155.247.164.214
13:56:29:WU00:FS00:Connecting to 155.247.164.214:8080
13:56:50:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
13:56:50:WU00:FS00:Connecting to 155.247.164.214:80
13:56:50:WU00:FS00:Upload 0.11%
13:56:50:ERROR:WU00:FS00:Exception: Transfer failed
frest1
 
Posts: 15
Joined: Wed Mar 25, 2020 2:57 pm

Re: Send Errors - 155.247.164.213 & .214

Postby qk7b » Wed Mar 25, 2020 5:20 pm

Same error here, just finished a WU for 11758 and got exact the same output
155.247.164.214 seems in real pain, good luck guys.

As said earlier by alxbelu, looks like we're more than one on this specific PRCG (I don't quite understand how it works for now sorry)

Mine is PRCG 11758 (0, 526, 0)

Looks like there are already 2 Faulty WU completed, assigned on 2020-03-23 where mine was on 2020-03-20 .
At this point I don't know if I should let it go, or if I should clear it in some way.

The expiration date is on 2020-03-28 anyway, so I can still wait till this date.


EDIT: Added info on PRCG
Image
qk7b
 
Posts: 4
Joined: Sat Mar 21, 2020 1:12 pm
Location: Lille, France

Re: Send Errors - 155.247.164.213 & .214

Postby davidcoton » Wed Mar 25, 2020 6:08 pm

Unreturned WUs should not be reassigned until they expire. Faulty WUs that are returned are reassigned, up to (IIRC) three failures.
If there is clear evidence that unreturned WUs are being reassigned before they expire, we need to collect enough info to determine the extent/pattern of this problem and then get the team further involved.
davidcoton
 
Posts: 1102
Joined: Wed Nov 05, 2008 4:19 pm
Location: Cambridge, UK

Re: Send Errors - 155.247.164.213 & .214

Postby sswilson » Wed Mar 25, 2020 7:57 pm

My .214 unit cleared... haven't gone back through the log yet to see if it uploaded or timed out.....

edit: Pretty sure I tracked it down in the log.... it took a total of 36 minutes to upload the results in .15 - .50% increments which would suggest the server was being absolutely hammered with uploads..... :)

Took up a total of 276 individual log entries to complete.

High praise to the techs who finally got this resolved. :)
sswilson
 
Posts: 90
Joined: Mon Dec 17, 2007 1:34 am
Location: Moncton, New Brunswick, Canada

Re: Send Errors - 155.247.164.213 & .214

Postby alxbelu » Wed Mar 25, 2020 8:40 pm

sswilson wrote:My .214 unit cleared... haven't gone back through the log yet to see if it uploaded or timed out.....

edit: Pretty sure I tracked it down in the log.... it took a total of 36 minutes to upload the results in .15 - .50% increments which would suggest the server was being absolutely hammered with uploads..... :)

Took up a total of 276 individual log entries to complete.

High praise to the techs who finally got this resolved. :)


What was the PRCG? I checked the one you mentioned in this thread (11758 (0, 2135, 0)), which seems to still not have been successfully uploaded: https://apps.foldingathome.org/wu#proje ... 2135&gen=0
alxbelu
 
Posts: 109
Joined: Sat Mar 14, 2020 7:28 pm

PreviousNext

Return to Issues with a specific server

Who is online

Users browsing this forum: Colonel_Klink and 3 guests

cron