Rejection due to transmission failure?

Moderators: Site Moderators, PandeGroup

Rejection due to transmission failure?

Postby AJMSmith » Fri Feb 10, 2017 2:34 pm

Recently I got this (restricted to WU00:FS01)
Code: Select all
10:47:38:WU00:FS01:0x21:Completed 6250000 out of 6250000 steps (100%)
10:47:40:WU00:FS01:0x21:Saving result file logfile_01.txt
10:47:40:WU00:FS01:0x21:Saving result file checkpointState.xml
10:47:40:WU00:FS01:0x21:Saving result file checkpt.crc
10:47:40:WU00:FS01:0x21:Saving result file log.txt
10:47:40:WU00:FS01:0x21:Saving result file positions.xtc
10:47:40:WU00:FS01:0x21:Folding@home Core Shutdown: FINISHED_UNIT
10:47:40:WU00:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
10:47:40:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:9431 run:684 clone:0 gen:13 core:0x21 unit:0x00000014ab436c9d586fdd39dc8c25ac
10:47:40:WU00:FS01:Uploading 13.61MiB to 171.67.108.157
10:47:40:WU00:FS01:Connecting to 171.67.108.157:8080
10:47:46:WU00:FS01:Upload 3.67%
10:47:52:WU00:FS01:Upload 8.26%
10:47:58:WU00:FS01:Upload 12.85%
10:48:04:WU00:FS01:Upload 17.45%
10:48:10:WU00:FS01:Upload 22.04%
10:48:16:WU00:FS01:Upload 26.17%
10:48:48:WU00:FS01:Upload 30.30%
10:48:48:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
10:48:48:WU00:FS01:Trying to send results to collection server
10:48:48:WU00:FS01:Uploading 13.61MiB to 171.67.108.46
10:48:48:WU00:FS01:Connecting to 171.67.108.46:8080
10:48:54:WU00:FS01:Upload 3.67%
10:49:00:WU00:FS01:Upload 8.26%
10:49:06:WU00:FS01:Upload 12.40%
10:49:12:WU00:FS01:Upload 16.07%
10:49:18:WU00:FS01:Upload 20.20%
10:49:24:WU00:FS01:Upload 24.79%
10:49:30:WU00:FS01:Upload 29.38%
10:49:36:WU00:FS01:Upload 33.97%
10:49:42:WU00:FS01:Upload 38.56%
10:49:48:WU00:FS01:Upload 43.15%
10:49:54:WU00:FS01:Upload 47.29%
10:50:00:WU00:FS01:Upload 51.88%
10:50:06:WU00:FS01:Upload 56.47%
10:50:12:WU00:FS01:Upload 61.06%
10:50:18:WU00:FS01:Upload 65.65%
10:50:24:WU00:FS01:Upload 70.24%
10:50:30:WU00:FS01:Upload 74.83%
10:50:36:WU00:FS01:Upload 78.96%
10:50:42:WU00:FS01:Upload 83.56%
10:50:48:WU00:FS01:Upload 88.15%
10:50:54:WU00:FS01:Upload 92.74%
10:51:00:WU00:FS01:Upload 97.33%
10:51:05:WU00:FS01:Upload complete
10:51:05:WU00:FS01:Server responded WORK_QUIT (404)
10:51:05:WARNING:WU00:FS01:Server did not like results, dumping
10:51:05:WU00:FS01:Cleaning up

I know I have occasional glitches in my intranet but is that the cause of the unit being rejected? The "WORK_QUIT" error looks suspicious given that I have a previous "FINISHED_UNIT" generated locally.
AJMSmith
 
Posts: 63
Joined: Tue Jul 01, 2008 1:17 am
Location: Greater London, UK

Re: Rejection due to transmission failure?

Postby bruce » Fri Feb 10, 2017 7:07 pm

First, the FINISHED_UNIT status cannot guarantee that the WU will arrive intact and be accepted by the server.)

Second, you have encountered a very unusual bug. I've been around long enough to have seen it a (very) few times but I've never been able to convince Development of what/why that happens. The log you posted makes it very clear.
:!:Thank you :!:

I've opened a new bug ticket and maybe this time they can fix it.
https://github.com/FoldingAtHome/fah-issues/issues/1182

I'm sorry you lost the WU. It has been reissued and somebody else is working on it.
bruce
 
Posts: 22739
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Rejection due to transmission failure?

Postby AJMSmith » Sat Feb 11, 2017 12:46 am

That's exactly why I posted ... it looked just basically wrong and I could not make sense of it.

I note there has been a comment on your ticket.

A
AJMSmith
 
Posts: 63
Joined: Tue Jul 01, 2008 1:17 am
Location: Greater London, UK

Re: Rejection due to transmission failure?

Postby werty316 » Sat Feb 11, 2017 5:54 pm

I can only fold on the weekends and I've had this exact same problem maybe 2 or 3 times now and everytime it does this the client just sits there. FAHControl says it is downloading a new WU but the log window doesn't show it is. My log file below. Not sure if this is a coincidence with the new core that was recently released. I have changed nothing on my computer software or hardware wise.
I have "next-unit-percentage" set to "100" but as you can see a new WU isn't being downloaded.

Code: Select all
09:27:11:WU00:FS00:Assigned to work server 171.67.108.157
09:27:11:WU01:FS00:Connecting to 140.163.4.232:8080
09:27:11:WU00:FS00:Requesting new work unit for slot 00: READY gpu:1:"GP104 [GeForce GTX 1070]" from 171.67.108.157
09:27:11:WU00:FS00:Connecting to 171.67.108.157:8080
09:27:11:WU00:FS00:Downloading 5.19MiB
09:27:17:WU00:FS00:Download 57.83%
09:27:19:WU00:FS00:Download complete
09:27:19:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9415 run:26 clone:0 gen:77 core:0x21 unit:0x0000005fab436c9d585e06c9c9c73d41
09:27:19:WU00:FS00:Starting
09:27:19:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" E:/FoldingAtHome/FAHClient/cores/fahwebx.stanford.edu/cores/Win32/AMD64/NVIDIA/Fermi/beta/Core_21.fah/FahCore_21.exe -dir 00 -suffix 01 -version 702 -lifeline 760 -checkpoint 3 -gpu 0
09:27:19:WU00:FS00:Started FahCore on PID 4436
09:27:19:WU00:FS00:Core PID:3100
09:27:19:WU00:FS00:FahCore 0x21 started
09:27:20:WU00:FS00:0x21:*********************** Log Started 2017-02-11T09:27:19Z ***********************
09:27:20:WU00:FS00:0x21:Project: 9415 (Run 26, Clone 0, Gen 77)
09:27:20:WU00:FS00:0x21:Unit: 0x0000005fab436c9d585e06c9c9c73d41
09:27:20:WU00:FS00:0x21:CPU: 0x00000000000000000000000000000000
09:27:20:WU00:FS00:0x21:Machine: 0
09:27:20:WU00:FS00:0x21:Reading tar file core.xml
09:27:20:WU00:FS00:0x21:Reading tar file integrator.xml
09:27:20:WU00:FS00:0x21:Reading tar file state.xml
09:27:20:WU00:FS00:0x21:Reading tar file system.xml
09:27:20:WU00:FS00:0x21:Digital signatures verified
09:27:20:WU00:FS00:0x21:Folding@home GPU Core21 Folding@home Core
09:27:20:WU00:FS00:0x21:Version 0.0.18
09:27:21:WU00:FS00:0x21:Completed 0 out of 6250000 steps (0%)
09:27:22:WU00:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
09:28:45:WU00:FS00:0x21:Completed 62500 out of 6250000 steps (1%)
09:30:07:WU00:FS00:0x21:Completed 125000 out of 6250000 steps (2%)
09:31:04:WU01:FS00:Upload 2.40%
09:31:10:WU01:FS00:Upload 6.40%
09:31:17:WU01:FS00:Upload 10.40%
09:31:23:WU01:FS00:Upload 15.20%
09:31:29:WU01:FS00:Upload 19.20%
09:31:29:WU00:FS00:0x21:Completed 187500 out of 6250000 steps (3%)
09:31:35:WU01:FS00:Upload 23.20%
09:31:41:WU01:FS00:Upload 27.21%
09:31:50:WU01:FS00:Upload 29.61%
09:32:03:WU01:FS00:Upload 30.41%
09:32:09:WU01:FS00:Upload 34.41%
09:32:15:WU01:FS00:Upload 39.21%
09:32:21:WU01:FS00:Upload 43.21%
09:32:27:WU01:FS00:Upload 47.21%
09:32:33:WU01:FS00:Upload 51.21%
09:32:39:WU01:FS00:Upload 55.21%
09:32:46:WU01:FS00:Upload 60.01%
09:32:52:WU01:FS00:Upload 64.01%
09:32:54:WU00:FS00:0x21:Completed 250000 out of 6250000 steps (4%)
09:32:59:WU01:FS00:Upload 68.01%
09:33:05:WU01:FS00:Upload 72.81%
09:33:12:WU01:FS00:Upload 77.61%
09:33:18:WU01:FS00:Upload 82.42%
09:33:27:WU01:FS00:Upload 85.62%
09:33:33:WU01:FS00:Upload 90.42%
09:33:40:WU01:FS00:Upload 95.22%
09:33:46:WU01:FS00:Upload 99.22%
09:33:49:WU01:FS00:Upload complete
09:33:49:WU01:FS00:Server responded WORK_ACK (400)
09:33:49:WU01:FS00:Final credit estimate, 22864.00 points
09:33:49:WU01:FS00:Cleaning up
09:34:14:WU00:FS00:0x21:Completed 312500 out of 6250000 steps (5%)
09:35:35:WU00:FS00:0x21:Completed 375000 out of 6250000 steps (6%)
09:36:56:WU00:FS00:0x21:Completed 437500 out of 6250000 steps (7%)
09:38:16:WU00:FS00:0x21:Completed 500000 out of 6250000 steps (8%)
09:39:37:WU00:FS00:0x21:Completed 562500 out of 6250000 steps (9%)
09:40:37:WU00:FS00:0x21:Completed 625000 out of 6250000 steps (10%)
09:41:37:WU00:FS00:0x21:Completed 687500 out of 6250000 steps (11%)
09:42:36:WU00:FS00:0x21:Completed 750000 out of 6250000 steps (12%)
09:43:35:WU00:FS00:0x21:Completed 812500 out of 6250000 steps (13%)
09:44:35:WU00:FS00:0x21:Completed 875000 out of 6250000 steps (14%)
09:45:34:WU00:FS00:0x21:Completed 937500 out of 6250000 steps (15%)
09:46:33:WU00:FS00:0x21:Completed 1000000 out of 6250000 steps (16%)
09:47:32:WU00:FS00:0x21:Completed 1062500 out of 6250000 steps (17%)
09:48:31:WU00:FS00:0x21:Completed 1125000 out of 6250000 steps (18%)
09:49:30:WU00:FS00:0x21:Completed 1187500 out of 6250000 steps (19%)
09:50:29:WU00:FS00:0x21:Completed 1250000 out of 6250000 steps (20%)
09:51:29:WU00:FS00:0x21:Completed 1312500 out of 6250000 steps (21%)
09:52:28:WU00:FS00:0x21:Completed 1375000 out of 6250000 steps (22%)
09:53:27:WU00:FS00:0x21:Completed 1437500 out of 6250000 steps (23%)
09:54:26:WU00:FS00:0x21:Completed 1500000 out of 6250000 steps (24%)
09:55:25:WU00:FS00:0x21:Completed 1562500 out of 6250000 steps (25%)
09:56:24:WU00:FS00:0x21:Completed 1625000 out of 6250000 steps (26%)
09:57:23:WU00:FS00:0x21:Completed 1687500 out of 6250000 steps (27%)
09:58:22:WU00:FS00:0x21:Completed 1750000 out of 6250000 steps (28%)
09:59:22:WU00:FS00:0x21:Completed 1812500 out of 6250000 steps (29%)
10:00:21:WU00:FS00:0x21:Completed 1875000 out of 6250000 steps (30%)
10:01:20:WU00:FS00:0x21:Completed 1937500 out of 6250000 steps (31%)
10:02:19:WU00:FS00:0x21:Completed 2000000 out of 6250000 steps (32%)
10:03:18:WU00:FS00:0x21:Completed 2062500 out of 6250000 steps (33%)
10:04:17:WU00:FS00:0x21:Completed 2125000 out of 6250000 steps (34%)
10:05:16:WU00:FS00:0x21:Completed 2187500 out of 6250000 steps (35%)
******************************** Date: 11/02/17 ********************************
10:06:15:WU00:FS00:0x21:Completed 2250000 out of 6250000 steps (36%)
10:07:15:WU00:FS00:0x21:Completed 2312500 out of 6250000 steps (37%)
10:08:14:WU00:FS00:0x21:Completed 2375000 out of 6250000 steps (38%)
10:09:13:WU00:FS00:0x21:Completed 2437500 out of 6250000 steps (39%)
10:10:12:WU00:FS00:0x21:Completed 2500000 out of 6250000 steps (40%)
10:11:11:WU00:FS00:0x21:Completed 2562500 out of 6250000 steps (41%)
10:12:10:WU00:FS00:0x21:Completed 2625000 out of 6250000 steps (42%)
10:13:09:WU00:FS00:0x21:Completed 2687500 out of 6250000 steps (43%)
10:14:08:WU00:FS00:0x21:Completed 2750000 out of 6250000 steps (44%)
10:15:07:WU00:FS00:0x21:Completed 2812500 out of 6250000 steps (45%)
10:16:06:WU00:FS00:0x21:Completed 2875000 out of 6250000 steps (46%)
10:17:05:WU00:FS00:0x21:Completed 2937500 out of 6250000 steps (47%)
10:18:05:WU00:FS00:0x21:Completed 3000000 out of 6250000 steps (48%)
10:19:04:WU00:FS00:0x21:Completed 3062500 out of 6250000 steps (49%)
10:20:03:WU00:FS00:0x21:Completed 3125000 out of 6250000 steps (50%)
10:21:02:WU00:FS00:0x21:Completed 3187500 out of 6250000 steps (51%)
10:22:01:WU00:FS00:0x21:Completed 3250000 out of 6250000 steps (52%)
10:23:00:WU00:FS00:0x21:Completed 3312500 out of 6250000 steps (53%)
10:24:00:WU00:FS00:0x21:Completed 3375000 out of 6250000 steps (54%)
10:24:59:WU00:FS00:0x21:Completed 3437500 out of 6250000 steps (55%)
10:25:58:WU00:FS00:0x21:Completed 3500000 out of 6250000 steps (56%)
10:26:57:WU00:FS00:0x21:Completed 3562500 out of 6250000 steps (57%)
10:27:56:WU00:FS00:0x21:Completed 3625000 out of 6250000 steps (58%)
10:28:55:WU00:FS00:0x21:Completed 3687500 out of 6250000 steps (59%)
10:29:54:WU00:FS00:0x21:Completed 3750000 out of 6250000 steps (60%)
10:30:53:WU00:FS00:0x21:Completed 3812500 out of 6250000 steps (61%)
10:31:52:WU00:FS00:0x21:Completed 3875000 out of 6250000 steps (62%)
10:32:51:WU00:FS00:0x21:Completed 3937500 out of 6250000 steps (63%)
10:33:50:WU00:FS00:0x21:Completed 4000000 out of 6250000 steps (64%)
10:34:50:WU00:FS00:0x21:Completed 4062500 out of 6250000 steps (65%)
10:35:49:WU00:FS00:0x21:Completed 4125000 out of 6250000 steps (66%)
10:36:48:WU00:FS00:0x21:Completed 4187500 out of 6250000 steps (67%)
10:37:47:WU00:FS00:0x21:Completed 4250000 out of 6250000 steps (68%)
10:38:46:WU00:FS00:0x21:Completed 4312500 out of 6250000 steps (69%)
10:39:45:WU00:FS00:0x21:Completed 4375000 out of 6250000 steps (70%)
10:40:44:WU00:FS00:0x21:Completed 4437500 out of 6250000 steps (71%)
10:41:43:WU00:FS00:0x21:Completed 4500000 out of 6250000 steps (72%)
10:42:43:WU00:FS00:0x21:Completed 4562500 out of 6250000 steps (73%)
10:43:42:WU00:FS00:0x21:Completed 4625000 out of 6250000 steps (74%)
10:44:41:WU00:FS00:0x21:Completed 4687500 out of 6250000 steps (75%)
10:45:40:WU00:FS00:0x21:Completed 4750000 out of 6250000 steps (76%)
10:46:39:WU00:FS00:0x21:Completed 4812500 out of 6250000 steps (77%)
10:47:38:WU00:FS00:0x21:Completed 4875000 out of 6250000 steps (78%)
10:48:37:WU00:FS00:0x21:Completed 4937500 out of 6250000 steps (79%)
10:49:36:WU00:FS00:0x21:Completed 5000000 out of 6250000 steps (80%)
10:50:35:WU00:FS00:0x21:Completed 5062500 out of 6250000 steps (81%)
10:51:34:WU00:FS00:0x21:Completed 5125000 out of 6250000 steps (82%)
10:52:33:WU00:FS00:0x21:Completed 5187500 out of 6250000 steps (83%)
10:53:32:WU00:FS00:0x21:Completed 5250000 out of 6250000 steps (84%)
10:54:31:WU00:FS00:0x21:Completed 5312500 out of 6250000 steps (85%)
10:55:31:WU00:FS00:0x21:Completed 5375000 out of 6250000 steps (86%)
10:56:30:WU00:FS00:0x21:Completed 5437500 out of 6250000 steps (87%)
10:57:29:WU00:FS00:0x21:Completed 5500000 out of 6250000 steps (88%)
10:58:28:WU00:FS00:0x21:Completed 5562500 out of 6250000 steps (89%)
10:59:27:WU00:FS00:0x21:Completed 5625000 out of 6250000 steps (90%)
11:00:26:WU00:FS00:0x21:Completed 5687500 out of 6250000 steps (91%)
11:01:25:WU00:FS00:0x21:Completed 5750000 out of 6250000 steps (92%)
11:02:24:WU00:FS00:0x21:Completed 5812500 out of 6250000 steps (93%)
11:03:24:WU00:FS00:0x21:Completed 5875000 out of 6250000 steps (94%)
11:04:23:WU00:FS00:0x21:Completed 5937500 out of 6250000 steps (95%)
11:05:22:WU00:FS00:0x21:Completed 6000000 out of 6250000 steps (96%)
11:06:21:WU00:FS00:0x21:Completed 6062500 out of 6250000 steps (97%)
11:07:20:WU00:FS00:0x21:Completed 6125000 out of 6250000 steps (98%)
11:08:19:WU00:FS00:0x21:Completed 6187500 out of 6250000 steps (99%)
11:09:18:WU00:FS00:0x21:Completed 6250000 out of 6250000 steps (100%)
11:09:18:WU00:FS00:0x21:Saving result file logfile_01.txt
11:09:18:WU00:FS00:0x21:Saving result file checkpointState.xml
11:09:18:WU00:FS00:0x21:Saving result file checkpt.crc
11:09:18:WU00:FS00:0x21:Saving result file log.txt
11:09:18:WU00:FS00:0x21:Saving result file positions.xtc
11:09:19:WU00:FS00:0x21:Folding@home Core Shutdown: FINISHED_UNIT
11:09:19:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
11:09:19:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9415 run:26 clone:0 gen:77 core:0x21 unit:0x0000005fab436c9d585e06c9c9c73d41
11:09:19:WU00:FS00:Uploading 7.83MiB to 171.67.108.157
11:09:19:WU01:FS00:Connecting to assign-GPU.stanford.edu:80
11:09:19:WU00:FS00:Connecting to 171.67.108.157:8080
11:09:20:WU01:FS00:News:
11:09:20:WU01:FS00:Assigned to work server 171.67.108.157
11:09:20:WU01:FS00:Requesting new work unit for slot 00: READY gpu:1:"GP104 [GeForce GTX 1070]" from 171.67.108.157
11:09:20:WU01:FS00:Connecting to 171.67.108.157:8080
11:09:22:WU01:FS00:Downloading 5.16MiB
11:09:26:WU00:FS00:Upload 4.79%
11:09:28:WU01:FS00:Download 21.82%
11:09:34:WU01:FS00:Download 40.01%
11:09:40:WU01:FS00:Download 54.56%
11:09:47:WU01:FS00:Download 56.98%
11:15:43:WU00:FS00:Upload 5.59%
11:15:43:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
11:15:43:WU00:FS00:Trying to send results to collection server
11:15:43:WU00:FS00:Uploading 7.83MiB to 171.67.108.46
11:15:43:WU00:FS00:Connecting to 171.67.108.46:8080
11:15:51:WU00:FS00:Upload 5.59%
11:16:01:WU00:FS00:Upload 8.78%
11:16:07:WU00:FS00:Upload 13.57%
11:16:13:WU00:FS00:Upload 17.56%
11:16:19:WU00:FS00:Upload 22.34%
11:16:26:WU00:FS00:Upload 27.13%
11:16:32:WU00:FS00:Upload 31.92%
11:16:38:WU00:FS00:Upload 35.91%
11:16:44:WU00:FS00:Upload 39.90%
11:16:50:WU00:FS00:Upload 44.69%
11:16:56:WU00:FS00:Upload 48.68%
11:17:02:WU00:FS00:Upload 53.47%
11:17:09:WU00:FS00:Upload 58.25%
11:17:15:WU00:FS00:Upload 62.24%
11:17:21:WU00:FS00:Upload 66.23%
11:17:27:WU00:FS00:Upload 71.02%
11:17:33:WU00:FS00:Upload 75.81%
11:17:39:WU00:FS00:Upload 79.80%
11:17:45:WU00:FS00:Upload 84.59%
11:17:51:WU00:FS00:Upload 88.58%
11:17:57:WU00:FS00:Upload 93.37%
11:18:03:WU00:FS00:Upload 98.15%
11:18:07:WU00:FS00:Upload complete
11:18:07:WU00:FS00:Server responded WORK_QUIT (404)
11:18:07:WARNING:WU00:FS00:Server did not like results, dumping
11:18:07:WU00:FS00:Cleaning up


Is "11:15:43:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed" the cause? What causes this?
werty316
 
Posts: 145
Joined: Tue Feb 19, 2008 6:29 pm

Re: Rejection due to transmission failure?

Postby bruce » Sun Feb 12, 2017 3:12 am

There is a known bug in FAHClient which permanently (until it is restarted) disables one of the interfaces that it uses to upload/download WUs if certain types of internet errors happen at the "wrong time." Does restarting FAHClient reestablish a fully functioning FAH's internet connection?
bruce
 
Posts: 22739
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Rejection due to transmission failure?

Postby AJMSmith » Mon Feb 13, 2017 1:15 am

I've not restarted but units are getting sent ... it can't be that problem ... the rejected units (and I have more than one recently) are all rejected after a failed send.
AJMSmith
 
Posts: 63
Joined: Tue Jul 01, 2008 1:17 am
Location: Greater London, UK

Re: Rejection due to transmission failure?

Postby bruce » Mon Feb 13, 2017 3:03 am

FAHClient uses at least two upload connections. If one has been hung (for a long time) and the other is working, wouldn't you have the exact symptoms you're describing. Humor me and restart the client.
bruce
 
Posts: 22739
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Rejection due to transmission failure?

Postby AJMSmith » Tue Feb 14, 2017 4:03 am

I have restarted ... and had a rejection after a failed upload ... it's odd though that it's only the GPU units that have the problem ... or so it seems.
AJMSmith
 
Posts: 63
Joined: Tue Jul 01, 2008 1:17 am
Location: Greater London, UK

Re: Rejection due to transmission failure?

Postby bruce » Tue Feb 14, 2017 5:54 am

Post the segment of your log showing the rejection.
bruce
 
Posts: 22739
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Rejection due to transmission failure?

Postby AJMSmith » Wed Feb 15, 2017 1:38 am

The scenario is essentially the same ... transmission failure followed by rejection on send to alternate.

Code: Select all
02:45:31:WU02:FS01:0x21:Completed 2500000 out of 2500000 steps (100%)
02:45:33:WU02:FS01:0x21:Saving result file logfile_01.txt
02:45:33:WU02:FS01:0x21:Saving result file checkpointState.xml
02:45:33:WU02:FS01:0x21:Saving result file checkpt.crc
02:45:33:WU02:FS01:0x21:Saving result file log.txt
02:45:33:WU02:FS01:0x21:Saving result file positions.xtc
02:45:33:WU02:FS01:0x21:Folding@home Core Shutdown: FINISHED_UNIT
02:45:34:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
02:45:34:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:9177 run:0 clone:6 gen:346 core:0x21 unit:0x0000020aab436c6957b24c29241bc372
02:45:34:WU02:FS01:Uploading 12.16MiB to 171.67.108.105
02:45:34:WU02:FS01:Connecting to 171.67.108.105:8080
02:45:40:WU02:FS01:Upload 4.63%
02:46:06:WU02:FS01:Upload 5.14%
02:46:06:WARNING:WU02:FS01:Exception: Failed to send results to work server: Transfer failed
02:46:06:WU02:FS01:Trying to send results to collection server
02:46:06:WU02:FS01:Uploading 12.16MiB to 171.67.108.46
02:46:06:WU02:FS01:Connecting to 171.67.108.46:8080
02:46:12:WU02:FS01:Upload 4.11%
02:46:18:WU02:FS01:Upload 9.25%
02:46:24:WU02:FS01:Upload 14.40%
02:46:30:WU02:FS01:Upload 19.02%
02:46:36:WU02:FS01:Upload 23.65%
02:46:42:WU02:FS01:Upload 28.28%
02:46:48:WU02:FS01:Upload 33.93%
02:46:54:WU02:FS01:Upload 38.56%
02:47:00:WU02:FS01:Upload 43.19%
02:47:06:WU02:FS01:Upload 47.81%
02:47:12:WU02:FS01:Upload 51.93%
02:47:18:WU02:FS01:Upload 56.56%
02:47:24:WU02:FS01:Upload 61.70%
02:47:30:WU02:FS01:Upload 66.84%
02:47:36:WU02:FS01:Upload 71.98%
02:47:42:WU02:FS01:Upload 77.12%
02:47:48:WU02:FS01:Upload 81.75%
02:47:54:WU02:FS01:Upload 86.89%
02:48:00:WU02:FS01:Upload 92.03%
02:48:06:WU02:FS01:Upload 97.17%
02:48:11:WU02:FS01:Upload complete
02:48:11:WU02:FS01:Server responded WORK_QUIT (404)
02:48:11:WARNING:WU02:FS01:Server did not like results, dumping
02:48:11:WU02:FS01:Cleaning up
AJMSmith
 
Posts: 63
Joined: Tue Jul 01, 2008 1:17 am
Location: Greater London, UK

Re: Rejection due to transmission failure?

Postby bruce » Wed Feb 15, 2017 8:36 am

Make a copy of the work folder for a WU that's AFTER the FAHCore message FINISHED. (work\02\wuresults_01.dat in that case). You may need to temporarily disable your router to preserve that files.
Reestablish your internet connection and let the WU try to upload. If you get the same results as shown above, send me the results file. (Contact me first.)
If the WU uploads successfully, move on until you capture one that fails like that one did.

Is this still an accurate description of your configuration?
bruce
 
Posts: 22739
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Rejection due to transmission failure?

Postby soa_rru » Fri Mar 03, 2017 7:36 pm

This is exactly what i'm experiencing again at the moment (had this problem back in November)

Looking at AJMSmith logs and its the same server that's dumping mine as well... 171.67.108.46

Lost a shedload of points over the last week because of this :(
soa_rru
 
Posts: 44
Joined: Sun Dec 16, 2007 11:21 am
Location: UK

Re: Rejection due to transmission failure?

Postby jay-mitchell » Sat Sep 30, 2017 1:12 pm

This is been happening to me on a few of my machines so i've tried to reproduce it. First, on a hunch that this is not even related to the connection interruptions, I blocked the work server with iptables so fahclient would upload the result to the collection server, which responded WORK_QUIT. It seems to me that collection server 171.67.108.46 is rejecting most if not all results that are assigned by work server 171.67.108.157 i've managed to collect a wuresults_01.dat by bringing my network interface down until after the WU finished so I could make a copy. I have uploaded it at :

https://www.dropbox.com/s/vufsr5j5qorenab/wuresults_01.dat?dl=0

Here are the relevant log entries :

Code: Select all
12:41:09:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9414 run:1627 clone:0 gen:91 core:0x21 unit:0x0000006dab436c9d585e069c073aef0e
12:41:09:WU01:FS00:Uploading 7.77MiB to 171.67.108.157
12:41:09:WU01:FS00:Connecting to 171.67.108.157:8080
12:41:10:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80
12:41:10:WU01:FS00:Connecting to 171.67.108.157:80
12:41:11:WARNING:WU01:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.157:80: Connection refused
12:41:11:WU01:FS00:Trying to send results to collection server
12:41:11:WU01:FS00:Uploading 7.77MiB to 171.67.108.46
12:41:11:WU01:FS00:Connecting to 171.67.108.46:8080
12:41:17:WU01:FS00:Upload 21.72%
12:41:23:WU01:FS00:Upload 41.03%
12:41:29:WU01:FS00:Upload 85.27%
12:41:31:WU01:FS00:Upload complete
12:41:31:WU01:FS00:Server responded WORK_QUIT (404)
12:41:31:WARNING:WU01:FS00:Server did not like results, dumping
12:41:31:WU01:FS00:Cleaning up


I've only seen this happen to WUs from work server 171.67.108.157 and 140.163.4.23*. I have seen WUs from work server 171.67.108.160 and 171.64.65.84 always successfully return to their collection servers when the work server was offline or the connection was interrupted.

If you would like any other wuresults_01.dat captured or even a wireshark capture just let me know.

In the event that any of the affected work servers went down for a extended period of time, it could result in huge amounts of lost work.

I have also posted this on the relevant bug ticket.
https://github.com/FoldingAtHome/fah-issues/issues/1182
jay-mitchell
 
Posts: 2
Joined: Fri Sep 29, 2017 9:07 pm

Re: Rejection due to transmission failure?

Postby jay-mitchell » Wed Oct 04, 2017 11:52 pm

Updating the CS appears to have solved this issue as I am no longer able to reproduce it. Below is a log of a WU from WS 171.67.108.157 being accepted by CS 171.67.108.46

Code: Select all
23:44:16:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9414 run:2024 clone:0 gen:247 core:0x21 unit:0x00000121ab436c9d585e069f681e8622
23:44:16:WU00:FS00:Uploading 7.75MiB to 171.67.108.157
23:44:16:WU00:FS00:Connecting to 171.67.108.157:8080
23:44:17:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
23:44:17:WU00:FS00:Connecting to 171.67.108.157:80
23:44:19:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 171.67.108.157:80: Connection refused
23:44:19:WU00:FS00:Trying to send results to collection server
23:44:19:WU00:FS00:Uploading 7.75MiB to 171.67.108.46
23:44:19:WU00:FS00:Connecting to 171.67.108.46:8080
23:44:25:WU00:FS00:Upload 4.84%
23:44:31:WU00:FS00:Upload 23.39%
23:44:37:WU00:FS00:Upload 83.88%
23:44:39:WU00:FS00:Upload complete
23:44:39:WU00:FS00:Server responded WORK_ACK (400)
23:44:39:WU00:FS00:Final credit estimate, 35037.00 points
23:44:39:WU00:FS00:Cleaning up
jay-mitchell
 
Posts: 2
Joined: Fri Sep 29, 2017 9:07 pm


Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 3 guests

cron