Work fails to download more WU after finishing uploading.

Moderators: Site Moderators, FAHC Science Team

Post Reply
Sandman192
Posts: 62
Joined: Fri Mar 07, 2008 12:40 am
Hardware configuration: SW: Windows 10 Professional 64-bit - Video Drivers v516.01
HW: ASUS - 1 EVGA GeForce 1080Ti 11GB RAM - 1 GeForce EVGA 980 4GB RAM - CPU i7-5930K 3.5GHz boost to 4GHz, 32GB of RAM

FAH v7.6.21

Work fails to download more WU after finishing uploading.

Post by Sandman192 »

The problem goes away after every restart of my computer.

I have not seen this problem on older versions of F@H. F@H v 7.6.13
Hasn't download for 2 days with no work running. Sometimes for CPU sometimes for GPU work. This has happened on 3 of my computers. 1 of which I've stopped using all to gether.
There's over a hundred of these saying the same thing in a row. "10053: An established connection was aborted by the software in your host machine."

Code: Select all

10:03:23:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
10:03:48:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
10:04:13:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
10:04:15:WU01:FS01:0x22:Watchdog shutdown failed, hard shutdown triggered
10:04:38:WARNING:WU01:FS01:FahCore returned an unknown error code which probably indicates that it crashed
10:04:38:WARNING:WU01:FS01:FahCore returned: WU_STALLED (127 = 0x7f)

Code: Select all

********************************************************************************
10:04:40:WU01:FS01:0x22:Project: 16918 (Run 112, Clone 12, Gen 13)
10:04:40:WU01:FS01:0x22:Unit: 0x000000160002894c5f17618a4e2d8fe9
10:04:40:WU01:FS01:0x22:Digital signatures verified
10:04:40:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
10:04:40:WU01:FS01:0x22:Version 0.0.11
10:04:40:WU01:FS01:0x22:  Checkpoint write interval: 100000 steps (2%) [50 total]
10:04:40:WU01:FS01:0x22:  JSON viewer frame write interval: 50000 steps (1%) [100 total]
10:04:40:WU01:FS01:0x22:  XTC frame write interval: 250000 steps (5%) [20 total]
10:04:40:WU01:FS01:0x22:  Global context and integrator variables write interval: disabled
10:05:15:ERROR:Send error: 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
10:14:38:WU00:FS00:0xa7:Completed 195000 out of 250000 steps (78%)
10:29:07:WU00:FS00:0xa7:Completed 197500 out of 250000 steps (79%)
10:44:03:WU00:FS00:0xa7:Completed 200000 out of 250000 steps (80%)
10:58:37:WU00:FS00:0xa7:Completed 202500 out of 250000 steps (81%)
11:06:23:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
11:06:50:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
11:07:49:ERROR:Send error: 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
11:13:33:WU00:FS00:0xa7:Completed 205000 out of 250000 steps (82%)
11:27:47:WU00:FS00:0xa7:Completed 207500 out of 250000 steps (83%)
11:41:51:WU00:FS00:0xa7:Completed 210000 out of 250000 steps (84%)
11:55:56:WU00:FS00:0xa7:Completed 212500 out of 250000 steps (85%)
12:10:08:WU00:FS00:0xa7:Completed 215000 out of 250000 steps (86%)
12:24:21:WU00:FS00:0xa7:Completed 217500 out of 250000 steps (87%)
12:38:26:WU00:FS00:0xa7:Completed 220000 out of 250000 steps (88%)
12:52:38:WU00:FS00:0xa7:Completed 222500 out of 250000 steps (89%)
13:07:43:WU00:FS00:0xa7:Completed 225000 out of 250000 steps (90%)
13:23:11:WU00:FS00:0xa7:Completed 227500 out of 250000 steps (91%)
13:37:28:WU00:FS00:0xa7:Completed 230000 out of 250000 steps (92%)
******************************* Date: 2020-08-05 *******************************
13:51:38:WU00:FS00:0xa7:Completed 232500 out of 250000 steps (93%)
14:05:47:WU00:FS00:0xa7:Completed 235000 out of 250000 steps (94%)
14:19:55:WU00:FS00:0xa7:Completed 237500 out of 250000 steps (95%)
14:34:10:WU00:FS00:0xa7:Completed 240000 out of 250000 steps (96%)
14:48:23:WU00:FS00:0xa7:Completed 242500 out of 250000 steps (97%)
15:02:42:WU00:FS00:0xa7:Completed 245000 out of 250000 steps (98%)
15:17:00:WU00:FS00:0xa7:Completed 247500 out of 250000 steps (99%)
15:17:00:WU02:FS00:Connecting to assign1.foldingathome.org:80
15:17:01:WU02:FS00:Assigned to work server 150.136.14.110
15:17:01:WU02:FS00:Requesting new work unit for slot 00: RUNNING cpu:3 from 150.136.14.110
15:17:01:WU02:FS00:Connecting to 150.136.14.110:8080
15:17:01:WU02:FS00:Downloading 2.34MiB
15:31:22:WU00:FS00:0xa7:Completed 250000 out of 250000 steps (100%)
15:31:29:WU00:FS00:0xa7:Saving result file ..\logfile_01.txt
15:31:29:WU00:FS00:0xa7:Saving result file dhdl.xvg
15:31:29:WU00:FS00:0xa7:Saving result file frame318.trr
15:31:29:WU00:FS00:0xa7:Saving result file md.log
15:31:29:WU00:FS00:0xa7:Saving result file pullf.xvg
15:31:29:WU00:FS00:0xa7:Saving result file pullx.xvg
15:31:29:WU00:FS00:0xa7:Saving result file science.log
15:31:29:WU00:FS00:0xa7:Saving result file traj_comp.xtc
15:31:29:WU00:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
15:31:30:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
15:31:30:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14379 run:2253 clone:1 gen:318 core:0xa7 unit:0x00000164455e42075e932f852c3167a7
15:31:30:WU00:FS00:Uploading 6.47MiB to 69.94.66.7
15:31:30:WU00:FS00:Connecting to 69.94.66.7:8080
15:31:36:WU00:FS00:Upload 5.79%
15:31:42:WU00:FS00:Upload 11.59%
15:31:49:WU00:FS00:Upload 17.38%
15:31:58:WU00:FS00:Upload 23.17%
15:32:04:WU00:FS00:Upload 27.04%
15:32:10:WU00:FS00:Upload 32.83%
15:32:16:WU00:FS00:Upload 37.66%
15:32:22:WU00:FS00:Upload 41.52%
15:32:30:WU00:FS00:Upload 44.42%
15:32:36:WU00:FS00:Upload 47.31%
15:32:42:WU00:FS00:Upload 53.11%
15:32:48:WU00:FS00:Upload 57.94%
15:32:54:WU00:FS00:Upload 63.73%
15:33:00:WU00:FS00:Upload 69.52%
15:33:06:WU00:FS00:Upload 75.32%
15:33:12:WU00:FS00:Upload 81.11%
15:33:18:WU00:FS00:Upload 87.87%
15:33:24:WU00:FS00:Upload 94.63%
15:33:30:WU00:FS00:Upload complete
15:33:30:WU00:FS00:Server responded WORK_ACK (400)
15:33:30:WU00:FS00:Final credit estimate, 1252.00 points
15:33:30:WU00:FS00:Cleaning up
******************************* Date: 2020-08-07 *******************************
10:36:57:ERROR:Send error: 10054: An existing connection was forcibly closed by the remote host.
Last edited by Joe_H on Fri Aug 07, 2020 1:49 pm, edited 2 times in total.
Reason: change Quote tags to Code for log
Joe_H
Site Admin
Posts: 7878
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Work fails to download more WU after finishing uploading

Post by Joe_H »

The 10053 and 10054 network error messages are local and not connected with downloading a new WU. From prior experience those are reporting the local network connections to Web Control and FAHViewer being closed.

Things to check would be changes to the firewall and anti-malware settings especially if there has been updates applied recently by Windows Update for instance.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Sandman192
Posts: 62
Joined: Fri Mar 07, 2008 12:40 am
Hardware configuration: SW: Windows 10 Professional 64-bit - Video Drivers v516.01
HW: ASUS - 1 EVGA GeForce 1080Ti 11GB RAM - 1 GeForce EVGA 980 4GB RAM - CPU i7-5930K 3.5GHz boost to 4GHz, 32GB of RAM

FAH v7.6.21

Re: Work fails to download more WU after finishing uploading

Post by Sandman192 »

I said this has never happened to older versions of F@H. And the problem always fixes its self after every reboot.
No anit-malware and firewall and anti-virus is from Windows. If it was then restarting my computer would not allow more downloading.
If it was from a Windows update then you be having it too on your Windows machine. Again it started when I updated to the newest version of F@H.
Joe_H
Site Admin
Posts: 7878
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Work fails to download more WU after finishing uploading

Post by Joe_H »

This has happened to others in the past, both for older versions and the current version of the F@h client. You have also only provided a bare minimum of log information, not a single instance of a WU request failing.

As for updating, you may have to re-identify the FAHClient executable as being an exception, that executable would have changed when you updated. The Windows antivirus counts as anti-malware, so does the Windows firewall. From other Windows users, communication done by FAHClient does need to be in the "Private" zone.

So, post the first 100-200 lines of your current log to show the system, hardware and client setup. And post a section showing an actual WU request failing and we can look at it further.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Work fails to download more WU after finishing uploading

Post by Neil-B »

@Joe_H ... Fairly sure the 2nd code window of OP shows this ... if you see at the end it looks like it might be one of those "established connection" not doing anything errors ... It connects and "starts" download after which nothing happens.

15:17:01:WU02:FS00:Connecting to 150.136.14.110:8080
15:17:01:WU02:FS00:Downloading 2.34MiB

@Sandman192 ... I think what may be happening is that the download connection is hanging for some reason and this is not being spotted/cleared by client - if I am right and you are using windows you don't need to restart client you can just drop the established connection - next time this happens try using TCP View to drop the hanging connection - Fairly sure someone posted a Linux equivalent tool a while back.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Mxyzptlk
Posts: 73
Joined: Wed Apr 08, 2020 8:55 pm
Hardware configuration: Lots... Look at my website: www.mxyzptlk.us
Location: California
Contact:

Re: Work fails to download more WU after finishing uploading

Post by Mxyzptlk »

I have seen the same exact issue on two of my computers.
I fold..... look at my folding setups here: https://mxyzptlk.us/about/
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Work fails to download more WU after finishing uploading

Post by bruce »

What is the reported value for CWD near the top of FAH"s log? Are you using the shortcut provided at install time?
Please see below about posting FAH's log.
Sandman192
Posts: 62
Joined: Fri Mar 07, 2008 12:40 am
Hardware configuration: SW: Windows 10 Professional 64-bit - Video Drivers v516.01
HW: ASUS - 1 EVGA GeForce 1080Ti 11GB RAM - 1 GeForce EVGA 980 4GB RAM - CPU i7-5930K 3.5GHz boost to 4GHz, 32GB of RAM

FAH v7.6.21

Re: Work fails to download more WU after finishing uploading

Post by Sandman192 »

And post a section showing an actual WU request failing and we can look at it further.
I have. It's in the first 3 line in the quote and the second quote in the last line.
@Joe_H ... Fairly sure the 2nd code window of OP shows this ...
Think you for understanding.

Using WiFi.I found out for some reason my "Gaming" router is suppose to be good for giving out strong single connections for WiFi and it seems to sucks even at strong single (It's 20ft or so from my router. I';m glad I have my modem/router and connected to it and it's working fine now.

As for restarting my computer which restarts my WiFi only last long enough to download new work and has trouble after a day even though it's connected and still has a good single.
Could it be F@H not liking certon WiFi routers??? That seems very wired and odd and the first hardware to software bug ever.
I have seen the same exact issue on two of my computers.
Are you using WiFi? And if so, what router are you using?
Mines a Netgear XR500 and uptodate. V2.3.2.56
Joe_H
Site Admin
Posts: 7878
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Work fails to download more WU after finishing uploading

Post by Joe_H »

Sandman192 wrote:Could it be F@H not liking certon WiFi routers??
Not so much certain WiFi routers as not liking network connections that are not reliable and stable. If the connection drops packets, especially ACK packets, the connection can stall with each side waiting on the other. The code to detect that kind of stalled connection has become better over the last few versions, but does not always catch the the condition and do a retry.

With a "gaming" router, its default network settings may prioritize sending and receiving packets used by games. Whether that is at the expense of the TCP packets carrying the HTTP data might take digging down into the documentation or hooking up a network analyzer.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply