[SOLVED] 3.21.157.11 assigned/connects/no download/hangs

Moderators: Site Moderators, FAHC Science Team

Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

[SOLVED] 3.21.157.11 assigned/connects/no download/hangs

Post by Neil-B »

Just noticed both my slots (on System 1) completed WUs successfully but were assigned to this server to download next WU … both followed same sequence:

Code: Select all

21:14:22:WU00:FS02:Connecting to assign1.foldingathome.org:80
21:14:23:WU00:FS02:Assigned to work server 3.21.157.11
21:14:23:WU00:FS02:Requesting new work unit for slot 02: RUNNING cpu:32 from 3.21.157.11
21:14:23:WU00:FS02:Connecting to 3.21.157.11:8080
and

Code: Select all

23:16:31:WU01:FS00:Connecting to assign1.foldingathome.org:80
23:16:31:WU01:FS00:Assigned to work server 3.21.157.11
23:16:31:WU01:FS00:Requesting new work unit for slot 00: RUNNING cpu:24 from 3.21.157.11
23:16:31:WU01:FS00:Connecting to 3.21.157.11:8080
Neither slot then progressed any further … no retry timer … no subsequent attempts to reconnect to AS or WS.

Log Header: (log started a couple of days ago - no configuration changes - "bullet proof stable")

Code: Select all

*********************** Log Started 2020-05-10T19:00:02Z ***********************
19:00:02:Trying to access database...
19:00:02:Successfully acquired database lock
19:00:02:Downloading GPUs.txt from assign1.foldingathome.org:80
19:00:02:Connecting to assign1.foldingathome.org:80
19:00:02:Read GPUs.txt
19:00:03:Enabled folding slot 00: READY cpu:24
19:00:03:Enabled folding slot 02: READY cpu:32
19:00:03:****************************** FAHClient ******************************
19:00:03:        Version: 7.6.13
19:00:03:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:00:03:      Copyright: 2020 foldingathome.org
19:00:03:       Homepage: https://foldingathome.org/
19:00:03:           Date: Apr 27 2020
19:00:03:           Time: 21:21:01
19:00:03:       Revision: 5a652817f46116b6e135503af97f18e094414e3b
19:00:03:         Branch: master
19:00:03:       Compiler: Visual C++ 2008
19:00:03:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
19:00:03:       Platform: win32 10
19:00:03:           Bits: 32
19:00:03:           Mode: Release
19:00:03:           Args: --open-web-control
19:00:03:         Config: C:\Users\OpDoubleHelix\AppData\Roaming\FAHClient\config.xml
19:00:03:******************************** CBang ********************************
19:00:03:           Date: Apr 24 2020
19:00:03:           Time: 17:07:55
19:00:03:       Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
19:00:03:         Branch: master
19:00:03:       Compiler: Visual C++ 2008
19:00:03:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
19:00:03:       Platform: win32 10
19:00:03:           Bits: 32
19:00:03:           Mode: Release
19:00:03:******************************* System ********************************
19:00:03:            CPU: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
19:00:03:         CPU ID: GenuineIntel Family 6 Model 63 Stepping 2
19:00:03:           CPUs: 32
19:00:03:         Memory: 511.75GiB
19:00:03:    Free Memory: 502.00GiB
19:00:03:        Threads: WINDOWS_THREADS
19:00:03:     OS Version: 6.2
19:00:03:    Has Battery: false
19:00:03:     On Battery: false
19:00:03:     UTC Offset: 1
19:00:03:            PID: 6936
19:00:03:            CWD: C:\Users\OpDoubleHelix\AppData\Roaming\FAHClient
19:00:03:  Win32 Service: false
19:00:03:             OS: Windows 10 Enterprise
19:00:03:        OS Arch: AMD64
19:00:03:           GPUs: 1
19:00:03:          GPU 0: Bus:3 Slot:0 Func:0 NVIDIA:3 GK107 [Quadro K420]
19:00:03:  CUDA Device 0: Platform:0 Device:0 Bus:3 Slot:0 Compute:3.0 Driver:10.2
19:00:03:OpenCL Device 0: Platform:0 Device:0 Bus:3 Slot:0 Compute:1.2 Driver:442.74
19:00:03:******************************* libFAH ********************************
19:00:03:           Date: Apr 15 2020
19:00:03:           Time: 14:53:14
19:00:03:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
19:00:03:         Branch: master
19:00:03:       Compiler: Visual C++ 2008
19:00:03:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
19:00:03:       Platform: win32 10
19:00:03:           Bits: 32
19:00:03:           Mode: Release
19:00:03:***********************************************************************
19:00:03:<config>
19:00:03:  <!-- Logging -->
19:00:03:  <log-rotate-max v='99'/>
19:00:03:
19:00:03:  <!-- Network -->
19:00:03:  <proxy v=':8080'/>
19:00:03:
19:00:03:  <!-- Slot Control -->
19:00:03:  <power v='full'/>
19:00:03:
19:00:03:  <!-- User Information -->
19:00:03:  <passkey v='*****'/>
19:00:03:  <team v='39363'/>
19:00:03:  <user v='Neck-W'/>
19:00:03:
19:00:03:  <!-- Folding Slots -->
19:00:03:  <slot id='0' type='CPU'>
19:00:03:    <client-type v='beta'/>
19:00:03:    <cpus v='24'/>
19:00:03:  </slot>
19:00:03:  <slot id='2' type='CPU'>
19:00:03:    <client-type v='beta'/>
19:00:03:    <cpus v='32'/>
19:00:03:  </slot>
19:00:03:</config>
Log Tail showing existing WUs complete and upload successfully but neither WU download commencing.

Code: Select all

21:04:24:WU01:FS02:0xa7:Completed 237500 out of 250000 steps (95%)
21:05:46:WU02:FS00:0xa7:Completed 137500 out of 250000 steps (55%)
21:06:53:WU01:FS02:0xa7:Completed 240000 out of 250000 steps (96%)
21:08:41:WU02:FS00:0xa7:Completed 140000 out of 250000 steps (56%)
21:09:24:WU01:FS02:0xa7:Completed 242500 out of 250000 steps (97%)
21:11:37:WU02:FS00:0xa7:Completed 142500 out of 250000 steps (57%)
21:11:53:WU01:FS02:0xa7:Completed 245000 out of 250000 steps (98%)
21:14:21:WU01:FS02:0xa7:Completed 247500 out of 250000 steps (99%)
21:14:22:WU00:FS02:Connecting to assign1.foldingathome.org:80
21:14:23:WU00:FS02:Assigned to work server 3.21.157.11
21:14:23:WU00:FS02:Requesting new work unit for slot 02: RUNNING cpu:32 from 3.21.157.11
21:14:23:WU00:FS02:Connecting to 3.21.157.11:8080
21:14:31:WU02:FS00:0xa7:Completed 145000 out of 250000 steps (58%)
21:16:51:WU01:FS02:0xa7:Completed 250000 out of 250000 steps (100%)
21:16:58:WU01:FS02:0xa7:Saving result file ..\logfile_01.txt
21:16:58:WU01:FS02:0xa7:Saving result file frame17.trr
21:16:58:WU01:FS02:0xa7:Saving result file frame17.xtc
21:16:58:WU01:FS02:0xa7:Saving result file md.log
21:16:58:WU01:FS02:0xa7:Saving result file science.log
21:16:58:WU01:FS02:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
21:16:59:WU01:FS02:FahCore returned: FINISHED_UNIT (100 = 0x64)
21:16:59:WU01:FS02:Sending unit results: id:01 state:SEND error:NO_ERROR project:16503 run:0 clone:86 gen:17 core:0xa7 unit:0x000000128f59f36f5eae56801d316da6
21:16:59:WU01:FS02:Uploading 6.99MiB to 143.89.243.111
21:16:59:WU01:FS02:Connecting to 143.89.243.111:8080
21:17:05:WU01:FS02:Upload 10.73%
21:17:11:WU01:FS02:Upload 16.09%
21:17:17:WU01:FS02:Upload 21.45%
21:17:23:WU01:FS02:Upload 27.71%
21:17:27:WU02:FS00:0xa7:Completed 147500 out of 250000 steps (59%)
21:17:29:WU01:FS02:Upload 33.97%
21:17:35:WU01:FS02:Upload 39.33%
21:17:41:WU01:FS02:Upload 44.70%
21:17:47:WU01:FS02:Upload 51.85%
21:17:53:WU01:FS02:Upload 56.32%
21:17:59:WU01:FS02:Upload 67.94%
21:18:05:WU01:FS02:Upload 76.88%
21:18:11:WU01:FS02:Upload 85.82%
21:18:15:WU01:FS02:Upload complete
21:18:15:WU01:FS02:Server responded WORK_ACK (400)
21:18:15:WU01:FS02:Final credit estimate, 27360.00 points
21:18:15:WU01:FS02:Cleaning up
21:20:26:WU02:FS00:0xa7:Completed 150000 out of 250000 steps (60%)
21:23:25:WU02:FS00:0xa7:Completed 152500 out of 250000 steps (61%)
21:26:26:WU02:FS00:0xa7:Completed 155000 out of 250000 steps (62%)
21:29:25:WU02:FS00:0xa7:Completed 157500 out of 250000 steps (63%)
21:32:23:WU02:FS00:0xa7:Completed 160000 out of 250000 steps (64%)
21:35:22:WU02:FS00:0xa7:Completed 162500 out of 250000 steps (65%)
21:38:20:WU02:FS00:0xa7:Completed 165000 out of 250000 steps (66%)
21:41:20:WU02:FS00:0xa7:Completed 167500 out of 250000 steps (67%)
21:44:18:WU02:FS00:0xa7:Completed 170000 out of 250000 steps (68%)
21:47:17:WU02:FS00:0xa7:Completed 172500 out of 250000 steps (69%)
21:50:15:WU02:FS00:0xa7:Completed 175000 out of 250000 steps (70%)
21:53:13:WU02:FS00:0xa7:Completed 177500 out of 250000 steps (71%)
21:56:13:WU02:FS00:0xa7:Completed 180000 out of 250000 steps (72%)
21:59:11:WU02:FS00:0xa7:Completed 182500 out of 250000 steps (73%)
22:02:10:WU02:FS00:0xa7:Completed 185000 out of 250000 steps (74%)
22:05:08:WU02:FS00:0xa7:Completed 187500 out of 250000 steps (75%)
22:08:06:WU02:FS00:0xa7:Completed 190000 out of 250000 steps (76%)
22:11:05:WU02:FS00:0xa7:Completed 192500 out of 250000 steps (77%)
22:14:03:WU02:FS00:0xa7:Completed 195000 out of 250000 steps (78%)
22:17:00:WU02:FS00:0xa7:Completed 197500 out of 250000 steps (79%)
22:19:58:WU02:FS00:0xa7:Completed 200000 out of 250000 steps (80%)
22:22:56:WU02:FS00:0xa7:Completed 202500 out of 250000 steps (81%)
22:25:55:WU02:FS00:0xa7:Completed 205000 out of 250000 steps (82%)
22:28:52:WU02:FS00:0xa7:Completed 207500 out of 250000 steps (83%)
22:31:49:WU02:FS00:0xa7:Completed 210000 out of 250000 steps (84%)
22:34:47:WU02:FS00:0xa7:Completed 212500 out of 250000 steps (85%)
22:37:45:WU02:FS00:0xa7:Completed 215000 out of 250000 steps (86%)
22:40:45:WU02:FS00:0xa7:Completed 217500 out of 250000 steps (87%)
22:43:43:WU02:FS00:0xa7:Completed 220000 out of 250000 steps (88%)
22:46:41:WU02:FS00:0xa7:Completed 222500 out of 250000 steps (89%)
22:49:40:WU02:FS00:0xa7:Completed 225000 out of 250000 steps (90%)
22:52:39:WU02:FS00:0xa7:Completed 227500 out of 250000 steps (91%)
22:55:39:WU02:FS00:0xa7:Completed 230000 out of 250000 steps (92%)
22:58:38:WU02:FS00:0xa7:Completed 232500 out of 250000 steps (93%)
23:01:37:WU02:FS00:0xa7:Completed 235000 out of 250000 steps (94%)
23:04:35:WU02:FS00:0xa7:Completed 237500 out of 250000 steps (95%)
23:07:34:WU02:FS00:0xa7:Completed 240000 out of 250000 steps (96%)
23:10:34:WU02:FS00:0xa7:Completed 242500 out of 250000 steps (97%)
23:13:32:WU02:FS00:0xa7:Completed 245000 out of 250000 steps (98%)
23:16:30:WU02:FS00:0xa7:Completed 247500 out of 250000 steps (99%)
23:16:31:WU01:FS00:Connecting to assign1.foldingathome.org:80
23:16:31:WU01:FS00:Assigned to work server 3.21.157.11
23:16:31:WU01:FS00:Requesting new work unit for slot 00: RUNNING cpu:24 from 3.21.157.11
23:16:31:WU01:FS00:Connecting to 3.21.157.11:8080
23:19:28:WU02:FS00:0xa7:Completed 250000 out of 250000 steps (100%)
23:19:35:WU02:FS00:0xa7:Saving result file ..\logfile_01.txt
23:19:35:WU02:FS00:0xa7:Saving result file frame19.trr
23:19:35:WU02:FS00:0xa7:Saving result file frame19.xtc
23:19:35:WU02:FS00:0xa7:Saving result file md.log
23:19:35:WU02:FS00:0xa7:Saving result file science.log
23:19:35:WU02:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
23:19:35:WU02:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
23:19:35:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:16507 run:0 clone:107 gen:19 core:0xa7 unit:0x000000148f59f36f5eae4e5d57bdae73
23:19:35:WU02:FS00:Uploading 6.98MiB to 143.89.243.111
23:19:35:WU02:FS00:Connecting to 143.89.243.111:8080
23:19:41:WU02:FS00:Upload 13.42%
23:19:47:WU02:FS00:Upload 22.37%
23:19:53:WU02:FS00:Upload 29.53%
23:19:59:WU02:FS00:Upload 35.79%
23:20:05:WU02:FS00:Upload 42.06%
23:20:11:WU02:FS00:Upload 48.32%
23:20:17:WU02:FS00:Upload 55.48%
23:20:23:WU02:FS00:Upload 61.74%
23:20:29:WU02:FS00:Upload 67.11%
23:20:35:WU02:FS00:Upload 79.64%
23:20:42:WU02:FS00:Upload 86.79%
23:20:48:WU02:FS00:Upload 93.06%
23:20:54:WU02:FS00:Upload 99.32%
23:20:55:WU02:FS00:Upload complete
23:20:55:WU02:FS00:Server responded WORK_ACK (400)
23:20:55:WU02:FS00:Final credit estimate, 25157.00 points
23:20:55:WU02:FS00:Cleaning up
Advance Control shows client "Online" and both slots "Ready" … The Work Queue has two "greyed out" Download IDs both Status "Download", Progress "0%", ETA "Unknown", etc. and no "Waiting On", Ateempts "0" and Next Attempt "Unknown".

Internet connection rock solid fibre connection 220Mbps+ download, 21Mbps+ upload.

Run beta flag, but fairly sure that isn't relevant to this issue hence posting here not in beta forum … My guess is this may be one of the "family" of issues that have been plaguing connections and that a restart or reboot may well clear it … Haven't tried Pause/Fold, Client restart or any such yet in case there are any of the other logs that might be useful identifying this issue … It is a bit late here, but I'll hang around for a bit in case someone wants to confirm if known issue or in case any of the other logs can help diagnose/clarify/shed light on the issue.
Last edited by Neil-B on Thu May 14, 2020 8:28 am, edited 4 times in total.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 3.21.157.11 assigned, connects, no download & "hangs"

Post by PantherX »

I can get to the Server and the landing page easily. What I believe is that you encountered this issue which is a known bug in FAHClient:
https://github.com/FoldingAtHome/fah-issues/issues/983

A restart of the client is needed to recover it. Pausing/unpausing the slot won't work. To confirm if this is the issue, check the log and see if it just "stops" after connecting to the WS.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: 3.21.157.11 assigned, connects, no download & "hangs"

Post by Neil-B »

OK, so closed client down and restarted it … One slot was assigned to, and grabbed a WU successfully from 130.237.11.145 and is folding fine … The other slot was assigned to 3.21.157.11 and has encountered exactly the same issue again - I'm not going to bother grabbing the log snippet off the server for this as it is precisely the same pattern as shown above - this rather implies that this "983 issue" is pretty server dependant?

I normally try to avoid restarting client when I have a slot running - but the WU in the slot that is running is a long one so I'll risk it and see if the "sleeping" slot can pick up a non 3.21.157.11 assignment or if it is just not my night.

I know the "983 issue" may be raised on GitHub - but with three identical issues on the same server (just for me alone) could someone give it a bit of tender loving pointing stick please :)

>>>>> Edited to continue saga :) <<<<<

Make that 4/4 the 3.21.157.11 server "Wins" :) … Exactly same symptoms as above so not bothering with a log cut … TCPView is showing there is an Established (but doing sweet nothing) connection to the server … Trying to work out if I can kill a TCP connection without killing the client so that I can just pause/fold the hanging slot … might as well make an all nighter of it !! … If anyone wants to tell me it isn't possible it might mean I'll get some sleep - but even better if someone can tell me it is and point me in the right direction :)
Last edited by Neil-B on Tue May 12, 2020 1:31 am, edited 1 time in total.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 3.21.157.11 assigned, connects, no download & "hangs"

Post by PantherX »

IME, the GitHub issue isn't Server dependent, I can replicate it on demand by yanking out the LAN cable while downloading/uploading. I can reproduce it easily but it impacts the entire client. In your case, it is a single Slot which is interesting.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: 3.21.157.11 assigned, connects, no download & "hangs"

Post by Neil-B »

It is a single slot repeating itself trying to connect to the same server … the other slot that had the same issue when assigned to a different WS connected fine … something about the connection protocol on that server at this point in time feels real iffy to me :) … I'm going to risk using TCPView to close the connection and see what the client does … I'll pause the running slot just in case it isn't "good" !!

>>>>> The Saga ends - at least for now :) <<<<<

Yay … Non Techie Idiot WINS … On closing the connection in TCPView the client instantly reconnected to AS, got assigned to another WS and has downloaded and is running a WU quite happily … other slot now unpaused and continuing on its merry way.

So for anyone on Windows with I would suggest grabbing TCPView (MS System Internals package) - which even an idiot like me can use/understand - identify the hung connection (look for FAHCLient row with the server name/IP in question) - right click on the row and close connection.

*** It might be interesting for someone to look at the server logs for this evening/night to see if there was an abnormal drop in people connecting for assignments or for abnormally high load as there does appear to be a correlation between this issue and this server this evening/night … not saying it doesn't happen on other servers and in other ways but the above repeatable pattern on that server might give clues as to what is awry and may help resolve the issue?

TBH, I rather hope I find my server idle when I get up later this morning (in a little over four hours) and the hanging slots waiting for connection to this WS - It would give me a chance to test my "Solution" :)

… oh yes - Is someone with a GitHub account could link this forum thread to the GitHub issue earlier in the thread it might provide some insight for that?
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: [SOLUTION] 3.21.157.11 assigned/connects/no download/han

Post by Neil-B »

So what I hoped for has occurred … both slot completed them WUS and both we Assigned to the 3.21.157.11 server again and both had exactly the same issue … used TCPView to close the two "Established Connections" and both were assigned to other WS and downloaded WUs immediately working on the now :)

So issue repeatable - this server is the common factor in every case for me overnight - This points to the issue not being something random that happens now and again to various servers but some form of repeatable effect that can begin to be shown starts to impact a server under certain conditions (don't know what these are but they have been that way for this overnight for every connection my kit has made to it.

I post below my log showing both the latest pair of issues occurring:

Code: Select all

05:29:31:WU02:FS02:Connecting to assign1.foldingathome.org:80
05:29:31:WU02:FS02:Assigned to work server 3.21.157.11
05:29:31:WU02:FS02:Requesting new work unit for slot 02: RUNNING cpu:32 from 3.21.157.11
05:29:32:WU02:FS02:Connecting to 3.21.157.11:8080

Code: Select all

06:02:32:WU01:FS00:Connecting to assign1.foldingathome.org:80
06:02:32:WU01:FS00:Assigned to work server 3.21.157.11
06:02:32:WU01:FS00:Requesting new work unit for slot 00: RUNNING cpu:24 from 3.21.157.11
06:02:32:WU01:FS00:Connecting to 3.21.157.11:8080
and the points at which I closed the connections to the server which allowed the Client to resume normal functioning:

Code: Select all

07:36:14:ERROR:WU02:FS02:Exception: 10002: Received short response, expected 512 bytes, got 0

Code: Select all

07:36:26:ERROR:WU01:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0
So Log Tail showing sequence: (I have added blank lines to assist spotting the above in the tail)

Code: Select all

05:14:31:WU01:FS02:0xa7:Completed 232500 out of 250000 steps (93%)
05:16:57:WU00:FS00:0xa7:Completed 425000 out of 500000 steps (85%)
05:17:00:WU01:FS02:0xa7:Completed 235000 out of 250000 steps (94%)
05:19:30:WU01:FS02:0xa7:Completed 237500 out of 250000 steps (95%)
05:20:12:WU00:FS00:0xa7:Completed 430000 out of 500000 steps (86%)
05:22:00:WU01:FS02:0xa7:Completed 240000 out of 250000 steps (96%)
05:23:27:WU00:FS00:0xa7:Completed 435000 out of 500000 steps (87%)
05:24:31:WU01:FS02:0xa7:Completed 242500 out of 250000 steps (97%)
05:26:41:WU00:FS00:0xa7:Completed 440000 out of 500000 steps (88%)
05:27:01:WU01:FS02:0xa7:Completed 245000 out of 250000 steps (98%)
05:29:31:WU01:FS02:0xa7:Completed 247500 out of 250000 steps (99%)

05:29:31:WU02:FS02:Connecting to assign1.foldingathome.org:80 
05:29:31:WU02:FS02:Assigned to work server 3.21.157.11
05:29:31:WU02:FS02:Requesting new work unit for slot 02: RUNNING cpu:32 from 3.21.157.11
05:29:32:WU02:FS02:Connecting to 3.21.157.11:8080

05:29:56:WU00:FS00:0xa7:Completed 445000 out of 500000 steps (89%)
05:31:59:WU01:FS02:0xa7:Completed 250000 out of 250000 steps (100%)
05:32:08:WU01:FS02:0xa7:Saving result file ..\logfile_01.txt
05:32:08:WU01:FS02:0xa7:Saving result file frame28.trr
05:32:08:WU01:FS02:0xa7:Saving result file frame28.xtc
05:32:08:WU01:FS02:0xa7:Saving result file md.log
05:32:08:WU01:FS02:0xa7:Saving result file science.log
05:32:08:WU01:FS02:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
05:32:09:WU01:FS02:FahCore returned: FINISHED_UNIT (100 = 0x64)
05:32:09:WU01:FS02:Sending unit results: id:01 state:SEND error:NO_ERROR project:16507 run:0 clone:58 gen:28 core:0xa7 unit:0x0000001f8f59f36f5eae4e6b813c6810
05:32:09:WU01:FS02:Uploading 6.99MiB to 143.89.243.111
05:32:09:WU01:FS02:Connecting to 143.89.243.111:8080
05:32:15:WU01:FS02:Upload 10.74%
05:32:21:WU01:FS02:Upload 17.89%
05:32:27:WU01:FS02:Upload 23.26%
05:32:33:WU01:FS02:Upload 28.63%
05:32:39:WU01:FS02:Upload 35.78%
05:32:45:WU01:FS02:Upload 42.05%
05:32:51:WU01:FS02:Upload 47.41%
05:32:57:WU01:FS02:Upload 53.68%
05:33:03:WU01:FS02:Upload 59.94%
05:33:09:WU01:FS02:Upload 67.99%
05:33:09:WU00:FS00:0xa7:Completed 450000 out of 500000 steps (90%)
05:33:15:WU01:FS02:Upload 74.25%
05:33:21:WU01:FS02:Upload 80.51%
05:33:28:WU01:FS02:Upload 86.78%
05:33:34:WU01:FS02:Upload 93.04%
05:33:40:WU01:FS02:Upload 98.41%
05:33:43:WU01:FS02:Upload complete
05:33:43:WU01:FS02:Server responded WORK_ACK (400)
05:33:43:WU01:FS02:Final credit estimate, 27392.00 points
05:33:43:WU01:FS02:Cleaning up
05:36:25:WU00:FS00:0xa7:Completed 455000 out of 500000 steps (91%)
05:39:42:WU00:FS00:0xa7:Completed 460000 out of 500000 steps (92%)
05:42:58:WU00:FS00:0xa7:Completed 465000 out of 500000 steps (93%)
05:46:13:WU00:FS00:0xa7:Completed 470000 out of 500000 steps (94%)
05:49:28:WU00:FS00:0xa7:Completed 475000 out of 500000 steps (95%)
05:52:43:WU00:FS00:0xa7:Completed 480000 out of 500000 steps (96%)
05:56:01:WU00:FS00:0xa7:Completed 485000 out of 500000 steps (97%)
05:59:16:WU00:FS00:0xa7:Completed 490000 out of 500000 steps (98%)
06:02:31:WU00:FS00:0xa7:Completed 495000 out of 500000 steps (99%)

06:02:32:WU01:FS00:Connecting to assign1.foldingathome.org:80
06:02:32:WU01:FS00:Assigned to work server 3.21.157.11
06:02:32:WU01:FS00:Requesting new work unit for slot 00: RUNNING cpu:24 from 3.21.157.11
06:02:32:WU01:FS00:Connecting to 3.21.157.11:8080

06:05:46:WU00:FS00:0xa7:Completed 500000 out of 500000 steps (100%)
06:05:48:WU00:FS00:0xa7:Saving result file ..\logfile_01.txt
06:05:48:WU00:FS00:0xa7:Saving result file ener.edr
06:05:48:WU00:FS00:0xa7:Saving result file frame0.trr
06:05:48:WU00:FS00:0xa7:Saving result file md.log
06:05:48:WU00:FS00:0xa7:Saving result file science.log
06:05:48:WU00:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
06:05:49:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
06:05:49:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:16805 run:12 clone:2 gen:0 core:0xa7 unit:0x0000000082ed0b915eb41ff36621e515
06:05:49:WU00:FS00:Uploading 13.91MiB to 130.237.11.145
06:05:49:WU00:FS00:Connecting to 130.237.11.145:8080
06:05:55:WU00:FS00:Upload 44.47%
06:06:01:WU00:FS00:Upload 95.68%
06:06:01:WU00:FS00:Upload complete
06:06:01:WU00:FS00:Server responded WORK_ACK (400)
06:06:01:WU00:FS00:Final credit estimate, 19880.00 points
06:06:01:WU00:FS00:Cleaning up
******************************* Date: 2020-05-12 *******************************

07:36:14:ERROR:WU02:FS02:Exception: 10002: Received short response, expected 512 bytes, got 0

07:36:15:WU02:FS02:Connecting to assign1.foldingathome.org:80
07:36:15:WU02:FS02:Assigned to work server 40.121.152.108
07:36:15:WU02:FS02:Requesting new work unit for slot 02: READY cpu:32 from 40.121.152.108
07:36:15:WU02:FS02:Connecting to 40.121.152.108:8080
07:36:16:WU02:FS02:Downloading 2.82MiB
07:36:17:WU02:FS02:Download complete
07:36:17:WU02:FS02:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:14645 run:617 clone:1 gen:12 core:0xa7 unit:0x0000000d2879986c5e8c80459af67546
07:36:17:WU02:FS02:Starting
07:36:17:WU02:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\OpDoubleHelix\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 02 -suffix 01 -version 706 -lifeline 13716 -checkpoint 15 -np 32
07:36:17:WU02:FS02:Started FahCore on PID 4352
07:36:17:WU02:FS02:Core PID:15096
07:36:17:WU02:FS02:FahCore 0xa7 started
07:36:18:WU02:FS02:0xa7:*********************** Log Started 2020-05-12T07:36:17Z ***********************
07:36:18:WU02:FS02:0xa7:************************** Gromacs Folding@home Core ***************************
07:36:18:WU02:FS02:0xa7:       Type: 0xa7
07:36:18:WU02:FS02:0xa7:       Core: Gromacs
07:36:18:WU02:FS02:0xa7:       Args: -dir 02 -suffix 01 -version 706 -lifeline 4352 -checkpoint 15 -np
07:36:18:WU02:FS02:0xa7:             32
07:36:18:WU02:FS02:0xa7:************************************ CBang *************************************
07:36:18:WU02:FS02:0xa7:       Date: Oct 26 2019
07:36:18:WU02:FS02:0xa7:       Time: 01:38:25
07:36:18:WU02:FS02:0xa7:   Revision: c46a1a011a24143739ac7218c5a435f66777f62f
07:36:18:WU02:FS02:0xa7:     Branch: master
07:36:18:WU02:FS02:0xa7:   Compiler: Visual C++ 2008
07:36:18:WU02:FS02:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
07:36:18:WU02:FS02:0xa7:   Platform: win32 10
07:36:18:WU02:FS02:0xa7:       Bits: 64
07:36:18:WU02:FS02:0xa7:       Mode: Release
07:36:18:WU02:FS02:0xa7:************************************ System ************************************
07:36:18:WU02:FS02:0xa7:        CPU: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
07:36:18:WU02:FS02:0xa7:     CPU ID: GenuineIntel Family 6 Model 63 Stepping 2
07:36:18:WU02:FS02:0xa7:       CPUs: 56
07:36:18:WU02:FS02:0xa7:     Memory: 511.75GiB
07:36:18:WU02:FS02:0xa7:Free Memory: 501.75GiB
07:36:18:WU02:FS02:0xa7:    Threads: WINDOWS_THREADS
07:36:18:WU02:FS02:0xa7: OS Version: 6.2
07:36:18:WU02:FS02:0xa7:Has Battery: false
07:36:18:WU02:FS02:0xa7: On Battery: false
07:36:18:WU02:FS02:0xa7: UTC Offset: 1
07:36:18:WU02:FS02:0xa7:        PID: 15096
07:36:18:WU02:FS02:0xa7:        CWD: C:\Users\OpDoubleHelix\AppData\Roaming\FAHClient\work
07:36:18:WU02:FS02:0xa7:******************************** Build - libFAH ********************************
07:36:18:WU02:FS02:0xa7:    Version: 0.0.18
07:36:18:WU02:FS02:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
07:36:18:WU02:FS02:0xa7:  Copyright: 2019 foldingathome.org
07:36:18:WU02:FS02:0xa7:   Homepage: https://foldingathome.org/
07:36:18:WU02:FS02:0xa7:       Date: Oct 26 2019
07:36:18:WU02:FS02:0xa7:       Time: 01:52:30
07:36:18:WU02:FS02:0xa7:   Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
07:36:18:WU02:FS02:0xa7:     Branch: master
07:36:18:WU02:FS02:0xa7:   Compiler: Visual C++ 2008
07:36:18:WU02:FS02:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
07:36:18:WU02:FS02:0xa7:   Platform: win32 10
07:36:18:WU02:FS02:0xa7:       Bits: 64
07:36:18:WU02:FS02:0xa7:       Mode: Release
07:36:18:WU02:FS02:0xa7:************************************ Build *************************************
07:36:18:WU02:FS02:0xa7:       SIMD: avx_256
07:36:18:WU02:FS02:0xa7:********************************************************************************
07:36:18:WU02:FS02:0xa7:Project: 14645 (Run 617, Clone 1, Gen 12)
07:36:18:WU02:FS02:0xa7:Unit: 0x0000000d2879986c5e8c80459af67546
07:36:18:WU02:FS02:0xa7:Reading tar file core.xml
07:36:18:WU02:FS02:0xa7:Reading tar file frame12.tpr
07:36:18:WU02:FS02:0xa7:Digital signatures verified
07:36:18:WU02:FS02:0xa7:Calling: mdrun -s frame12.tpr -o frame12.trr -cpt 15 -nt 32
07:36:18:WU02:FS02:0xa7:Steps: first=0 total=250000
07:36:19:WU02:FS02:0xa7:Completed 1 out of 250000 steps (0%)

07:36:26:ERROR:WU01:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

07:36:26:WU01:FS00:Connecting to assign1.foldingathome.org:80
07:36:26:WU01:FS00:Assigned to work server 40.121.152.108
07:36:26:WU01:FS00:Requesting new work unit for slot 00: READY cpu:24 from 40.121.152.108
07:36:26:WU01:FS00:Connecting to 40.121.152.108:8080
07:36:27:WU01:FS00:Downloading 2.82MiB
07:36:28:WU01:FS00:Download complete
07:36:28:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:14830 run:11 clone:0 gen:12 core:0xa7 unit:0x0000000d2879986c5eb0c6e5f5147701
07:36:28:WU01:FS00:Starting
07:36:28:WU01:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\OpDoubleHelix\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 01 -suffix 01 -version 706 -lifeline 13716 -checkpoint 15 -np 24
07:36:28:WU01:FS00:Started FahCore on PID 12468
07:36:28:WU01:FS00:Core PID:5008
07:36:28:WU01:FS00:FahCore 0xa7 started
07:36:29:WU01:FS00:0xa7:*********************** Log Started 2020-05-12T07:36:28Z ***********************
07:36:29:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
07:36:29:WU01:FS00:0xa7:       Type: 0xa7
07:36:29:WU01:FS00:0xa7:       Core: Gromacs
07:36:29:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 12468 -checkpoint 15 -np
07:36:29:WU01:FS00:0xa7:             24
07:36:29:WU01:FS00:0xa7:************************************ CBang *************************************
07:36:29:WU01:FS00:0xa7:       Date: Oct 26 2019
07:36:29:WU01:FS00:0xa7:       Time: 01:38:25
07:36:29:WU01:FS00:0xa7:   Revision: c46a1a011a24143739ac7218c5a435f66777f62f
07:36:29:WU01:FS00:0xa7:     Branch: master
07:36:29:WU01:FS00:0xa7:   Compiler: Visual C++ 2008
07:36:29:WU01:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
07:36:29:WU01:FS00:0xa7:   Platform: win32 10
07:36:29:WU01:FS00:0xa7:       Bits: 64
07:36:29:WU01:FS00:0xa7:       Mode: Release
07:36:29:WU01:FS00:0xa7:************************************ System ************************************
07:36:29:WU01:FS00:0xa7:        CPU: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
07:36:29:WU01:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 63 Stepping 2
07:36:29:WU01:FS00:0xa7:       CPUs: 56
07:36:29:WU01:FS00:0xa7:     Memory: 511.75GiB
07:36:29:WU01:FS00:0xa7:Free Memory: 501.51GiB
07:36:29:WU01:FS00:0xa7:    Threads: WINDOWS_THREADS
07:36:29:WU01:FS00:0xa7: OS Version: 6.2
07:36:29:WU01:FS00:0xa7:Has Battery: false
07:36:29:WU01:FS00:0xa7: On Battery: false
07:36:29:WU01:FS00:0xa7: UTC Offset: 1
07:36:29:WU01:FS00:0xa7:        PID: 5008
07:36:29:WU01:FS00:0xa7:        CWD: C:\Users\OpDoubleHelix\AppData\Roaming\FAHClient\work
07:36:29:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
07:36:29:WU01:FS00:0xa7:    Version: 0.0.18
07:36:29:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
07:36:29:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
07:36:29:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
07:36:29:WU01:FS00:0xa7:       Date: Oct 26 2019
07:36:29:WU01:FS00:0xa7:       Time: 01:52:30
07:36:29:WU01:FS00:0xa7:   Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
07:36:29:WU01:FS00:0xa7:     Branch: master
07:36:29:WU01:FS00:0xa7:   Compiler: Visual C++ 2008
07:36:29:WU01:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
07:36:29:WU01:FS00:0xa7:   Platform: win32 10
07:36:29:WU01:FS00:0xa7:       Bits: 64
07:36:29:WU01:FS00:0xa7:       Mode: Release
07:36:29:WU01:FS00:0xa7:************************************ Build *************************************
07:36:29:WU01:FS00:0xa7:       SIMD: avx_256
07:36:29:WU01:FS00:0xa7:********************************************************************************
07:36:29:WU01:FS00:0xa7:Project: 14830 (Run 11, Clone 0, Gen 12)
07:36:29:WU01:FS00:0xa7:Unit: 0x0000000d2879986c5eb0c6e5f5147701
07:36:29:WU01:FS00:0xa7:Reading tar file core.xml
07:36:29:WU01:FS00:0xa7:Reading tar file frame12.tpr
07:36:29:WU01:FS00:0xa7:Digital signatures verified
07:36:29:WU01:FS00:0xa7:Calling: mdrun -s frame12.tpr -o frame12.trr -cpt 15 -nt 24
07:36:29:WU01:FS00:0xa7:Steps: first=0 total=250000
07:36:30:WU01:FS00:0xa7:Completed 1 out of 250000 steps (0%)
07:36:59:WU02:FS02:0xa7:Completed 2500 out of 250000 steps (1%)
07:37:20:WU01:FS00:0xa7:Completed 2500 out of 250000 steps (1%)
07:37:39:WU02:FS02:0xa7:Completed 5000 out of 250000 steps (2%)
07:38:05:WU01:FS00:0xa7:Completed 5000 out of 250000 steps (2%)
07:38:18:WU02:FS02:0xa7:Completed 7500 out of 250000 steps (3%)
07:38:51:WU01:FS00:0xa7:Completed 7500 out of 250000 steps (3%)
I will continue to update this topic as/when more of these issues occur :)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: [SOLUTION] 3.21.157.11 assigned/connects/no download/han

Post by Neil-B »

Yay - another one :)

Same serve, same issue … but this time the AS reassigned back to 3.21.157.11 and the issue reoccurred straight away - this repeated again … at this point when connection closed the the Advanced Control showed a retry timer was now in force, after which the AS offers a different WS and a WU is promptly downloaded and starts to fold ... I am prepare to bet that had the AS offered 3.21.157.11 again it would have "hung" on an Established Connection and needed this Closing Down.

I'll grab the log tail and edit this post to add the evidence … Log Tail below … I have added blank rows and labels to make the issues and the closed connections more obvious:

Code: Select all

08:49:19:WU00:FS02:0xa7:Completed 25000 out of 500000 steps (5%)
08:49:50:WU01:FS00:0xa7:Completed 242500 out of 250000 steps (97%)
08:50:35:WU01:FS00:0xa7:Completed 245000 out of 250000 steps (98%)
08:50:57:WU00:FS02:0xa7:Completed 30000 out of 500000 steps (6%)
08:51:19:WU01:FS00:0xa7:Completed 247500 out of 250000 steps (99%)

--- 1st "Hung" Established Connection ---

08:51:20:WU02:FS00:Connecting to assign1.foldingathome.org:80
08:51:20:WU02:FS00:Assigned to work server 3.21.157.11
08:51:20:WU02:FS00:Requesting new work unit for slot 00: RUNNING cpu:24 from 3.21.157.11
08:51:20:WU02:FS00:Connecting to 3.21.157.11:8080

08:52:04:WU01:FS00:0xa7:Completed 250000 out of 250000 steps (100%)
08:52:07:WU01:FS00:0xa7:Saving result file ..\logfile_01.txt
08:52:07:WU01:FS00:0xa7:Saving result file dhdl.xvg
08:52:07:WU01:FS00:0xa7:Saving result file frame12.trr
08:52:07:WU01:FS00:0xa7:Saving result file md.log
08:52:07:WU01:FS00:0xa7:Saving result file pullf.xvg
08:52:07:WU01:FS00:0xa7:Saving result file pullx.xvg
08:52:07:WU01:FS00:0xa7:Saving result file science.log
08:52:07:WU01:FS00:0xa7:Saving result file traj_comp.xtc
08:52:07:WU01:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
08:52:07:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
08:52:07:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:14830 run:11 clone:0 gen:12 core:0xa7 unit:0x0000000d2879986c5eb0c6e5f5147701
08:52:07:WU01:FS00:Uploading 6.80MiB to 40.121.152.108
08:52:07:WU01:FS00:Connecting to 40.121.152.108:8080
08:52:13:WU01:FS00:Upload 45.95%
08:52:18:WU01:FS00:Upload complete
08:52:19:WU01:FS00:Server responded WORK_ACK (400)
08:52:19:WU01:FS00:Final credit estimate, 6096.00 points
08:52:19:WU01:FS00:Cleaning up
08:52:33:WU00:FS02:0xa7:Completed 35000 out of 500000 steps (7%)
08:54:07:WU00:FS02:0xa7:Completed 40000 out of 500000 steps (8%)
08:55:42:WU00:FS02:0xa7:Completed 45000 out of 500000 steps (9%)
08:57:17:WU00:FS02:0xa7:Completed 50000 out of 500000 steps (10%)
08:58:52:WU00:FS02:0xa7:Completed 55000 out of 500000 steps (11%)
09:00:26:WU00:FS02:0xa7:Completed 60000 out of 500000 steps (12%)
09:02:01:WU00:FS02:0xa7:Completed 65000 out of 500000 steps (13%)
09:03:36:WU00:FS02:0xa7:Completed 70000 out of 500000 steps (14%)
09:05:10:WU00:FS02:0xa7:Completed 75000 out of 500000 steps (15%)
09:06:45:WU00:FS02:0xa7:Completed 80000 out of 500000 steps (16%)
09:08:19:WU00:FS02:0xa7:Completed 85000 out of 500000 steps (17%)
09:09:54:WU00:FS02:0xa7:Completed 90000 out of 500000 steps (18%)
09:11:29:WU00:FS02:0xa7:Completed 95000 out of 500000 steps (19%)
09:13:04:WU00:FS02:0xa7:Completed 100000 out of 500000 steps (20%)
09:14:39:WU00:FS02:0xa7:Completed 105000 out of 500000 steps (21%)
09:16:13:WU00:FS02:0xa7:Completed 110000 out of 500000 steps (22%)
09:17:48:WU00:FS02:0xa7:Completed 115000 out of 500000 steps (23%)
09:19:22:WU00:FS02:0xa7:Completed 120000 out of 500000 steps (24%)
09:20:56:WU00:FS02:0xa7:Completed 125000 out of 500000 steps (25%)
09:22:31:WU00:FS02:0xa7:Completed 130000 out of 500000 steps (26%)
09:24:06:WU00:FS02:0xa7:Completed 135000 out of 500000 steps (27%)
09:25:40:WU00:FS02:0xa7:Completed 140000 out of 500000 steps (28%)
09:27:16:WU00:FS02:0xa7:Completed 145000 out of 500000 steps (29%)
09:28:56:WU00:FS02:0xa7:Completed 150000 out of 500000 steps (30%)
09:30:31:WU00:FS02:0xa7:Completed 155000 out of 500000 steps (31%)
09:32:05:WU00:FS02:0xa7:Completed 160000 out of 500000 steps (32%)
09:33:40:WU00:FS02:0xa7:Completed 165000 out of 500000 steps (33%)
09:35:15:WU00:FS02:0xa7:Completed 170000 out of 500000 steps (34%)
09:36:49:WU00:FS02:0xa7:Completed 175000 out of 500000 steps (35%)
09:38:24:WU00:FS02:0xa7:Completed 180000 out of 500000 steps (36%)
09:39:58:WU00:FS02:0xa7:Completed 185000 out of 500000 steps (37%)
09:41:33:WU00:FS02:0xa7:Completed 190000 out of 500000 steps (38%)
09:43:08:WU00:FS02:0xa7:Completed 195000 out of 500000 steps (39%)
09:44:43:WU00:FS02:0xa7:Completed 200000 out of 500000 steps (40%)
09:46:18:WU00:FS02:0xa7:Completed 205000 out of 500000 steps (41%)

--- 1st Connection Closed ---

09:47:11:ERROR:WU02:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

--- 2nd "Hung" Established Connection ---

09:47:11:WU02:FS00:Connecting to assign1.foldingathome.org:80
09:47:11:WU02:FS00:Assigned to work server 3.21.157.11
09:47:11:WU02:FS00:Requesting new work unit for slot 00: READY cpu:24 from 3.21.157.11
09:47:11:WU02:FS00:Connecting to 3.21.157.11:8080

09:47:52:WU00:FS02:0xa7:Completed 210000 out of 500000 steps (42%)

--- 2nd Connection Closed ---

09:49:06:ERROR:WU02:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

--- 3rd "Hung" Established Connection ---

09:49:06:WU02:FS00:Connecting to assign1.foldingathome.org:80
09:49:07:WU02:FS00:Assigned to work server 3.21.157.11
09:49:07:WU02:FS00:Requesting new work unit for slot 00: READY cpu:24 from 3.21.157.11
09:49:07:WU02:FS00:Connecting to 3.21.157.11:8080

--- 3rd Connection Closed ---

09:49:27:ERROR:WU02:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0

09:49:27:WU00:FS02:0xa7:Completed 215000 out of 500000 steps (43%)

--- After waiting on Retry Timer now get assignment to a different WS where the connection works as expected --- 

09:50:43:WU02:FS00:Connecting to assign1.foldingathome.org:80
09:50:44:WU02:FS00:Assigned to work server 40.121.152.108
09:50:44:WU02:FS00:Requesting new work unit for slot 00: READY cpu:24 from 40.121.152.108
09:50:44:WU02:FS00:Connecting to 40.121.152.108:8080
09:50:44:WU02:FS00:Downloading 2.82MiB
09:50:45:WU02:FS00:Download complete
09:50:46:WU02:FS00:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:14649 run:788 clone:0 gen:50 core:0xa7 unit:0x000000342879986c5e8e0e6cdab69b18
09:50:46:WU02:FS00:Starting
09:50:46:WU02:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\OpDoubleHelix\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 02 -suffix 01 -version 706 -lifeline 13716 -checkpoint 15 -np 24
09:50:46:WU02:FS00:Started FahCore on PID 9816
09:50:46:WU02:FS00:Core PID:13804
09:50:46:WU02:FS00:FahCore 0xa7 started
09:50:46:WU02:FS00:0xa7:*********************** Log Started 2020-05-12T09:50:46Z ***********************
09:50:46:WU02:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
09:50:46:WU02:FS00:0xa7:       Type: 0xa7
09:50:46:WU02:FS00:0xa7:       Core: Gromacs
09:50:46:WU02:FS00:0xa7:       Args: -dir 02 -suffix 01 -version 706 -lifeline 9816 -checkpoint 15 -np
09:50:46:WU02:FS00:0xa7:             24
09:50:46:WU02:FS00:0xa7:************************************ CBang *************************************
09:50:46:WU02:FS00:0xa7:       Date: Oct 26 2019
09:50:46:WU02:FS00:0xa7:       Time: 01:38:25
09:50:46:WU02:FS00:0xa7:   Revision: c46a1a011a24143739ac7218c5a435f66777f62f
09:50:46:WU02:FS00:0xa7:     Branch: master
09:50:46:WU02:FS00:0xa7:   Compiler: Visual C++ 2008
09:50:46:WU02:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
09:50:46:WU02:FS00:0xa7:   Platform: win32 10
09:50:46:WU02:FS00:0xa7:       Bits: 64
09:50:46:WU02:FS00:0xa7:       Mode: Release
09:50:46:WU02:FS00:0xa7:************************************ System ************************************
09:50:46:WU02:FS00:0xa7:        CPU: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
09:50:46:WU02:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 63 Stepping 2
09:50:46:WU02:FS00:0xa7:       CPUs: 56
09:50:46:WU02:FS00:0xa7:     Memory: 511.75GiB
09:50:46:WU02:FS00:0xa7:Free Memory: 501.41GiB
09:50:46:WU02:FS00:0xa7:    Threads: WINDOWS_THREADS
09:50:46:WU02:FS00:0xa7: OS Version: 6.2
09:50:46:WU02:FS00:0xa7:Has Battery: false
09:50:46:WU02:FS00:0xa7: On Battery: false
09:50:46:WU02:FS00:0xa7: UTC Offset: 1
09:50:46:WU02:FS00:0xa7:        PID: 13804
09:50:46:WU02:FS00:0xa7:        CWD: C:\Users\OpDoubleHelix\AppData\Roaming\FAHClient\work
09:50:46:WU02:FS00:0xa7:******************************** Build - libFAH ********************************
09:50:46:WU02:FS00:0xa7:    Version: 0.0.18
09:50:46:WU02:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
09:50:46:WU02:FS00:0xa7:  Copyright: 2019 foldingathome.org
09:50:46:WU02:FS00:0xa7:   Homepage: https://foldingathome.org/
09:50:46:WU02:FS00:0xa7:       Date: Oct 26 2019
09:50:46:WU02:FS00:0xa7:       Time: 01:52:30
09:50:46:WU02:FS00:0xa7:   Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
09:50:46:WU02:FS00:0xa7:     Branch: master
09:50:46:WU02:FS00:0xa7:   Compiler: Visual C++ 2008
09:50:46:WU02:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
09:50:46:WU02:FS00:0xa7:   Platform: win32 10
09:50:46:WU02:FS00:0xa7:       Bits: 64
09:50:46:WU02:FS00:0xa7:       Mode: Release
09:50:46:WU02:FS00:0xa7:************************************ Build *************************************
09:50:46:WU02:FS00:0xa7:       SIMD: avx_256
09:50:46:WU02:FS00:0xa7:********************************************************************************
09:50:46:WU02:FS00:0xa7:Project: 14649 (Run 788, Clone 0, Gen 50)
09:50:46:WU02:FS00:0xa7:Unit: 0x000000342879986c5e8e0e6cdab69b18
09:50:46:WU02:FS00:0xa7:Reading tar file core.xml
09:50:46:WU02:FS00:0xa7:Reading tar file frame50.tpr
09:50:46:WU02:FS00:0xa7:Digital signatures verified
09:50:46:WU02:FS00:0xa7:Calling: mdrun -s frame50.tpr -o frame50.trr -cpt 15 -nt 24
09:50:46:WU02:FS00:0xa7:Steps: first=0 total=250000
09:50:48:WU02:FS00:0xa7:Completed 1 out of 250000 steps (0%)
09:51:02:WU00:FS02:0xa7:Completed 220000 out of 500000 steps (44%)
09:51:37:WU02:FS00:0xa7:Completed 2500 out of 250000 steps (1%)
09:52:23:WU02:FS00:0xa7:Completed 5000 out of 250000 steps (2%)
09:52:42:WU00:FS02:0xa7:Completed 225000 out of 500000 steps (45%)
09:53:08:WU02:FS00:0xa7:Completed 7500 out of 250000 steps (3%)
09:53:54:WU02:FS00:0xa7:Completed 10000 out of 250000 steps (4%)
09:54:20:WU00:FS02:0xa7:Completed 230000 out of 500000 steps (46%)
09:54:40:WU02:FS00:0xa7:Completed 12500 out of 250000 steps (5%)
09:55:25:WU02:FS00:0xa7:Completed 15000 out of 250000 steps (6%)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: [SOLUTION] 3.21.157.11 assigned/connects/no download/han

Post by Neil-B »

And another case of this … So every time either of my slots is assigned to the 3.21.157.11 server it "hangs/stalls" until the connection is closed … This time the hanging slot was assigned immediately to another WS and continued without delay … The other slot had no such issues a few minutes earlier as it was assigned directly to another WS.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: [SOLUTION] 3.21.157.11 assigned/connects/no download/han

Post by Neil-B »

And another case of this … same issue, same solution :)

I'm guessing I am being assigned to that WS to pick up the various Beta WUs on it pick up p14702, p14801 or p14802 WUs (and possibly p14701, p14703, p14704, p14803, p14804 if they have been released tyo Beta yet) … have folded some of these in the recent past but not since this issue started last evening/night … I'll post a "heads up" in Beta re potential issue with server impacting those projects.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Kebast
Posts: 386
Joined: Thu Aug 06, 2015 5:21 pm

Re: [SOLUTION] 3.21.157.11 assigned/connects/no download/han

Post by Kebast »

Woke up this morning to see that my CPU slot had been stuck on this for about 7 hours :(
Image
Ryzen 5900x 12T - RTX 4070 TI
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: [SOLUTION] 3.21.157.11 assigned/connects/no download/han

Post by Neil-B »

If you are using Windows … grab TCPView from the MS site and use it to close connection … simplest way to resolve getting the slot running without impacting other slots … there will be something similar for Linux but I am really not the best person to advise on that … Have PM'd vvoelz to let him know his projects are being impacted.

Be prepared to check regularly until this is resolved - I am getting assigned to this server maybe 20%+ of the time.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: [SOLUTION] 3.21.157.11 assigned/connects/no download/han

Post by Neil-B »

And another case of this … same issue, same solution - reassigned to same server, same issue, same solution - on second closed connection AS offered a different server … "YaY" :) … Going to start setting an alarm clock to go off when each slot is due to finish I think :)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
HaloJones
Posts: 920
Joined: Thu Jul 24, 2008 10:16 am

Re: [SOLUTION] 3.21.157.11 assigned/connects/no download/han

Post by HaloJones »

Neil-B wrote:And another case of this … same issue, same solution - reassigned to same server, same issue, same solution - on second closed connection AS offered a different server … "YaY" :) … Going to start setting an alarm clock to go off when each slot is due to finish I think :)
it's funny how you keep getting sent there. I'm convinced the AS "know" the clients and stick them to particular WS. So one machine might always get 134xx, another 144xx, etc.
single 1070

Image
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: [SOLUTION] 3.21.157.11 assigned/connects/no download/han

Post by Neil-B »

I have relatively high count slots and beta flag … There are a series of Projects in Beta at the moment hosted off that server so I would hope to be sent there (and was being quite a lot) prior to this novel feature revealing itself last night … depending on the WUs "size" I return something between 15 and 30 WUs a day (sometimes more) - so even if there are say 6 main groups of CPU WUs in Beta at any one time I'm likely to hit each group (and their server) four or five times a day … At the moment I think there are less than that true Beta so probably only trying to hit 3/4 severs for Beta WUs at the moment and this is one of them.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: [SOLUTION] 3.21.157.11 assigned/connects/no download/han

Post by Neil-B »

And another case … caught it as it happened before previous WU had completed as had alarm set :) … closed connection and immediately reassigned to same server - closed connection again so into retry timer which ticked off and AS assigned different WS just in time to download as previous WU actually finished !! - Continuous folding with manual error correction on the fly - Result :)

Now to set next alarm … 35 minutes and counting
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Post Reply