[Bug] Client got stuck after initiating connection to WS

Moderators: Site Moderators, FAHC Science Team

Post Reply
Frogging101
Posts: 85
Joined: Wed Mar 25, 2020 2:39 am
Location: Canada

[Bug] Client got stuck after initiating connection to WS

Post by Frogging101 »

My FAHClient has been stuck since 21:32:00 UTC, yesterday (April 17, 2020).

The last lines in the log were as follows:

Code: Select all

21:32:00:WU00:FS00:Connecting to 18.218.241.186:80
21:32:00:WU00:FS00:Assigned to work server 13.90.152.57
21:32:00:WU00:FS00:Requesting new work unit for slot 00: READY cpu:24 from 13.90.152.57
21:32:00:WU00:FS00:Connecting to 13.90.152.57:8080
Upon inspection, it appears that this connection is still in the ESTABLISHED state, some 15 hours later:

Code: Select all

Netid  State      Recv-Q Send-Q Local Address:Port                 Peer Address:Port                
tcp    ESTAB      0      0      10.176.100.154:38750              13.90.152.57:8080                users:(("FAHClient",pid=30164,fd=12))
I would venture a theory that this connection is not actually still active; the connection simply died (without an RST) at a specific time when the client was waiting for a response, which it will never receive.

I could not clear this condition by pausing/unpausing, or using the request-id or request-ws commands. request-ws did connect to an AS and get an "Assigned to work server 150.136.14.110" message, but nothing else happened after that. The original socket to 13.90.152.57:8080 remained open in the ESTABLISHED state throughout these attempts to jog the client.

The FAHControl UI continued to display 13.90.152.57 as the work server, with no next attempt. Here's the full queue-info output:

Code: Select all

  {"id": "00", "state": "DOWNLOAD", "error": "NO_ERROR", "project": 0, "run": 0, "clone": 0, "gen": 0, "core": "unknown", "unit": "0x00000000000000000000000000000000", "percentdone": "0.00%", "eta": "0.00 secs", "ppd": "0", "creditestimate": "0", "waitingon": "", "nextattempt": "0.00 secs", "timeremaining": "unknown time", "totalframes": 0, "framesdone": 0, "assigned": "<invalid>", "timeout": "<invalid>", "deadline": "<invalid>", "ws": "13.90.152.57", "cs": "0.0.0.0", "attempts": 0, "slot": "00", "tpf": "0.00 secs", "basecredit": "0"}                             
I had to restart to get it folding again. Sending SIGINT once did not close the client; the 13.90.152.57:8080 socket remained open. I had to send it again to force exit.
Jan
Posts: 80
Joined: Tue Mar 31, 2020 6:46 pm

Re: [Bug] Client got stuck after initiating connection to WS

Post by Jan »

I know it sounds like a bit of a weird solution - but here is a report that might help. If you are already on the newest client 7.6.9, maybe try a reboot. If that doesnt work: Here is another approach.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: [Bug] Client got stuck after initiating connection to WS

Post by PantherX »

It seems that you have encountered this known bug: https://github.com/FoldingAtHome/fah-issues/issues/983

I will ad this link too.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Post Reply