[Bug] Client got stuck after initiating connection to WS

Moderators: Site Moderators, FAHC Science Team

[Bug] Client got stuck after initiating connection to WS

Postby Frogging101 » Sat Apr 18, 2020 5:30 pm

My FAHClient has been stuck since 21:32:00 UTC, yesterday (April 17, 2020).

The last lines in the log were as follows:
Code: Select all
21:32:00:WU00:FS00:Connecting to 18.218.241.186:80
21:32:00:WU00:FS00:Assigned to work server 13.90.152.57
21:32:00:WU00:FS00:Requesting new work unit for slot 00: READY cpu:24 from 13.90.152.57
21:32:00:WU00:FS00:Connecting to 13.90.152.57:8080


Upon inspection, it appears that this connection is still in the ESTABLISHED state, some 15 hours later:

Code: Select all
Netid  State      Recv-Q Send-Q Local Address:Port                 Peer Address:Port               
tcp    ESTAB      0      0      10.176.100.154:38750              13.90.152.57:8080                users:(("FAHClient",pid=30164,fd=12))


I would venture a theory that this connection is not actually still active; the connection simply died (without an RST) at a specific time when the client was waiting for a response, which it will never receive.

I could not clear this condition by pausing/unpausing, or using the request-id or request-ws commands. request-ws did connect to an AS and get an "Assigned to work server 150.136.14.110" message, but nothing else happened after that. The original socket to 13.90.152.57:8080 remained open in the ESTABLISHED state throughout these attempts to jog the client.

The FAHControl UI continued to display 13.90.152.57 as the work server, with no next attempt. Here's the full queue-info output:
Code: Select all
  {"id": "00", "state": "DOWNLOAD", "error": "NO_ERROR", "project": 0, "run": 0, "clone": 0, "gen": 0, "core": "unknown", "unit": "0x00000000000000000000000000000000", "percentdone": "0.00%", "eta": "0.00 secs", "ppd": "0", "creditestimate": "0", "waitingon": "", "nextattempt": "0.00 secs", "timeremaining": "unknown time", "totalframes": 0, "framesdone": 0, "assigned": "<invalid>", "timeout": "<invalid>", "deadline": "<invalid>", "ws": "13.90.152.57", "cs": "0.0.0.0", "attempts": 0, "slot": "00", "tpf": "0.00 secs", "basecredit": "0"}                             


I had to restart to get it folding again. Sending SIGINT once did not close the client; the 13.90.152.57:8080 socket remained open. I had to send it again to force exit.
Frogging101
 
Posts: 66
Joined: Wed Mar 25, 2020 3:39 am
Location: Canada

Re: [Bug] Client got stuck after initiating connection to WS

Postby Jan » Sat Apr 18, 2020 8:54 pm

I know it sounds like a bit of a weird solution - but here is a report that might help. If you are already on the newest client 7.6.9, maybe try a reboot. If that doesnt work: Here is another approach.
Jan
 
Posts: 80
Joined: Tue Mar 31, 2020 7:46 pm

Re: [Bug] Client got stuck after initiating connection to WS

Postby PantherX » Sat Apr 18, 2020 10:46 pm

It seems that you have encountered this known bug: https://github.com/FoldingAtHome/fah-issues/issues/983

I will ad this link too.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6345
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud


Return to V7.5.1 Public Release Windows/Linux/MacOS X

Who is online

Users browsing this forum: No registered users and 2 guests

cron