Odd upload failure

Moderators: Site Moderators, FAHC Science Team

Post Reply
billford
Posts: 1005
Joined: Thu May 02, 2013 8:46 pm
Hardware configuration: Full Time:

2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)

Retired:

3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop
Location: Near Oxford, United Kingdom
Contact:

Odd upload failure

Post by billford »

Log extract:

Code: Select all

******************************* Date: 2014-11-22 *******************************
.
.
09:28:50:WU02:FS00:0xa4:- Shutting down core
09:28:50:WU02:FS00:0xa4:
09:28:50:WU02:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
09:29:01:WU02:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
09:29:02:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:9015 run:467 clone:2 gen:51 core:0xa4 unit:0x00000042664f2de453e55e1e332f655a
09:29:02:WU02:FS00:Uploading 1.64MiB to 171.64.65.124
09:29:02:WU02:FS00:Connecting to 171.64.65.124:8080
And there it stayed… if the server responded (and I suspect it did) the the client didn't see it, but there was no timeout and the client didn't go into the usual retry-at-increasing-intervals sequence. (It carried on as usual with a newly downloaded WU of course)

I only noticed it on this morning's check around the clients- after a reboot it connected to 171.64.65.124 and uploaded the WU with no bother.

I'm fairly sure that it was most probably caused by some sort of OS/client/network problem at my end, but I'd be interested in comments from others.

In particular, is this what I would expect from the known bug where FAH doesn't recover gracefully from a loss of internet connection during upload? I haven't had that happen before.

Though I should add that if there were such a break it would have been very brief, there's no sign of it in the router log or my monitoring app.
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Odd upload failure

Post by bruce »

Odd is a good word for it.
It's not clear from your log whether or not a connection was ever established with that server. In all other reports on Internet interruptions,that I've seen, there were %_reports so it was clear the connection was established and then interrupted, followed by a hang. Maybe this is the same bug and maybe not.

Restarting the client is still the only known recovery. The WU upload should restart.
davidcoton
Posts: 1102
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: Odd upload failure

Post by davidcoton »

billford wrote:In particular, is this what I would expect from the known bug where FAH doesn't recover gracefully from a loss of internet connection during upload? I haven't had that happen before.
While it is not entirely clear, I think Yes. What seems to happen is that the ACK packet (very small) gets lost en route from server to client, and the link just hangs. The logs do not reveal enough to be certain about what happened, but the programmers ought to be able to solve it when it gets enough priority to be looked at. (Disclaimer -- I haven't looked at the code. I could be completely wrong :twisted: )
Image
billford
Posts: 1005
Joined: Thu May 02, 2013 8:46 pm
Hardware configuration: Full Time:

2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)

Retired:

3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop
Location: Near Oxford, United Kingdom
Contact:

Re: Odd upload failure

Post by billford »

Thanks both. Looks like I'll have to put it down as another of life's little mysteries.

And remember to check the clients a bit more often- it was so late it only got 530 points, I probably lost more than that with the reboot and the 780 Ti in the same machine having to restart from a checkpoint :shock:
Image
Post Reply