Page 4 of 5

Re: 140.163.4.200

Posted: Tue Jan 05, 2021 10:30 pm
by TristanChen
Really appreciate your kind words and encouragement!
On this laptop alone, there are currently 3 completed work units waiting to send to 140.163.4.210.
Work unit 1: 40 attempts
Work unit 2: 47 attempts
Work unit 3: 18 attempts
And I got 9 other full-sized rigs....

To me this is worse than what happened in April. Because if I simply cannot get work units (as was the case back then), I can always repurpose my rigs to do other useful science like BOINC/Rosetta. But collection server issues essentially destroy completed work, at which point I feel no better than miners adding to global warming... :(

Re: 140.163.4.200

Posted: Tue Jan 05, 2021 10:54 pm
by Joe_H
Please clarify since this topic is about 140.163.4.200. Are these WUs from WS 140.163.4.200 that are failing to upload to both the WS and to 140.163.4.210 as its alternate CS? Do WUs to other servers upload without problems? Your two posts in this topic are ambiguous towards helping you get this fixed up.

Re: 140.163.4.200

Posted: Thu Jan 07, 2021 1:04 pm
by TristanChen
Hi Joe,
Thanks for getting back to me. The issues appears to be with both 140.163.4.200 and 140.163.4.210. As in my clients would first try to upload the completed work units to one server (e.g. 140.163.4.200), it would fail, and then on the next attempt, it would try the other server (140.163.4.210) and fail again.

I do not see this issue with any other servers. All of my failed uploads are to these two servers. I successfully uploaded 75 completed work units yesterday and 16 so far today.
https://folding.extremeoverclocking.com ... =&u=724183

As we speak (I'm on one of my other computers now), on this desktop right now I have 4 completed work units unable to be uploaded, with 47, 39, 25, and 9 retries respectively. All of them were received from work server 140.163.4.200 and all of them list 140.163.4.210 as the collection server.

Hope this helps to get this issue resolved! It is a real pain point for me and I'm sure a lot of other folders. Thank you!

Re: 140.163.4.200

Posted: Thu Jan 07, 2021 1:16 pm
by TristanChen
If I might offer a suggestion. If this server is consistently having trouble receiving completed work, can we at least take it offline and not let it issue new work until the issue is resolved? Thanks!

Re: 140.163.4.200

Posted: Thu Jan 07, 2021 2:02 pm
by Neil-B
If all upload attempts from all folders are failing I am sure this course of action will be considered ... but by taking down the server all work of the projects hosted on it ceases - if it is only a percentage (even quite a large one) who are having issues uploading - or if it is just slow/delayed returns that is still better for the projects concerned than nothing? ... these things are tough calls but I am sure all options will be being considered

Re: 140.163.4.200

Posted: Thu Jan 07, 2021 5:40 pm
by bruce
Neil-B wrote:If all upload attempts from all folders are failing I am sure this course of action will be considered ... but by taking down the server all work of the projects hosted on it ceases - if it is only a percentage (even quite a large one) who are having issues uploading - or if it is just slow/delayed returns that is still better for the projects concerned than nothing?
The likelihood is that it's all work issued by either of those servers. It may be a campus security policy issue that needs an exception reenabled but it might be a disk-full issue. I've attempted to notify the appropriate admins.

Re: 140.163.4.200

Posted: Sat Jan 09, 2021 3:09 am
by TristanChen
Just providing an update. The issues are still unresolved. On this desktop alone (1 of 9 for me), I currently have 6 completed work units awaiting upload. All from work server 140.163.4.200 and all being uploaded to 140.163.4.210 collection server. Failures still seen while uploading to both servers. The oldest of the work units is now 3 days old and about to expire.

Please... Just turn this server off...

Re: 140.163.4.200

Posted: Sat Jan 09, 2021 3:56 am
by Joe_H
Only a few people are having this kind of problem with the server. The vast majority are not having upload problems, and they are looking into the issue. Since you have provided no information on which WUs are not being uploaded, we and they are limited in how much they can check.

Have you checked any of the WUs to see if they uploaded using the app to check their status - https://apps.foldingathome.org/wu. If they are uploaded for you, it is possible the ACK packet from the server is not getting back to your system to start the cleanup process after a successful upload. If that is the case the client will retry until it get an "Already got" message or the WU reaches the final deadline.

Re: 140.163.4.200

Posted: Sat Jan 09, 2021 6:25 am
by TristanChen
Hi Joe,

The 5 remaining completed/stuck work units on this computer are (one expired a few hours ago and was dumped):
17321 (0, 705, 41) - 15 retries, WS: 140.163.4.200, CS: 140.163.4.210
17322 (0, 1573, 58) - 43 retries, WS: 140.163.4.200, CS: 140.163.4.210
17317 (0, 1789, 69) - 43 retries, WS: 140.163.4.200, CS: 140.163.4.210
17315 (0, 989, 66) - 22 retries, WS: 140.163.4.200, CS: 140.163.4.210
17315 (0, 310, 69) - 39 retries, WS: 140.163.4.200, CS: 140.163.4.210

I checked the status app website that you've provided, and it appears that four out of the five work units above have never been successfully completed (by anyone). Only one of them 17317 (0, 1789, 69) appears to have been successfully completed by anyone.

Hope this helps!

Re: 140.163.4.200

Posted: Sat Jan 09, 2021 6:47 am
by TristanChen
I've jumped onto another desktop and on this machine the stuck/completed units are:
17319 (0, 1687, 61) - 40 attempts, WS: 140.163.4.200, CS: 140.163.4.210
17322 (0, 650, 54) - 44 attempts, WS: 140.163.4.200, CS: 140.163.4.210
17315 (0, 1125, 66) - 33 attempts, WS: 140.163.4.200, CS: 140.163.4.210
17321 (0, 1156, 39) - 21 attempts, WS: 140.163.4.200, CS: 140.163.4.210

Unfortunately, it appears that the status app website could not find any of the four work units. Not sure why...
Let me know if you need me to post more faulty WUs. I've got at least 20 more just like them on other machines.

Just want to reiterate that I'm not really a casual folder... I completed something like 20,000 work units last year... and successfully uploaded 62 work units yesterday to other servers. https://folding.extremeoverclocking.com ... =&u=724183

Re: 140.163.4.200

Posted: Sat Jan 09, 2021 8:19 am
by rafwiewiora
Hey TristanChen,

Just letting you know I'm working with Joseph, the F@h dev, on this. Something's definitely up --- I've got download problems on my own on the same network! Will keep updated.

Re: 140.163.4.200

Posted: Thu Jan 14, 2021 7:59 pm
by rafwiewiora
So my problem turned out to be something completely different --- since it's just you and another person on Discord complaining and historically we've never been able to work out why from time to time a couple of people start seeing problem, there's really nothing I can do on our side to fix it --- so I suggest we do a simple solution of 'banning' you from these servers so you'll get work from others --- if you could DM me your username and IP I can get that done.

Re: 140.163.4.200

Posted: Sat Jan 16, 2021 6:41 am
by TristanChen
Great! Just sent you the PM. Lemme know if you don't see it?

Re: 140.163.4.200

Posted: Tue Jan 19, 2021 2:11 am
by TristanChen
Just to give an update on this issue. The situation seems to have gotten even worse for me. As of this morning, on this desktop, I now have 7 stuck/completed work units.
Why am I still getting work from this work server? Was I just banned from the collection servers?!

17317 (0, 1235, 96) - 11 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17318 (0, 627, 156) - 21 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17319 (0, 1678, 88) - 27 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17323 (0, 205, 9) - 63 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17322 (0, 1062, 92) - 46 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17320 (0, 1992, 33) - 13 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17322 (0, 466, 84) - 39 attempts - WS: 140.163.4.200, CS: 140.163.4.210

Re: 140.163.4.200

Posted: Tue Jan 19, 2021 3:02 am
by bruce
Choose a couple of representative WUs and post the segment of FAH's log showing a few upload failures and the nearby messages.