140.163.4.200

Moderators: Site Moderators, FAHC Science Team

TristanChen
Posts: 21
Joined: Tue May 30, 2017 4:55 am

Re: 140.163.4.200

Post by TristanChen »

Really appreciate your kind words and encouragement!
On this laptop alone, there are currently 3 completed work units waiting to send to 140.163.4.210.
Work unit 1: 40 attempts
Work unit 2: 47 attempts
Work unit 3: 18 attempts
And I got 9 other full-sized rigs....

To me this is worse than what happened in April. Because if I simply cannot get work units (as was the case back then), I can always repurpose my rigs to do other useful science like BOINC/Rosetta. But collection server issues essentially destroy completed work, at which point I feel no better than miners adding to global warming... :(
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 140.163.4.200

Post by Joe_H »

Please clarify since this topic is about 140.163.4.200. Are these WUs from WS 140.163.4.200 that are failing to upload to both the WS and to 140.163.4.210 as its alternate CS? Do WUs to other servers upload without problems? Your two posts in this topic are ambiguous towards helping you get this fixed up.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
TristanChen
Posts: 21
Joined: Tue May 30, 2017 4:55 am

Re: 140.163.4.200

Post by TristanChen »

Hi Joe,
Thanks for getting back to me. The issues appears to be with both 140.163.4.200 and 140.163.4.210. As in my clients would first try to upload the completed work units to one server (e.g. 140.163.4.200), it would fail, and then on the next attempt, it would try the other server (140.163.4.210) and fail again.

I do not see this issue with any other servers. All of my failed uploads are to these two servers. I successfully uploaded 75 completed work units yesterday and 16 so far today.
https://folding.extremeoverclocking.com ... =&u=724183

As we speak (I'm on one of my other computers now), on this desktop right now I have 4 completed work units unable to be uploaded, with 47, 39, 25, and 9 retries respectively. All of them were received from work server 140.163.4.200 and all of them list 140.163.4.210 as the collection server.

Hope this helps to get this issue resolved! It is a real pain point for me and I'm sure a lot of other folders. Thank you!
TristanChen
Posts: 21
Joined: Tue May 30, 2017 4:55 am

Re: 140.163.4.200

Post by TristanChen »

If I might offer a suggestion. If this server is consistently having trouble receiving completed work, can we at least take it offline and not let it issue new work until the issue is resolved? Thanks!
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: 140.163.4.200

Post by Neil-B »

If all upload attempts from all folders are failing I am sure this course of action will be considered ... but by taking down the server all work of the projects hosted on it ceases - if it is only a percentage (even quite a large one) who are having issues uploading - or if it is just slow/delayed returns that is still better for the projects concerned than nothing? ... these things are tough calls but I am sure all options will be being considered
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 140.163.4.200

Post by bruce »

Neil-B wrote:If all upload attempts from all folders are failing I am sure this course of action will be considered ... but by taking down the server all work of the projects hosted on it ceases - if it is only a percentage (even quite a large one) who are having issues uploading - or if it is just slow/delayed returns that is still better for the projects concerned than nothing?
The likelihood is that it's all work issued by either of those servers. It may be a campus security policy issue that needs an exception reenabled but it might be a disk-full issue. I've attempted to notify the appropriate admins.
TristanChen
Posts: 21
Joined: Tue May 30, 2017 4:55 am

Re: 140.163.4.200

Post by TristanChen »

Just providing an update. The issues are still unresolved. On this desktop alone (1 of 9 for me), I currently have 6 completed work units awaiting upload. All from work server 140.163.4.200 and all being uploaded to 140.163.4.210 collection server. Failures still seen while uploading to both servers. The oldest of the work units is now 3 days old and about to expire.

Please... Just turn this server off...
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 140.163.4.200

Post by Joe_H »

Only a few people are having this kind of problem with the server. The vast majority are not having upload problems, and they are looking into the issue. Since you have provided no information on which WUs are not being uploaded, we and they are limited in how much they can check.

Have you checked any of the WUs to see if they uploaded using the app to check their status - https://apps.foldingathome.org/wu. If they are uploaded for you, it is possible the ACK packet from the server is not getting back to your system to start the cleanup process after a successful upload. If that is the case the client will retry until it get an "Already got" message or the WU reaches the final deadline.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
TristanChen
Posts: 21
Joined: Tue May 30, 2017 4:55 am

Re: 140.163.4.200

Post by TristanChen »

Hi Joe,

The 5 remaining completed/stuck work units on this computer are (one expired a few hours ago and was dumped):
17321 (0, 705, 41) - 15 retries, WS: 140.163.4.200, CS: 140.163.4.210
17322 (0, 1573, 58) - 43 retries, WS: 140.163.4.200, CS: 140.163.4.210
17317 (0, 1789, 69) - 43 retries, WS: 140.163.4.200, CS: 140.163.4.210
17315 (0, 989, 66) - 22 retries, WS: 140.163.4.200, CS: 140.163.4.210
17315 (0, 310, 69) - 39 retries, WS: 140.163.4.200, CS: 140.163.4.210

I checked the status app website that you've provided, and it appears that four out of the five work units above have never been successfully completed (by anyone). Only one of them 17317 (0, 1789, 69) appears to have been successfully completed by anyone.

Hope this helps!
TristanChen
Posts: 21
Joined: Tue May 30, 2017 4:55 am

Re: 140.163.4.200

Post by TristanChen »

I've jumped onto another desktop and on this machine the stuck/completed units are:
17319 (0, 1687, 61) - 40 attempts, WS: 140.163.4.200, CS: 140.163.4.210
17322 (0, 650, 54) - 44 attempts, WS: 140.163.4.200, CS: 140.163.4.210
17315 (0, 1125, 66) - 33 attempts, WS: 140.163.4.200, CS: 140.163.4.210
17321 (0, 1156, 39) - 21 attempts, WS: 140.163.4.200, CS: 140.163.4.210

Unfortunately, it appears that the status app website could not find any of the four work units. Not sure why...
Let me know if you need me to post more faulty WUs. I've got at least 20 more just like them on other machines.

Just want to reiterate that I'm not really a casual folder... I completed something like 20,000 work units last year... and successfully uploaded 62 work units yesterday to other servers. https://folding.extremeoverclocking.com ... =&u=724183
rafwiewiora
Scientist
Posts: 167
Joined: Mon Aug 03, 2015 8:23 pm
Location: New York

Re: 140.163.4.200

Post by rafwiewiora »

Hey TristanChen,

Just letting you know I'm working with Joseph, the F@h dev, on this. Something's definitely up --- I've got download problems on my own on the same network! Will keep updated.
rafwiewiora
Scientist
Posts: 167
Joined: Mon Aug 03, 2015 8:23 pm
Location: New York

Re: 140.163.4.200

Post by rafwiewiora »

So my problem turned out to be something completely different --- since it's just you and another person on Discord complaining and historically we've never been able to work out why from time to time a couple of people start seeing problem, there's really nothing I can do on our side to fix it --- so I suggest we do a simple solution of 'banning' you from these servers so you'll get work from others --- if you could DM me your username and IP I can get that done.
TristanChen
Posts: 21
Joined: Tue May 30, 2017 4:55 am

Re: 140.163.4.200

Post by TristanChen »

Great! Just sent you the PM. Lemme know if you don't see it?
TristanChen
Posts: 21
Joined: Tue May 30, 2017 4:55 am

Re: 140.163.4.200

Post by TristanChen »

Just to give an update on this issue. The situation seems to have gotten even worse for me. As of this morning, on this desktop, I now have 7 stuck/completed work units.
Why am I still getting work from this work server? Was I just banned from the collection servers?!

17317 (0, 1235, 96) - 11 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17318 (0, 627, 156) - 21 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17319 (0, 1678, 88) - 27 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17323 (0, 205, 9) - 63 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17322 (0, 1062, 92) - 46 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17320 (0, 1992, 33) - 13 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17322 (0, 466, 84) - 39 attempts - WS: 140.163.4.200, CS: 140.163.4.210
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 140.163.4.200

Post by bruce »

Choose a couple of representative WUs and post the segment of FAH's log showing a few upload failures and the nearby messages.
Post Reply