140.163.4.200

Moderators: Site Moderators, FAHC Science Team

Re: 140.163.4.200

Postby TristanChen » Tue Jan 05, 2021 11:30 pm

Really appreciate your kind words and encouragement!
On this laptop alone, there are currently 3 completed work units waiting to send to 140.163.4.210.
Work unit 1: 40 attempts
Work unit 2: 47 attempts
Work unit 3: 18 attempts
And I got 9 other full-sized rigs....

To me this is worse than what happened in April. Because if I simply cannot get work units (as was the case back then), I can always repurpose my rigs to do other useful science like BOINC/Rosetta. But collection server issues essentially destroy completed work, at which point I feel no better than miners adding to global warming... :(
TristanChen
 
Posts: 21
Joined: Tue May 30, 2017 5:55 am

Re: 140.163.4.200

Postby Joe_H » Tue Jan 05, 2021 11:54 pm

Please clarify since this topic is about 140.163.4.200. Are these WUs from WS 140.163.4.200 that are failing to upload to both the WS and to 140.163.4.210 as its alternate CS? Do WUs to other servers upload without problems? Your two posts in this topic are ambiguous towards helping you get this fixed up.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6906
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: 140.163.4.200

Postby TristanChen » Thu Jan 07, 2021 2:04 pm

Hi Joe,
Thanks for getting back to me. The issues appears to be with both 140.163.4.200 and 140.163.4.210. As in my clients would first try to upload the completed work units to one server (e.g. 140.163.4.200), it would fail, and then on the next attempt, it would try the other server (140.163.4.210) and fail again.

I do not see this issue with any other servers. All of my failed uploads are to these two servers. I successfully uploaded 75 completed work units yesterday and 16 so far today.
https://folding.extremeoverclocking.com ... =&u=724183

As we speak (I'm on one of my other computers now), on this desktop right now I have 4 completed work units unable to be uploaded, with 47, 39, 25, and 9 retries respectively. All of them were received from work server 140.163.4.200 and all of them list 140.163.4.210 as the collection server.

Hope this helps to get this issue resolved! It is a real pain point for me and I'm sure a lot of other folders. Thank you!
TristanChen
 
Posts: 21
Joined: Tue May 30, 2017 5:55 am

Re: 140.163.4.200

Postby TristanChen » Thu Jan 07, 2021 2:16 pm

If I might offer a suggestion. If this server is consistently having trouble receiving completed work, can we at least take it offline and not let it issue new work until the issue is resolved? Thanks!
TristanChen
 
Posts: 21
Joined: Tue May 30, 2017 5:55 am

Re: 140.163.4.200

Postby Neil-B » Thu Jan 07, 2021 3:02 pm

If all upload attempts from all folders are failing I am sure this course of action will be considered ... but by taking down the server all work of the projects hosted on it ceases - if it is only a percentage (even quite a large one) who are having issues uploading - or if it is just slow/delayed returns that is still better for the projects concerned than nothing? ... these things are tough calls but I am sure all options will be being considered
1: 2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent, Quadro K420
2: Xeon E3-1505Mv5, 32GB DDR4, NVME, Win10 Pro, Quadro M1000M
3: i7-960, 12GB DDR3, SSD, Win10 Pro, GTX 750Ti
4: i9-10850K, 64GB DDR4, NVME, Win 10 Pro, RTX3070
Neil-B
 
Posts: 1754
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: 140.163.4.200

Postby bruce » Thu Jan 07, 2021 6:40 pm

Neil-B wrote:If all upload attempts from all folders are failing I am sure this course of action will be considered ... but by taking down the server all work of the projects hosted on it ceases - if it is only a percentage (even quite a large one) who are having issues uploading - or if it is just slow/delayed returns that is still better for the projects concerned than nothing?


The likelihood is that it's all work issued by either of those servers. It may be a campus security policy issue that needs an exception reenabled but it might be a disk-full issue. I've attempted to notify the appropriate admins.
bruce
 
Posts: 20571
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: 140.163.4.200

Postby TristanChen » Sat Jan 09, 2021 4:09 am

Just providing an update. The issues are still unresolved. On this desktop alone (1 of 9 for me), I currently have 6 completed work units awaiting upload. All from work server 140.163.4.200 and all being uploaded to 140.163.4.210 collection server. Failures still seen while uploading to both servers. The oldest of the work units is now 3 days old and about to expire.

Please... Just turn this server off...
TristanChen
 
Posts: 21
Joined: Tue May 30, 2017 5:55 am

Re: 140.163.4.200

Postby Joe_H » Sat Jan 09, 2021 4:56 am

Only a few people are having this kind of problem with the server. The vast majority are not having upload problems, and they are looking into the issue. Since you have provided no information on which WUs are not being uploaded, we and they are limited in how much they can check.

Have you checked any of the WUs to see if they uploaded using the app to check their status - https://apps.foldingathome.org/wu. If they are uploaded for you, it is possible the ACK packet from the server is not getting back to your system to start the cleanup process after a successful upload. If that is the case the client will retry until it get an "Already got" message or the WU reaches the final deadline.
Joe_H
Site Admin
 
Posts: 6906
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: 140.163.4.200

Postby TristanChen » Sat Jan 09, 2021 7:25 am

Hi Joe,

The 5 remaining completed/stuck work units on this computer are (one expired a few hours ago and was dumped):
17321 (0, 705, 41) - 15 retries, WS: 140.163.4.200, CS: 140.163.4.210
17322 (0, 1573, 58) - 43 retries, WS: 140.163.4.200, CS: 140.163.4.210
17317 (0, 1789, 69) - 43 retries, WS: 140.163.4.200, CS: 140.163.4.210
17315 (0, 989, 66) - 22 retries, WS: 140.163.4.200, CS: 140.163.4.210
17315 (0, 310, 69) - 39 retries, WS: 140.163.4.200, CS: 140.163.4.210

I checked the status app website that you've provided, and it appears that four out of the five work units above have never been successfully completed (by anyone). Only one of them 17317 (0, 1789, 69) appears to have been successfully completed by anyone.

Hope this helps!
TristanChen
 
Posts: 21
Joined: Tue May 30, 2017 5:55 am

Re: 140.163.4.200

Postby TristanChen » Sat Jan 09, 2021 7:47 am

I've jumped onto another desktop and on this machine the stuck/completed units are:
17319 (0, 1687, 61) - 40 attempts, WS: 140.163.4.200, CS: 140.163.4.210
17322 (0, 650, 54) - 44 attempts, WS: 140.163.4.200, CS: 140.163.4.210
17315 (0, 1125, 66) - 33 attempts, WS: 140.163.4.200, CS: 140.163.4.210
17321 (0, 1156, 39) - 21 attempts, WS: 140.163.4.200, CS: 140.163.4.210

Unfortunately, it appears that the status app website could not find any of the four work units. Not sure why...
Let me know if you need me to post more faulty WUs. I've got at least 20 more just like them on other machines.

Just want to reiterate that I'm not really a casual folder... I completed something like 20,000 work units last year... and successfully uploaded 62 work units yesterday to other servers. https://folding.extremeoverclocking.com ... =&u=724183
TristanChen
 
Posts: 21
Joined: Tue May 30, 2017 5:55 am

Re: 140.163.4.200

Postby rafwiewiora » Sat Jan 09, 2021 9:19 am

Hey TristanChen,

Just letting you know I'm working with Joseph, the F@h dev, on this. Something's definitely up --- I've got download problems on my own on the same network! Will keep updated.
rafwiewiora
Scientist
 
Posts: 167
Joined: Mon Aug 03, 2015 9:23 pm
Location: New York

Re: 140.163.4.200

Postby rafwiewiora » Thu Jan 14, 2021 8:59 pm

So my problem turned out to be something completely different --- since it's just you and another person on Discord complaining and historically we've never been able to work out why from time to time a couple of people start seeing problem, there's really nothing I can do on our side to fix it --- so I suggest we do a simple solution of 'banning' you from these servers so you'll get work from others --- if you could DM me your username and IP I can get that done.
rafwiewiora
Scientist
 
Posts: 167
Joined: Mon Aug 03, 2015 9:23 pm
Location: New York

Re: 140.163.4.200

Postby TristanChen » Sat Jan 16, 2021 7:41 am

Great! Just sent you the PM. Lemme know if you don't see it?
TristanChen
 
Posts: 21
Joined: Tue May 30, 2017 5:55 am

Re: 140.163.4.200

Postby TristanChen » Tue Jan 19, 2021 3:11 am

Just to give an update on this issue. The situation seems to have gotten even worse for me. As of this morning, on this desktop, I now have 7 stuck/completed work units.
Why am I still getting work from this work server? Was I just banned from the collection servers?!

17317 (0, 1235, 96) - 11 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17318 (0, 627, 156) - 21 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17319 (0, 1678, 88) - 27 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17323 (0, 205, 9) - 63 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17322 (0, 1062, 92) - 46 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17320 (0, 1992, 33) - 13 attempts - WS: 140.163.4.200, CS: 140.163.4.210
17322 (0, 466, 84) - 39 attempts - WS: 140.163.4.200, CS: 140.163.4.210
TristanChen
 
Posts: 21
Joined: Tue May 30, 2017 5:55 am

Re: 140.163.4.200

Postby bruce » Tue Jan 19, 2021 4:02 am

Choose a couple of representative WUs and post the segment of FAH's log showing a few upload failures and the nearby messages.
bruce
 
Posts: 20571
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

PreviousNext

Return to Issues with a specific server

Who is online

Users browsing this forum: Yandex [Bot] and 4 guests

cron