Is 128.252.203.10 configured correctly?

Moderators: Site Moderators, FAHC Science Team

Is 128.252.203.10 configured correctly?

Postby BobHehmann » Wed May 06, 2020 8:16 pm

Obviously there are some unresolved ongoing technical problems with Work Server 128.252.203.10. Understood. However, while waiting for hours to upload a completed GPU Work Unit back to this server [WU 11759(0, 5680, 59)], I noticed that my FAH control panel indicates this WU has no Collection Server: IP 0.0.0.0. (Other WUs show explicit alternate IP addresses for their Collection Servers, as I expect.) I take this to mean that this completed WU can only be returned back to its Work Server, that there is no fail-over collection point. But the Server Stats page does show this Work Server as having a Collection Server configured. Is something misconfigured with the server, or perhaps with the WU family?

Second thing I noticed - the Server Stats page column for uptime is rather misleading, or at least not being calculated in a way that correlates with outside user perception. This troubled server has obviously been restarted multiple times today - and after the implied restarts, uptime starts incrementing following the wall clock. However, the column for time of last contact just sits - implying that the server "hung" quickly, and is going nowhere. Presumably, with the developing backlog, contacts would come nearly continuously, so last contact time will generally closely track with clock time as long as work is flowing. In many of my critical (monitored) datacenter servers, that is one of the first levels of health monitoring - does a monitored server respond to the monitor's regular polling? If not, raise an alert. Lacking such monitoring infrastructure, lack of progress in the time of last contact would seem one effective proxy for a failed server (especially with so many users trolling for work!)

Anyway, best wishes to the folks on the front lines, and thanks for providing a way for us to help with the science!

Cheers, Bob
BobHehmann
 
Posts: 2
Joined: Wed May 06, 2020 7:52 pm

Re: Is 128.252.203.10 configured correctly?

Postby Joe_H » Wed May 06, 2020 8:35 pm

A CS may not have been setup for the WS, Project, or WU at the time it was downloaded to your system. If a CS was added later, that is not retroactive.

Projects on a single WS may have different CS addresses assigned, though that is less common currently.

Last contact time would be to the server managing the Server Status page, a WS freshly rebooted would "check in". But later contacts might get lost or delayed if the WS is swamped with network requests for downloads and uploads.

Additional information has been posted by the person managing this server in other topics.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6439
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA


Return to Issues with a specific server

Who is online

Users browsing this forum: No registered users and 2 guests

cron