multiple assignment servers

Moderators: Site Moderators, FAHC Science Team

Post Reply
arcturius
Posts: 4
Joined: Thu May 12, 2016 9:36 pm

multiple assignment servers

Post by arcturius »

Hello,

Starting 2-3 days ago, I am unable to get assignments from 171.67.108.45 and 171.64.65.35, and likely others, for hosts on 192.48.154.0/23. Possibly they don't like something about 192.0.0.0/8 hosts?
I can pull index.html from them, but actually getting assignments fails. I think the 'empty work server assignment' is probably inaccurate, and really just a result of the connection failure.

Strangely, traceroutes from that network fail at the last hop, where those from 68.112.192/20 are fine (see below).

The actual messages are:
18:19:34:ERROR:WU00:FS00:Exception: Could not get an assignment 19:34:31:WU00:FS00:Connecting to 171.67.108.45:8080 19:35:34:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.45:8080': Failed to connect to 171.67.108.45:8080: Connection timed out
19:35:34:WU00:FS00:Connecting to 171.64.65.35:80 19:35:35:WARNING:WU00:FS00:Failed to get assignment from '171.64.65.35:80': Empty work server assignment

Code: Select all

Bad traceroute:
$ traceroute -w 10 171.67.108.45
traceroute to 171.67.108.45 (171.67.108.45), 30 hops max, 60 byte packets
...
  4  12.249.235.73 (12.249.235.73)  11.402 ms  11.428 ms  11.411 ms
  5  cr81.mpsmn.ip.att.net (12.122.132.202)  20.419 ms  20.435 ms 20.420 ms
  6  * * *
  7  cgr1.cgcil.ip.att.net (12.122.132.157)  27.857 ms  27.850 ms 27.824 ms
  8  v205.core1.chi1.he.net (216.66.78.117)  28.584 ms  17.738 ms 17.694 ms
  9  10ge11-4.core1.pao1.he.net (184.105.222.173)  56.080 ms  55.900 ms
55.893 ms
10  * * *
11  west-rtr-vlan8.SUNet (171.64.255.193)  67.545 ms  67.557 ms 67.541 ms
12  * * *
...
30  * * *
$ traceroute -w 10 171.64.65.35
traceroute to 171.64.65.35 (171.64.65.35), 30 hops max, 60 byte packets
...
  4  12.249.235.73 (12.249.235.73)  11.660 ms  11.669 ms  11.651 ms
  5  cr81.mpsmn.ip.att.net (12.122.132.202)  22.657 ms  22.656 ms 22.639 ms
  6  * * *
  7  cgr1.cgcil.ip.att.net (12.122.132.157)  27.296 ms  27.308 ms 27.300 ms
  8  v205.core1.chi1.he.net (216.66.78.117)  29.034 ms  18.285 ms 29.024 ms
  9  10ge11-4.core1.pao1.he.net (184.105.222.173)  68.307 ms  56.563 ms
68.289 ms
10  stanford-university.10gigabitethernet1-4.core1.pao1.he.net
(216.218.209.118)  55.994 ms  56.006 ms  55.992 ms
11  csmx-rtf-rtr.SUNet (171.64.255.215)  66.478 ms  66.326 ms 66.312 ms
12  * * *
...
30  * * *
$


Good traceroute:
$ traceroute -w 10 171.67.108.45
traceroute to 171.67.108.45 (171.67.108.45), 64 hops max, 40 byte packets
...
  4  prr01mplsmn-bue-2.mpls.mn.charter.com (96.34.3.33)  23.17 ms 21.72 ms  22.273 ms
  5  10ge1-2.core1.msp1.he.net (184.105.253.237)  26.409 ms  18.632 ms
39.552 ms
  6  10ge1-5.core1.den1.he.net (184.105.222.41)  50.120 ms  47.826 ms
50.264 ms
  7  10ge4-2.core1.slc1.he.net (184.105.222.154)  68.231 ms 10ge13-5.core1.sjc2.he.net (184.105.213.105)  67.442 ms  65.169 ms
  8  10ge3-3.core1.pao1.he.net (72.52.92.69)  67.168 ms  65.535 ms 10ge5-17.core1.sjc2.he.net (184.105.223.157)  64.12 ms
  9  stanford-university.10gigabitethernet1-4.core1.pao1.he.net
(216.218.209.118)  60.547 ms  65.667 ms  64.180 ms
10  west-rtr-vlan8.SUNet (171.64.255.193)  88.27 ms stanford-university.10gigabitethernet1-4.core1.pao1.he.net
(216.218.209.118)  65.885 mso west-rtr-vlan8.sunet (171.64.255.193)
78.341 ms
11  VSP11.stanford.edu (171.67.108.45)  79.199 ms !C  75.849 ms !C 86.58 ms !C 
$  traceroute -w 10 171.64.65.35 traceroute to 171.64.65.35 (171.64.65.35), 30 hops max, 38 byte packets
  ...
  4  prr01mplsmn-bue-2.mpls.mn.charter.com (96.34.3.33)  16.189 ms
16.915 ms  18.688 ms
  5  10ge1-2.core1.msp1.he.net (184.105.253.237)  15.612 ms  13.883 ms
21.233 ms
  6  10ge1-5.core1.den1.he.net (184.105.222.41)  43.558 ms  33.004 ms
34.680 ms
  7  10ge4-2.core1.slc1.he.net (184.105.222.154)  63.553 ms 10ge13-5.core1.sjc2.he.net (184.105.213.105)  69.382 ms  60.135 ms
  8  10ge5-17.core1.sjc2.he.net (184.105.223.157)  65.562 ms 10ge3-3.core1.pao1.he.net (72.52.92.69)  57.243 ms  68.844 ms
  9  stanford-university.10gigabitethernet1-4.core1.pao1.he.net
(216.218.209.118)  60.857 ms  62.390 ms 10ge3-3.core1.pao1.he.net
(72.52.92.69)  57.750 ms
10  csmx-rtf-rtr.SUNet (171.64.255.215)  76.585 ms stanford-university.10gigabitethernet1-4.core1.pao1.he.net
(216.218.209.118)  64.860 ms csmx-rtf-rtr-vl8.SUNet (171.64.255.215)
74.814 ms
11  assignx.stanford.edu (171.64.65.35)  73.027 ms !<10> csmx-rtf-rtr.SUNet (171.64.255.215)  73.686 ms assignx.stanford.edu
(171.64.65.35)  74.218 ms !<10>
$
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: multiple assignment servers

Post by bruce »

I've reported a Stanford network problem to the appropriate people.
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: multiple assignment servers

Post by toTOW »

Is it really a network problem ?

In the messages listed by arcturius, I see one error (Connection timed out), but one message that could also be a problem in the client settings (Empty work server assignment) ...

192.48.154.0/23 are registered to a corporate organization (Silicon Graphics International Corp.), so they're probably on a network with security filters (proxy or something similar) that could block the client from connecting to the FAH servers ... can you tell us more about that environment ?
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
arcturius
Posts: 4
Joined: Thu May 12, 2016 9:36 pm

Re: multiple assignment servers

Post by arcturius »

There is a content filter, though I am able to get work most of the time. 171.67.108.58 appears to be OK.
Can the client output the URL of files it is requesting? That would help troubleshooting content filter problems.
Traceroute inconsistency is still suspicious, though.
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: multiple assignment servers

Post by toTOW »

Some security services also doesn't like direct IP accesses and deny them ... unfortunately, the FAHClient always use IP to communicate with servers and this can't be changed.

Note that if you're running FAHClient on corporate computers, you need to get written permission before doing so to avoid issues ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: multiple assignment servers

Post by bruce »

Security services do have a good reason to block IP addresses since there's a reasonable percentage of malware exploits that utilize IP addresses.

Nevertheless, FAH uses DNS names only for the Assignment Servers and uses IPV4 addresses to communicate with the work servers. You can't download a work assignment using a DNS name, and even if you did, you wouldn't be able to fold that assignment.

You'll probably need the help of the administrator of the proxy server. If they can bypass the IP address restriction for a range of addresses, they can find the addresses used by FAH on http://fah-web.stanford.edu/pybeta/serverstat.html
arcturius
Posts: 4
Joined: Thu May 12, 2016 9:36 pm

Re: multiple assignment servers

Post by arcturius »

Thanks.

I asked this before: is there a way to get the client to print the URLs it is trying to access? I'd like to troubleshoot this more, and make it very easy for the admins when I do go make a request.
I'd rather not resort to tcpdump...
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: multiple assignment servers

Post by toTOW »

Everything is printed in the log file ...
14:53:34:WU01:FS00:Connecting to 171.67.108.45:80
14:53:35:WU01:FS00:Assigned to work server 140.163.4.241
14:53:35:WU01:FS00:Requesting new work unit for slot 00: RUNNING gpu:0:GM200 [GeForce GTX 980 Ti] from 140.163.4.241
14:53:35:WU01:FS00:Connecting to 140.163.4.241:8080
14:53:35:WU01:FS00:Downloading 7.53MiB
<progress here but truncated to save space>
14:53:40:WU01:FS00:Download complete
Or
4:58:24:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:11416 run:1 clone:3 gen:13 core:0x21 unit:0x0000000d8ca304f156e81e51acb5ebb4
14:58:24:WU00:FS00:Uploading 20.12MiB to 140.163.4.241
14:58:24:WU00:FS00:Connecting to 140.163.4.241:8080
<progress here but truncated to save space>
15:03:00:WU00:FS00:Upload complete
15:03:00:WU00:FS00:Server responded WORK_ACK (400)
15:03:00:WU00:FS00:Final credit estimate, 221484.00 points
15:03:00:WU00:FS00:Cleaning up
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: multiple assignment servers

Post by Joe_H »

As bruce mentioned, the client only uses URLs for the Assignment Servers. They are of the form assign.stanford.ed or assign-gpu.stanford.edu. A variant on the first uses assignx.stanford .edu, where x is an integer from 2 to 4 as I recall. They all should resolve to either 171.67.108.45 or 171.64.65.35.

Depending on where you look in the log, sometimes you will see those URLs listed. The rest of the time the IP address is listed and requested by the client directly. So there is no way to get the client to print URLs for addresses it uses, it would have to do a reverse name lookup to get it.

The addresses will all be in the ranges shown for servers on the Server Status page - http://fah-web.stanford.edu/pybeta/serverstat.html.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
arcturius
Posts: 4
Joined: Thu May 12, 2016 9:36 pm

Re: multiple assignment servers

Post by arcturius »

The client downloads via http, so it does use URLs--even if the host is specified as an IP instead of FQDN. For my purposes (and those of the client), the human-friendly name is inconsequential.

I'm really looking for something like 'http://140.163.4.241:8080/path/to/file/ ... requesting' .
The IP and port isn't enough to re-attempt the download with another http client that would show additional error information.
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: multiple assignment servers

Post by toTOW »

Unfortunately, clients and servers codes are closed source, so I'm afraid no one will be able to help you on the forum. :(

Maybe Vijay or Joe could answer if they are allowed to ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: multiple assignment servers

Post by bruce »

As far as I know, there is no path/to/file/the/client/is/requesting. When a WU is downloaded, the client and the server have a conversation,l each supplying information to the other. It seems quite likely that the server prepares the WU on request, processes some server data together with data supplied by your client. If the server determines that your client has already been assigned a WU which has not expired, it will reassign the same one, encouraging you to complete the assignment you've already been given. If the WU has already expired or it has already been completed, it will give you a different WU.

FAH is designed to achieve the maximum science so there's a lot of important logic buried in the closed source of both FAHClient and the server code to facilitate the completion of as many WUs as possible as rapidly as possible, no matter who completes them. Retaining your assigned WU at path/to/file/the/client/is/requesting isn't a requirement as long as it can recreate it from the data known either by the server or by your client, especially since re-requesting the same WU is rare.
ScottK
Posts: 6
Joined: Tue Aug 02, 2011 9:40 pm

Re: multiple assignment servers

Post by ScottK »

08:47:45:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.45:80': Empty work server assignment
08:47:46:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.204:80': Failed to connect to 171.67.108.204:80: No connection could be made because the target machine actively refused it.
08:47:46:ERROR:WU00:FS00:Exception: Could not get an assignment
08:51:59:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.45:80': Empty work server assignment
08:52:00:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.204:80': Failed to connect to 171.67.108.204:80: No connection could be made because the target machine actively refused it.
08:52:00:ERROR:WU00:FS00:Exception: Could not get an assignment
08:58:51:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.45:80': Empty work server assignment
08:58:52:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.204:80': Failed to connect to 171.67.108.204:80: No connection could be made because the target machine actively refused it.
08:58:52:ERROR:WU00:FS00:Exception: Could not get an assignment
09:09:56:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.45:80': Empty work server assignment
09:09:58:WARNING:WU00:FS00:Failed to get assignment from '171.67.108.204:80': Failed to connect to 171.67.108.204:80: No connection could be made because the target machine actively refused it.
09:09:58:ERROR:WU00:FS00:Exception: Could not get an assignment

Over and over and over...

If there was only a way to change work assignment servers. Why in the world would an assignment server "actively" refuse a connection? This makes no sense at all, with the exception of the servers shown are not in the list of active work servers. Those of us who wish to *donate* time on our system(s) to F@H should be able to re-point work server assignments when such situations occur.
ScottK
Posts: 6
Joined: Tue Aug 02, 2011 9:40 pm

Re: multiple assignment servers

Post by ScottK »

Well, something definitely appears to be up (or DOWN) with the server farm. There are *NO* GPU servers up - not even showing on the list - and many others in DOWN or REJECT status for Classic and SMP realms.
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: multiple assignment servers

Post by Joe_H »

Are you going to the Server Status page linked at the top of the forum page - http://fah-web.stanford.edu/pybeta/serverstat.html? There are quite a few GPU servers listed as up when I just checked. In addition, they are not always listed as GPU in the Client column as some Work Servers are hosts to both GPU and CPU based projects.

Second, you are misunderstanding the difference between Assignment Servers and Work Servers. Your system connects first with an AS, that server determines what type of work is being requested and your configuration. Then the AS hands over your connection to a WS if it can match your request. The message "Empty work server Assignment" means it could not make a match among all the WS's available. The client automatically fails over to attempting to get work from a different AS if no WS is assigned or the AS it contacted is down. That is what your short segment of log file is showing.

The "actively refusing" message can come from a variety of reasons. Those include that the AS code could be down on that particular server. The trouble shooting topic for connections gives directions on how to test addresses in a browser, have you tried that? I can connect to the first AS, but the second is not reachable.

If you could post the first 100 or so lines of your log file that shows the system info as seen by the client, and also shows the folding configuration settings, then we can better determine why your system is not getting an assignment. But without any information as to what type of WU your system is requesting and what it is going to be folded on, it is not possible to tell why your system is not getting an assignment.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply