Do this first

Moderators: Site Moderators, PandeGroup

Do this first

Postby bruce » Wed Dec 05, 2007 8:37 pm

Most communication errors will specify an IP address if you have specified -verbosity 9 when you invoked the client. The exceptions are the Assignment Servers. (See below if that's your issue).

Before complaining about a particular server, be sure to check the status page at Stanford. http://fah-web.stanford.edu/serverstat.html If it indicates the server is having troubles, the Pande Group may already be working with on the issue (during daylight or evening hours). Then look for a recent post with the server IP in the title and if it's already being discussed there's probably nothing else that can be done in the short term.

To confirm that you can communicate with a specific server, link to it with your browser using the IP address. For example, if FAHlog.txt shows that you are failing to connect to 171.65.103.100:80, then open http://171.65.103.100 and if you're also failing to connect to 171.64.65.20:8080, then open http://171.64.65.20:8080 (the proper result is OK.)

If it's a download problem, you should be redirected to another server shortly. Be sure that you have NOT specified WUs Without Deadlines. If you're being redirected to 0.0.0.0, that means that all of the servers which have WUs for your configuration have run out of work to assign. Modifying the selection criteria may help, but you may also need to wait until more WUs are available. (The PI for the project really doesn't like for this to happen any more than you do.) The selection critera that you can modify are (A) The size of WUs you'll accept [The "Big" setting will also accept either Normal or Small] and (B) The -advmethods setting [This setting allows the server to send you either WUs that are in late-stage-beta-testing but if there are none of those, you'll get the standard FAH WUs which are generally more stable.]
NOTE: If you're having trouble getting GPU assignments, be sure to specify which type of GPU you have: ATI, G80, or Fermi.

If it's an upload problem and the server appears to be functioning normally, (for windows only:) be sure that Use_IE_Settings is set to NO.

If it's an upload problem and the server appears to be in trouble, the client is designed to deal with this issue. After the first couple of upload attempts, it should try both the Work Server for that project and a Collection Server. If both fail, the client will keep the result in the local queue and will re-attempt the upload periodically without need for you to intervene. If you're running a firewall that is designed to block spyware, be sure that the FAH client can contact the internet (or disable the outgoing firewall briefly to confirm if that's the problem). If the upload attempts have continued to fail for some time, please add the -verbosity 9 flag to your client and post the section of FAHlog.txt that shows the errors.

If you get a message "Server does not have record of this unit" don't worry about it. Each WU has to be uploaded to a primary Work Server. If that server is busy or down, the client will attempt to upload to a Collection Server. Normally the Collection Server has a list of all or almost all of the WUs that it can accept. If that list was incomplete at the time the Work Server went down, the remaining WUs will need to wait for the Work Server to be on-line again. Normally that is only a very small percentage of the outstanding WUs.

NOTE: Virtually all upload problems involve two servers. The first one is your primary Work Server for that project and is where the WU will eventually reside. The second one is called a Collection Server (CS). It is a backup server designed to accept uploads when the primary Work Server is overloaded or down. If serverstat indicates that your Collection Server is operating but heavily loaded, that means one or more of the Work Servers are not accepting their share of uploads. The CS can get overloaded quite easily and nothing can really be done about that. Of course, if it's not operating properly, it should be reported, but main focus of any error report should be on getting the primary Work Server repaired and able to accept the uploads so nobody has to revert to the CS.

If you decide to report an upload problem, report the problem against the Work Server's IP. It's going to be the primary concern of someone who can fix the problem at Stanford.

Assignment Servers:
Before any download can proceed, your client must first contact an Assignment Server. These servers do NOT issue WU assignments. They keep track of the status of the Work Servers and reassigns you to a Work Server that can give you a WU. If you cannot contact any of the Assignment Servers, there is a high probability the problem is in your internet connection, firewall, router or even AntiVirus program. The error messages might be + Could not connect to Assignment Server or + Could not connect to Assignment Server 2, in which case you should try http://assign.stanford.edu:8080 or http://assign2.stanford.edu in your browser. [For the GPU client, that would be http://assign-gpu.stanford.edu/ and/or http://assign-gpu.stanford.edu:8080/ ]

(As always, if you don't understand how to do these steps from this brief summary, we're happy to help.)
bruce
 
Posts: 8730
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Return to Issues with a specific server

Who is online

Users browsing this forum: No registered users