Page 2 of 2

Re: Can't upload results to 155.247.166.220

PostPosted: Thu Jan 18, 2018 9:24 pm
by bruce
Each Work Server (WS) accepts uploads for it's own projects. Projects are supposed to designate one or more alternate servers (known as Collection Servers, or CS) which can accept uploads if the WU cannot be uploaded to the WS. If no CS has been designate, it will be reported as 0.0.0.0 and your client will be forced to retry uploading only to the original WS -- and the log will not show additional attempts to upload to a CS.

Many projects at temple had no CS designated. Apparently when temple added a second server, they designated their other server as the CS, but that's not the best choice because the path to both of them goes through the same campus network routers. Temple has been working on improving this situation but there's still work to be done.

It's a lack of redundancy problem which only shows up when something has failed. [I wish I could examine the project settings at temple, but donors don't have that capability except by examining their logs.)

Re: Can't upload results to 155.247.166.220

PostPosted: Thu Jan 18, 2018 9:34 pm
by bruce
For each individual WU, you may see upload attempts in groups of 1s, 2s, or 4s similar to the following

...Failed to connect to 155.247.166.220:8080:
...Failed to connect to 155.247.166.220:80:
...Failed to connect to 155.247.166.219:8080
...Failed to connect to 155.247.166.219:80

(The actual numbers may be different.)
The objective is to try different addresses until one succeeds but if there are no more possibilities, it has to wait and then retry from the beginning of the list.

Re: Can't upload results to 155.247.166.220

PostPosted: Thu Jan 18, 2018 11:45 pm
by Ken_g6
Thanks, Bruce,

Mine was project:13773 run:0 clone:162 gen:64 core:0xa4 unit:0x000000450002894c59bb46a88e27d1df

Re: Can't upload results to 155.247.166.220

PostPosted: Fri Jan 19, 2018 3:23 pm
by JeansOn
Bruce, thank you for your answer, above. I like to read such things. ... background information :)
My question was, where in the logs is the place where CS is documented.

I think, there is no place in the log or I didn't search enough. (Str+F and then 'Server' in more than one log)
Only in "Status" tab CS is found.
Writing the allocated CS to a place in the log would help.
Maybe a Suggestion for improvement can make it easier and quicker in future for you and the donors to get the target, you realy don't want to have a look for ;-)

However, sun goes down now in my homeland and so I wish you a nice weekend.

Re: Can't upload results to 155.247.166.220

PostPosted: Fri Jan 19, 2018 3:52 pm
by bruce
Progress has been made. It's still not working for this project, but it's now trying one more option.

14:00:16:WARNING:WU02:FS00:WorkServer connection failed on port 8080 trying 80
14:00:16:WU02:FS00:Connecting to 155.247.166.220:80
14:00:38:WARNING:WU02:FS00:Exception: Failed to send results to work server: Failed to connect to 155.247.166.220:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
14:00:38:WU02:FS00:Trying to send results to collection server
14:00:38:WU02:FS00:Uploading 4.22MiB to 155.247.166.219
14:00:38:WU02:FS00:Connecting to 155.247.166.219:8080
14:00:39:WARNING:WU02:FS00:WorkServer connection failed on port 8080 trying 80
14:00:39:WU02:FS00:Connecting to 155.247.166.219:80
14:01:00:ERROR:WU02:FS00:Exception: Failed to connect to 155.247.166.219:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
14:01:00:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:14013 run:0 clone:102 gen:20 core:0xa4 unit:0x000000160002894c59e4d2770215e250
14:01:00:WU02:FS00:Uploading 4.22MiB to 155.247.166.220
14:01:00:WU02:FS00:Connecting to 155.247.166.220:8080
14:01:02:WARNING:WU02:FS00:WorkServer connection failed on port 8080 trying 80
14:01:02:WU02:FS00:Connecting to 155.247.166.220:80

Re: Can't upload results to 155.247.166.220

PostPosted: Sat Jan 20, 2018 12:50 pm
by JeansOn
Yes, I understand, thank you.

IF WS is ok, the work server is unused.
IF WS is not ok, the collection server is used, if defined
IF CS is not defined, there is a loop trying to use the WS only: CS is undocumented in the log. I couldn't give you the expected answer.

Maybe it would be helpful, there is an additional line in every log, showing the IP of WS and CS (near the sending-line?). The software knows the fields, but it shows them only, when used.
That is not enough and we are writing about this problem. That is what I meant in my post above: Tell it the sw-Engineer


But they (SW-Engineers) have much to do. And the stats are more important for the teams.
In Germany, PCGH-team is just doing a "warm up":
The 04.Feb. is an important day: World cancer day. At 00:00:00 MiddleEuropeanTime the folding week starts.
Bad stats / upload cancels our competition.

Re: Can't upload results to 155.247.166.220

PostPosted: Sat Jan 20, 2018 2:44 pm
by Joe_H
JeansOn wrote:Maybe it would be helpful, there is an additional line in every log, showing the IP of WS and CS (near the sending-line?). The software knows the fields, but it shows them only, when used.
That is not enough and we are writing about this problem. That is what I meant in my post above: Tell it the sw-Engineer


An extra line displaying that information in the log might be nice, but is both unnecessary and unlikely to be added. This information is already displayed on the Status panel of FAHControl. The WS IP address is always shown, the CS address is shown if defined and otherwise 0.0.0.0 is displayed when it is undefined.

As for being near the sending line, the IP addresses used already are shown every upload and download in the log. In the case of sending a WU back, that is the very next line in the log. The log entries even specify when the upload is going to the CS, otherwise you can assume the upload is to the WS. So I am not understanding what more you are looking for there.

Re: Can't upload results to 155.247.166.220

PostPosted: Sat Jan 20, 2018 8:32 pm
by bruce
There's a plan in place to add not just one but multiple CSs. There's clearly a benefit but it's a bit tricky since not only does the config on the Work Server need to be revised, additional ports need to be opened on the campus firewall if the new CS is on some other campus. All I can say is that folks are working on that plan but it's not going to happen overnight.

Re: Can't upload results to 155.247.166.220

PostPosted: Fri Jan 26, 2018 5:35 pm
by kofther
Is there an update on when this issue might be resolved?

Thanks,
Kof

5:24:52:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14025 run:4 clone:0 gen:35 core:0xa7 unit:0x000000240002894c5a1363ee41cf2420
15:24:52:WU00:FS00:Uploading 6.78MiB to 155.247.166.220
15:24:52:WU00:FS00:Connecting to 155.247.166.220:8080
15:26:59:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
15:26:59:WU00:FS00:Connecting to 155.247.166.220:80
15:29:06:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 155.247.166.220:80: Connection timed out
15:29:06:WU00:FS00:Trying to send results to collection server
15:29:06:WU00:FS00:Uploading 6.78MiB to 155.247.166.219
15:29:06:WU00:FS00:Connecting to 155.247.166.219:8080
15:31:14:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
15:31:14:WU00:FS00:Connecting to 155.247.166.219:80
15:33:21:ERROR:WU00:FS00:Exception: Failed to connect to 155.247.166.219:80: Connection timed out

Re: Can't upload results to 155.247.166.220

PostPosted: Fri Jan 26, 2018 6:32 pm
by Joe_H
kofther wrote:Is there an update on when this issue might be resolved?

Thanks,
Kof

Both of these connected work servers are up and accepting returns, so if your WU has not uploaded yet it is due to something else.

As it was, just the .220 WS was offline for a brief time Wednesday, the WU should have uploaded to the .119 WS acting as a CS for 155.247.166.220. Both have been available for the last 48 hours, and one of my systems uploaded a WU to 155.247.166.219 less than 20 minutes ago.

You may need to reset your router, check that anti-malware apps are not blocking the upload connection, or do other steps to check your network connection. As a first step, can you open the addresses http://155.247.166.220:8080 and http://155.247.166.219:8080 in a web browser?