Completed WU 11752 tries uploading to Work Server and fails

Moderators: Site Moderators, FAHC Science Team

Post Reply
FreytagXIII
Posts: 6
Joined: Mon Mar 30, 2020 6:09 am

Completed WU 11752 tries uploading to Work Server and fails

Post by FreytagXIII »

Not sure if this is the right place to put this, since the work unit itselft finished without issues.

My client has been unsuccessfully trying to upload the result of the completed unit to 140.163.4.231 for about 10 hours now. There is no issues with this server either, because I finished another WU that uploaded to the same address without problems.
Although, just noticed that the Collection Server for my copy of WU 11752 (0, 4061, 0) is supposed to be 128.252.203.4, so it seems to be using the wrong upload target for the result.

Code: Select all

21:08:59:WU02:FS01:0x22:Completed 1000000 out of 1000000 steps (100%)
21:09:05:WU02:FS01:0x22:Saving result file ..\logfile_01.txt
21:09:05:WU02:FS01:0x22:Saving result file checkpointState.xml
21:09:09:WU02:FS01:0x22:Saving result file checkpt.crc
21:09:09:WU02:FS01:0x22:Saving result file positions.xtc
21:09:10:WU02:FS01:0x22:Saving result file science.log
21:09:10:WU02:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
21:09:11:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
21:09:11:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11752 run:0 clone:4061 gen:0 core:0x22 unit:0x000000068ca304e75e6a8073d8356d2d
21:09:11:WU02:FS01:Uploading 24.33MiB to 140.163.4.231
21:09:11:WU02:FS01:Connecting to 140.163.4.231:8080
21:09:17:WU02:FS01:Upload 6.42%
21:09:23:WU02:FS01:Upload 16.44%
21:09:29:WU02:FS01:Upload 26.20%
21:09:35:WU02:FS01:Upload 36.98%
21:09:41:WU02:FS01:Upload 47.51%
21:09:47:WU02:FS01:Upload 57.79%
21:09:53:WU02:FS01:Upload 68.57%
21:09:59:WU02:FS01:Upload 78.08%
21:10:04:WU00:FS01:Connecting to 65.254.110.245:8080
21:10:05:WU02:FS01:Upload 88.61%
21:10:05:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
21:10:05:WU00:FS01:Connecting to 18.218.241.186:80
21:10:05:WARNING:WU00:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
21:10:05:ERROR:WU00:FS01:Exception: Could not get an assignment
21:10:11:WU02:FS01:Upload 99.14%
21:10:11:WU02:FS01:Upload complete
21:10:11:WU02:FS01:Server responded PLEASE_WAIT (464)
21:10:11:WARNING:WU02:FS01:Failed to send results, will try again later
21:10:11:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11752 run:0 clone:4061 gen:0 core:0x22 unit:0x000000068ca304e75e6a8073d8356d2d
21:10:11:WU02:FS01:Uploading 24.33MiB to 140.163.4.231
21:10:11:WU02:FS01:Connecting to 140.163.4.231:8080
21:10:17:WU02:FS01:Upload 8.73%
21:10:23:WU02:FS01:Upload 19.78%
21:10:29:WU02:FS01:Upload 29.79%
21:10:35:WU02:FS01:Upload 40.07%
21:10:41:WU02:FS01:Upload 50.85%
21:10:47:WU02:FS01:Upload 61.64%
21:10:54:WU02:FS01:Upload 62.92%
21:11:00:WU02:FS01:Upload 64.21%
21:11:06:WU02:FS01:Upload 70.12%
21:11:12:WU02:FS01:Upload 81.16%
21:11:19:WU02:FS01:Upload 86.81%
21:11:26:WU02:FS01:Upload 95.80%
21:11:29:WU02:FS01:Upload complete
21:11:29:WU02:FS01:Server responded PLEASE_WAIT (464)
21:11:29:WARNING:WU02:FS01:Failed to send results, will try again later
Joe_H
Site Admin
Posts: 7870
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Completed WU 11752 tries uploading to Work Server and fa

Post by Joe_H »

Uploads always first try to go to the WS, CS is the fallback. It appears the PLEASE_WAIT response is causing the client to not fallback and just retry the WS.

I will try to contact the researcher running that project and server about this.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
astrorob
Posts: 43
Joined: Sun Mar 15, 2020 7:59 pm

Re: Completed WU 11752 tries uploading to Work Server and fa

Post by astrorob »

11751 is also doing this but given the similarity in WU number i assume this is the same researcher... has been receiving this response for 4 hours now.
Image
FreytagXIII
Posts: 6
Joined: Mon Mar 30, 2020 6:09 am

Re: Completed WU 11752 tries uploading to Work Server and fa

Post by FreytagXIII »

I will hit the 24h timeout in about an hour, maybe it will fallback to the CS then...
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Completed WU 11752 tries uploading to Work Server and fa

Post by bruce »

The 24hr timeout doesn't change anything with respect to the upload logic except the QRB. (Sorry about the reduced points.)

This particular WU has been reassigned and uploaded by several people. A couple succesfully completed it and a couple reported errors. In this case, yours is another duplicate which is no longer needed. I'm not aware of any code in the server's logic that recognized this condition, but the right thing to do would be for the server to give you credit and silently delete :!: the WU so it doesn't keep trying for the next week.

That would be a tiny but valuable enhancement to improve server bandwidth overload conditions.
FreytagXIII
Posts: 6
Joined: Mon Mar 30, 2020 6:09 am

Re: Completed WU 11752 tries uploading to Work Server and fa

Post by FreytagXIII »

Oh okay, that clarifies things.

Maybe another consideration on reducing traffic then:
The fah client increases the time between each send attempt, which is good. But this time offset gets reset every time the client is paused/unpaused or restarted (I have about 30 total send attempts for that WU now). It might be helpful to include something like a "last_send_timeout" flag, that is exchanged along with the WUs. So if the client is restarted, an immediate send attempt could be started and if it fails, the next attempt is started after whatever "last_send_timeout" was after the previous attempt instead of starting over from 0.

Don't know how feasible that would be though.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Completed WU 11752 tries uploading to Work Server and fa

Post by bruce »

I'm aware of that issue but it's probably not going to get any attention during the crisis.

From a personal perspective, I want my machine to be folding and I want it NOW. From FAH's perspective, if it fails to give you work, it will have plenty of opportunities to give the same assignment to someone else.

If my machine is idle because it's in the unused excess client group, so what, if somebody else's machine is folding that's good enough for the COVAID research.
FreytagXIII
Posts: 6
Joined: Mon Mar 30, 2020 6:09 am

Re: Completed WU 11752 tries uploading to Work Server and fa

Post by FreytagXIII »

bruce wrote:I'm aware of that issue but it's probably not going to get any attention during the crisis.
And that's totally understandable, there are more important things to cope with right now :) .

My machine was assigned other WUs and continued folding. That one WU just keeps sending every now and then. Just wanted to mention the thought in case this happens to more clients and upload-loops might become a bigger problem over time, which I hope they won't.

Thanks for the detailed answers and keep up the great work!
FreytagXIII
Posts: 6
Joined: Mon Mar 30, 2020 6:09 am

Re: Completed WU 11752 tries uploading to Work Server and fa

Post by FreytagXIII »

Just for the record: it finally managed to upload to the CS as I started my client today!

Code: Select all

*********************** Log Started 2020-03-31T10:49:30Z ***********************
10:49:30:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11752 run:0 clone:4061 gen:0 core:0x22 unit:0x000000068ca304e75e6a8073d8356d2d
10:49:30:WU02:FS01:Uploading 24.33MiB to 140.163.4.231
10:49:30:WU02:FS01:Connecting to 140.163.4.231:8080
10:51:08:WU02:FS01:Upload 0.51%
10:51:08:WARNING:WU02:FS01:Exception: Failed to send results to work server: Transfer failed
10:51:08:WU02:FS01:Trying to send results to collection server
10:51:08:WU02:FS01:Uploading 24.33MiB to 128.252.203.4
10:51:08:WU02:FS01:Connecting to 128.252.203.4:8080
10:51:14:WU02:FS01:Upload 26.71%
10:51:20:WU02:FS01:Upload 55.73%
10:51:26:WU02:FS01:Upload 85.78%
10:51:33:WU02:FS01:Upload complete
10:51:33:WU02:FS01:Server responded WORK_ACK (400)
10:51:33:WU02:FS01:Final credit estimate, 16615.00 points
10:51:33:WU02:FS01:Cleaning up
Joe_H
Site Admin
Posts: 7870
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Completed WU 11752 tries uploading to Work Server and fa

Post by Joe_H »

Good to hear, looks like they found the problem that was causing this.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
FreytagXIII
Posts: 6
Joined: Mon Mar 30, 2020 6:09 am

Re: Completed WU 11752 tries uploading to Work Server and fa

Post by FreytagXIII »

Unfortunately, I encountered issues with the same WU/WS-combination today. Got another copy of 11752, this time (0, 1883, 6), again from 140.163.4.231, but this time without a CS configured for the WU. It's been stuck trying to upload for over 2 hours already, either reaching about 0.5% progress before failing or running into a connection timeout right away. No "464 Please Wait" this time.

Judging by the other forum entries, more people are having troubles with this server lately. I just think it's odd, that I am having issues with the same WU/WS-combination again.

I guess there's still not much to be done. Sucks to see another 100k points slowly pass away though...
Post Reply