Page 9 of 12

Re: Send Errors - 155.247.164.213 & .214

Posted: Wed Mar 25, 2020 7:51 pm
by sswilson
alxbelu wrote:
sswilson wrote:My .214 unit cleared... haven't gone back through the log yet to see if it uploaded or timed out.....

edit: Pretty sure I tracked it down in the log.... it took a total of 36 minutes to upload the results in .15 - .50% increments which would suggest the server was being absolutely hammered with uploads..... :)

Took up a total of 276 individual log entries to complete.

High praise to the techs who finally got this resolved. :)
What was the PRCG? I checked the one you mentioned in this thread (11758 (0, 2135, 0)), which seems to still not have been successfully uploaded: https://apps.foldingathome.org/wu#proje ... 2135&gen=0
I'll have a closer look after supper (and maybe upload the whole log file).... it didn't seem to be logged in the same fashion as it normally would (stopped listing the attempt to connect to .214 and then only referred to the WU for several attempts before it seemed to upload).

I don't think there's a way for me to grab the actual WU now that it's no longer showing on the client?

Re: Send Errors - 155.247.164.213 & .214

Posted: Wed Mar 25, 2020 8:01 pm
by alxbelu
AFAIK the log should read something like this:

Code: Select all

19:44:09:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:11762 run:0 clone:9927 gen:12 core:0x22 unit:0x0000001080fccb0a5e7113ba81c3f381
19:44:09:WU00:FS01:Uploading 33.02MiB to 128.252.203.10
19:44:09:WU00:FS01:Connecting to 128.252.203.10:8080
19:44:33:WU00:FS01:Upload 1.14%
19:44:40:WU00:FS01:Upload 29.72%
19:44:46:WU00:FS01:Upload complete
19:44:46:WU00:FS01:Server responded WORK_ACK (400)
PRCG is in this case, for FoldingSlot01 and WorkUnit00: 11762, 0,9927,12
The "WORK_ACK" concerns FS01 and WU00, hence I know which specific project was successfully uploaded to which server.

Re: Send Errors - 155.247.164.213 & .214

Posted: Wed Mar 25, 2020 8:45 pm
by sswilson

Code: Select all

15:15:20:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11758 run:0 clone:2135 gen:0 core:0x22 unit:0x000000019bf7a4d55e6d7715c140c08d
15:15:20:WU01:FS01:Uploading 55.24MiB to 155.247.164.213
15:15:20:WU01:FS01:Connecting to 155.247.164.213:8080
15:15:20:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
15:15:20:WU01:FS01:Trying to send results to collection server
15:15:20:WU01:FS01:Uploading 55.24MiB to 155.247.164.214
15:15:20:WU01:FS01:Connecting to 155.247.164.214:8080
15:15:20:ERROR:WU01:FS01:Exception: Transfer failed
If I'm reading this right.... this was one of the early attempts to upload it in this log (kinda confused as to why It's listing .213 first and then switching to .214 before it claims the transfer failed)

Code: Select all

17:15:43:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11758 run:0 clone:2135 gen:0 core:0x22 unit:0x000000019bf7a4d55e6d7715c140c08d
17:15:43:WU01:FS01:Uploading 55.24MiB to 155.247.164.213
17:15:43:WU01:FS01:Connecting to 155.247.164.213:8080
17:15:43:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
17:15:43:WU01:FS01:Trying to send results to collection server
17:15:43:WU01:FS01:Uploading 55.24MiB to 155.247.164.214
17:15:43:WU01:FS01:Connecting to 155.247.164.214:8080
17:15:43:ERROR:WU01:FS01:Exception: Transfer failed
This is the last reference of uploading to .214 or .213 that I'm finding in the log

Oops.... yeah found this....

Code: Select all

18:16:00:WARNING:WU01:FS01:Past final deadline 2020-03-24T18:15:59Z, dumping
18:16:00:WU01:FS01:Cleaning up
I guess that's where it dumped it and I was looking at the wrong WU that took forever to upload thinking it was the orphaned one.

Either way... it's gone. :)

Re: Send Errors - 155.247.164.213 & .214

Posted: Wed Mar 25, 2020 9:30 pm
by evilgizmo2352
Did a little digging and found that both 155.247.164.213 and 214 are set to ASSIGN. Or so it says on the F@H Server list page. I have no idea how to let them know. If anybody knows how to contact voelz at Temple edu. , please do.

Re: Send Errors - 155.247.164.213 & .214

Posted: Wed Mar 25, 2020 10:52 pm
by EHoops
I am also having this issue. Is there anything to do besides keep retrying the upload? It also seems like this is affecting multiple people, is there a less busy time for sending? I could pause until then.

Re: Send Errors - 155.247.164.213 & .214

Posted: Wed Mar 25, 2020 11:38 pm
by _r2w_ben
davidcoton wrote:Unreturned WUs should not be reassigned until they expire. Faulty WUs that are returned are reassigned, up to (IIRC) three failures.
If there is clear evidence that unreturned WUs are being reassigned before they expire, we need to collect enough info to determine the extent/pattern of this problem and then get the team further involved.
Work units can be reassigned when the timeout is reached in order to keep the project moving forward. If the first assigned machine finishes after the timeout, but before the expiry, then it will still be rewarded points.The second assigned machine will also be rewarded points upon completion.

Whichever machine returns the result first will result in the creation of the next work unit with the same PRC but with Gen + 1.

Re: Send Errors - 155.247.164.213 & .214

Posted: Thu Mar 26, 2020 7:12 am
by AEM
I just saw mine disappear. I think it finally got sent.

Re: Send Errors - 155.247.164.213 & .214

Posted: Thu Mar 26, 2020 8:01 am
by Darth_Peter_dualxeon
Yes, it was finally fixed. My two workunits finally uploaded at today, 06:45 (in the timezone of Fahclient)

Re: Send Errors - 155.247.164.213 & .214

Posted: Thu Mar 26, 2020 8:13 am
by qk7b
qk7b wrote:Same error here, just finished a WU for 11758 and got exact the same output
155.247.164.214 seems in real pain, good luck guys.

As said earlier by alxbelu, looks like we're more than one on this specific PRCG (I don't quite understand how it works for now sorry)

Mine is PRCG 11758 (0, 526, 0)

Looks like there are already 2 Faulty WU completed, assigned on 2020-03-23 where mine was on 2020-03-20 .
At this point I don't know if I should let it go, or if I should clear it in some way.

The expiration date is on 2020-03-28 anyway, so I can still wait till this date.


EDIT: Added info on PRCG
Same for me here, everything has been sent in the night !

Great work !

Re: Send Errors - 155.247.164.213 & .214

Posted: Thu Mar 26, 2020 8:27 am
by sc00p
Oh I got one WU stuck on my client now for a while... checked the WU status PRCG 11758 (0,2822,0) = "Not found".

Should I come to a conclusion the specific project has been omitted or what?

Re: Send Errors - 155.247.164.213 & .214

Posted: Thu Mar 26, 2020 9:29 am
by scerbera
Mine are being sent now, great news!

Re: Send Errors - 155.247.164.213 & .214

Posted: Thu Mar 26, 2020 10:12 am
by vangli
Mine three are also uploading. See that software version on servers has changed from 9.5.6 to 9.6.

Re: Send Errors - 155.247.164.213 & .214

Posted: Thu Mar 26, 2020 11:31 am
by pachydermus
Uploaded here. Instead my machines have been waiting 12 hours for a new work unit to process. fantastic. :x

Re: Send Errors - 155.247.164.213 & .214

Posted: Thu Mar 26, 2020 12:44 pm
by sc00p
vangli wrote:See that software version on servers has changed from 9.5.6 to 9.6.
Nice find there^ :)


And my WU has gone too it seems...

Re: Send Errors - 155.247.164.213 & .214

Posted: Thu Mar 26, 2020 5:07 pm
by alxbelu
sc00p wrote:Oh I got one WU stuck on my client now for a while... checked the WU status PRCG 11758 (0,2822,0) = "Not found".

Should I come to a conclusion the specific project has been omitted or what?
Not sure, but it looks the same for the one WU I had left: https://apps.foldingathome.org/wu#proje ... 1460&gen=0

Code: Select all

07:17:58:WU03:FS01:Sending unit results: id:03 state:SEND error:NO_ERROR project:11758 run:0 clone:1460 gen:0 core:0x22 unit:0x000000069bf7a4d55e6d77136df445dd
07:17:58:WU03:FS01:Uploading 55.24MiB to 155.247.164.213
07:17:58:WU03:FS01:Connecting to 155.247.164.213:8080
07:18:20:WARNING:WU03:FS01:WorkServer connection failed on port 8080 trying 80
07:18:20:WU03:FS01:Connecting to 155.247.164.213:80
07:18:41:WARNING:WU03:FS01:Exception: Failed to send results to work server: Failed to connect to 155.247.164.213:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
07:18:41:WU03:FS01:Trying to send results to collection server
07:18:41:WU03:FS01:Uploading 55.24MiB to 155.247.164.214
07:18:41:WU03:FS01:Connecting to 155.247.164.214:8080
07:18:47:WU03:FS01:Upload 4.98%
07:18:53:WU03:FS01:Upload 10.41%
07:18:59:WU03:FS01:Upload 15.95%
07:19:05:WU03:FS01:Upload 21.38%
07:19:11:WU03:FS01:Upload 26.70%
07:19:17:WU03:FS01:Upload 32.13%
07:19:23:WU03:FS01:Upload 37.45%
07:19:29:WU03:FS01:Upload 42.54%
07:19:35:WU03:FS01:Upload 47.86%
07:19:41:WU03:FS01:Upload 52.73%
07:19:47:WU03:FS01:Upload 58.04%
07:19:53:WU03:FS01:Upload 63.36%
07:19:59:WU03:FS01:Upload 68.79%
07:20:05:WU03:FS01:Upload 74.11%
07:20:11:WU03:FS01:Upload 79.54%
07:20:17:WU03:FS01:Upload 84.97%
07:20:23:WU03:FS01:Upload 90.29%
07:20:29:WU03:FS01:Upload 95.72%
07:20:34:WU03:FS01:Upload complete
07:20:34:WU03:FS01:Server responded WORK_ACK (400)
07:20:34:WU03:FS01:Final credit estimate, 16615.00 points
I guess there could be a delay, but other WUs I've submitted since show up correctly (e.g. PRCG 11759 (0,790,16)). Perhaps they allowed submissions that did not match the WS assignment db, to manually handle them at some point? In any case, happy that it somewhat seems to have been resolved :)