Page 1 of 1

Problem on sending results [Project 14283]

Posted: Fri Nov 01, 2019 1:28 am
by sf8kkn
Hi guys,

I've currently this error on various of my rigs.
Problem seems to be on project 12783 only.

01:22:48:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:14283 run:0 clone:1 gen:46 core:0x21 unit:0x0000003380fccb0a5d9e11688fbd34af
01:22:48:WU00:FS01:Uploading 160.21MiB to 128.252.203.10
01:22:48:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:14283 run:0 clone:79 gen:1 core:0x21 unit:0x0000000280fccb0a5d9e116d639f080f
01:22:48:WU00:FS01:Connecting to 128.252.203.10:8080
01:22:48:WU01:FS01:Uploading 193.10MiB to 128.252.203.10
...
01:22:50:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
01:22:50:WU00:FS01:Trying to send results to collection server
01:22:50:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
01:22:50:WU00:FS01:Uploading 160.21MiB to 155.247.166.219
01:22:50:WU01:FS01:Trying to send results to collection server
01:22:50:WU00:FS01:Connecting to 155.247.166.219:8080
01:22:50:WU01:FS01:Uploading 193.10MiB to 155.247.166.219
01:22:50:WU01:FS01:Connecting to 155.247.166.219:8080
01:22:51:ERROR:WU00:FS01:Exception: Transfer failed
01:22:51:ERROR:WU01:FS01:Exception: Transfer failed
01:22:52:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:14283 run:0 clone:1 gen:46 core:0x21 unit:0x0000003380fccb0a5d9e11688fbd34af
01:22:52:WU00:FS01:Uploading 160.21MiB to 128.252.203.10
01:22:52:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:14283 run:0 clone:79 gen:1 core:0x21 unit:0x0000000280fccb0a5d9e116d639f080f
01:22:52:WU00:FS01:Connecting to 128.252.203.10:8080
01:22:52:WU01:FS01:Uploading 193.10MiB to 128.252.203.10
01:22:52:WU01:FS01:Connecting to 128.252.203.10:8080
01:22:53:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
01:22:53:WU00:FS01:Trying to send results to collection server
01:22:53:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
01:22:53:WU00:FS01:Uploading 160.21MiB to 155.247.166.219
01:22:53:WU01:FS01:Trying to send results to collection server
01:22:53:WU00:FS01:Connecting to 155.247.166.219:8080
01:22:53:WU01:FS01:Uploading 193.10MiB to 155.247.166.219
01:22:53:WU01:FS01:Connecting to 155.247.166.219:8080
01:22:53:ERROR:WU00:FS01:Exception: Transfer failed
01:22:54:ERROR:WU01:FS01:Exception: Transfer failed

Re: Problem on sending results

Posted: Fri Nov 01, 2019 2:21 am
by bruce
See my explanation here

You've paused those WUs many times while they were processing (Most likely you processed them "on idle"). The upload packets are all greater than 100 MiB and are much too big to be valid results from project:14283

Re: Problem on sending results

Posted: Fri Nov 01, 2019 3:13 am
by sf8kkn
Not sure to understand, rigs are dedicated to folding and 1 wu takes like 4 hours to compute.
I see my monitoring tool has detected problem and relaunched wu several times, that's enough to lost that wu ?
So we can download wus with various size, all my rigs are configured for wus of 200MB max, but upload is limited to 100MB ? Well, that's a lot of time lost ...

Re: Problem on sending results

Posted: Fri Nov 01, 2019 2:52 pm
by bruce
As I said in the linked explanation, every time the WU enters/leaves the paused state, extra garbage is added to the upload. If the WU never pauses, the bug in FAHCore_a7 for Windows keeps the results upload correct (and concise). The new version of FAHCore_a7 fixes this problem and the results will be up-loadable.

Re: Problem on sending results

Posted: Fri Nov 01, 2019 3:22 pm
by Joe_H
How many times did your tool relaunch the WU? Post that log and perhaps that will indicate where the problem was. Generally it does not take just a few times restarting to blow up the WU upload size to 193 MB, if it takes that many restarts the WU itself was bad or your system is not folding stable for GPU folding.

In this case Bruce missed that the WU's involved were running the GPU Core_21, so his comments about the Core_A7 issue are not completely relevant. Someone who has processed a Project 14283 WU will have to weigh in with the normal upload size for a WU from that project.

I have looked up both WU's. So far each has one report of a return where the WU failed to be processed successfully. Additional reports would be needed to determine that the WU's are bad, someone may successfully process them when reassigned.

Re: Problem on sending results [Project 14283]

Posted: Fri Nov 01, 2019 4:25 pm
by bruce
Oops. It looks like size has nothing to do with it. (So much for spending a week on "vacation"

Ib fact, your client detected the WU as FAULTY so there's probably more useful information in an earlier part of the log. Scroll back to where those WUs were downloaded.

Re: Problem on sending results [Project 14283]

Posted: Fri Nov 01, 2019 9:23 pm
by sf8kkn
I will not find how many time wu has been relaunched, I've not this level of detail in my logs :(

Re: Problem on sending results [Project 14283]

Posted: Fri Nov 01, 2019 9:40 pm
by Joe_H
The logs kept by the client would, fi your tool is completely relaunching processing, then even then the client keeps the last 16 logs by default.

Perhaps you need to rethink how your monitoring tool is handling problems.

Re: Problem on sending results [Project 14283]

Posted: Sat Nov 02, 2019 4:56 pm
by toTOW
Additional data added at each failure (bad state) on an already big WU might exceed the maximum upload size of the server ... and p14283 is already big when everything is fine (more than 100MB to upload).

Feel free to dump these WUs, they will never get back (and won't be very useful since they failed).

Re: Problem on sending results [Project 14283]

Posted: Sun Nov 03, 2019 8:47 pm
by artoar_11
I don't know if it's fair to compare that way. My WU upload from this project/2019-10-13T12:44:41Z:

12:44:40:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:14283 run:0 clone:2 gen:5 core:0x21 unit:0x0000000580fccb0a5d9e11684ba342e0
12:44:40:WU00:FS01:Uploading 115.64MiB to 128.252.203.10
12:44:40:WU00:FS01:Connecting to 128.252.203.10:8080
12:44:46:WU00:FS01:Upload 24.11%
12:44:52:WU00:FS01:Upload 59.24%
12:44:58:WU00:FS01:Upload 91.45%
12:45:01:WU00:FS01:Upload complete
12:45:01:WU00:FS01:Server responded WORK_ACK (400)
12:45:01:WU00:FS01:Final credit estimate, 155440.00 points
12:45:01:WU00:FS01:Cleaning up

Re: Problem on sending results [Project 14283]

Posted: Mon Nov 04, 2019 6:40 pm
by bruce
P14283 is a GPU project. The bug in the CPU core_a7 which adds extra data to the upload has nothing to do with P14283. That bug has been causing congestion on 155.247.166.2xx and 14283 is on a server at a different site: 128.252.203.10.

Re: Problem on sending results [Project 14283]

Posted: Sat Nov 09, 2019 5:59 pm
by toTOW
artoar_11 wrote:I don't know if it's fair to compare that way. My WU upload from this project/2019-10-13T12:44:41Z:

12:44:40:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:14283 run:0 clone:2 gen:5 core:0x21 unit:0x0000000580fccb0a5d9e11684ba342e0
12:44:40:WU00:FS01:Uploading 115.64MiB to 128.252.203.10
12:44:40:WU00:FS01:Connecting to 128.252.203.10:8080
12:44:46:WU00:FS01:Upload 24.11%
12:44:52:WU00:FS01:Upload 59.24%
12:44:58:WU00:FS01:Upload 91.45%
12:45:01:WU00:FS01:Upload complete
12:45:01:WU00:FS01:Server responded WORK_ACK (400)
12:45:01:WU00:FS01:Final credit estimate, 155440.00 points
12:45:01:WU00:FS01:Cleaning up
This is the normal upload size for this project for a WU completed without Bad States ...

Re: Problem on sending results [Project 14283]

Posted: Sat Nov 09, 2019 6:30 pm
by bruce
Unknown answer....

P14283 is a project that runs on the GPU. The recent change to the FAHCore was for CPU WUs so your question isn't applicable. Also, you can't really assume that one project returns a similar amount of data as some other project.