Problem on sending results [Project 14283]

Moderators: Site Moderators, PandeGroup

Problem on sending results [Project 14283]

Postby sf8kkn » Fri Nov 01, 2019 1:28 am

Hi guys,

I've currently this error on various of my rigs.
Problem seems to be on project 12783 only.

01:22:48:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:14283 run:0 clone:1 gen:46 core:0x21 unit:0x0000003380fccb0a5d9e11688fbd34af
01:22:48:WU00:FS01:Uploading 160.21MiB to 128.252.203.10
01:22:48:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:14283 run:0 clone:79 gen:1 core:0x21 unit:0x0000000280fccb0a5d9e116d639f080f
01:22:48:WU00:FS01:Connecting to 128.252.203.10:8080
01:22:48:WU01:FS01:Uploading 193.10MiB to 128.252.203.10
...
01:22:50:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
01:22:50:WU00:FS01:Trying to send results to collection server
01:22:50:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
01:22:50:WU00:FS01:Uploading 160.21MiB to 155.247.166.219
01:22:50:WU01:FS01:Trying to send results to collection server
01:22:50:WU00:FS01:Connecting to 155.247.166.219:8080
01:22:50:WU01:FS01:Uploading 193.10MiB to 155.247.166.219
01:22:50:WU01:FS01:Connecting to 155.247.166.219:8080
01:22:51:ERROR:WU00:FS01:Exception: Transfer failed
01:22:51:ERROR:WU01:FS01:Exception: Transfer failed
01:22:52:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:14283 run:0 clone:1 gen:46 core:0x21 unit:0x0000003380fccb0a5d9e11688fbd34af
01:22:52:WU00:FS01:Uploading 160.21MiB to 128.252.203.10
01:22:52:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:14283 run:0 clone:79 gen:1 core:0x21 unit:0x0000000280fccb0a5d9e116d639f080f
01:22:52:WU00:FS01:Connecting to 128.252.203.10:8080
01:22:52:WU01:FS01:Uploading 193.10MiB to 128.252.203.10
01:22:52:WU01:FS01:Connecting to 128.252.203.10:8080
01:22:53:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
01:22:53:WU00:FS01:Trying to send results to collection server
01:22:53:WARNING:WU01:FS01:Exception: Failed to send results to work server: Transfer failed
01:22:53:WU00:FS01:Uploading 160.21MiB to 155.247.166.219
01:22:53:WU01:FS01:Trying to send results to collection server
01:22:53:WU00:FS01:Connecting to 155.247.166.219:8080
01:22:53:WU01:FS01:Uploading 193.10MiB to 155.247.166.219
01:22:53:WU01:FS01:Connecting to 155.247.166.219:8080
01:22:53:ERROR:WU00:FS01:Exception: Transfer failed
01:22:54:ERROR:WU01:FS01:Exception: Transfer failed
sf8kkn
 
Posts: 8
Joined: Sat Oct 19, 2019 7:06 pm

Re: Problem on sending results

Postby bruce » Fri Nov 01, 2019 2:21 am

See my explanation here

You've paused those WUs many times while they were processing (Most likely you processed them "on idle"). The upload packets are all greater than 100 MiB and are much too big to be valid results from project:14283
bruce
 
Posts: 22818
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Problem on sending results

Postby sf8kkn » Fri Nov 01, 2019 3:13 am

Not sure to understand, rigs are dedicated to folding and 1 wu takes like 4 hours to compute.
I see my monitoring tool has detected problem and relaunched wu several times, that's enough to lost that wu ?
So we can download wus with various size, all my rigs are configured for wus of 200MB max, but upload is limited to 100MB ? Well, that's a lot of time lost ...
sf8kkn
 
Posts: 8
Joined: Sat Oct 19, 2019 7:06 pm

Re: Problem on sending results

Postby bruce » Fri Nov 01, 2019 2:52 pm

As I said in the linked explanation, every time the WU enters/leaves the paused state, extra garbage is added to the upload. If the WU never pauses, the bug in FAHCore_a7 for Windows keeps the results upload correct (and concise). The new version of FAHCore_a7 fixes this problem and the results will be up-loadable.
bruce
 
Posts: 22818
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Problem on sending results

Postby Joe_H » Fri Nov 01, 2019 3:22 pm

How many times did your tool relaunch the WU? Post that log and perhaps that will indicate where the problem was. Generally it does not take just a few times restarting to blow up the WU upload size to 193 MB, if it takes that many restarts the WU itself was bad or your system is not folding stable for GPU folding.

In this case Bruce missed that the WU's involved were running the GPU Core_21, so his comments about the Core_A7 issue are not completely relevant. Someone who has processed a Project 14283 WU will have to weigh in with the normal upload size for a WU from that project.

I have looked up both WU's. So far each has one report of a return where the WU failed to be processed successfully. Additional reports would be needed to determine that the WU's are bad, someone may successfully process them when reassigned.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 4574
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: Problem on sending results [Project 14283]

Postby bruce » Fri Nov 01, 2019 4:25 pm

Oops. It looks like size has nothing to do with it. (So much for spending a week on "vacation"

Ib fact, your client detected the WU as FAULTY so there's probably more useful information in an earlier part of the log. Scroll back to where those WUs were downloaded.
bruce
 
Posts: 22818
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Problem on sending results [Project 14283]

Postby sf8kkn » Fri Nov 01, 2019 9:23 pm

I will not find how many time wu has been relaunched, I've not this level of detail in my logs :(
sf8kkn
 
Posts: 8
Joined: Sat Oct 19, 2019 7:06 pm

Re: Problem on sending results [Project 14283]

Postby Joe_H » Fri Nov 01, 2019 9:40 pm

The logs kept by the client would, fi your tool is completely relaunching processing, then even then the client keeps the last 16 logs by default.

Perhaps you need to rethink how your monitoring tool is handling problems.
Joe_H
Site Admin
 
Posts: 4574
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: Problem on sending results [Project 14283]

Postby toTOW » Sat Nov 02, 2019 4:56 pm

Additional data added at each failure (bad state) on an already big WU might exceed the maximum upload size of the server ... and p14283 is already big when everything is fine (more than 100MB to upload).

Feel free to dump these WUs, they will never get back (and won't be very useful since they failed).
Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.

FAH-Addict : latest news, tests and reviews about Folding@Home project.

Image
User avatar
toTOW
Site Moderator
 
Posts: 8785
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: Problem on sending results [Project 14283]

Postby artoar_11 » Sun Nov 03, 2019 8:47 pm

I don't know if it's fair to compare that way. My WU upload from this project/2019-10-13T12:44:41Z:

12:44:40:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:14283 run:0 clone:2 gen:5 core:0x21 unit:0x0000000580fccb0a5d9e11684ba342e0
12:44:40:WU00:FS01:Uploading 115.64MiB to 128.252.203.10
12:44:40:WU00:FS01:Connecting to 128.252.203.10:8080
12:44:46:WU00:FS01:Upload 24.11%
12:44:52:WU00:FS01:Upload 59.24%
12:44:58:WU00:FS01:Upload 91.45%
12:45:01:WU00:FS01:Upload complete
12:45:01:WU00:FS01:Server responded WORK_ACK (400)
12:45:01:WU00:FS01:Final credit estimate, 155440.00 points
12:45:01:WU00:FS01:Cleaning up
artoar_11
 
Posts: 687
Joined: Sun Nov 22, 2009 8:42 pm
Location: Bulgaria/Team #224497/artoar11_ALL_....

Re: Problem on sending results [Project 14283]

Postby bruce » Mon Nov 04, 2019 6:40 pm

P14283 is a GPU project. The bug in the CPU core_a7 which adds extra data to the upload has nothing to do with P14283. That bug has been causing congestion on 155.247.166.2xx and 14283 is on a server at a different site: 128.252.203.10.
bruce
 
Posts: 22818
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Problem on sending results [Project 14283]

Postby toTOW » Sat Nov 09, 2019 5:59 pm

artoar_11 wrote:I don't know if it's fair to compare that way. My WU upload from this project/2019-10-13T12:44:41Z:

12:44:40:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:14283 run:0 clone:2 gen:5 core:0x21 unit:0x0000000580fccb0a5d9e11684ba342e0
12:44:40:WU00:FS01:Uploading 115.64MiB to 128.252.203.10
12:44:40:WU00:FS01:Connecting to 128.252.203.10:8080
12:44:46:WU00:FS01:Upload 24.11%
12:44:52:WU00:FS01:Upload 59.24%
12:44:58:WU00:FS01:Upload 91.45%
12:45:01:WU00:FS01:Upload complete
12:45:01:WU00:FS01:Server responded WORK_ACK (400)
12:45:01:WU00:FS01:Final credit estimate, 155440.00 points
12:45:01:WU00:FS01:Cleaning up

This is the normal upload size for this project for a WU completed without Bad States ...
User avatar
toTOW
Site Moderator
 
Posts: 8785
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: Problem on sending results [Project 14283]

Postby bruce » Sat Nov 09, 2019 6:30 pm

Unknown answer....

P14283 is a project that runs on the GPU. The recent change to the FAHCore was for CPU WUs so your question isn't applicable. Also, you can't really assume that one project returns a similar amount of data as some other project.
bruce
 
Posts: 22818
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.


Return to V7.5.1 Public Release Windows/Linux/MacOS X

Who is online

Users browsing this forum: No registered users and 2 guests

cron