Page 1 of 1

Repeated WU

Posted: Sat Feb 18, 2017 5:10 pm
by foldinghomealone
I have a question regarding if the same WU is repeated several times on different systems to eliminate calculation errors.
Reason I'm asking is that we have a lively discussion regarding this question in our team's FAH-forum.

My assumption on this topic is that GPU WUs are not repeated. They are repeated only if there is some unusal 'problem' with the project.
I thought I was reading that GPU-WUs are doublechecked by the CPU while being processed. If the WU is ok it will be uploaded and that's it.
If there is a deviation between GPU and CPU results then the WU is aborted and the result uploaded and the rest-WU will be distributed to another donor.

I hope you can shed some light here. Thanks

Re: Repeated WU

Posted: Sat Feb 18, 2017 5:22 pm
by Joe_H
Your assumption is correct. The general plan for all WU's, both CPU and GPU, is to assign each one to just one machine. Only when a WU fails on one system is it reassigned, usually to two different systems. There is no use of multiple systems to "eliminate calculation errors", some sanity checks are done during processing, and as you recall reading it is done on the CPU with current GPU folding cores. Additional checks are done on returned WU's before they are accepted.

Re: Repeated WU

Posted: Sat Feb 18, 2017 5:25 pm
by bruce
When a WU fails, the two most common reasons are system instability (e.g.- overclocking) or inherent mathematical instability (aka a "bad WU"). If t's the former, assigning it to someone else gives FAH a pretty good chance of continuing to extend that trajectory. [Longer trajectories consisting of many sequential WUs are a lot more important to science that short trajectories.] When a bad WU is discovered, it's generally something that needs to be fixed by the scientist who defined the initial conditions of that trajectory -- and the WU will fail again and again if reissued.

Your second question is more subtle. Suppose a WU is doublechecked after reaching 100% and if a WU contains errors, the client refuses to upload the result. Wouldn't you rather have it doublechecked more frequently so that maybe it would be aborted at 22%? Not only would wasting time processing a failed WU make you unhappy, it would waste resources that could otherwise benefit science.

On the other hand, doublechecking, itself, adds overhead, so doublechecking too frequently would waste resources in another way.

(Joe types faster than I do.) ;)

Re: Repeated WU

Posted: Sat Feb 18, 2017 5:39 pm
by foldinghomealone
Thansk alot for your fast help.