Repeated WU

Moderators: Site Moderators, FAHC Science Team

Post Reply
foldinghomealone
Posts: 130
Joined: Wed Feb 01, 2017 7:07 pm

Repeated WU

Post by foldinghomealone »

I have a question regarding if the same WU is repeated several times on different systems to eliminate calculation errors.
Reason I'm asking is that we have a lively discussion regarding this question in our team's FAH-forum.

My assumption on this topic is that GPU WUs are not repeated. They are repeated only if there is some unusal 'problem' with the project.
I thought I was reading that GPU-WUs are doublechecked by the CPU while being processed. If the WU is ok it will be uploaded and that's it.
If there is a deviation between GPU and CPU results then the WU is aborted and the result uploaded and the rest-WU will be distributed to another donor.

I hope you can shed some light here. Thanks
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Repeated WU

Post by Joe_H »

Your assumption is correct. The general plan for all WU's, both CPU and GPU, is to assign each one to just one machine. Only when a WU fails on one system is it reassigned, usually to two different systems. There is no use of multiple systems to "eliminate calculation errors", some sanity checks are done during processing, and as you recall reading it is done on the CPU with current GPU folding cores. Additional checks are done on returned WU's before they are accepted.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Repeated WU

Post by bruce »

When a WU fails, the two most common reasons are system instability (e.g.- overclocking) or inherent mathematical instability (aka a "bad WU"). If t's the former, assigning it to someone else gives FAH a pretty good chance of continuing to extend that trajectory. [Longer trajectories consisting of many sequential WUs are a lot more important to science that short trajectories.] When a bad WU is discovered, it's generally something that needs to be fixed by the scientist who defined the initial conditions of that trajectory -- and the WU will fail again and again if reissued.

Your second question is more subtle. Suppose a WU is doublechecked after reaching 100% and if a WU contains errors, the client refuses to upload the result. Wouldn't you rather have it doublechecked more frequently so that maybe it would be aborted at 22%? Not only would wasting time processing a failed WU make you unhappy, it would waste resources that could otherwise benefit science.

On the other hand, doublechecking, itself, adds overhead, so doublechecking too frequently would waste resources in another way.

(Joe types faster than I do.) ;)
foldinghomealone
Posts: 130
Joined: Wed Feb 01, 2017 7:07 pm

Re: Repeated WU

Post by foldinghomealone »

Thansk alot for your fast help.
Post Reply