Users with many failed WUs

Moderators: Site Moderators, FAHC Science Team

Post Reply
rwh202
Posts: 425
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Users with many failed WUs

Post by rwh202 »

I had a few WUs failed on a rig that was getting its knickers in a twist over the various GPU indices.

Out of curiosity I put one into the WU checker to see if someone else had picked it up and finished it.

https://apps.foldingathome.org/wu#proje ... =635&gen=0

Turns out someone else also failed it before someone successfully completed it.

Again, out of curiosity (and boredom - self-isolation does that...) I clicked on the other failed user.

Interesting - a list of 228 slots having returned WUs, so putting in some significant effort.

However, most show a very low credit - suggesting failures. I picked every 3rd slot to sample the low point WUs (<1000):

Code: Select all

https://apps.foldingathome.org/wu#project=14158&run=2&clone=8992&gen=1	failed - 2 failures before completion
https://apps.foldingathome.org/wu#project=14443&run=0&clone=17&gen=3	failed - 4 failures before completion
https://apps.foldingathome.org/wu#project=11763&run=0&clone=6892&gen=47 failed - for some season issued and completed successfully by 3 others
https://apps.foldingathome.org/wu#project=13879&run=0&clone=1568&gen=81 failed - completed by next person
https://apps.foldingathome.org/wu#project=14437&run=227&clone=2&gen=3	failed - 3 failures before completion
https://apps.foldingathome.org/wu#project=11743&run=0&clone=2872&gen=13 failed - completed by next person
https://apps.foldingathome.org/wu#project=14432&run=0&clone=268&gen=38	failed - completed by next person
https://apps.foldingathome.org/wu#project=11741&run=0&clone=5579&gen=23 failed - for some season issued and completed successfully by 2 others
https://apps.foldingathome.org/wu#project=16434&run=548&clone=2&gen=4	failed - completed by next person
https://apps.foldingathome.org/wu#project=16433&run=1879&clone=0&gen=7	failed - completed by next person
https://apps.foldingathome.org/wu#project=14538&run=0&clone=1965&gen=90	failed - 2 failures before completion
https://apps.foldingathome.org/wu#project=14436&run=2444&clone=0&gen=16	failed - completed by next person
https://apps.foldingathome.org/wu#project=13878&run=0&clone=1390&gen=56	failed - completed by next person
https://apps.foldingathome.org/wu#project=11746&run=0&clone=324&gen=65	failed - for some season issued and completed successfully by 2 others
https://apps.foldingathome.org/wu#project=11743&run=0&clone=4735&gen=73	failed - completed by next person
https://apps.foldingathome.org/wu#project=16433&run=53&clone=1&gen=11	failed - for some season issued and completed successfully by 2 others
https://apps.foldingathome.org/wu#project=11743&run=0&clone=9178&gen=12	failed - completed by next person
https://apps.foldingathome.org/wu#project=11761&run=0&clone=13123&gen=1 failed - 2 failures before completion by 2 others
https://apps.foldingathome.org/wu#project=14251&run=401&clone=0&gen=6	failed - no other return
https://apps.foldingathome.org/wu#project=14436&run=877&clone=0&gen=17	failed - completed by next person
https://apps.foldingathome.org/wu#project=14549&run=0&clone=603&gen=46	failed - completed by next person
https://apps.foldingathome.org/wu#project=14549&run=0&clone=514&gen=54	failed - completed by next person
https://apps.foldingathome.org/wu#project=11746&run=0&clone=6303&gen=43	failed - completed by next person
https://apps.foldingathome.org/wu#project=16434&run=388&clone=2&gen=12	failed - completed by next person
https://apps.foldingathome.org/wu#project=14444&run=0&clone=170&gen=6	failed - completed by next person
https://apps.foldingathome.org/wu#project=14438&run=0&clone=808&gen=3	failed - completed by next person
https://apps.foldingathome.org/wu#project=14436&run=2137&clone=3&gen=8	failed - 2 failures before completion
https://apps.foldingathome.org/wu#project=14549&run=0&clone=1461&gen=39	failed - completed by next person
https://apps.foldingathome.org/wu#project=16444&run=0&clone=526&gen=5	failed - no other return
https://apps.foldingathome.org/wu#project=13876&run=0&clone=1840&gen=60	failed - completed by next person
https://apps.foldingathome.org/wu#project=14436&run=2526&clone=0&gen=26	failed - completed by next person
https://apps.foldingathome.org/wu#project=14561&run=0&clone=1307&gen=18	failed - 4 failures before completion
https://apps.foldingathome.org/wu#project=16434&run=328&clone=0&gen=22	failed - 2 failures before completion
https://apps.foldingathome.org/wu#project=14436&run=772&clone=0&gen=31	failed - completed by next person
Picking some of the last WUs from the other failed users in the above:

Code: Select all

https://apps.foldingathome.org/wu#project=11762&run=0&clone=6151&gen=31	failed - completed by next person
https://apps.foldingathome.org/wu#project=14549&run=0&clone=1217&gen=0	failed - completed by next person
https://apps.foldingathome.org/wu#project=14543&run=0&clone=359&gen=38	failed - completed by next person (me!)
https://apps.foldingathome.org/wu#project=16421&run=0&clone=2459&gen=9	failed - completed by next person
https://apps.foldingathome.org/wu#project=14251&run=102&clone=2&gen=6	failed - no other return
https://apps.foldingathome.org/wu#project=16443&run=0&clone=276&gen=7	failed - no other return
https://apps.foldingathome.org/wu#project=11779&run=0&clone=8875&gen=43 failed - 4 failures before completion
https://apps.foldingathome.org/wu#project=11761&run=0&clone=5160&gen=16	failed - 2 failures before completion by 3 others
https://apps.foldingathome.org/wu#project=16804&run=13&clone=144&gen=0	failed - completed by next person
https://apps.foldingathome.org/wu#project=16804&run=53&clone=408&gen=9	failed - completed by next person
https://apps.foldingathome.org/wu#project=11761&run=0&clone=3118&gen=48	failed - 2 failures before completion by 2 others and 1 dumped
https://apps.foldingathome.org/wu#project=11759&run=0&clone=227&gen=61	failed - 3 failures before completion by 6 others
This rather rudementary and ad-hoc delve into a few users and WUs shows an enormous number of failures. All but 4 were subsequently completed successfully, those 4 are still waiting, so probably not bad WUs either.
Just following the tree from WU to user to WUs to users you quickly come across hundreds of failed WUs failed by a few users. I've seen similar patterns in BOINC projects where the WU results are somewhat easier to browse and you're not limited to just the last WU for a client.

Obviously, anyone can have a bad spell and produce a few failures, myself included, but the above is shameful.

What can be done to stop this having a huge detrimental effect on the project?

Isn't there a risk that WUs get marked as bad if there are too many bad clients out there? Also, a high proportion get completed by more than 1 person - so there's also a duplication of work and effort.

When WUs are in short supply and assignment servers under such heavy load with a surplus of good clients sitting idle, shouldn't some blacklisting be implemented?
HaloJones
Posts: 920
Joined: Thu Jul 24, 2008 10:16 am

Re: Users with many failed WUs

Post by HaloJones »

There are many contributors who treat the project as a fire and forget one. Install, fold anonymously, never check their points (obvs) and assume it's all ok.
single 1070

Image
Joe_H
Site Admin
Posts: 7868
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Users with many failed WUs

Post by Joe_H »

Too many errors in a row, and the client will shut down the folding slot at least until the next reboot.

On the server side, every so often a user requesting work does get blacklisted for too many errors.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
rwh202
Posts: 425
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: Users with many failed WUs

Post by rwh202 »

Joe_H wrote:Too many errors in a row, and the client will shut down the folding slot at least until the next reboot.

On the server side, every so often a user requesting work does get blacklisted for too many errors.
Cool, good to know that something is in place to limit 'rogue' clients.

Do researchers get to see stats on percentages of WUs returned 'faulty'? I wonder how big the problem is in reality. A 'good' clients should have no problem doing 99% successful returns - it wouldn't take many 'bad' ones to bring that down.

With regards to blacklisting clients - is that manual or automatic? I can see a way in which you could 'game' the system with partial credits for failed WUs that would earn significant PPD - I'm hoping that would quickly be quashed.
Joe_H
Site Admin
Posts: 7868
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Users with many failed WUs

Post by Joe_H »

The researchers do get various stats on the WUs, including how many are returned faulty. Some of the file overhead in the current version of the GPU folding core is to collect information on what types of faults occurred in processing the faulty returns. Normally they try to keep the faulty rate below 1-2%.

I don't know much of the details on the blacklisting of clients on the server side beyond that it includes both manual and automatic methods.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
vtankovich
Posts: 8
Joined: Thu Apr 30, 2020 4:39 pm

Re: Users with many failed WUs

Post by vtankovich »

I've noticed that I have a small chance to get an error when I try to use the GPU while it's folding - games or even highres video playing or a lot of "heavy" tabs open at the same time.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Users with many failed WUs

Post by bruce »

It turns out that people who fail WUs like that insist that their overclock is "completely stable" and if challenged probably won't back off. The only answer I know of is to blacklist them, but FAH doesn't like to do that.

It should be noted that people who appear to have lots of clients are probably just reinstalling the client software. Every time you reinstall the software, the servers see it as a new client and have no way of recognizing and removing the defunct client that has been overwritten other than noting that it has been a long time since that client completed any work.
Post Reply