Proposal: Allow to skip or reject work units

Moderators: Site Moderators, FAHC Science Team

Proposal: Allow to skip or reject work units

Postby Goetz » Tue Mar 10, 2020 5:11 am

Scenario: I noticed that work unit 11737 restarts from its beginning (0%) once I stop and restart it. It's the CORE22 test project, which would require 23h10m for completion on my computer using the GPU (NVIDIA:5 GM108 [MX110]). I use a laptop computer which sometimes has to be run on battery only. That stops folding. Therefore such a work unit never can be finished on my computer.
Proposal: I propose to add a "skip" or "reject" command (besides "fold", "pause" and "finish") in order to be able to give a work unit back to the community before the time expires within which the computer has to complete the project.
Goetz
 
Posts: 1
Joined: Tue Mar 10, 2020 4:54 am

Re: Proposal: Allow to skip or reject work units

Postby bruce » Tue Mar 10, 2020 5:27 am

All projects should restart from the most recent checkpoint. If you try to restart a project before it reaches the first checkpoint, then it's going to restart from 0%.k Checkpoints for CPU projects are based on a specific time interval which you can adjust.

The checkpoint interval for GPU projects (like 11737) is defined by the project owner based on the number of scientific samples needed per WU. That information is not as easy to obtain as it should be and an enhancement request has been submitted which might be included in a future code revision. I find that information two places. It's printed in the science log while the WU is being initialized. It can also be figured out if you peek at the work directory for that project.

Note: If you're on Windows and you shut down Windows without first doing a PAUSE, you're likely to corrupt the most recent checkpoint for the active WU.
bruce
 
Posts: 19816
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: Proposal: Allow to skip or reject work units

Postby JimboPalmer » Tue Mar 10, 2020 8:55 am

If you cannot complete Core_22 Work Units due to being a laptop, you can delete the GPU slot. Then you will only get Core_a7 WUs for your CPU.

You can manage a CPU WU to better cope with pauses.

(If you only occasionally do not want GPU WUs, you can pause just one slot, just right clink on it is the Advanced Control program)
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
JimboPalmer
 
Posts: 2016
Joined: Mon Feb 16, 2009 5:12 am
Location: Greenwood MS USA

Re: Proposal: Allow to skip or reject work units

Postby HaloJones » Tue Mar 10, 2020 12:42 pm

Allowing users to skip/reject units will simply encourage cherry-picking.
1x Titan X, 5x 1070, 1x 970, 1 x Ryzen 3600

Image
HaloJones
 
Posts: 852
Joined: Thu Jul 24, 2008 11:16 am

Re: Proposal: Allow to skip or reject work units

Postby scolphoy » Wed Apr 01, 2020 2:38 pm

Not sure if available in all versions, but at least the linux FAHClient has the --dump (wu|all) that will dump the work unit and inform the servers.

I looked into this just a moment ago, when I saw that my server had spent five days working on a WU, done about 40%, timeout later today, expiry in 3 days and eta was still about 7 days. I knew it would never make it on time and the client would dump it anyways in a few days when the expiry time is passed. My options as I saw would be:
  1. Not touch it, have it waste electricity for a few more days, then dump the expired WU and get new work.
  2. Shut down the box for a couple of days, save the energy, then dump the expired WU and get new work.
  3. Dump the WU now and get new work right away.
The third option seemed overall the most efficient, so I decided to go with that. I got a new WU with and the ETA for this shows just a few hours, so it seems something was wrong with the old WU.

In case someone on the team wants to look into the old WU, here's a log for when I dumped it:
Code: Select all
12:57:48:WARNING:Dumping WU00 per user request
12:57:48:WU00:FS00:Sending unit results: id:00 state:SEND error:DUMPED project:13821 run:237 clone:1 gen:83 core:0xa7 unit:0x0000006080fccb095c883992cea0aa06
12:57:49:WU00:FS00:Connecting to 155.247.166.219:8080
12:57:50:WU00:FS00:Server responded WORK_QUIT (404)
12:57:50:WARNING:WU00:FS00:Server did not like results, dumping


I recommend making dumping a WU simple, so that other people with a hopeless WU could also put that time and energy into better use. That way you learn sooner if you need to send that WU to someone else and the science can continue faster. I don't believe cherry picking would actually become a real issue.
scolphoy
 
Posts: 2
Joined: Wed Apr 01, 2020 1:32 pm

Re: Proposal: Allow to skip or reject work units

Postby Joe_H » Wed Apr 01, 2020 8:12 pm

scolphoy wrote:I recommend making dumping a WU simple, so that other people with a hopeless WU could also put that time and energy into better use. That way you learn sooner if you need to send that WU to someone else and the science can continue faster. I don't believe cherry picking would actually become a real issue.


For reasons already given here and elsewhere, that will not happen. Removal of a problem WU is already fairly easy. As for "cherry picking" becoming an issue, it has at times in the past. This project has been running for 20 years, this has been seen more than a few times. Some even caused problems that affected other folders ability to contribute.

As for the WU that was taking too long, you did not provide enough log data to tell, but it might have been one of a small batch that got created with several times the normal number of steps for that project. If you had noticed and posted earlier, we would have checked the log information to see if that was the problem, and given directions on how to get rid of the WU.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6523
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: Proposal: Allow to skip or reject work units

Postby scolphoy » Thu Apr 02, 2020 1:14 am

Ok. Cherry picking volunteer computing tasks seems very pointless and vain, but then again, people are known to do a lot of pointless stuff. We don't have to go deeper on this here.
If you say that this has happened before, I believe you.
scolphoy
 
Posts: 2
Joined: Wed Apr 01, 2020 1:32 pm

Re: Proposal: Allow to skip or reject work units

Postby susanreads » Thu Jul 02, 2020 10:16 am

I'm running one CPU slot on my laptop, and I've got a WU that won't finish by deadline (project 16805). I ran it all last night with nothing else running, TPF is just over an hour, timeout is on Saturday and deadline at 23:57Z on Sunday. Is there a way to reject it so that the server assigns it to someone else without waiting till Saturday?

I could just Stop Folding and save the electricity until it expires, but I'd rather someone with a faster machine could get on with it and I could get one that I might finish before timeout (I've completed 44, all before deadline until now, and the majority before timeout, so my slow machine isn't completely useless).

Joe_H says "Removal of a problem WU is already fairly easy" but I don't know how to do it, and I can't find anything useful in the foldingathome FAQ.
susanreads
 
Posts: 19
Joined: Sat Apr 04, 2020 8:57 pm

Re: Proposal: Allow to skip or reject work units

Postby ajm » Thu Jul 02, 2020 10:32 am

For dumping the WU, you pause the slot (or the whole thing as you have only that one slot), you delete the folder containing the WU in %AppData%\FAHClient\work in Windows or in var/lib/fahclient/work in Linux or in /Library/Application Support/FAHClient/work in MacOS. Once again, as you only have one slot, you also can delete the whole work folder. Then you restart the slot or FAH, and your client will download another WU.
For pausing and restarting, you use Advanced Control (aka FAHControl).
ajm
 
Posts: 552
Joined: Sat Mar 21, 2020 6:22 am
Location: Lucerne, Switzerland

Re: Proposal: Allow to skip or reject work units

Postby susanreads » Thu Jul 02, 2020 4:11 pm

Thanks ajm, that seems to have worked. There wasn't even much of a delay as it downloaded a new WU and now it's folding something of a reasonable size again.
susanreads
 
Posts: 19
Joined: Sat Apr 04, 2020 8:57 pm


Return to Discussions of General-FAH topics

Who is online

Users browsing this forum: No registered users and 0 guests

cron