Page 2 of 3

Re: Feature Request: Pause at next checkpoint

Posted: Sat Apr 11, 2020 11:05 pm
by foldinghomealone2
iceman1992 wrote:
foldinghomealone2 wrote:Hence my! general recommendation: never ever pause a GPU slot.
:::
That would be (I would guess) the easiest update that can solve this problem
It would solve a non-existing problem.
If you don't want to fold then don't fold. If you make a break for several hours someone else with a fast GPU would have finished it in this time.
Pausing a GPU slot slows down progress in research (and drops your PPD significantly).

FAH wants you to return a WU as fast as possible and therefore they offer the non-linear QRB (quick return bonus)
Why you think pausing a GPU slot is a good idea?

As long as FAH doesn't support 'streaming' of WUs, pausing is bad.

Maybe my points are a little bit exaggerated but you get my point, I hope.

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 12:23 am
by Crawdaddy79
foldinghomealone2 wrote:If you don't want to fold then don't fold.
You seem to be quite passionate about people not using the pause feature. I think throwing the baby out with the bathwater is not a good strategy for F@H's larger mission. Pausing at next checkpoint would make folding more efficient because those of us that use the pause feature would not be re-doing work we've already done.

...

A logfile update might be an okay feature to add - unless it's already there with the verbose option (I will enable verbosity level 5 now to see).

EDIT: It does not log checkpointing. I do notice that at every 5% my GPU cooler spins down for about 10 seconds, so it makes sense that it checkpoints every 5%.

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 12:54 am
by PantherX
Crawdaddy79 wrote:...I do notice that at every 5% my GPU cooler spins down for about 10 seconds, so it makes sense that it checkpoints every 5%.
Before writing the checkpoint for the GPU WU, verification needs to happen which is done by the CPU. Thus, you may see a drop in GPU usage and a spike in CPU Usage.

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 2:05 am
by Crawdaddy79
Except my CPU is pegged at 100% because it's folding too. :)

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 5:29 am
by iceman1992
foldinghomealone2 wrote:If you don't want to fold then don't fold.
If everyone thought like you, we wouldn't be running at 2+ exaflops right now. What a narrow-minded way of thinking.
F@H's original idea was to use spare compute, not dedicated compute. Not everyone who contributed did it with a dedicated rig.
Are you saying their contributions are not valuable?
By that same thinking, you should tell all the donors with Core 2 Duo to stop folding, because a Threadripper can finish their WUs probably 100x faster,
and those with old GTX500s/HD7000s not to bother at all because an RTX2080Ti can do it much faster.

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 10:42 am
by foldinghomealone2
Let me put it in a different perspective:
Having 2 ExaFlops doesn't mean anything. Currently we make 0 nanoflops on research for cancer, Alzheimer's and so on.
I'm not quite convinced that protein folding will help much in this Covid-19 crisis. I think that different short-term solutions are needed.
However, I think that protein folding is a very good tool to tackle long-term problems. Cancer will still kill people when Covid-19 won't be remembered.

And when this Covid-19 folding hype is over in a few weeks, the dedicated rigs will drive folding forward as they did before.

I think we have to be honest to the folding community. Saying that a Core 2 Duo does great help is a lie. Maybe a few hundred-thousand would be. But then other factors like power consumption etc have to be considered as well.

Therefore my opinion is to fold with power efficient hardware only and to fold dedicated. (not to be understood as to fold with dedicated rigs).
With dedicated I mean that you take a WU and you fold it as fast as possible and return it.
In a relay race you wouldn't take the baton and then make a break, would you?

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 12:31 pm
by Neil-B
OK .. So I cpu fold with kit that is four years old .. your opinion is that I should stop folding? - given that CPU are power efficient than GPUs - given the CPUs whilst Xeons are not particularly power efficient as CPUs go? - given that only power efficient hardware should be allowed to fold.

In the big scheme of things folding is not a just one relay - it is thousands/millions of relays .. feel free to have the faster 100 people race thousands/millions of relays each .. but getting the tens of thousands of fun runners to each run a mile in relay format and you will complete a whole lot more relays that way !!

Oh, and while you are at it, why not ban all folding rigs or even all none HPC from folding? … why not just distribute folding across the HPC community? … FAH has been at its core about mass participation - yes there are some very keen enthusiasts which is awesome - but there have always been those who have been welcome to give what little they can as long as it reaches a minimum standard of "by the expiration date".

If your opinion is the direction that FAH team chooses to go in the future then I will graciously stop contributing my time and electricity and goodwill and accept that as a decision they have made.

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 1:00 pm
by anandhanju
This enhancement request has been logged at https://github.com/FoldingAtHome/fah-issues/issues/1268 . I've added a link back to this thread for additional rationale and purpose.

Edit: Corrected link to issue. Thanks uyaem

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 1:18 pm
by uyaem
foldinghomealone2 wrote:FAH wants you to return a WU as fast as possible and therefore they offer the non-linear QRB (quick return bonus)
Why you think pausing a GPU slot is a good idea?

As long as FAH doesn't support 'streaming' of WUs, pausing is bad.

Maybe my points are a little bit exaggerated but you get my point, I hope.
Who said pausing is a good idea?
Your thinking would be correct if there was more computing power available than unsolved problems required, which isn't the case (despite sometimes no WUs being available).
And in that case, you'd also want the high-end systems to receive work before anyone else, which also isn't implemented.

Of course you don't want everyone to pause every WU indefinitely, but that is very hypothetical. :)
Slightly delayed work done > no work done.
So, for the sake of efficiency, you even want those who need to pause to lose as little work as possible. E.g. a reboot to apply a patch will take less time than in takes to re-compute lost GPU work.
anandhanju wrote:This enhancement request has been logged at viewtopic.php?f=16&t=34239 . I've added a link back to this thread for additional rationale and purpose.
I think you added the wrong link here, it just leads back to page #1 of this thread.

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 1:54 pm
by anandhanju
uyaem wrote:...I think you added the wrong link here, it just leads back to page #1 of this thread.
Thanks, corrected

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 2:03 pm
by foldinghomealone2
I think there are three periods to be thought of:
- pre-Covid-19-folding hype
- Covid-19-folding hype
- post-Covid-19-folding hype (which I guess will be in latest 2 months):

I'm sure that the current folding hype will boost output and folders also post-Covid-19-folding-hype but I assume that the same will apply as before.

See a post I made September 19:
"Top 500 out of 8000 active folders (~>890kPPD) make 62,7% of all points
Top 1000 (~>455kPPD) make 77,9%
Top 1500 (~>259kPPD) make 86,2%"
viewtopic.php?f=16&t=31812&p=308889&hilit=quo+vadis#p308601

It shows that, for sure, everyone contributes but only a very small number of folders drive folding massively.
Currently, it is different, I agree.

Now, with enough WUs available but the assignment servers being the bottleneck, folding with 'slow' HW doesn't do anything bad.
But there were times and there will be again, that the WU generation is the bottleneck. And then folding with 'slow' hardware will put 'fast' HW into idle and then it is slowing down the system. Just be aware of that.

I don't state what FAH has to do. All that is just my opinion.

My opinion is biased by the following (not in any particular order)
- high electricity costs (0.30€/kWh)
- awareness of effects on environment
- that FAH uses a quick return bonus
- that there are current projects with timeouts of 1 day (like p1387x)
- that there is no such pause function as requested, although FAH exists for quite a while
The latter 3 indicate clearly that FAH is highly interested in quick returns. Therefore I follow that principle

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 6:36 pm
by Crawdaddy79
You clearly aren't counting "Anonymous" in your stats, which far and away has completed more WUs than any active user. The username for the inefficient masses.

https://folding.extremeoverclocking.com ... p?s=&srt=4
anandhanju wrote:This enhancement request has been logged at https://github.com/FoldingAtHome/fah-issues/issues/1268 . I've added a link back to this thread for additional rationale and purpose.

Edit: Corrected link to issue. Thanks uyaem
Woohoo - I finally matter! :mrgreen: Thanks.

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 6:49 pm
by iceman1992
foldinghomealone2 wrote:I think we have to be honest to the folding community. Saying that a Core 2 Duo does great help is a lie. Maybe a few hundred-thousand would be. But then other factors like power consumption etc have to be considered as well.
That is for the scientists to decide, if a machine returns a WU before the timeout then it is good work. If they feel things aren't fast enough then shorten the timeout. That's a target for us folders to meet. As long as we meet the timeout, what's the problem?

As Neil-B so nicely summed it up:
Neil-B wrote:getting the tens of thousands of fun runners to each run a mile in relay format and you will complete a whole lot more relays that way
Which is why we should be encouraging people to keep folding, and appreciating people who have the will to contribute, no matter how old their hardware is (as long as they meet the timeout), not putting them down for not having better resources.
foldinghomealone2 wrote:Therefore my opinion is to fold with power efficient hardware only and to fold dedicated. (not to be understood as to fold with dedicated rigs).
With dedicated I mean that you take a WU and you fold it as fast as possible and return it.
Okay, but that's a bit of a paradox. By your definition, folding dedicated is not dedicated rigs, but if someone uses the machine while it's folding, it will slow the progress down, depending on what they're doing it can almost stop the progress completely. So nobody should use the machine while it's folding. That makes it a sort-of dedicated rig.

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 6:57 pm
by foldinghomealone2
Crawdaddy79 wrote:You clearly aren't counting "Anonymous" in your stats, which far and away has completed more WUs than any active user. The username for the inefficient masses.
Look at the monthly data and you will realize that 'Anonymous' wasn't as big as it is now.
https://folding.extremeoverclocking.com ... =&u=811139

Re: Feature Request: Pause at next checkpoint

Posted: Sun Apr 12, 2020 7:11 pm
by foldinghomealone2
iceman1992 wrote:
foldinghomealone2 wrote:I think we have to be honest to the folding community. Saying that a Core 2 Duo does great help is a lie. Maybe a few hundred-thousand would be. But then other factors like power consumption etc have to be considered as well.
That is for the scientists to decide, if a machine returns a WU before the timeout then it is good work. If they feel things aren't fast enough then shorten the timeout. That's a target for us folders to meet. As long as we meet the timeout, what's the problem?
Like I stated before it is not an issue as long as there are enough WUs available. But that happened before and it will happen again after the hype.

"That's a target for us folders to meet": How do you want to influence that as user?
Let's assume you have a slow GPU, like a 1050 Ti, you start folding. The you pause because you want to shutdown over night, the next day sometime you start folding again. And then you realize that the timeout is 1 day like for p13876 and you're ... not happy because you won't receive any bonus points.

Projects with short timeout shouldn't be distributed to slow GPUs.

Even slow GPUs can return a WU within the timeout. As long as you don't press the pause button...