Feature Request: Pause at next checkpoint

Moderators: Site Moderators, FAHC Science Team

foldinghomealone2
Posts: 148
Joined: Sun Jul 30, 2017 8:40 pm

Re: Feature Request: Pause at next checkpoint

Post by foldinghomealone2 »

iceman1992 wrote:
foldinghomealone2 wrote:Hence my! general recommendation: never ever pause a GPU slot.
:::
That would be (I would guess) the easiest update that can solve this problem
It would solve a non-existing problem.
If you don't want to fold then don't fold. If you make a break for several hours someone else with a fast GPU would have finished it in this time.
Pausing a GPU slot slows down progress in research (and drops your PPD significantly).

FAH wants you to return a WU as fast as possible and therefore they offer the non-linear QRB (quick return bonus)
Why you think pausing a GPU slot is a good idea?

As long as FAH doesn't support 'streaming' of WUs, pausing is bad.

Maybe my points are a little bit exaggerated but you get my point, I hope.
Crawdaddy79
Posts: 73
Joined: Sat Mar 21, 2020 3:56 pm

Re: Feature Request: Pause at next checkpoint

Post by Crawdaddy79 »

foldinghomealone2 wrote:If you don't want to fold then don't fold.
You seem to be quite passionate about people not using the pause feature. I think throwing the baby out with the bathwater is not a good strategy for F@H's larger mission. Pausing at next checkpoint would make folding more efficient because those of us that use the pause feature would not be re-doing work we've already done.

...

A logfile update might be an okay feature to add - unless it's already there with the verbose option (I will enable verbosity level 5 now to see).

EDIT: It does not log checkpointing. I do notice that at every 5% my GPU cooler spins down for about 10 seconds, so it makes sense that it checkpoints every 5%.
Image
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Feature Request: Pause at next checkpoint

Post by PantherX »

Crawdaddy79 wrote:...I do notice that at every 5% my GPU cooler spins down for about 10 seconds, so it makes sense that it checkpoints every 5%.
Before writing the checkpoint for the GPU WU, verification needs to happen which is done by the CPU. Thus, you may see a drop in GPU usage and a spike in CPU Usage.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Crawdaddy79
Posts: 73
Joined: Sat Mar 21, 2020 3:56 pm

Re: Feature Request: Pause at next checkpoint

Post by Crawdaddy79 »

Except my CPU is pegged at 100% because it's folding too. :)
Image
iceman1992
Posts: 527
Joined: Fri Mar 23, 2012 5:16 pm

Re: Feature Request: Pause at next checkpoint

Post by iceman1992 »

foldinghomealone2 wrote:If you don't want to fold then don't fold.
If everyone thought like you, we wouldn't be running at 2+ exaflops right now. What a narrow-minded way of thinking.
F@H's original idea was to use spare compute, not dedicated compute. Not everyone who contributed did it with a dedicated rig.
Are you saying their contributions are not valuable?
By that same thinking, you should tell all the donors with Core 2 Duo to stop folding, because a Threadripper can finish their WUs probably 100x faster,
and those with old GTX500s/HD7000s not to bother at all because an RTX2080Ti can do it much faster.
foldinghomealone2
Posts: 148
Joined: Sun Jul 30, 2017 8:40 pm

Re: Feature Request: Pause at next checkpoint

Post by foldinghomealone2 »

Let me put it in a different perspective:
Having 2 ExaFlops doesn't mean anything. Currently we make 0 nanoflops on research for cancer, Alzheimer's and so on.
I'm not quite convinced that protein folding will help much in this Covid-19 crisis. I think that different short-term solutions are needed.
However, I think that protein folding is a very good tool to tackle long-term problems. Cancer will still kill people when Covid-19 won't be remembered.

And when this Covid-19 folding hype is over in a few weeks, the dedicated rigs will drive folding forward as they did before.

I think we have to be honest to the folding community. Saying that a Core 2 Duo does great help is a lie. Maybe a few hundred-thousand would be. But then other factors like power consumption etc have to be considered as well.

Therefore my opinion is to fold with power efficient hardware only and to fold dedicated. (not to be understood as to fold with dedicated rigs).
With dedicated I mean that you take a WU and you fold it as fast as possible and return it.
In a relay race you wouldn't take the baton and then make a break, would you?
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Feature Request: Pause at next checkpoint

Post by Neil-B »

OK .. So I cpu fold with kit that is four years old .. your opinion is that I should stop folding? - given that CPU are power efficient than GPUs - given the CPUs whilst Xeons are not particularly power efficient as CPUs go? - given that only power efficient hardware should be allowed to fold.

In the big scheme of things folding is not a just one relay - it is thousands/millions of relays .. feel free to have the faster 100 people race thousands/millions of relays each .. but getting the tens of thousands of fun runners to each run a mile in relay format and you will complete a whole lot more relays that way !!

Oh, and while you are at it, why not ban all folding rigs or even all none HPC from folding? … why not just distribute folding across the HPC community? … FAH has been at its core about mass participation - yes there are some very keen enthusiasts which is awesome - but there have always been those who have been welcome to give what little they can as long as it reaches a minimum standard of "by the expiration date".

If your opinion is the direction that FAH team chooses to go in the future then I will graciously stop contributing my time and electricity and goodwill and accept that as a decision they have made.
Last edited by Neil-B on Sun Apr 12, 2020 2:05 pm, edited 1 time in total.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
anandhanju
Posts: 526
Joined: Mon Dec 03, 2007 4:33 am
Location: Australia

Re: Feature Request: Pause at next checkpoint

Post by anandhanju »

This enhancement request has been logged at https://github.com/FoldingAtHome/fah-issues/issues/1268 . I've added a link back to this thread for additional rationale and purpose.

Edit: Corrected link to issue. Thanks uyaem
Last edited by anandhanju on Sun Apr 12, 2020 1:26 pm, edited 1 time in total.
uyaem
Posts: 222
Joined: Sat Mar 21, 2020 7:35 pm
Location: Esslingen, Germany

Re: Feature Request: Pause at next checkpoint

Post by uyaem »

foldinghomealone2 wrote:FAH wants you to return a WU as fast as possible and therefore they offer the non-linear QRB (quick return bonus)
Why you think pausing a GPU slot is a good idea?

As long as FAH doesn't support 'streaming' of WUs, pausing is bad.

Maybe my points are a little bit exaggerated but you get my point, I hope.
Who said pausing is a good idea?
Your thinking would be correct if there was more computing power available than unsolved problems required, which isn't the case (despite sometimes no WUs being available).
And in that case, you'd also want the high-end systems to receive work before anyone else, which also isn't implemented.

Of course you don't want everyone to pause every WU indefinitely, but that is very hypothetical. :)
Slightly delayed work done > no work done.
So, for the sake of efficiency, you even want those who need to pause to lose as little work as possible. E.g. a reboot to apply a patch will take less time than in takes to re-compute lost GPU work.
anandhanju wrote:This enhancement request has been logged at viewtopic.php?f=16&t=34239 . I've added a link back to this thread for additional rationale and purpose.
I think you added the wrong link here, it just leads back to page #1 of this thread.
Image
CPU: Ryzen 9 3900X (1x21 CPUs) ~ GPU: nVidia GeForce GTX 1660 Super (Asus)
anandhanju
Posts: 526
Joined: Mon Dec 03, 2007 4:33 am
Location: Australia

Re: Feature Request: Pause at next checkpoint

Post by anandhanju »

uyaem wrote:...I think you added the wrong link here, it just leads back to page #1 of this thread.
Thanks, corrected
foldinghomealone2
Posts: 148
Joined: Sun Jul 30, 2017 8:40 pm

Re: Feature Request: Pause at next checkpoint

Post by foldinghomealone2 »

I think there are three periods to be thought of:
- pre-Covid-19-folding hype
- Covid-19-folding hype
- post-Covid-19-folding hype (which I guess will be in latest 2 months):

I'm sure that the current folding hype will boost output and folders also post-Covid-19-folding-hype but I assume that the same will apply as before.

See a post I made September 19:
"Top 500 out of 8000 active folders (~>890kPPD) make 62,7% of all points
Top 1000 (~>455kPPD) make 77,9%
Top 1500 (~>259kPPD) make 86,2%"
viewtopic.php?f=16&t=31812&p=308889&hilit=quo+vadis#p308601

It shows that, for sure, everyone contributes but only a very small number of folders drive folding massively.
Currently, it is different, I agree.

Now, with enough WUs available but the assignment servers being the bottleneck, folding with 'slow' HW doesn't do anything bad.
But there were times and there will be again, that the WU generation is the bottleneck. And then folding with 'slow' hardware will put 'fast' HW into idle and then it is slowing down the system. Just be aware of that.

I don't state what FAH has to do. All that is just my opinion.

My opinion is biased by the following (not in any particular order)
- high electricity costs (0.30€/kWh)
- awareness of effects on environment
- that FAH uses a quick return bonus
- that there are current projects with timeouts of 1 day (like p1387x)
- that there is no such pause function as requested, although FAH exists for quite a while
The latter 3 indicate clearly that FAH is highly interested in quick returns. Therefore I follow that principle
Crawdaddy79
Posts: 73
Joined: Sat Mar 21, 2020 3:56 pm

Re: Feature Request: Pause at next checkpoint

Post by Crawdaddy79 »

You clearly aren't counting "Anonymous" in your stats, which far and away has completed more WUs than any active user. The username for the inefficient masses.

https://folding.extremeoverclocking.com ... p?s=&srt=4
anandhanju wrote:This enhancement request has been logged at https://github.com/FoldingAtHome/fah-issues/issues/1268 . I've added a link back to this thread for additional rationale and purpose.

Edit: Corrected link to issue. Thanks uyaem
Woohoo - I finally matter! :mrgreen: Thanks.
Image
iceman1992
Posts: 527
Joined: Fri Mar 23, 2012 5:16 pm

Re: Feature Request: Pause at next checkpoint

Post by iceman1992 »

foldinghomealone2 wrote:I think we have to be honest to the folding community. Saying that a Core 2 Duo does great help is a lie. Maybe a few hundred-thousand would be. But then other factors like power consumption etc have to be considered as well.
That is for the scientists to decide, if a machine returns a WU before the timeout then it is good work. If they feel things aren't fast enough then shorten the timeout. That's a target for us folders to meet. As long as we meet the timeout, what's the problem?

As Neil-B so nicely summed it up:
Neil-B wrote:getting the tens of thousands of fun runners to each run a mile in relay format and you will complete a whole lot more relays that way
Which is why we should be encouraging people to keep folding, and appreciating people who have the will to contribute, no matter how old their hardware is (as long as they meet the timeout), not putting them down for not having better resources.
foldinghomealone2 wrote:Therefore my opinion is to fold with power efficient hardware only and to fold dedicated. (not to be understood as to fold with dedicated rigs).
With dedicated I mean that you take a WU and you fold it as fast as possible and return it.
Okay, but that's a bit of a paradox. By your definition, folding dedicated is not dedicated rigs, but if someone uses the machine while it's folding, it will slow the progress down, depending on what they're doing it can almost stop the progress completely. So nobody should use the machine while it's folding. That makes it a sort-of dedicated rig.
Last edited by iceman1992 on Sun Apr 12, 2020 7:02 pm, edited 1 time in total.
foldinghomealone2
Posts: 148
Joined: Sun Jul 30, 2017 8:40 pm

Re: Feature Request: Pause at next checkpoint

Post by foldinghomealone2 »

Crawdaddy79 wrote:You clearly aren't counting "Anonymous" in your stats, which far and away has completed more WUs than any active user. The username for the inefficient masses.
Look at the monthly data and you will realize that 'Anonymous' wasn't as big as it is now.
https://folding.extremeoverclocking.com ... =&u=811139
foldinghomealone2
Posts: 148
Joined: Sun Jul 30, 2017 8:40 pm

Re: Feature Request: Pause at next checkpoint

Post by foldinghomealone2 »

iceman1992 wrote:
foldinghomealone2 wrote:I think we have to be honest to the folding community. Saying that a Core 2 Duo does great help is a lie. Maybe a few hundred-thousand would be. But then other factors like power consumption etc have to be considered as well.
That is for the scientists to decide, if a machine returns a WU before the timeout then it is good work. If they feel things aren't fast enough then shorten the timeout. That's a target for us folders to meet. As long as we meet the timeout, what's the problem?
Like I stated before it is not an issue as long as there are enough WUs available. But that happened before and it will happen again after the hype.

"That's a target for us folders to meet": How do you want to influence that as user?
Let's assume you have a slow GPU, like a 1050 Ti, you start folding. The you pause because you want to shutdown over night, the next day sometime you start folding again. And then you realize that the timeout is 1 day like for p13876 and you're ... not happy because you won't receive any bonus points.

Projects with short timeout shouldn't be distributed to slow GPUs.

Even slow GPUs can return a WU within the timeout. As long as you don't press the pause button...
Post Reply