Workaround for WU starvation on Manjaro Linux

FAH provides a V7 client installer for Debian / Mint / Ubuntu / RedHat / CentOS / Fedora. Installation on other distros may or may not be easy but if you can offer help to others, they would appreciate it.

Moderators: Site Moderators, FAHC Science Team

Post Reply
pcwolf
Posts: 36
Joined: Fri Apr 03, 2020 4:49 pm

Workaround for WU starvation on Manjaro Linux

Post by pcwolf »

My first post, I could not locate similar behavior searching F@H forum.

F@H Client 7.6.9, GPU NVidia RTX 2070, kernel 5.6.8

I fold 24x7 and when I wake in the morning the F@H client is still chugging away and overnight stats show credits.

When I am awake at the machine, I keep an idle eye on progress and check in when WU completes at 99% and requests a new WU.
I am well aware that popular press has hugely increased the number of active Folders and understandably demand for WUs is immense these days. And considerable patience called for.

QUESTION:
When the log shows "No WU available for this configuration" I check the GPU for time of "Next attempt." It seems that the interval between requests *increases* as the number of Unavailable attempts continues. 1 minute, 2 minutes, 5 minutes, 10 minutes ... etc etc. Is this working as designed to protect the F@H servers by spacing out unfillable WU requests?

OBSERVATION:
The "No WU available for this configuration" pings back and forth between F@H servers 18.218.x.x and 65.254.x.x, then, when finally connecting, it downloads from Work Unit Server 3.188.x.x. I am at a loss to understand why this hierarchy is chosen, but doesn't really matter whether I understand or not.

WORKAROUND:
When I become impatient waiting for WU downloads (i.e. considerable minutes/hours passing not Folding) I have found if I go to Manjaro System Settings and go to the SystemD tab, I can restart the "foldingathome.service" and when both the service and F@H Client return ... *BOOM* I immediately receive a new WU. :D This behavior is consistent and repeatable. I have two GPUs Folding and the previously engaged slot goes immediately back to a checkpoint and resumes flawlessly.

I do not understand what is happening, but I do know it is happening. By the way, on Manjaro, the general installation is working flawlessly without hiccup, and took me about a half hour to install and bring up when I started Folding last month. The ARCH wiki has detailed step-by-step, and AUR updates Foldingathome regularly.
Image
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Workaround for WU starvation on Manjaro Linux

Post by PantherX »

The next attempt uses an exponential timer. It was originally meant to deal with the situation where the Server had physical issues and needed manpower and lots of hours to fix. However, the latest version, 7.6.13 has the upper-limit to 1 hour so you can give that a go if you want.

The 18.218.x.x and 65.254.x.x are the Assignment Servers and will always be the first point of contact for the clients. The AS then directs your client to the best possible Work Server to get a WU.

Restarting the client resets the timer which means more attempts to get a WU leading to a higher probability of getting a WU. However, if the Servers are under load, this adds to the issue hence the 7.6.13 has the upper limit of 1 hour to remove the manual resetting of the timer.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Workaround for WU starvation on Manjaro Linux

Post by Neil-B »

... and the current (possibly still beta but thought it had been released) FAHCLient 7.6.13 https://foldingathome.org/beta/ has I believe a changed profile for the retry timer with a maximum wait now being one hour rather than six - but I may be wrong on that (seem to recall it said something to this effect in the release notes).
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
pcwolf
Posts: 36
Joined: Fri Apr 03, 2020 4:49 pm

Re: Workaround for WU starvation on Manjaro Linux

Post by pcwolf »

Thank you once again for the very sound advice, PantherX!

As I mentioned, the AUR updates F@HClient as soon as it is released; I got 7.6.13 as a regular update a few days ago.
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Workaround for WU starvation on Manjaro Linux

Post by bruce »

pcwolf wrote:When I become impatient waiting for WU downloads (i.e. considerable minutes/hours passing not Folding) I have found if I go to Manjaro System Settings and go to the SystemD tab, I can restart the "foldingathome.service" and when both the service and F@H Client return ... *BOOM* I immediately receive a new WU. :D This behavior is consistent and repeatable. I have two GPUs Folding and the previously engaged slot goes immediately back to a checkpoint and resumes flawlessly.
You may (or may not) be guilty of biased perception. Restarting the service does initiate a fresh attempt to get work rather than waiting up to an hour for the next automatic attempt, but I know of no reason why the restart would be any more likely to succeed than if the next attempt was initiated by the timer. It would seem most likely that the client simply says to the server "I/m asking for a new work unit for my hardware ( ... description)" rather than the request being equivalent to "I'm asking again for for a new work unit for my hardware ( ... description)" Why would the "again" message (if it's there) actually reduce your chances of getting a new assignment?
Post Reply