The annoying "restart" incident.

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

Post Reply
Ibringapples
Posts: 42
Joined: Fri Apr 10, 2020 3:53 am

The annoying "restart" incident.

Post by Ibringapples »

Hello all,

Due to the shortage of GPU WU I have to restart the linux FAH client.

When I do, the feeling of frustration is quite unpleasant because the service just doesn't obey me.

I have to kill the process to try to restart it. Or.. when finally I can stop it the service is up by itself. So I don't feel I have the control about this service.

Does anyone has a good way to do it?

Thanks a lot.:)
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: The annoying "restart" incident.

Post by JimboPalmer »

I have not noticed a shortage of GPU WUs, since about May 12. I run Wndows boxes, no disease specified, 3 Nvidia GPUs, two Pascals and a Turring. No Beta, no Advanced.

Is there a chance you are restricting your WUs in some way?

here is how to post your log

viewtopic.php?f=24&t=26036
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
NRT_AntiKytherA
Posts: 111
Joined: Sun May 10, 2020 11:50 pm

Re: The annoying "restart" incident.

Post by NRT_AntiKytherA »

I have to kill the process to try to restart it. Or.. when finally I can stop it the service is up by itself. So I don't feel I have the control about this service.

Does anyone has a good way to do it?
Simplest would be to restart your machine which will signal the client to terminate gracefully preserving any running CPU work unit to the last save point.

More complex, restart the service and fahclient using systemd. anyhow pay attention to bruce's thoughts on this subject:
bruce wrote:
pcwolf wrote:When I become impatient waiting for WU downloads (i.e. considerable minutes/hours passing not Folding) I have found if I go to Manjaro System Settings and go to the SystemD tab, I can restart the "foldingathome.service" and when both the service and F@H Client return ... *BOOM* I immediately receive a new WU. :D This behavior is consistent and repeatable. I have two GPUs Folding and the previously engaged slot goes immediately back to a checkpoint and resumes flawlessly.
You may (or may not) be guilty of biased perception. Restarting the service does initiate a fresh attempt to get work rather than waiting up to an hour for the next automatic attempt, but I know of no reason why the restart would be any more likely to succeed than if the next attempt was initiated by the timer. It would seem most likely that the client simply says to the server "I/m asking for a new work unit for my hardware ( ... description)" rather than the request being equivalent to "I'm asking again for for a new work unit for my hardware ( ... description)" Why would the "again" message (if it's there) actually reduce your chances of getting a new assignment?
Ibringapples
Posts: 42
Joined: Fri Apr 10, 2020 3:53 am

Re: The annoying "restart" incident.

Post by Ibringapples »

Hello,

You're right...

Simply they are not running properly (WU)

Here the logs:

https://pastebin.com/cq0qB5Fd


Can you help me?

Thanks a lot. :)

---update--- 01

Now the only one that is not running is the CPU WU :!: :?:
But i'ts unstable. Suddenly 2 days :?:

---update--- 02

Now.. all are running but I've lost 2 CPU from the 4 ones I have.

Code: Select all

~$ nproc --all
4
:?:
Last edited by Ibringapples on Wed May 27, 2020 2:01 pm, edited 2 times in total.
Ibringapples
Posts: 42
Joined: Fri Apr 10, 2020 3:53 am

Re: The annoying "restart" incident.

Post by Ibringapples »

NRT_AntiKytherA wrote:
I have to kill the process to try to restart it. Or.. when finally I can stop it the service is up by itself. So I don't feel I have the control about this service.

Does anyone has a good way to do it?
Simplest would be to restart your machine which will signal the client to terminate gracefully preserving any running CPU work unit to the last save point.

But.. reboot the machine could be a problem cause I have other services running inside.

Then, maybe with systemd? I have OpenRC and systemd but I prefer OpenRC...

Thanks. :)
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: The annoying "restart" incident.

Post by bruce »

Ibringapples wrote:Now the only one that is not running is the CPU WU :!: :?:
But i'ts unstable. Suddenly 2 days :?: [/quote]
Now.. all are running but I've lost 2 CPU from the 4 ones I have.
Each GPU requires one CPU thread to send and receive data between main RAM and the GPU. With 2 GPUs and 4 CPUs you can fold with the remaining two.
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: The annoying "restart" incident.

Post by MeeLee »

You can safely run a script to use systemd to restart the service.
You can also use ssh to start it remotely.
Supposedly fahcontrol has a way to connect to a remote client.
Ibringapples
Posts: 42
Joined: Fri Apr 10, 2020 3:53 am

Re: The annoying "restart" incident.

Post by Ibringapples »

MeeLee wrote:You can safely run a script to use systemd to restart the service.
You can also use ssh to start it remotely.
Supposedly fahcontrol has a way to connect to a remote client.
Actually no...

Code: Select all

~$ sudo /etc/init.d/FAHClient restart
Stopping fahclient ... OK
Starting fahclient ... FAIL
That's the awful issue here ...

:?:
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: The annoying "restart" incident.

Post by bruce »

If you're running one or more CPU based slots (FAHCore_a7) that's not true.
MeeLee wrote:You can safely run a script to use systemd to restart the service.
Unfortunately, there's a bug in FAHCore_a7 which fails to sync it's open files before shutting down. You have to pause all CPU slots and give them time to close their files.
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: The annoying "restart" incident.

Post by MeeLee »

I was under the assumption that you needed to use systemd for restarts.

Or, perhaps try Fahclient stop, and on another line fahclient start.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: The annoying "restart" incident.

Post by bruce »

When I PAUSE a FAHCore_a7 WU, it can watch ir process for a bit before reporting that it has completed the stopping process. I have not evaluated whether that time varies with the project but I'd guess that it might. You need to allow at least that long before restarting, whether or not you use systemd. I have not heard if the bug will be fixed in the next version of the FAHCore, but I sure hope so.
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: The annoying "restart" incident.

Post by MeeLee »

I never had any issues on my system using the 'restart' function in terminal.
however, you could use the 'sleep' command to pause the script for an x-amount of seconds before going to the next
Eg:

Code: Select all

sudo /etc/init.d/fahclient stop
sleep 5
sudo /etc/init.d/fahclient start
Post Reply