Default using all CPUs seems flawed

Moderators: Site Moderators, FAHC Science Team

Post Reply
Alex_Atkin
Posts: 37
Joined: Mon Oct 24, 2022 4:32 am

Default using all CPUs seems flawed

Post by Alex_Atkin »

How v8 handles resource allocation seems rather flawed. Leaving it at default I got a 3 CPU core job reach 66%, then it got a GPU job (probably Alzeimers) which took over all resources, leading to the CPU only job potentially never finishing.

I realise I can reduce the CPU to 1 and add a CPU only virtual peer, but then I lose the ability to use all CPU cores for a GPU job that CAN make use of them.

I'm not really sure what the solution to this could be though. Presumably the long-term goal is to have all GPU jobs utilise all CPU cores rather than differentiating between the two. But can this even be done effectively given a GPUs performance is so many times greater?

Is there an explanation somewhere for why this is better than the old system?
Image
calxalot
Site Moderator
Posts: 871
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Default using all CPUs seems flawed

Post by calxalot »

I believe there are currently no hybrid CPU + GPU cores.
The resource system was designed to use them in the future.

Yes, if you create separate resource groups for each GPU, you will have to reduce the available CPUs in the default group.
It may be best to just have the default group with all resources.

If you wish to change allocation among groups, it can be helpful to set everything to finishing and reconfigure after groups are paused.

To only do do GPU folding, you might be able to set CPUs to 1 in each GPU resource group.
Set CPUs zero in default group.
I have not tried this because I'm just running macOS right now.
Alex_Atkin
Posts: 37
Joined: Mon Oct 24, 2022 4:32 am

Re: Default using all CPUs seems flawed

Post by Alex_Atkin »

I understand that, what seemed to go wrong is that for some reason it picked a CPU job, then some time later a GPU one and it paused the CPU job indefinitely claiming there were insufficient resources.

Thinking back, maybe this was because like past versions it disables the GPU by default and when I enabled it the GPU job took priority leaving the CPU job no way to finish? Perhaps there needs to be some failsafe for this first configuration scenario so the CPU job finishes before it pulls down a GPU job? Or ask if you want to enable the GPU before even looking for jobs at all?
Image
calxalot
Site Moderator
Posts: 871
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Default using all CPUs seems flawed

Post by calxalot »

The 8.1.11 client is supposed to allow the cpus for a WU to be changed without claiming insufficient resources.

It's possible the assignment had a minimum cpus that was no longer met.

This sounds like a bug you might want to report.

https://github.com/FoldingAtHome/fah-cl ... tet/issues
calxalot
Site Moderator
Posts: 871
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Default using all CPUs seems flawed

Post by calxalot »

Meanwhile, you might need to set everything to finishing for the situation to clear up. If so, that would be a separate bug.

It would be good to know if another GPU WU is started without running the stalled CPU WU.
Alex_Atkin
Posts: 37
Joined: Mon Oct 24, 2022 4:32 am

Re: Default using all CPUs seems flawed

Post by Alex_Atkin »

Being effectively in the same slot I couldn't figure out how to pause the GPU unit to let it finish so I ended up clearing it, never occurred to me Finishing might do it. Worth noting for the future as I hate dropping WUs.
Image
Alex_Atkin
Posts: 37
Joined: Mon Oct 24, 2022 4:32 am

Re: Default using all CPUs seems flawed

Post by Alex_Atkin »

Its happened again just letting F@H manage its own usage.
Image
Image
calxalot
Site Moderator
Posts: 871
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Default using all CPUs seems flawed

Post by calxalot »

I think a GPU WU assignment taking more than 1 cpu is a bug. It could be an assignment server bug or misconfiguration.

Please report it as a client issue.
Hou5e
Posts: 16
Joined: Tue Apr 21, 2020 10:29 am

Re: Default using all CPUs seems flawed

Post by Hou5e »

Until it gets fixed, you could setup Resource Groups to have separate 'Slot'-like behavior for your CPU and GPU, see GitHub Issue #52 for more info about enabling it.
Alex_Atkin
Posts: 37
Joined: Mon Oct 24, 2022 4:32 am

Re: Default using all CPUs seems flawed

Post by Alex_Atkin »

I tried that and yes it fixes that problem, but then the / peers are not reported in the initial json from the websocket connection which breaks my ability to monitor my folding boxes as I have no idea how to request the data for that virtual peer.

Obviously long-term I need to learn how to properly use the websockets properly, but of course there will no documentation until after the beta. Right now with a dirty hack (constantly closing/opening the websocket) I was able to include v8 alongside my v7 monitoring. Without that, I can't see if something is wrong.
Image
calxalot
Site Moderator
Posts: 871
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Default using all CPUs seems flawed

Post by calxalot »

You access the resource group peer with a separate websocket. Append the group name to ws: url. Example

ws://127.0.0.1:7396/api/websocket/myrg
Alex_Atkin
Posts: 37
Joined: Mon Oct 24, 2022 4:32 am

Re: Default using all CPUs seems flawed

Post by Alex_Atkin »

Thanks, it seems the problem will be fixed in the next release so I will probably just leave it alone as it seems somewhat random and as long as the CPU WU doesn't reach the timeout, its just a slight delay (relative the CPU WU timeouts) getting the WU finished.
Image
Post Reply