Page 1 of 2

Initial impressions for fast ARM hardware

Posted: Sun Jan 10, 2021 3:01 am
by ky0ko
As of recently, i am the proud owner of a Mac Mini based on the Apple M1 ARM chip, which is remarkably speedy.

I found out a few days ago that folding@home has support for ARM platforms now, and i have an Ubuntu aarch64 VM set up on that machine, so i went ahead and installed it and got folding. It seems to pretty steadily maintain an estimate of 40,000 ppd, plus or minus about 2000.

After a couple of days folding on that, I then ran the version for macOS under Rosetta2 emulation for comparison. Over 24 hours it has been steadily hovering at 55,000 ppd, plus or minus about 1000.

It seems to me that if folding under dynamic translation from x86_64 returns about 37% more ppd than the native ARM client under a hypervisor, then there must be quite a bit of room for improvement! Hopefully this margin decreases in the future as FahCore and its dependencies continue to receive work. I fully intend to continue testing both versions against each other on this machine with future updates.

Nonetheless, i am very happy that arm support is here, as i have quite a bit of arm hardware lying around that can dedicate free CPU time to the project.

Re: Initial impressions for fast ARM hardware

Posted: Sun Jan 10, 2021 5:37 pm
by bruce
A primary goal for FAH is to maximize the total science being produced. The existing ARM support will allow many new systems to come on-line without modification. If ARM support can be improved as you suggest, I'm sure there will be immediately be a project started to do that.

Re: Initial impressions for fast ARM hardware

Posted: Mon Jan 25, 2021 9:45 pm
by xnupanic
My first impressions on ARM hardware are pretty good. I am running on an Nvidia AGX Xavier, and on its first WU it estimates 34k PPD on its 8-core CPU. However, I would be REALLY interested to see what this can do if we can figure out how to get FAHClient to use its in-SoC 512-core Volta GPU. When FAHClient runs, it detects the CUDA device, but it can't use it, probably for a couple of reasons. First, the GPU does not have OpenCL support right now, and as I understand it, this is required even if you're going to just do CUDA jobs. Second, I am not sure how the GPU is actually interfaced with the system. "lshw" doesn't show it, and "lspci" doesn't show it, so there's no way its in gpus.txt. I do know its a GV11B GPU, I think derived from a GV110, so it should be technically capable of running WUs on it. When I am less tired I will start a new thread for supporting this board and its GPU and include logs and system info, and happily help however I can to get it running. But for now, its super cool to have the Xavier crunching ARM64 WUs.

Re: Initial impressions for fast ARM hardware

Posted: Tue Feb 02, 2021 9:48 am
by sptn.
ky0ko wrote:[...]
After a couple of days folding on that, I then ran the version for macOS under Rosetta2 emulation for comparison. Over 24 hours it has been steadily hovering at 55,000 ppd, plus or minus about 1000.[...]
Can you say something about the power draw? I guess it is way lower than any Windows machine running these scores.

Re: Initial impressions for fast ARM hardware

Posted: Tue Feb 02, 2021 9:34 pm
by bruce
xnupanic wrote:... However, I would be REALLY interested to see what this can do if we can figure out how to get FAHClient to use its in-SoC 512-core Volta GPU.
This would be a pretty big development project ... and there's quote a backlog of good ideas competing for a limited development budget.

Personally, I don''t have any functional ARM hardware so testing out new ideas is impossible for me ... and there's no assurance it's a workable idea. FAH Development works first on projects that increase scientific production so lets start with an estimate of how many new clients would be able to provide the resources of the 512 core Volta.

Re: Initial impressions for fast ARM hardware

Posted: Fri Feb 05, 2021 10:28 pm
by hs42
My Mac Mini M1 256GB 8GB measures 24 watts with folding power set at 'FULL'. Estimated PPD is 66,775. Measured with Emporia smart outlet.

Re: Initial impressions for fast ARM hardware

Posted: Sat Feb 06, 2021 2:56 am
by Joe_H
hs42 wrote:My Mac Mini M1 256GB 8GB measures 24 watts with folding power set at 'FULL'. Estimated PPD is 66,775. Measured with Emporia smart outlet.
You may get a bit higher PPD if you either set the client to Light, or set the CPU thread count to 4. On Full the folding uses both the high power and lower power cores, and the low power cores can slow down the overall rate of calculation.

Folding on your Mac Mini may get more efficient sometime its the future. Right now the client nd folding core use Intel code that is run through Rosetta 2. Work is being done to provide a native core and client which should run a bit faster.

Re: Initial impressions for fast ARM hardware

Posted: Sat Feb 06, 2021 5:55 am
by belloq
I was checking in on thread about the M1 systems from Apple. Back in November, a user was unable to get FAH to run on their system as it seems the downloaded core was the wrong one. Did FAH make changes to the Mac client to allow it to run under Rosetta2?

Re: Initial impressions for fast ARM hardware

Posted: Sat Feb 06, 2021 6:23 am
by Joe_H
I don't know of any specific changes, but one of the developers did comment on that incorrect core download. Possibly that led to adjustments on the server side to get the correct core to download. A few have reported success folding since then.

Re: Initial impressions for fast ARM hardware

Posted: Fri Oct 22, 2021 3:23 am
by MeeLee
Will be interested in the M1 Pro and M1 max results.
If the theory is true of it being up to 3x faster than the m1, we might see PPD values equal to those of budget GPUs (think 200-500k ppd, for power consumption values of about 65W.
Best efficiency of any cpu (partially thanks to the 5nm design), but still not beating gpus with hundreds or thousands of shaders.

Re: Initial impressions for fast ARM hardware

Posted: Fri Dec 10, 2021 3:33 pm
by belloq
MeeLee wrote:Will be interested in the M1 Pro and M1 max results.
If the theory is true of it being up to 3x faster than the m1, we might see PPD values equal to those of budget GPUs (think 200-500k ppd, for power consumption values of about 65W.
Best efficiency of any cpu (partially thanks to the 5nm design), but still not beating gpus with hundreds or thousands of shaders.
Using the CPU cores though right? Unless someone is compiling a GPU core for the M1... which sounded like would never happen.

Re: Initial impressions for fast ARM hardware

Posted: Thu Dec 23, 2021 1:02 pm
by eljonco
Running M1 Max maxed out (10 core), I get some 23800 ppd with Rosetta (I presume).
Cores are set to 10 when Full, utilising the2 efficiency cores fully.
The remaining cores are used to 80% (4 of them) and some 5,10,15 and 20% (the last 4).
No significant heating of the laptop. No fans audible.

The CPUs are, obviously, not used at all.


Power draw (laptop) is 35W.
FAH Control is launching but Client:local Connecting" is as far as it gets.
No way to set the controls, only using control via web browser is possible.
Of course, in there only the start/stop, light/medium/full and when(not)idle are present.

I'd be more than happy to do some guided experiments to increase the processing speed.

Re: Initial impressions for fast ARM hardware

Posted: Thu Dec 23, 2021 6:44 pm
by Joe_H
eljonco wrote:FAH Control is launching but Client:local Connecting" is as far as it gets.
See this topic about FAHControl on macOS - viewtopic.php?f=108&t=37513&p=353629#p353625. A solution was posted by the developer who assists in the port to OS X, the connection problem is related to a change in the OS since the installer and app were last updated.

Re: Initial impressions for fast ARM hardware

Posted: Thu Dec 23, 2021 10:26 pm
by calxalot
Your best performance on M1 max will probably be with 8 threads, which you can set via FAHControl.

You also need a passkey to get bonus points.

Re: Initial impressions for fast ARM hardware

Posted: Fri Dec 24, 2021 7:50 pm
by eljonco
13 threads on FahCore_a8. Currently 6 cores maxed out, 4 performance cores slightly over 60%.
Currently 35000ppd/35W 1000 ppd/W.