Initial impressions for fast ARM hardware

Moderators: Site Moderators, FAHC Science Team

ky0ko
Posts: 3
Joined: Sat Apr 11, 2020 9:53 pm

Initial impressions for fast ARM hardware

Post by ky0ko »

As of recently, i am the proud owner of a Mac Mini based on the Apple M1 ARM chip, which is remarkably speedy.

I found out a few days ago that folding@home has support for ARM platforms now, and i have an Ubuntu aarch64 VM set up on that machine, so i went ahead and installed it and got folding. It seems to pretty steadily maintain an estimate of 40,000 ppd, plus or minus about 2000.

After a couple of days folding on that, I then ran the version for macOS under Rosetta2 emulation for comparison. Over 24 hours it has been steadily hovering at 55,000 ppd, plus or minus about 1000.

It seems to me that if folding under dynamic translation from x86_64 returns about 37% more ppd than the native ARM client under a hypervisor, then there must be quite a bit of room for improvement! Hopefully this margin decreases in the future as FahCore and its dependencies continue to receive work. I fully intend to continue testing both versions against each other on this machine with future updates.

Nonetheless, i am very happy that arm support is here, as i have quite a bit of arm hardware lying around that can dedicate free CPU time to the project.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Initial impressions for fast ARM hardware

Post by bruce »

A primary goal for FAH is to maximize the total science being produced. The existing ARM support will allow many new systems to come on-line without modification. If ARM support can be improved as you suggest, I'm sure there will be immediately be a project started to do that.
xnupanic
Posts: 3
Joined: Mon Jan 25, 2021 8:30 pm

Re: Initial impressions for fast ARM hardware

Post by xnupanic »

My first impressions on ARM hardware are pretty good. I am running on an Nvidia AGX Xavier, and on its first WU it estimates 34k PPD on its 8-core CPU. However, I would be REALLY interested to see what this can do if we can figure out how to get FAHClient to use its in-SoC 512-core Volta GPU. When FAHClient runs, it detects the CUDA device, but it can't use it, probably for a couple of reasons. First, the GPU does not have OpenCL support right now, and as I understand it, this is required even if you're going to just do CUDA jobs. Second, I am not sure how the GPU is actually interfaced with the system. "lshw" doesn't show it, and "lspci" doesn't show it, so there's no way its in gpus.txt. I do know its a GV11B GPU, I think derived from a GV110, so it should be technically capable of running WUs on it. When I am less tired I will start a new thread for supporting this board and its GPU and include logs and system info, and happily help however I can to get it running. But for now, its super cool to have the Xavier crunching ARM64 WUs.
//xnupanic
sptn.
Posts: 51
Joined: Wed Sep 09, 2020 10:05 am

Re: Initial impressions for fast ARM hardware

Post by sptn. »

ky0ko wrote:[...]
After a couple of days folding on that, I then ran the version for macOS under Rosetta2 emulation for comparison. Over 24 hours it has been steadily hovering at 55,000 ppd, plus or minus about 1000.[...]
Can you say something about the power draw? I guess it is way lower than any Windows machine running these scores.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Initial impressions for fast ARM hardware

Post by bruce »

xnupanic wrote:... However, I would be REALLY interested to see what this can do if we can figure out how to get FAHClient to use its in-SoC 512-core Volta GPU.
This would be a pretty big development project ... and there's quote a backlog of good ideas competing for a limited development budget.

Personally, I don''t have any functional ARM hardware so testing out new ideas is impossible for me ... and there's no assurance it's a workable idea. FAH Development works first on projects that increase scientific production so lets start with an estimate of how many new clients would be able to provide the resources of the 512 core Volta.
hs42
Posts: 2
Joined: Fri Feb 05, 2021 4:02 pm

Re: Initial impressions for fast ARM hardware

Post by hs42 »

My Mac Mini M1 256GB 8GB measures 24 watts with folding power set at 'FULL'. Estimated PPD is 66,775. Measured with Emporia smart outlet.
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Initial impressions for fast ARM hardware

Post by Joe_H »

hs42 wrote:My Mac Mini M1 256GB 8GB measures 24 watts with folding power set at 'FULL'. Estimated PPD is 66,775. Measured with Emporia smart outlet.
You may get a bit higher PPD if you either set the client to Light, or set the CPU thread count to 4. On Full the folding uses both the high power and lower power cores, and the low power cores can slow down the overall rate of calculation.

Folding on your Mac Mini may get more efficient sometime its the future. Right now the client nd folding core use Intel code that is run through Rosetta 2. Work is being done to provide a native core and client which should run a bit faster.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
belloq
Posts: 40
Joined: Thu Sep 24, 2020 12:58 pm

Re: Initial impressions for fast ARM hardware

Post by belloq »

I was checking in on thread about the M1 systems from Apple. Back in November, a user was unable to get FAH to run on their system as it seems the downloaded core was the wrong one. Did FAH make changes to the Mac client to allow it to run under Rosetta2?
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Initial impressions for fast ARM hardware

Post by Joe_H »

I don't know of any specific changes, but one of the developers did comment on that incorrect core download. Possibly that led to adjustments on the server side to get the correct core to download. A few have reported success folding since then.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: Initial impressions for fast ARM hardware

Post by MeeLee »

Will be interested in the M1 Pro and M1 max results.
If the theory is true of it being up to 3x faster than the m1, we might see PPD values equal to those of budget GPUs (think 200-500k ppd, for power consumption values of about 65W.
Best efficiency of any cpu (partially thanks to the 5nm design), but still not beating gpus with hundreds or thousands of shaders.
belloq
Posts: 40
Joined: Thu Sep 24, 2020 12:58 pm

Re: Initial impressions for fast ARM hardware

Post by belloq »

MeeLee wrote:Will be interested in the M1 Pro and M1 max results.
If the theory is true of it being up to 3x faster than the m1, we might see PPD values equal to those of budget GPUs (think 200-500k ppd, for power consumption values of about 65W.
Best efficiency of any cpu (partially thanks to the 5nm design), but still not beating gpus with hundreds or thousands of shaders.
Using the CPU cores though right? Unless someone is compiling a GPU core for the M1... which sounded like would never happen.
eljonco
Posts: 5
Joined: Thu Dec 23, 2021 12:48 pm

Re: Initial impressions for fast ARM hardware

Post by eljonco »

Running M1 Max maxed out (10 core), I get some 23800 ppd with Rosetta (I presume).
Cores are set to 10 when Full, utilising the2 efficiency cores fully.
The remaining cores are used to 80% (4 of them) and some 5,10,15 and 20% (the last 4).
No significant heating of the laptop. No fans audible.

The CPUs are, obviously, not used at all.


Power draw (laptop) is 35W.
FAH Control is launching but Client:local Connecting" is as far as it gets.
No way to set the controls, only using control via web browser is possible.
Of course, in there only the start/stop, light/medium/full and when(not)idle are present.

I'd be more than happy to do some guided experiments to increase the processing speed.
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Initial impressions for fast ARM hardware

Post by Joe_H »

eljonco wrote:FAH Control is launching but Client:local Connecting" is as far as it gets.
See this topic about FAHControl on macOS - viewtopic.php?f=108&t=37513&p=353629#p353625. A solution was posted by the developer who assists in the port to OS X, the connection problem is related to a change in the OS since the installer and app were last updated.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
calxalot
Site Moderator
Posts: 878
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Initial impressions for fast ARM hardware

Post by calxalot »

Your best performance on M1 max will probably be with 8 threads, which you can set via FAHControl.

You also need a passkey to get bonus points.
eljonco
Posts: 5
Joined: Thu Dec 23, 2021 12:48 pm

Re: Initial impressions for fast ARM hardware

Post by eljonco »

13 threads on FahCore_a8. Currently 6 cores maxed out, 4 performance cores slightly over 60%.
Currently 35000ppd/35W 1000 ppd/W.
Post Reply