Jetson Nano GPU based computing

SubBass100 · Post by **SubBass100** » Tue Nov 03, 2020 12:29 am

Hello All!

I have some NVIDIA Jetson Nano's sitting around and would love to offer them as workers for the project. I've read some previous posts that mock the speed of the jetson's resources and say that it's a waste, however, I also see that there is support for Raspberry PI, so, I'm not sure why it's such a big deal? Every little bit helps right? I was able to successfully install the client arm64 software on the nano and get it up and running for "CPU" work. However I cannot seem to get the GPU portion working as it just says "Folding Slot Disabled". The nano supports cuda so I'm not sure what to try? Perhaps there is an attribute or flag needed? I'm not looking for a "NO! CANNOT BE DONE"

answer however if there is an actual limitation of the device which disallows it from working I will be understanding. I've invested some time into this already and if there is a chance of it working without much more effort, this could be a nice little unit to support F@H. Thank you very much for your help and I look forward helping more! Team: 237206 LAN Jam Industries

JimboPalmer · Post by **JimboPalmer** » Tue Nov 03, 2020 12:59 am

GPU: NVIDIA Maxwell™ architecture with 128 CUDA cores.
You are going to find folks with 384 CUDA cores who can't complete a WU on time,

My understanding is that the company NeoCortix has produced a CPU client for ARM, they would be the obvious source for a client that could also use GPU cores.

I used to fold on a client from Sony on Andriod on ARM. They did not count in Points as they made so few. (As I understand it Points are integers and Sony's client was very fractional)

SubBass100 · Post by **SubBass100** » Tue Nov 03, 2020 1:09 am

Thank you for that explanation. It makes perfect sense now. I'll find other ways to contribute!

JimboPalmer · Post by **JimboPalmer** » Tue Nov 03, 2020 1:43 am

As ARM improves in computing power, it will be a factor in F@H. meanwhile we find uses for ARM in display and control.

Post by **PantherX** » Tue Nov 03, 2020 7:38 am

Welcome to the F@H Forum SubBass100,

For F@H to support GPU, it needs to have:
OpenCL 1.2 support
Double Precision support

After that, next steps can be taken

JimboPalmer · Post by **JimboPalmer** » Tue Nov 03, 2020 10:03 am

No version of OpenCL is supported. (or hinted at)
https://forums.developer.nvidia.com/t/o ... port/74071

It does feature 64 Bit floating point math. (Double Precision)
https://www.techpowerup.com/gpu-specs/j ... -gpu.c3643

It looks like OpenCl is supported in hardware, if Nvidia ever released a driver that supported it. (again, they are clear above that that will not happen)
As you look at the comparable GPUs in speed, notice that none of them are supported for F@H either. it would be about 4.5 times slower than a GT 1030.

Post by **bruce** » Tue Nov 03, 2020 7:27 pm

SubBass100 wrote:I have some NVIDIA Jetson Nano's sitting around and would love to offer them as workers for the project. I've read some previous posts that mock the speed of the jetson's resources and say that it's a waste, however, I also see that there is support for Raspberry PI, so, I'm not sure why it's such a big deal? Every little bit helps right? I was able to successfully install the client arm64 software on the nano and get it up and running for "CPU" work. However I cannot seem to get the GPU portion working as it just says "Folding Slot Disabled". The nano supports cuda so I'm not sure what to try? Perhaps there is an attribute or flag needed? I'm not looking for a "NO! CANNOT BE DONE" answer however if there is an actual limitation of the device which disallows it from working I will be understanding. I've invested some time into this already and if there is a chance of it working without much more effort, this could be a nice little unit to support F@H. Thank you very much for your help and I look forward helping more!

We appreciate your intentions. As has already been said, it seems highly unlikely. Let me fill in a few other facts.

1) FAH is currently developing CPU support for ARM devices. Formal support has NOT been announced and we're all waiting for a formal announcement. That announcement SHOULD include information about the extent of that support.

2) This site is supported by volunteers. While we may express personal opinions, formal announcements are generally found on the main website foldingathome.org and perhaps referenced here.

3) RPi probably will not be supported. More powerful ARM devices (generally supported by Linux) probably will be supported for CPU computations. For GPUs, OpenCL support is expected. Special cases might be considered someday for CUDA-only devices but it would make it difficult to assign credits properly (and it has never been done.)

4) Your statement "Every little bit helps right?" sounds logical but it isn't necessarily accurate. Scientific research is time-critical. If a researcher plans to publish results in M months, (s)he has to allocate a portion of that time to performing the actual calculations. That, together with some estimates of the length of atomic paths required leads back to a deadline for every assigned Work Unit. We try to avoid wasting your time assigning WUs that cannot be completed by the assigned Deadline or assigning work that will be duplicated unnecessarily.

5) I have not researched your suggested hardware but ASSUMING that it is several orders of magnitude slower than other (supported) hardware, assigning a WU to your hardware might, in fact, delay the actual scientific progress of that project since nobody else should be processing the same assignment while the servers wait for you to complete it. Points a heavily weighted based on the speed it is completed and expired work is discarded.

6) I'd be happy to be wrong about that assumption, but it does offer a potential reason to disqualify your quote in (4) above.

MeeLee · Post by **MeeLee** » Wed Nov 04, 2020 6:55 pm

Current support for arm is on cortex A70 cores. The jetson nano uses A50 series cores. They're more than 5x slower than A70 cores. The higher end A70 cores and even Neonverse should have similar power to an intel Atom / Celeron processor.

Second question is,
We know fah on arm might start working in the near future.
And nvidia gpus on x86/x64 cpus work as well.
If Nvidia gpus on arm processors would work, that'll be another question.

That being said, there aren't a lot of devices supporting computations on A70 series cores.
And if they did, they would still be very slow. Amazon graviton, Ampere, AMD Epyc, and a few other chips using multi core arm cpus might be a lot faster, though they'd still be competing with x86/64 cpus. Not GPUs.

If anything could come from this folding on arm, it's perhaps compatibility with the mali (or later) IGPs, and further down arm cpus feeding Nvidia gpus (now that Nvidia has purchased arm, and we're kind of expecting more gpu compatibility with Nvidia arm devices in the near future.

Post by **bruce** » Sun Nov 08, 2020 9:15 pm

I'm not familiar with the details of ARM hardware but I do understand there's a lot of difference between high-end and low-end ARM devices. I do read about upcoming ARM software support, though, and it sounds like the low-end devices will be unsupported (or at least unable to keep up with FAH's deadlines.

NVidia's potential support of ARM GPUs will be an interesting development. Hopefully it will include both OpenCL and CUDA support.

MeeLee · Post by **MeeLee** » Sun Nov 08, 2020 11:52 pm

The problem with arm is that only A70-series cores are fast enough for folding on CPU.
But currently all Nvidia development boards are powered by A50 cores.
Also, most cellphones that have a Big-Little core configuration, use A70 series cores for the big cores, and A50 or A30 series cores for the little cores.
That means at best they'll be able to use 2x A70 series cores, without overheating.

For Nvidia GPUs on ARM, there also aren't any drivers for larger (RTX) GPUs for ARM (other than their Xavier/Jetson series, which use A50-series cores).
Those CPU cores are sufficient for probably a GT710 to a GT1030 (being optimistic), if Nvidia can't find a way to use more than 1 CPU core per GPU.
They will most certainly not be powerful enough to feed any GTX-series GPU.
Third issue is that most ARM hardware that we have, is limited to a single PCIE slot (PCIE 3.0 x1 at best), which isn't going to cut it for much like any GPU out there but those that are by today's terms ancient.

But even if they were able to make a PCIE 3.0 x4 slot work on the upper range of the A70 cores (running at 2 to 3Ghz), won't be able to feed any modern RTX GPU anytime soon. Unless more than 1 core will be used.
Maybe Neonverse will have sufficient processing power to feed an RTX 2000 GPU.

I think currently ARM makes sense for the larger servers, and cloud instances (like Graviton, Epyc, and Ampere PCs).
Even then it's questionable if these CPUs (while consuming much less power than x86 of equal cores), if they will be as efficient as x86 (in terms of performance per watt)?

Perhaps the reason we don't see many high power, ARM devices in the industry, is because ARM CPUs are mainly focused on getting simple jobs done at a low power consumption.
And their A70 series CPUs aren't very power efficient compared to x86? (some A70 CPU cores can only be turned on or off, and I know on Ampere some are always 'ON', meaning they use like 80W at idle, and 100-120W at full load, unlike X86 CPUs that have very good sleep modes built in).

There is future in ARM though, but only in the higher end CPUs. Not in the power efficient ones.
A single quad core A50 series CPU running at 1,5Ghz could be several times slower than a single Atom CPU at 1,66Ghz for some workloads.

SilvioMartin · Post by **SilvioMartin** » Fri Nov 13, 2020 12:05 pm

I read about FAH running on Raspberry Pi on heise.de, downloaded the client etc here https://foldingathome.org/alternative-downloads/ and got it running on my RPi B4. So far, so good. But then I wanted to ask something and checked in this forum, whether the question came up before. Now I find this

bruce wrote: Formal support has NOT been announced and we're all waiting for a formal announcement.
RPi probably will not be supported.

and especially this: viewtopic.php?f=24&t=35998

On the download page the ARM / Raspberry Pi client is presented the same way as the others, which makes it look like a release. Is it still beta? Should I better finish the current WU and then set my project FAH@RPi aside for now?

foldy · Post by **foldy** » Fri Nov 13, 2020 1:32 pm

If it is working without issues then keep it running I would say

Is it fast enough to complete the work units in time?

Benchmarks say Pi 4 can do 14 Gflops linpack while an Intel 6700k can do 250 Gflops. So I guess it needs very small work units to hold the deadline.

You could overclock the Pi 4 to 2 GHz which is 30% boost compared to default clocks. But then it needs better cooling and it must be stable or else FAH work units will fail.

SilvioMartin · Post by **SilvioMartin** » Fri Nov 13, 2020 1:58 pm

foldy wrote:Is it fast enough to complete the work units in time?

Yes. It is even fast enough to get some bonus points. The current WP (project 16929) has a base credit of 487 points and an estimated credit of 1080 points. The last one (project 16932) had the same base credit (I think) and earned 1086 points.

Though compared to my iMac it is really slow. My iMac (3.4 GHz quad core i5) runs FAH on 3 CPU cores. It always predicts roughly around 40000 points per day. So it gains 4000 points / (day * GHz * CPU core). The Raspberry Pi runs on all 4 CPU cores with 1.5 GHz (it is not slowed down, temperature always around 65 °C, nothing else running, "top" reports between 350 % and > 390 % CPU usage of FAH core). It predicts around 1,700 points per day. So it gains just around 300 points / (day * GHz * CPU core).

This leads to the question, which I wanted to ask. In my first try I have seen a PPD of 11000, but then I noted that the CPU would get too hot soon. So I shut it down and connected the fan to the 5V pin instead of 3,3V. But after starting again, I only saw PPD around 1700. Was the 11000 just a temporary phantom, or is there some issue with my current setup? Which range for PPD would one expect?

Neil-B · Post by **Neil-B** » Fri Nov 13, 2020 9:30 pm

At the start of any WU ppd estimates can be wildly wrong ... normally after 5% they have stabilised

MeeLee · Post by **MeeLee** » Sat Nov 14, 2020 5:24 am

The Pi 4 has a Cortex A72 CPU.
They currently have Cortex A73, 75, 76, 77; all of which are more powerful than the former; yet all *should in theory* have backwards compatibility with older A-series (like the A72 from the Pi) CPUs.
After that is Neonverse.
I don't know how Neonverse compares to the M1 CPU, Apple is developing.
But apparently the M1 is also a performance CPU (4 'neonverse like' cores, and 4 low performance cores), without the high idle consumption drawback that's associated with most ARM performance processors.

The fact that the M1 CPU is built on 5nm, means that it will automatically be a lot more efficient than any older CPU; and some reports have said it topped modern laptop CPUs by a wide margin (Usually 10th or 11th gen Core i5 and i7 CPUs, with 4cores and 8 threads).
What doesn't make any sense though, is that most of those x86 CPUs run at near to 5Ghz boost, vs the 3.1Ghz on the Apple M1.
So how ARM is getting more performance out of a 3Ghz, than a 5Ghz x86 beats me.
Unless they're measuring the rated clock speed of the Intel x86 CPUs (~2.10Ghz), probably in very thermal limiting cases; which would have made more sense...
Plus, half of the M1 CPU is low power oriented (probably A50-series cores), that are known to be much slower than X86 CPUs.

Anyway, that Apple is investing in the M1 CPU is good news. It means more high-powered Chinese knockoffs will be created soon (for TV boxes, and Developer/single board computers).

Folding Forum

Jetson Nano GPU based computing

Jetson Nano GPU based computing

Re: Jetson Nano GPU based computing

Re: Jetson Nano GPU based computing

Re: Jetson Nano GPU based computing

Re: Jetson Nano GPU based computing

Re: Jetson Nano GPU based computing

Re: Jetson Nano GPU based computing

Re: Jetson Nano GPU based computing

Re: Jetson Nano GPU based computing

Re: Jetson Nano GPU based computing

Re: Jetson Nano GPU based computing

Re: Jetson Nano GPU based computing

Re: Jetson Nano GPU based computing

Re: Jetson Nano GPU based computing

Re: Jetson Nano GPU based computing