Azure - GPU Series - Tech Specs on diff series

homeshark · Post by **homeshark** » Mon Sep 21, 2020 12:54 pm

Hi Everyone,

Azure VM compute expert here. Quite a few of the guides I've read on this and other sites go into many manual steps (like installing the GPU drivers) and fail to take into account low to no cost options components while still being able to advantage of the a GPU series capable VM.

I plan on automating stateless VMs via templates & automating their deployment/uptime with a user defined cost ceiling and minimizing unneeded cost (like storage which is free with ephemeral disks)

My goal is for the User to click a Link / fill in their user/key/passkey on the Azure website and off they go on folding

I don't know the GPU differences with regards to which ones provide the most folding value when compared to the different GPUs offered on Azure. i.e. Gaming GPU? GPU Compute? Machine Learning GPU? AI GPU?, etc

QUESTIONS

What is the folding 'value' of each GPU offered for the different series?
VM Series / GPU / CPU
NC Series - Nvidia Tesla K80 / Intel E5-2690 v3 (Haswell)
NCv2 Series - Nvidia Tesla P100 / Intel Xeon E5-2690 v4 (Broadwell)
NCv3 Series - Nvidia Tesla V100 / Intel Xeon E5-2690 v4 (Broadwell)
NCasT4_v3 Series - Nvidia Tesla T4 / AMD EPYC 7V12 (Rome)
ND Series - Tesla P40 / Intel Xeon E5-2690 v4 (Broadwell)
NDv2 Series - Tesla V100 NVLINK
NV Series - Nvidia Tesla M60 / Intel Xeon E5-2690 v3 (Haswell)
NVv3 Series - Nvidia Tesla M60 / Intel E5-2690 v4 (Broadwell)
NVv4 Series - AMD Radeon Instinct MI25 / AMD EPYC 7V12 (Rome)

Knish · Post by **Knish** » Mon Sep 21, 2020 3:27 pm

Hi! I wrote the manual steps post on github if u found that one from this forum last month. After taking in cost, only 1 choice really looked available to me (though I didn't do any math like a cost per ppd; just this criteria: 1) gpu because points, 2)what is affordable for 30 days of folding to use the free credits ) I'm quite certain is the most cost effective method b/c of spot VMs.

I'm not quite sure what you mean by "folding values" --if ppd then I can't really answer most of them as I don't have that data but you can google the same hardware model for its respective ppd

for ppd, GPUs provide WAY more than CPUs, and a P100 (NC6sv_2) is the same cost roughly as a 32 core compute VM (f32s_v2)
however if we disregard points and view a cpu WU as equally beneficial as a GPU WU for science, then the 32 core goes thru WUs in just over an hour each, while the P100 takes anywhere from 2-6 hours for non Moonshot WUs. (Moonshot only takes ~30 minutes)

I don't see the merrit of GPU folding with anything less than an NCv2 b/c while K80 is cheaper than P100, it's obviously slower.
The NDv2 / NCv3 V100s are really, really good, but their costs are exponentially higher than the reduced folding time/points they bring in. I have only used the V100 for a few days to "soak up" the extra cash that was going to be left over as my expiration date for the Azure trial was approaching, but mainly now just spin up a 2nd P100 instead for a few days more.

Don't try to use MI25. I believe those are their custom GPUs which use of their custom drivers is mandatory, so fah is not going to work on them.

I also haven't had any luck trying a multiGPU vm such as the NC12s_v2 as I can only get 1 gpu to fold for some reason.

if I remember right, a single P100 brought in about 1.6 mil ppd (non moonshot WUs) and the V100 over 2 mil ppd

Post by **Beberg** » Mon Sep 21, 2020 7:56 pm

Folding@home is NOT stateless! You need to have persistent storage, or you will be putting more load on the servers, and cause some other problems (that I will not detail here) as well.

There is more about persistent storage written in the container README - https://github.com/foldingathome/containers/

homeshark · Post by **homeshark** » Mon Sep 21, 2020 9:39 pm

Beberg wrote:Folding@home is NOT stateless! You need to have persistent storage, or you will be putting more load on the servers, and cause some other problems (that I will not detail here) as well.

There is more about persistent storage written in the container README - https://github.com/foldingathome/containers/

By stateless I didn't mean Docker containers just full VMs at the moment.

I'll take what you mention into consideration on the design. Is Folding@Home disk intense (as in IOPS / Disk bandwidth)? I don't believe so...

With regards to config, I see a 'Config Rotation/Backup' (is this the config.xml?) and a 'Logging' area but didn't see anything with regards to setting the WU folder location

Knish wrote:Hi! I wrote the manual steps post on github if u found that one from this forum last month. After taking in cost, only 1 choice really looked available to me (though I didn't do any math like a cost per ppd; just this criteria: 1) gpu because points, 2)what is affordable for 30 days of folding to use the free credits ) I'm quite certain is the most cost effective method b/c of spot VMs.

I'm not quite sure what you mean by "folding values" --if ppd then I can't really answer most of them as I don't have that data but you can google the same hardware model for its respective ppd

for ppd, GPUs provide WAY more than CPUs, and a P100 (NC6sv_2) is the same cost roughly as a 32 core compute VM (f32s_v2)
however if we disregard points and view a cpu WU as equally beneficial as a GPU WU for science, then the 32 core goes thru WUs in just over an hour each, while the P100 takes anywhere from 2-6 hours for non Moonshot WUs. (Moonshot only takes ~30 minutes)

I don't see the merrit of GPU folding with anything less than an NCv2 b/c while K80 is cheaper than P100, it's obviously slower.
The NDv2 / NCv3 V100s are really, really good, but their costs are exponentially higher than the reduced folding time/points they bring in. I have only used the V100 for a few days to "soak up" the extra cash that was going to be left over as my expiration date for the Azure trial was approaching, but mainly now just spin up a 2nd P100 instead for a few days more.

Don't try to use MI25. I believe those are their custom GPUs which use of their custom drivers is mandatory, so fah is not going to work on them.

I also haven't had any luck trying a multiGPU vm such as the NC12s_v2 as I can only get 1 gpu to fold for some reason.

if I remember right, a single P100 brought in about 1.6 mil ppd (non moonshot WUs) and the V100 over 2 mil ppd

Gotcha. Looking further into this... I'm now no longer referring to Points or WUs. I'd imagine there is a GPU architecture that can 'fold' more efficiently compared to other GPUs. That keyword of efficiently is meant to move away from the context of simply a more powerful GPU. I hope that answers your question @Knish and yes I originally did look at your guide as well to help inspire me with ideas.

Post by **PantherX** » Tue Sep 22, 2020 1:29 am

Welcome to the F@H Forum homeshark,

homeshark wrote:...Is Folding@Home disk intense (as in IOPS / Disk bandwidth)? I don't believe so...

It isn't an intensive application. It does need to write checkpoints and compress/decompress WUs but is fairly lightweight.

homeshark wrote:...With regards to config, I see a 'Config Rotation/Backup' (is this the config.xml?) and a 'Logging' area but didn't see anything with regards to setting the WU folder location...

Don't have Docker experience but it would make sense given that config.xml is updated as and when needed. On Windows, this is the default location for the WU processing, FahCore download, configuration store, etc.
%AppData%\Roaming\FAHClient

Efficiency of the GPU is a tricky question since different WUs behave differently and the only metric we have is CUDA cores. The more they are, the better they perform on the WU until the WU doesn't fill up all the CUDA cores. While they are plans to resolve that, there's no ETA. Have a look here to see if you can find any useful comparison since you have professional GPUs while I am familiar with consumer GPUs: https://folding.lar.systems/

homeshark · Post by **homeshark** » Tue Sep 22, 2020 2:03 am

@PantherX thank you. I think I have a good starting point, especially with the reference to CUDA cores. I also came across an efficiency breakdown on popular GPUs - https://docs.google.com/spreadsheets/d/ ... edit#gid=0

Post by **PantherX** » Tue Sep 22, 2020 3:56 am

Please note that currently, we have FahCore_22 version 0.0.13 in Beta Testing which uses CUDA and can significantly alter the PPD and time taken for the WUs to fold: viewtopic.php?f=66&t=36129 Do note that it will be released when it is ready and we don't have an ETA on that since it is dependent on few different factors.

Post by **bruce** » Tue Sep 22, 2020 3:01 pm

FAH also needs read/write storage where it can download semi-permanent code. Though the FAHCores change rarely, a container should not need to download it every time it needs to use it. Where do you plan to store the "cores" directories?

Inasmuch as FAH is now providing containerized code, what added value are you providing?

gunnarre · Post by **gunnarre** » Tue Sep 22, 2020 4:11 pm

vast.ai lists 2080 Ti cards for 6 USD/day and 1080 Ti cards at 3.6USD/day right now. Will you be able to come near to that performance/price level with Azure?

MeeLee · Post by **MeeLee** » Tue Sep 22, 2020 4:41 pm

With just having done Azure, Google Cloud, and Amazon AWS compute loads, I would say the best GPU for the job is a GPU compute.
It's different to AI (AI usually is 8 bit), gaming (gaming is burst, but not continuous loads, and don't include OpenCL usually), Machine learning, usually uses 16 bit shaders.
Compute is what you're doing when you're folding (on either CPU and GPU).
It usually costs a lot more, but the CPU/GPU combination works better, and (usually) doesn't get throttled.
With gaming, you get throttled if you use too much data. There's something in the small letters stating there's a certain amount of computations you are allowed to do per month, and once you surpass it, your processing power will be lowered, or cut completely.

Depending on how long you're willing to fold on Azure, buying a cheap PC (Intel Core i3, 120GB SSD, cheap motherboard, and an RTX 3080), might cost a bit more initially (~$750), but will get you more than your PPD/$$ back in one year!
If you just want to use up your free credit (a week or month or whatever), it's only going to make little PPD in that time.
Usually the free tiers are also compute limited.

homeshark · Post by **homeshark** » Tue Sep 22, 2020 5:16 pm

bruce wrote:FAH also needs read/write storage where it can download semi-permanent code. Though the FAHCores change rarely, a container should not need to download it every time it needs to use it. Where do you plan to store the "cores" directories?

Inasmuch as FAH is now providing containerized code, what added value are you providing?

I never mentioned I was doing containers such as Docker. Keep in mind this is also a learning/research/exploration project for me which is why I reached out to the forum. There are already existing guides utilizing some of the underlying ideas I planned to incorporate but they weren't as inclusive depending on someone's IT skillset & why not shave off manual effort / time commitment

gunnarre wrote:vast.ai lists 2080 Ti cards for 6 USD/day and 1080 Ti cards at 3.6USD/day right now. Will you be able to come near to that performance/price level with Azure?

Azure sells unused Compute capacity as 'Spot' VMs which are substantially discounted off the retail rate. Ex: NC6s_v3 VM with 1x Tesla v100 GPU & 6 core Xeon E5-2690 v4 (Broadwell) - Retail: 3.06/hr Spot: 0.4817/hr (84% savings).

The downside with Spot VMs is that you're only given 30 seconds notice before your VM is turned off when a higher paying customer comes along. When that happen you may want to deploy to a different Azure region in the world that can provide similar low pricing, hence the 'stateless' part of spinning up and spinning down.

To help with WU expiration, I'll try to incorporate a 'dump' when a VM eviction occurs.

Persistent storage could be incorporated but it may not be as cost effective especially if that storage needs to transit to a different global region to whereever the new Compute VM is stood up, hence the idea of dumping the WU when evicted.

So far I've been up for days without being kicked out.

@Gunnarre I'm not trying to compete necessarily on price, but just offer a higher level of convenience to automate rolling out Compute VMs. Azure gives out $100's in free credit monthly subscriptions, so why not make use of it in the most effective way possible. With Azure you also need to take into account storage costs, bandwidth usage, etc. Companies may also have a volunteer or 'community service' initiatives with a set budget, so again try to make that as effective as possible for those with the means to do so.

I hope this clears things up with everyone. As much as I would love the idea of my guide going 'viral', I wouldn't count on anything I make to have a substantial burden on F@H

MeeLee wrote:With just having done Azure, Google Cloud, and Amazon AWS compute loads, I would say the best GPU for the job is a GPU compute.
It's different to AI (AI usually is 8 bit), gaming (gaming is burst, but not continuous loads, and don't include OpenCL usually), Machine learning, usually uses 16 bit shaders.
Compute is what you're doing when you're folding (on either CPU and GPU).

Awesome! That's what I needed to know.

Post by **bruce** » Thu Oct 08, 2020 11:03 pm

MeeLee wrote:With just having done Azure, Google Cloud, and Amazon AWS compute loads, I would say the best GPU for the job is a GPU compute.
It's different to AI (AI usually is 8 bit), gaming (gaming is burst, but not continuous loads, and don't include OpenCL usually), Machine learning, usually uses 16 bit shaders.

This is almost correct. The gaming code does a lot of 8 and 16 bit ops becuse it can run faster on the narrower FloatingPoint instructions where the images generated need speed but don't need high precision. The tensor cores used by AI and DLSS use FP8 and FP16 depending on the generation of the hardware and the needs of the software.

The FAHCores use Mixed Precision which is mostly FP32 but also needs a limited amount of FP64 code.

Knish · Post by **Knish** » Tue Oct 20, 2020 9:22 pm

homeshark, I hope u see this but I have an update/correction to make. Recent price/hr for P100 (NC6s_v2) has risen above 29cents/hr yet V100 (NC6s_v3) has fallen to 30 cents/hr. I am seeing 4+ and even 5+ million PPD for V100 which is 3x the ppd of the P100. At least for the time being, $0.30/hr will last just under 27 days' worth of free credits which is still perfect for the 30 days trial limit that Azure offers

Folding Forum

Azure - GPU Series - Tech Specs on diff series

Azure - GPU Series - Tech Specs on diff series

Re: Azure - GPU Series - Tech Specs on diff series

Re: Azure - GPU Series - Tech Specs on diff series

Re: Azure - GPU Series - Tech Specs on diff series

Re: Azure - GPU Series - Tech Specs on diff series

Re: Azure - GPU Series - Tech Specs on diff series

Re: Azure - GPU Series - Tech Specs on diff series

Re: Azure - GPU Series - Tech Specs on diff series

Re: Azure - GPU Series - Tech Specs on diff series

Re: Azure - GPU Series - Tech Specs on diff series

Re: Azure - GPU Series - Tech Specs on diff series

Re: Azure - GPU Series - Tech Specs on diff series

Re: Azure - GPU Series - Tech Specs on diff series