ASUS R904 G34

Moderators: Site Moderators, FAHC Science Team

Post Reply
Markus$Cologne
Posts: 2
Joined: Thu Oct 14, 2021 4:40 am

ASUS R904 G34

Post by Markus$Cologne »

Hi,

I obtained the aforementioned server (4 Processors, 64 cores in total, 128 GB RAM) which was part of a former supercomputer that comprised of 535 of these machines. (287 TFLOPS peak)
Operating System is Ubuntu 20.04.3 LTS with all patches etc. applied.
f@h Software that was installed is "fahclient_7.6.21_amd64.deb"

I can't send images here, but I can tell you that all processors are well above 80% or 85% load - and this must be true based on the noise of the fans in the machine, as well as the dissipated heat.

What wonders me here is the point that with this machine it still takes 4 hours with all 64 cores reported "in use" by the related fahcontrol program to complete a work-unit...

Are there any hints to speed up things - other than getting a faster machine ? My idea was that with 64 cores it would take an hour or so to handle on work-unit. Or is this a wrong idea from my part ?
My 4-core Desk-PC processes a work-unit in about 5 hours.

Many thanks for any usefule hint or advice from Cologne / Germany
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: ASUS R904 G34

Post by JimboPalmer »

Welcome to Folding@Home!

F@H sizes the Work Unit based on the number of CPUs devoted to folding, so while both may be taking the same amount of time, more CPUs should be getting more Points Per Day as it is working on more challenging proteins.

The different ages of CPUs have different capabities,so older CPUs, may be slower per CPU.

If you used Windows, there would be tricks to use over 32, Linux should be fine.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
PaulTV
Posts: 179
Joined: Mon Jan 25, 2021 4:53 pm
Location: Netherlands

Re: ASUS R904 G34

Post by PaulTV »

If you happened to get one of the monster CPU jobs, 4 hours isn't bad at all. I've had jobs that took like 36 hours on 14 threads (AMD 5800X) - and the next one may be done within an hour. Job sizes for different projects will have different sizes, depending on number of atoms and number of steps.
Image

Ryzen 5800X / RTX 4090 / Windows 11
Ryzen 5600X / RTX 3070 Ti / Ubuntu 20.04
Ryzen 5600 / RTX 3060 Ti / Windows 11
jchang6
Posts: 52
Joined: Sat May 09, 2020 2:13 pm
Hardware configuration: Intel Xeon E3/E5, various generations from Westmere to Skylake. AMD Radeon RX5x00 and nVidia RTX 2080 Super.
Location: Boston
Contact:

Re: ASUS R904 G34

Post by jchang6 »

are the CPUs Intel or AMD? 4 sockets, 64 cores : is that 32 cores and 64 threads or 64 physical cores?
if AMD, it could be Interlagos (2011) or Abu Dhabi (2012).
if Intel, it would have to Haswell generation of Xeon E7 v3 (2015) or more recent.
The AMD cores of that era (prior to Zen) were weaker, the Intel Haswell should be half way decent. What was PPD? FaH does seem to assign big jobs to high core count systems
Image
Markus$Cologne
Posts: 2
Joined: Thu Oct 14, 2021 4:40 am

Re: ASUS R904 G34

Post by Markus$Cologne »

Hi, thanks for the quick reply and all the detailed information.
There are 4 AMD processors in the machine with 16 cores each. They are of the "Interlagos" type.

You mentioned that FaH seems to have problems with assigning jobs to high core count systems - Ubuntu Systems-Management reports 64 processors, all with loads 80% or higher - and this should be true, since the speed of the fans ramps up conderarbly as soon as FaH starts up automatically after system boot.

So despite the number of cores the performance per core seems to be the issue and my expectations were slightly wrong. I am testing some BIOS settings, eventually I can squeeze some performance out of the system. If anyone has a clue to speed up things - comments are very welcome !

Regards
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: ASUS R904 G34

Post by Neil-B »

There are some projects that will really use large thread counts well - others are far less scalable and you will not get optimal throughput/ppd from them ... you need to run a fair few projects to get a feel for what your highs/lows in throughput/ppd are.

Make sure you are monitoring temps/cpu boost speeds - it is perfectly possible to have a situation where you are running all threads/core at max but the thermals are reducing the clock rates by a significant amount - halving a core/thread count can cool off the system and increase clock speeds giving little is any drop in throughput/ppd.

Server grade kit can tend to be loud ... and you need to make sure it is configured properly or it can be more so ... with intel kit checking the fru/sdr is important as otherwise the server may not actually know what configuration it is and may not be managing itself properly (including clocks/thermals) - I guess that AMD kit has something similar that needs to be configured for the server to run optimally - not just a case of bios settings with some servers as they will have their own management suite as well.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
gunnarre
Posts: 567
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: ASUS R904 G34

Post by gunnarre »

I'm wondering if making one CPU slot for each of the processors might be a good idea (4x 16 threads), if inter-CPU communication has to be done through slow RAM, or if the hypervisor is moving threads between CPUs. One 64-thread slot should in theory be better, but with four 16-core CPUs instead of one 64-core Threadripper/Xeon (with shared fast cache), I'm not so sure that running one slot is the optimal configuration.

If indeed this is the problem, the high CPU load might be mainly comprised of actively waiting for RAM/bus to access data from a thread on a different CPU, rather than active processing.

Edit: Or perhaps some other kind of NUMA-related CPU affinity can be done on the OS level.
Image
Online: GTX 1660 Super, GTX 1080, GTX 1050 Ti 4G OC, RX580 + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 960, GTX 950
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: ASUS R904 G34

Post by MeeLee »

gunnarre wrote:I'm wondering if making one CPU slot for each of the processors might be a good idea (4x 16 threads), if inter-CPU communication has to be done through slow RAM, or if the hypervisor is moving threads between CPUs. One 64-thread slot should in theory be better, but with four 16-core CPUs instead of one 64-core Threadripper/Xeon (with shared fast cache), I'm not so sure that running one slot is the optimal configuration.

If indeed this is the problem, the high CPU load might be mainly comprised of actively waiting for RAM/bus to access data from a thread on a different CPU, rather than active processing.

Edit: Or perhaps some other kind of NUMA-related CPU affinity can be done on the OS level.
Exactly what I was going to suggest.
The main problem with assigning 1 WU to all cores, is inter-core activity. Certain data that's written to the L-cache in core 1, now has to travel to the significantly slower PCIE bus, to be read by a thread on another CPU core.
This is extremely inefficient.
Hence why allocating 4 CPUs in the program, each controlling their own CPU.
Also, leave about 1 thread of the CPU for background data processing, unless all it does is fold. Even then, 15 threads per CPU or WU are plenty and PPD will not be affected much over 16 threads.
WhitehawkEQ
Posts: 17
Joined: Sat May 14, 2011 11:50 pm

Re: ASUS R904 G34

Post by WhitehawkEQ »

I have 2 Opteron 6276 systems, I did run Ubuntu but I now run Win 10 Pro for workstations

I have these pics over on Overclockers.com, if you can post your pics to a forum, you can then link them here.

Image
Image
Image
Image
gunnarre
Posts: 567
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: ASUS R904 G34

Post by gunnarre »

Have you tried the suggestion to make more CPU slots?
Image
Online: GTX 1660 Super, GTX 1080, GTX 1050 Ti 4G OC, RX580 + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 960, GTX 950
b_comly
Posts: 4
Joined: Thu Mar 19, 2020 2:44 am

Re: ASUS R904 G34

Post by b_comly »

gunnarre wrote: Sun Oct 24, 2021 9:51 am Have you tried the suggestion to make more CPU slots?
I have actually tried this. It isn't sustainable currently in windows. Multiple CPU slots can find them both on the same NUMA node on a Threadripper. I've also tried manually setting affinity only to find it back on node0 on the next WU.

Currently this only really works when running in Linux when and with a cpu slot configured threads/4-2 and only using 2 slots at a time for the fastest result.
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: ASUS R904 G34

Post by JimboPalmer »

https://www.amd.com/en/product/1546
https://www.cpu-world.com/CPUs/Bulldoze ... TGGGU.html

This may be your CPU.

I am guessing AVX is the fastest floating point math it knows.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Post Reply