A few clarifications on the A7 GROMACS core please

Moderators: Site Moderators, FAHC Science Team

Post Reply
Breach
Posts: 205
Joined: Sat Mar 09, 2013 8:07 pm
Location: Brussels, Belgium

A few clarifications on the A7 GROMACS core please

Post by Breach »

I incidentally checked the science.log today, and saw the following:

Code: Select all

Compiled SIMD instructions: AVX_256 (Gromacs could use AVX2_256 on this machine, which is better)

The current CPU can measure timings more accurately than the code in
GROMACS was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding GROMACS with the GMX_USE_RDTSCP=OFF CMake option.


NOTE: The number of threads is not equal to the number of (logical) cores
      and the -pin option is set to auto: will not pin thread to cores.
      This can lead to significant performance degradation.
      Consider using -pin on (and -pinoffset in case you run multiple jobs).
1. On not using AVX2 instructions, I see this has been discussed here already viewtopic.php?f=105&t=31382
2. On the timing measurements, I guess it's compiled with GMX_USE_RDTSCP=OFF to cater for CPUs which don't support the RDTSCP instruction, correct? Pity that it doesn't support runtime auto-detection.

But at least according according to this: https://github.com/gromacs/gromacs/blob ... eLists.txt there is a correlation between AVX and RDTSCP support, so for AVX compiles, RDTSCP support could probably be on...?

3. For the third one, I'm running the core on 6 out of 8 threads (as the other two are running my GPU slot). That's already passed to mdrun as -nt 6. But GROMACS complains that "This can lead to significant performance degradation.", apparently due to "OS switching threads across physical cores, which may result in performance loss.". So, I guess my question is why the client is not using -pin on and -pinoffset?

I know almost nothing about GROMACS, so please excuse my ignorance in advance ;-) Just curious. Thanks.
Last edited by Breach on Sun Apr 19, 2020 6:37 pm, edited 1 time in total.
Windows 11 x64 / 5800X@5Ghz / 32GB DDR4 3800 CL14 / 4090 FE / Creative Titanium HD / Sennheiser 650 / PSU Corsair AX1200i
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: A few clarifications on the A7 GROMACS core please

Post by JimboPalmer »

Breach wrote:1. On not using AVX2 instructions, I see this has been discussed here already viewtopic.php?f=105&t=31382
2. On the timing measurements, I guess it's compiled with GMX_USE_RDTSCP=OFF to cater for CPUs which don't support the RDTSCP instruction, correct? Pity that it doesn't support runtime auto-detection.

But at least according according to this: https://github.com/gromacs/gromacs/blob ... eLists.txt there is a correlation between AVX and RDTSCP support, so for AVX compiles, RDTSCP support could probably be on...?
They initially used RDTSCP on a7, and discovered that several folders running Core_a7 in a Virtual Machine failed. While the underlying hardware can support RDTSCP, not all Virtual Machines can.

https://en.wikipedia.org/wiki/Time_Stamp_Counter
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: A few clarifications on the A7 GROMACS core please

Post by Joe_H »

1. Yes, at the time of development AVX2 would have excluded most processors currently available, or required the client testing the hardware to request one of 3, instead of 2, A7 cores The Gromacs version available and used did not support creating an executable that would select SSE2 or AVX code at runtime.

As I recall, an AVX2 version was tested, but did not give a significant boost in processing as compared to the AVX version running on the same hardware. Newer processors might have more significant speedups for AVX2 than those tested at the time.

Implementing this would take opening new development and validation of a new folding core. Currently the A7 core meets the scientific needs of the researchers, if they found a need for new features supported in a later Gromacs version, then that might happen.

2. No idea if the "Auto" option was available when the A7 core was in development. I do recall some issues with early releases of the core when people wanted to run folding within a VM, the choice made for this setting may have been to not create problems in that and other setups.

3. Can depend on your processor, OS and other factors. For instance, on my MacBook Pro using the extra HT threads gives little extra throughput as that probably pushes the CPU into some thermal throttling. On a desktop I do see a bit more processing speed using the HT, but that is a different hardware setup.

The "switching threads" bit may be more of an issue in multi-CPU servers where NUMA considerations come into play. There may be reasons the developer did not use those switches, that is beyond my knowledge.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Breach
Posts: 205
Joined: Sat Mar 09, 2013 8:07 pm
Location: Brussels, Belgium

Re: A few clarifications on the A7 GROMACS core please

Post by Breach »

Okay, many thanks guys! (For the 'Auto' RDTSCP support, I've edited my post - apparently it's build-time detection only).
Windows 11 x64 / 5800X@5Ghz / 32GB DDR4 3800 CL14 / 4090 FE / Creative Titanium HD / Sennheiser 650 / PSU Corsair AX1200i
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: A few clarifications on the A7 GROMACS core please

Post by bruce »

the information in the "science" log is generally directed at those who create projects while the information in FAH's public log is generally aimed at you, the Donor. While this isn't a 100% accurate division, we pretty much assume that messages like this can be adjusted by the project owner. Nevertheless there's nothing wrong with you asking questions like this.

Joe_h has given excellent answers. The only thing I would add it that we are running an "old" version of GROMACS in this FAHCore and they've made a number of improvements. As Joe_H has suggested, opening new development and validation of a new folding core is an expensive proposition, especially during the COVAID crisis. The value of new features supported in a later Gromacs version must be traded off against the value of spending limited resources on other (higher?) priority needs. Personally, I'd like several of the new features but I also know of a lot of other development efforts that would be valuable, too.

As far as the settings go (like setting thread affinities) an individual scientist might be able to make those settings, but then they'd have to look for benefits and drawbacks across a wide variety of systems that the project might be assigned to. Such a study makes sense if you're a scientist running on a specific cluster where NUMA is likely significant but it's a lot bigger study across all donor owned hardware. Personally, I'd rather have them spend their time devising new biochemical studies/projects that might lead to a cure.

And with respect to RDTSCP support is concerned, nobody is concerned whether the time spent on an individual Donor's machine is accurately reported. FAH Scientists are more interested in accurately evaluating the mean time to complete each WU in their project. That's why the QuickTimeBonus was developed.

Bottom line: GROMACS development is aimed at the individual scientist running on a speific computer and FAH adapted it to run on a crowd sourced set of home computers. The environments are very different.
Post Reply