GCC compiler flags and version

Moderators: Site Moderators, FAHC Science Team

Post Reply
Dr. Merkwürdigliebe
Posts: 32
Joined: Tue Nov 08, 2016 7:52 pm
Hardware configuration: Xeon 1230v3 + Geforce RTX 2080
Location: Germany

GCC compiler flags and version

Post by Dr. Merkwürdigliebe »

Hi,

I have a question concerning the used compiler flags and version.
  • Are there any plans to upgrade the compiler version? The version used is 4.8.5 released in 2015. I figure that in newer versions there are improvements especially related to the support of AVX/AVX2, i.d. for example auto-vectorization.
  • I'm surprised that the logs says: "avx_256" but there is only the "-msse2" compiler flag present. Why not -mavx?
08:10:57:WU00:FS00:0xa7:************************************ Build *************************************
08:10:57:WU00:FS00:0xa7: Version: 0.0.11
08:10:57:WU00:FS00:0xa7: Date: Sep 20 2016
08:10:57:WU00:FS00:0xa7: Time: 06:40:11
08:10:57:WU00:FS00:0xa7: Repository: Git
08:10:57:WU00:FS00:0xa7: Revision: 957bd90e68d95ddcf1594dc15ff6c64cc4555146
08:10:57:WU00:FS00:0xa7: Branch: master
08:10:57:WU00:FS00:0xa7: Compiler: GNU 4.8.5
08:10:57:WU00:FS00:0xa7: Options: -std=gnu++98 -O3 -funroll-loops -ffast-math -mfpmath=sse
08:10:57:WU00:FS00:0xa7: -fno-unsafe-math-optimizations -msse2

08:10:57:WU00:FS00:0xa7: Platform: linux2 4.6.0-1-amd64
08:10:57:WU00:FS00:0xa7: Bits: 64
08:10:57:WU00:FS00:0xa7: Mode: Release
08:10:57:WU00:FS00:0xa7: SIMD: avx_256
ChristianVirtual
Posts: 1596
Joined: Tue May 28, 2013 12:14 pm
Location: Tokyo

Re: GCC compiler flags and version

Post by ChristianVirtual »

Going a version higher breaks CentOS (not sure about RH and Fedora). Not sure how easy and update for those OS will be. I tried newer version of GROMAS for something else and run in GLICXX errors where I have not yet seen a fix for. (not even CentOS 7.4). Would scare me if a7 can’t run on stock Linux anymore . Maybe static linking would help (if license allows)
ImageImage
Please contribute your logs to http://ppd.fahmm.net
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GCC compiler flags and version

Post by bruce »

Autovecorization doesn't benefit FAH. The GROMACS code used for AVX is extremely efficient at vectorization. Even the old versions of GROMACS that were oritinally written for SSE were able to push enough vectors throug the SSE registers to exceed what Pentium CPUs were able to achieve with tightely written "benchmark" code (and the GROMACS code has improved since then).

As a general rule, FAH uses the oldest compiler (and other supporting software) that contains the features that FAH needs. This way FAH does not force everyone to update to the latest libraries, etc, etc.

Given that some Donors are still running on old OSs that are old, but stable, and other Donors are running the latest OS, it's always easier to support the back-rev of a library on an updates system than it is to support a future-rev of the library that wasn't available when the older stable system was released. Tha way, FAH supports as many donor systems as possible.

BTW, there are two versions of FAHCore_a7. One runs on systems with AVX and one runs on systems without AVX, using SSE. I'm not sure which set of flags you're looking at. FAH needs to support both classes of CPU.
Dr. Merkwürdigliebe
Posts: 32
Joined: Tue Nov 08, 2016 7:52 pm
Hardware configuration: Xeon 1230v3 + Geforce RTX 2080
Location: Germany

Re: GCC compiler flags and version

Post by Dr. Merkwürdigliebe »

bruce wrote:Autovecorization doesn't benefit FAH. The GROMACS code used for AVX is extremely efficient at vectorization. Even the old versions of GROMACS that were oritinally written for SSE were able to push enough vectors throug the SSE registers to exceed what Pentium CPUs were able to achieve with tightely written "benchmark" code (and the GROMACS code has improved since then).

As a general rule, FAH uses the oldest compiler (and other supporting software) that contains the features that FAH needs. This way FAH does not force everyone to update to the latest libraries, etc, etc.

[...]
OK, understood. It's practically the same problem over at rosetta@home. Every Pentium 4 machine needs to be supported...
bruce wrote: BTW, there are two versions of FAHCore_a7. One runs on systems with AVX and one runs on systems without AVX, using SSE. I'm not sure which set of flags you're looking at. FAH needs to support both classes of CPU.
I was expecting to see the AVX-capable cores to be compiled using the "-mavx" compiler flag.

From the gcc manual page:
These switches enable or disable the use of instructions in the MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AES, PCLMUL, SSE4A, FMA4, XOP, LWP, ABM or 3DNow! extended instruction sets. These extensions are also available as built-in functions: see X86 Built-in Functions, for details of the functions enabled and disabled by these switches.
To have SSE/SSE2 instructions generated automatically from floating-point code (as opposed to 387 instructions), see -mfpmath=sse.

GCC depresses SSEx instructions when -mavx is used. Instead, it generates new AVX instructions or AVX equivalence for all SSEx instructions when needed.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GCC compiler flags and version

Post by bruce »

GCC depresses SSEx instructions when -mavx is used. Instead, it generates new AVX instructions or AVX equivalence for all SSEx instructions when needed.
So what happens if your CPU only supports SSE? As I understand it, this is a Compile-time option, not a run-time option so it would break support for older CPUs. There's really no JIT support so they'd have to create a separate download for every combination.

I'll see if I can find somebody in development that really DOES know the answer.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GCC compiler flags and version

Post by bruce »

development wrote:The flags that you see in core's info block are only those used to compile the core wrapper. The Gromacs build used a different set of compiler flags which included AVX256 support and custom AVX256 assembly for the innermost simulation loops. The Gromacs libraries are then linked with the core wrapper.
It should be noted that hand written assembly code is MUCH more efficient than any general purpose auto-vectoring code can ever produce, which is what I was saying above. I have not re-read their website recently but you should be able to find more information about that at gromacs.org. For a protein with N atoms, the code needs to repeatedly multiply an NxN symmetric (triangular) matrix by a N-Vector in each of 3 directions. The matrix contains a lot of zeros which can be eliminated (by not packing them and then unpacking them correctly) avoiding an awful lot of unneeded calculations -- though it's still a lot of computations.

For current CPU projexts, 5000 < N < 175000 but the upper limit increases from time to time.
Dr. Merkwürdigliebe
Posts: 32
Joined: Tue Nov 08, 2016 7:52 pm
Hardware configuration: Xeon 1230v3 + Geforce RTX 2080
Location: Germany

Re: GCC compiler flags and version

Post by Dr. Merkwürdigliebe »

OK,

thanks for the clarification!
Dr. Merkwürdigliebe
Posts: 32
Joined: Tue Nov 08, 2016 7:52 pm
Hardware configuration: Xeon 1230v3 + Geforce RTX 2080
Location: Germany

Re: GCC compiler flags and version

Post by Dr. Merkwürdigliebe »

One more question concerning the AVX-enabled builds:

Does Core a7 (or a potential new version of it) support AVX-512? Does it make sense to buy a new, for example, XEON-W or Core i9?
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: GCC compiler flags and version

Post by JimboPalmer »

https://en.wikipedia.org/wiki/AVX-512

AVX-512 uses the same registers as AVX2, just extended by 256 bits. AVX2 uses the same registers as SSE, just extended by 128 bits.

The vast majority of Floating Point operations done by F@H are single precision, but some steps require double precision Floating Point, which AVX2 supports.

Since SSE does not natively support double precision, there is a good chance that the AVX2 code is 3 times faster and somewhat smaller than the SSE code when it is required. There is no hint that F@H would benefit from more precision.

Since AVX-512 does offer more registers than AVX2, you will get a speed up from faster memory access, if F@H benefits with more registers. (which it should)

Step 1) GROMACS programmers decode to pursue speedups they see looking at the AVX-512 architecture (more registers, more interesting filters, etc.) and build a Beta/working/stable version of GROMACS.
Step 2) Once stable, Cauldron Programmers need to evaluate if F@H operations use the enhanced features of GROMACS. (They don't always)
Step 3) If so we would either get an enhanced a7 that can automatically detect which code base to use (SSE, AVX2, or AVX512) or an all new core (a9?) that only runs on 2017 and up CPUs from Intel with an OS that supports AVX512. (Cauldron had issues where CPUs supporting AVX2, but which did not have an OS that supported AVX2, were sent on the AVX2 code branch rather than the SSE code branch. Hopefully we do not repeat that.)

F@H never promises ahead of the facts, nor posts futures schedules, so no pre-announcements can be expected.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GCC compiler flags and version

Post by bruce »

Again, I have not inspected the GROMACS code, so this is a SWAG.

Suppose the protein has N atoms at X1,y1,z1 and x2,y2,z2 and x3,y3,z3 and .... This requires N*N/2 inter-atomic vectors or 3*N*N/2 single precision computations. These operations can be packed into registers that hold 2, 4, 8, 18 per register (depending on the length of the register) -- unless they happen to be too far from each other to be worth calculating. This sort of packing/unpacking minimizes the number of registers used, speeding up the calculations using hand-coded ALC, Unfortunately the logic needs to be adjusted (reprogrammed) based on the number of single precision values fit in each register, where the last operation probably won't be maximized. This code change is done at COMPILE time, not RUN TIME -- hence there needs to be a separate version of the code for each register length.
Post Reply