GROMACS 2019?

foldy · Post by **foldy** » Sun Dec 22, 2019 9:21 am

I read GROMACS 2019.4 has optimizations for AMD Zen 2 CPUs, can use AVX2 and offload PME tasks to Intel iGPU.
http://manual.gromacs.org/documentation ... mance.html

Would this be useful to FAH to create a new FAHCore for CPU?

Post by **toTOW** » Sun Dec 22, 2019 2:25 pm

It's been a while since Gromacs has had the ablility to use the GPU, but it has always been quite inefficient ... way less efficient than having two separate cores for CPU and GPU. Some BOINC projects tried this approach, and the GPU load was ridiculous in this situation, too much time being lost on the CPU to keep the GPU part busy enough.

Current release (used in A7 core) is already capable of using more advanced instructions than AVX (if you look closely in science.log, with a compatible CPU, you'll see a message that AVX is used but the CPU can do better with AVX2 for instance), but as a choice to maximize compatibility and to prevent the need of compiling multiple versions of the same core, AVX instructions are forced at compilation step by the developer.

foldy · Post by **foldy** » Mon Dec 23, 2019 11:03 am

toTOW wrote: ... way less efficient than having two separate cores for CPU and GPU.

That is true but we do not have a seperate GPU core for Intel iGPU. So Intel iGPU could be used as accelerator for FAH CPU core.

foldy · Post by **foldy** » Mon Dec 23, 2019 11:04 am

JimboPalmer said in PM:

Implemented update groups This sounds like you can use less CPUs on a task, But F@H does not usually have more than one slot to utilize those unused CPUs This would work best if we could create slots on the fly.

PME long-ranged interaction GPU offload now available with OpenCL This would require a slot for BOTH CPU and GPU, a new idea in F@H

Intel integrated GPUs are now supported for GPU offload with OpenCL If the above were implemented, this would be nice.

Bonded interactions are now supported for CUDA GPU offload F@H has decided not to support CUDA as it is limited to Nvidia.

Added code generation support for NVIDIA Turing GPUs F@H has decided not to support CUDA as it is limited to Nvidia.

So most of these would involve an upheaval of the Slot concept. It would not be a simple recompile. Deciding to support CUDA means software development not all devices can benefit from. Given that F@H has a one person development team, dividing his focus is iffy, I bet if Nvidia threw developers at it, it would happen.

foldy · Post by **foldy** » Mon Dec 23, 2019 11:15 am

JimboPalmer wrote: So most of these would involve an upheaval of the Slot concept. It would not be a simple recompile.

I think the slot concept can stay like it is. Only the CPU slot needs an option to use Intel iGPU as accelerator. e.g. I would set a FAH CPU slot option useIntelGPU=true and then the FAH CPU slot would offload some work to Intel iGPU.

I don't know if there are multi CPU mainboards supporting several CPUs each with an Intel iGPU? Then the FAH CPU slot option would be useIntelGPU=1 for first Intel iGPU and for a second FAH CPU slot useIntelGPU=2.

Frisa · Post by **Frisa** » Mon Dec 23, 2019 4:47 pm

seems IGPU offloading is the most promising new feature, since openmm did not utilize igpu so the igpu inside intel cpu would gonna idle anyway, why not finding some way to crack performance out of it?
plus it would be nice if theres avx2 binary

Post by **bruce** » Mon Dec 23, 2019 5:59 pm

Allowing CPU slots to utilize an Intel GPU would certainly benefit those who run CPU slots. There's a similar concept in OpenMM for GPU slots which gets some assistance checking for errors by the CPU, but one CPU is already allocated to the GPU slot. As foldy suggests, since the Intel GPU isn't used by FAH, I don't see any difficulty allocating it to the CPU slot(s) -- except for FAH's shortage of developmental resources (which may or may not be the deciding factor).

By default, FAH creates a single GPU slot with all the remaining CPUs -- but of course many people choose to change their configuration. In most cases, there may be one CPU slot and one (or zero) Intel iGPU. In that case, there's no need for a useIntelGPU=1,2,etc. option (thereby simplifying development). Letting any CPU slot use any iGPU seems like a no-brainer. Even if that constrains resources on specific systems, it would be on a complex system being configured by a guru, and they'll be deciding what makes sense anyway. Adding iGPU support would always be a benefit.

I have not searched gromacs.org for the status of this feature. Have they added it in GROMACS Vx.x? If it's a simple matter of replacing Vx.x with Vy.y and calling it FAHCore_a8. FAH's overall throughput would increase by a relatively small factor but if the developmental costs are small, maybe we can convince them to fund the effort.

Is Intel OpenCL required?

Would the CPU slot(s) attempt to steal resources from dGPUs? (I think we'd want to prohibit that.)

foldy · Post by **foldy** » Mon Dec 23, 2019 10:23 pm

1) Have they added it in GROMACS Vx.x? => Since gromace 2019, latest patch release is gromacs 2019.5
2) If it's a simple matter of replacing Vx.x with Vy.y and calling it FAHCore_a8? => I don't know if the PME and non bonded tasks offload to Intel iGPU would help FAH simulations, scientists must answer? http://manual.gromacs.org/documentation ... -with-gpus

Code: Select all

md_run command has parameter -gpu_id.

This are the needed compile flags on building gromacs

Code: Select all

-DGMX_GPU=on to build using nvcc to run using NVIDIA CUDA GPU acceleration or an OpenCL GPU
-DGMX_USE_OPENCL=on to build with OpenCL support enabled. GMX_GPU must also be set.

http://manual.gromacs.org/documentation ... index.html

It is not possible to configure both CUDA and OpenCL support in the same build of GROMACS, nor to support both Intel and other vendors’ GPUs with OpenCL

I understand it if intel opencl ist configured in build then this will not run on other Nvidia/AMD GPUs - which is what we want.
3) Is Intel OpenCL required? => OpenCL 1.2 is required in 64bit which is supported by Intel graphics driver. http://manual.gromacs.org/documentation ... ility.html
4) Would the CPU slot(s) attempt to steal resources from dGPUs? => in code or by runtime flag must configure to use only Intel iGPUs

To build with support for Intel integrated GPUs, it is required to add -DGMX_OPENCL_NB_CLUSTER_SIZE=4 to the cmake command line, so that the GPU kernels match the characteristics of the hardware. The Neo driver is recommended.

Intel iGPU with OpenCL 1.2 support is available since Intel 3rd gen CPU Ivy Bridge on Windows. But on Linux it is supported since Intel 5th gen CPU only. If you set minimum version to Intel 4th gen CPU Haswell then these also offer AVX2. So the FAHCore_a8 could provide AVX2 and Intel iGPU support together.

Frisa · Post by **Frisa** » Tue Dec 24, 2019 4:26 am

foldy wrote: Intel iGPU with OpenCL 1.2 support is available since Intel 3rd gen CPU Ivy Bridge on Windows. But on Linux it is supported since Intel 5th gen CPU only. If you set minimum version to Intel 4th gen CPU Haswell then these also offer AVX2. So the FAHCore_a8 could provide AVX2 and Intel iGPU support together.

its more a drive rather than hardware problem, the NEO driver only supports broadwell or later igpu, for running opencl on ivy bridge or haswell igpu you need to install old beignet driver, only opencl 1.2 is supported, but thats enough
https://github.com/intel/beignet
https://stackoverflow.com/questions/573 ... e-and-inte

foldy · Post by **foldy** » Tue Dec 24, 2019 11:03 am

Since Intel Broadwell they now also have Intel(R) Graphics Compute Runtime for OpenCL on Linux
https://github.com/intel/compute-runtime

Post by **bruce** » Tue Dec 24, 2019 4:29 pm

Good.

In the past,some have reported difficulties with NV/AMD GPUs when Intel's OpenCL was installed. I don't know if the that's still true nor do I know if more than one runtime can coexist without conflict. The MS Registry does allow multiple links.

foldy · Post by **foldy** » Tue Dec 24, 2019 5:40 pm

In theory there is only one OpenCL interface and several OpenCL runtimes for Intel/Nvidia/AMD. If I load OpenCL device 0 or 1 or 2 with several GPUs using common OpenCL interface then each device loads its different OpenCL runtime through graphics driver. But on Windows 10 FAH gets confused sometimes when registry also has Intel OpenCL enabled. Nvidia says to fix this just remove the Windows registry key. For FAH users it was more easy to just uninstall Intel iGPU drivers. So if FAH uses Intel iGPU acceleration for CPU slot then Intel iGPU driver needs to stay installed but Windows registry key needs to be removed.

Scientists or who can need to benchmark if AVX2 and/or intel iGPU is useful for FAH.

Post by **bruce** » Tue Dec 24, 2019 8:52 pm

foldy wrote:Scientists or who can need to benchmark if AVX2 and/or intel iGPU is useful for FAH.

Development won't enable a technology unless there's a througput benefit to FAH that's worth the development costs.

It's my understanding that the AVX registers are independent hardware from the iGPU shaders so they can work in parallel. That means we can probably assume that performance of a CPU-only slot will benefit from the added parallelism available by also using an iGPU, though it may not be a huge difference.

MeeLee · Post by **MeeLee** » Wed Dec 25, 2019 8:48 am

From similar Boinc projects in terms to CPU/GPU load to FAH, the CPU load for pre-iris Inel IGPs is very low (below 10% GPU) and doesn't really need a dedicated core for folding.
A core with hyperthreading is good enough to be shared between Intel IGP and dGPU, or for CPU folding there's really no need for setting a dedicated core, as only very few CPU resources are used for these iGPUs.

I don't know about Intel's Iris iGPUs, But if I were to estimate, based on shader count, a 12/24 shader 8th-9th gen uses up <10-15% of a CPU thread, their best 64 shader iGPUs probably will use around 40% of a CPU thread.
This can easily be shared between cores when CPU folding.

The only con I see, is when intel starts stealing CPU time from other dGPUs, in a case scenario where a dedicated graphics card will lose more PPD than the iGPU would gain.

nVidia already locks CPUs to each GPU, so in case of nVidia I don't think there will be any issue, as long as the CPU has enough speed to feed the Nvidia GPUs, and there are enough threads available on the CPU.

MeeLee · Post by **MeeLee** » Wed Dec 25, 2019 9:26 am

bruce wrote:Good.

In the past,some have reported difficulties with NV/AMD GPUs when Intel's OpenCL was installed. I don't know if the that's still true nor do I know if more than one runtime can coexist without conflict. The MS Registry does allow multiple links.

On Linux, if Boinc is any indication (they also use OpenCL), it is possible on post 7th gen CPUs; and oddly, works on Intel Atom N2000 series, but not on the N3000 series.
In some cases, the system needs to be properly set up to work with their dedicated GPUs, with Intel IGP going through the motherboard HDMI connector disabled in the bios.

Once the system is running fine, intel IGP can be enabled, drivers installed, Intel OpenCL installed, and at least for Boinc, now both Intel and Nvidia GPUs are recognized and can be used.
However, there are case scenarios, in which (for a first time at least), a monitor or dummy plug needs to be installed on the Intel HDMI port, for Boinc to recognize the IGP.

I would presume that FAH could respond very similarly to Boinc in this manner.

----

As far as Intel iGPU performance, it may have been good 5 years ago.
The avg iGP numbers, based on their flops rating, compared to similar CPU or GPUs, is pretty low.

On an Intel Celeron 4900, the CPU does 100Gflops, and their 12 EUs Intel iGPU (using 10-15W) does an additional 200Gflops at full precision.
The Celeron runs at 3,1Ghz, the IGP runs at 1050Mhz sustained boost frequency, with a stock Intel cooler.
That means both CPU and IGP in tandem, will be running around 15-20k PPD at best (below 10k PPD for CPU only).
Not really worth investing research in, I'd say, though the nice thing is that the IGP runs at double the speed of the CPU, while using half the power consumption of the CPU.

Intel 10th gen Iris GPU, with 64 shader (rated at 1Tflops FPP), would most likely be able to get about 60-75 kPPD, should it were to work on FAH; which is in line with a 4C/8T cpu running at ~4Ghz.

With the time bonus FAH offers for quick return, If you compare these numbers with any modern GPU running in excess of 1M PPD (8Tflops), you'll need A LOT of them to get the same PPD score (I would estimate around 100); but only a good 10 of the best Intel iGPUs to complete the same amount of work in the same amount of time; or get a similar PPD rating (without the QRB).

Folding Forum

GROMACS 2019?

GROMACS 2019?

Re: GROMACS 2019?

Re: GROMACS 2019?

Re: GROMACS 2019?

Re: GROMACS 2019?

Re: GROMACS 2019?

Re: GROMACS 2019?

Re: GROMACS 2019?

Re: GROMACS 2019?

Re: GROMACS 2019?

Re: GROMACS 2019?

Re: GROMACS 2019?

Re: GROMACS 2019?

Re: GROMACS 2019?

Re: GROMACS 2019?