Thorough overview of HPC domain, including programmable GPUs

#64-Lux · Post by **#64-Lux** » Fri Jun 18, 2010 7:47 am

List of documents: http://www.prace-project.eu/documents/p ... erables-1/
There's everything: networking, I/O, "approaches" (CPUs, GPUs, FPGAs...), 15 different programming languages/platforms from higher level to VHDL, benchmarks, productivity ("time to program", "lines of code"), etc. etc. The documents were written at the end of 2009, therefore some details may have changed, but overall it's truly great collection of knowledge. Some quotes to wet the appetite from Report on petascale software libraries and programming models:

GPUs are currently the prevalent accelerator although the have not been used very often for production runs. Several languages exist to make use of GPUs. The language on which most people pinned their hopes is probably OpenCL. Testing the performance of the first available OpenCL compilers has revealed that it is currently insufficient. The language specification is not even a year old, so it might not be a surprise that the performance is very poor. More severe is the fact that the design of the language seems to be insufficient as well. The HPC community hoped that OpenCL could be the language that allows running code on different accelerators which would solve the problem of code maintainability and portability across accelerator devices, but it turned out that to program OpenCL many choices depend on the underlying hardware which prevents a seamless use of other devices. The language is very similar to CUDA and makes some implicit assumptions which are only true for graphic cards. So it seems that the language will stay a GPU language for another couple of years. An HPC compiler expert claimed that it is only feasible to use OpenCL as an intermediate language; higher level languages should be used on top of it and ensure at least portability across GPUs. At the current time it is not advisable to use OpenCL for scientific projects.

The second language that can currently be recommended for those interested in accelerators is CUDA. Even if there is a small danger that its vendor NVIDIA at some point might get out of business or loose interest in CUDA, the language is quite similar to OpenCL. CUDA code could be easily ported to OpenCL if necessary. It is clear that using CUDA for scientific codes is somewhat comparable to using assembler code for the most important kernel routines; with each new hardware or software release small changes in the code are usually necessary. Codes that have delivered good performance on older hardware might suffer on newer hardware and vice versa. The code will definitely grow in size and become harder to maintain. That means that everyone should be clearly aware, whether the code will benefit from the use of accelerators or if there are easier modifications; e.g., optimizing for SSE instructions.

Regadring the poor state of OpenCL compiler(s), there was also another study in first quarter of 2010 that came to similar conclusion:

However,when nontrivial device kernels are used, OpenCL begins to trail CUDA performance significantly. This demonstrates the immaturity of the OpenCL compiler and has dramatic consequences for potential application developers.

Another things I found interesting:
*) the GPUs (accelerators) are not compared to a single CPU, but to a "8-core, 2-socket Nehalem-EP node at 2.53 GHz";
*) there are developments (incl. with DARPA backing, which also gave us ReiserFS) towards higher level HPC languages (Chapel). Compilers are crucial, but raw;
*) there are 3D and 6D network topologies (I've yet to imagine the latter);
*) partitioned global address space: locality (NUMA) in network domain and whatnot;
*) transactional memory in hardware would have been debuted in Sun's Rock processor, but from Wikipedia it seems that the project was canceled after Oracle merger.