Can we overcome the performance/portability tradeoff on GPU pipelines

The tremendous improvement of Graphic Processing Units (GPUs) hardware and software makes them a good fit with real-time Adaptive Optics computation pipelines, although full of challenges. The help of modern GPU architecture and some non-conventional GPU programming techniques makes the sub millisecond requirement no longer impossible.

Unfortunately, GPU programming requires a deep understanding of its mechanisms in order to achieve this kind of performance. It also raises concerns about global maintainability of such systems.

This talk will dive into the GPU mechanisms that facilitate the integration of discrete accelerators into time sensitive system that are in use within the real-time AO control platform COSMIC By taking advantage of its asynchronous nature, we display how it is effectively possible to divert most latencies and overheads from the critical path. We will then discuss possible ways to improve productivity & maintainability of such systems The tremendous improvement of Graphic Processing Units (GPUs) hardware and software makes them a good fit with real-time Adaptive Optics computation pipelines, although full of challenges. The help of modern GPU architecture and some non-conventional GPU programming techniques makes the sub millisecond requirement no longer impossible.

Unfortunately, GPU programming requires a deep understanding of its mechanisms in order to achieve this kind of performance. That is why using a high-level programming model such as OpenMP while keeping the performance obtained on a hardware specific one like CUDA would be a good balance, although OpenMP lacks, so far, the right tools to reach real-time performance. The Barcelona SuperComputing center proposes a novel approach with a compiler transformation technique that turns OpenMP directives into CUDA code, allowing a significant performance increase. This talk will cover the basics of enabling GPU computations for AO real-time control with the COSMIC platform. It will then present how compiler transformation techniques can turn an OpenMP pipeline into a high performance GPU graph without trading-off on performance, which with our contribution to the CLANG compiler gives all the necessary tools to implement AO pipelines in such environment and get a better understanding of the GPU behavior.

 

Cyril Cetre, Thales Research & Technology