Research line leader: Henk Corporaal (TUe)

ZERO requires heterogeneous processing platforms. These platforms are extremely difficult to program efficiently. We have observed that the energy and performance efficiency of a straightforward mapping can be improved with at least a factor 10 (but often much more) by highly tuning the application and its mapping. However, this tuning makes the code incredibly complex, error prone, and difficult to maintain. In addition, every platform or application change requires redoing this process. Factors complicating efficient code generation include e.g.: different cache and scratchpad configurations, advanced DMA support, different instruction sets used, various types of parallelism to be exploited (SIMD, MIMD, threads, ILP), use of parametric accelerators, and Big-Little code migration. In addition, IoT devices often have real-time requirements, and, most importantly, they need special attention with respect to low power, exploiting all the energy knobs. Currently, most code generation methods ignore timing and power; these constraints are checked afterwards, leading to many long design cycles. It will be clear that R4 covers many challenges, among others:

  • Investigate code optimization opportunities for low power and measure their potential.
  • Implement advanced code transformations to perform above code optimizations automatically within an existing compiler flow. New is also that automatic optimization will be driven by power-estimation models.
  • Support automatic function recognition and offloading to more power-efficient function specific hardware accelerators.
  • Target various heterogeneous processing back-ends, including accelerators.

We will take LLVM tooling, with OpenMP4 (supporting accelerators), including state-of-the-art polyhedral optimizations (Polly), auto-vectorization, and existing optimizations for energy reduction like data locality and reuse improvement, as point of departure.