The major challenge of MAX is to prepare the flagship codes for the transition to the Exascale. In doing that, we should face the technological challenges related to hardware architectures becoming more complex (many cores, wide vector units) and heterogeneous (i.e. equipped with GPUs or FPGAs). This requires a modernisation of the codes and the adoption of new programming models.
In this context, OpenMP has represented in the past the standard de-facto for intra-node parallelism. However, it has recently dealt with limitations related to the fork-join models. To overcome these limitations we are exploring inside OpenMP the possibility to explore a task-based parallelism, in particular looking at some new features of the OpenMP5 standard, such as the taskloop construct. Besides OpenMP, we are interested in investigating other possibilities, always keeping the focus on asynchronous approaches in order to reduce the latency. In particular, we look at FPGAs approaches and one-sided communication techniques and frameworks like HPX.
In order to address the heterogeneity of the systems (i.e. GPUs), we are considering both open standards (OpenACC or the offload construct in OpenMP) and proprietary solutions, in particular CUDA and CudaFortran, as well as Intel ONEAPI and AMD ROCm and HIP.