19 | | * `01.heat_mpi.bin`: A straightforward MPI-only implementation using blocking MPI primitives (MPI_Send and MPI_Recv) to send and receive the halo rows. |
20 | | * `02.heat_itampi_ompss2_tasks.bin`: A hybrid MPI+OmpSs-2 version that instantiates a task to compute each of the blocks inside each rank and for each of the timesteps. It creates a task for sending and receiving the halo rows for each of the external blocks. The execution of tasks follow a data-flow model because tasks declare the dependencies on the data they reading/modifying. Moreover, communication tasks leverage the non-blocking mechanism of TAMPI (`TAMPI_Iwait`), so communications are fully non-blocking and asynchronous from the user point of view. |
21 | | * `03.heat_tampirma_ompss2_tasks.bin`: An implementation similar to `02.heat_itampi_ompss2_tasks.bin` but using MPI RMA operations (`MPI_Put`) to exchange the halo rows. This version also leverages TAMPI and uses the new `MPI_Win_ifence` function. The `TAMPI_Iwait` mechanism is used to perform the `MPI_Win_ifence` completely non-blocking and asynchronously. The calls to `MPI_Put` are assumed to be already non-blocking. We register multiple MPI RMA windows to allow concurrent communications through the different RMA windows. Each communication task exchanges the part of the halo row assigned to a single MPI window. |
| 19 | * `01.heat_mpi.bin`: A straightforward '''MPI-only''' implementation using '''blocking MPI primitives''' (MPI_Send and MPI_Recv) to send and receive the halo rows. The computation of blocks and exchange of halos inside each rank is completely sequential. |
| 20 | |
| 21 | * `02.heat_itampi_ompss2_tasks.bin`: A hybrid '''MPI+!OmpSs-2''' version leveraging '''TAMPI''' that performs both computation and communications using '''tasks''' with '''data dependencies'''. It instantiates a task to compute each of the blocks inside each rank and for each of the timesteps. It also creates a sending and receiving tasks to exchange the block halo rows for each of the boundary blocks. The execution of tasks follows a '''data-flow model''' because tasks declare the dependencies on the data they read/modify. Moreover, communication tasks call '''non-blocking MPI primitives''' and leverage the '''non-blocking mechanism of TAMPI''' (`TAMPI_Iwait`), so communications are fully non-blocking and '''asynchronous''' from the user point of view. Communication tasks issue non-blocking communications that are transparently managed and periodically checked by TAMPI. These tasks do not explicitly wait for their communication, but they delay their completion (asynchronously) until their MPI communications finish. |
| 22 | |
| 23 | * `03.heat_tampirma_ompss2_tasks.bin`: An implementation similar to `02.heat_itampi_ompss2_tasks.bin` but using '''MPI RMA operations''' (`MPI_Put`) to exchange the block halo rows. This program leverages the MPI active target RMA communication using the '''MPI window fences''' to open/close RMA access epochs. It uses the '''TAMPI''' library and the new integration for the `MPI_Win_ifence` synchronization function. In this way, we use `TAMPI_Iwait` to bind the completion of a communication task to the finalization of a `MPI_Win_ifence`. Therefore, the opening/closing of RMA access epochs is completely non-blocking and asynchronous from the user point of view. We assume the calls to `MPI_Put` are non-blocking. Finally, as an optimization, we register '''multiple MPI RMA''' windows for each rank to allow '''concurrent''' communications through the different RMA windows. Each RMA window holds a part of the halo row that may belong to multiple logical blocks. Each communication task exchanges the part of the halo row assigned to a single MPI window. |
| 24 | |
| 25 | === Requirements === |
| 26 | The requirements of this application are shown in the following lists. The main requirements are: |
| 27 | |
| 28 | * The '''GNU''' or '''Intel®''' Compiler Collection. |
| 29 | |
| 30 | * The '''ParaStationMPI''' installation supporting '''multi-threading''' and featuring the '''libNAM''' extension that allows access to NAM memory regions through MPI RMA windows. |
| 31 | |
| 32 | * The '''Task-Aware MPI (TAMPI)''' library which defines a clean '''interoperability''' mechanism for MPI and OpenMP/!OmpSs-2 tasks. It supports both blocking and non-blocking MPI operations by providing two different interoperability mechanisms. Downloads and more information at [https://github.com/bsc-pm/tampi]. |
| 33 | |
| 34 | * The '''!OmpSs-2''' model which is the second generation of the '''!OmpSs''' programming model. It is a '''task-based''' programming model originated from the ideas of the OpenMP and !StarSs programming models. The specification and user-guide are available at [https://pm.bsc.es/ompss-2-docs/spec/] and [https://pm.bsc.es/ompss-2-docs/user-guide/], respectively. !OmpSs-2 requires both '''Mercurium''' and '''Nanos6''' tools. Mercurium is a source-to-source compiler which provides the necessary support for transforming the high-level directives into a parallelized version of the application. The Nanos6 runtime system provides the services to manage all the parallelism in the application (e.g., task creation, synchronization, scheduling, etc.). Downloads at [https://github.com/bsc-pm]. |
| 35 | |
| 36 | * The NAM software allowing access to NAM memory. |