| 12 | |
| 13 | The most common practice in hybrid applications is combining MPI and OpenMP in a fork-join approach, where computation phases are parallelized using OpenMP and communication phases are serialized. |
| 14 | This is a simple approach but it usually does not scale as well as the MPI-only parallelizations. |
| 15 | A more advanced technique is the taskification of both computation and communications phases using the task data dependencies to connect both task types and guarantee a correct execution order. |
| 16 | This strategy allows applications to parallelize their workloads following the data-flow paradigm principles, where communications are integrated into the flow of tasks. |
| 17 | However, MPI and OpenMP were not designed to be combined efficiently since MPI only defines threading levels to safely interact with other parallel programming models. |
| 18 | |
| 19 | The Task-Aware MPI library was recently proposed to overcome these limitations. |
| 20 | TAMPI allows the safe and efficient taskification of MPI communications, including all blocking, non-blocking, point-to-point, and collective operations. |
| 21 | The TAMPI library allows calling MPI communication operations from within tasks. |
| 22 | On the one hand, blocking MPI operations pause the calling task until the operation completes, but in the meantime, the tasking runtime system may execute other tasks. |
| 23 | |
| 24 | On the other hand, non-blocking operations can be bound to a task through two API functions named `TAMPI_Iwait` and `TAMPI_Iwaitall`, which have the same parameters as the standard `MPI_Wait` and `MPI_Waitall`, respectively. |
| 25 | These two TAMPI functions are non-blocking and asynchronous and can be used to bind the completion of the calling task to the finalization of the MPI requests passed as parameters. |
| 26 | The calling task will not release its dependencies until (1) the task finishes its execution and (2) all bound MPI operations complete. |
| 27 | Its successor tasks will be ready to execute only once its dependencies are released. |
| 28 | We must highlight that a task can call those TAMPI functions several times, binding multiple MPI requests during its execution. |
| 29 | Moreover, TAMPI offers a wrapper function for each non-blocking MPI operation, which performs the standard non-blocking operation, and then, it automatically calls `TAMPI_Iwait` with the resulting MPI request. |
| 30 | The `TAMPI_Isend` and `TAMPI_Irecv` are examples of these wrappers. |
| 31 | |
| 32 | More information about the TAMPI library and its code is available at [https://github.com/bsc-pm/tampi]. |