11 | | In this page we show how to access Network Attached Memory (NAM) from tasks in hybrid task-based MPI+!OmpSs-2 applications. |
12 | | The NAM memory regions can be accessed through MPI RMA windows thanks to the extensions on the RMA model interface provided by !ParaStation MPI. |
13 | | The applications can allocate MPI windows that are linked to NAM memory regions and perform RMA operations on those NAM regions, such as `MPI_Put` and `MPI_Get`. |
14 | | Following the standard MPI RMA fence synchronization mode, applications can open access epochs on those regions by calling `MPI_Win_fence`, then call the desired RMA operations, and finally, close the epoch with another call to `MPI_Win_fence`. |
15 | | Moreover, this NAM support has been integrated into the !OmpSs-2 tasking model, allowing accessing NAM regions from !OmpSs-2 tasks efficiently and safely through the Task-Aware MPI (TAMPI) library. |
16 | | |
17 | | In the following sections, we describe the TAMPI library and how hybrid task-based applications should access NAM memory regions. |
18 | | Finally, we show the Heat equation benchmark as an example of using this support to save periodic snapshots of the computed matrix in a NAM allocation. |
| 11 | This page explains how to access Network Attached Memory (NAM) from tasks in hybrid task-based MPI+!OmpSs-2 applications. |
| 12 | The NAM memory regions can be accessed through MPI RMA windows thanks to the extensions on the MPI RMA interface that is implemented by !ParaStation MPI. |
| 13 | The user applications can allocate MPI windows that represent NAM memory regions and perform RMA operations on those NAM windows, such as `MPI_Put` and `MPI_Get`. |
| 14 | Following the standard MPI RMA fence synchronization mode, applications can open access epochs on those windows by calling `MPI_Win_fence`, then perform the desired RMA operations on the NAM data, and finally, close the epoch with another call to `MPI_Win_fence`. |
| 15 | This NAM support has been integrated into the !OmpSs-2 tasking model, allowing accessing NAM regions from !OmpSs-2 tasks efficiently and safely through the Task-Aware MPI (TAMPI) library. |
| 16 | |
| 17 | In the following sections, we describe the TAMPI library and how hybrid task-based applications can access NAM memory regions. |
| 18 | Finally, we show the Heat equation benchmark as an example of using this support to save snapshots of the computed matrix in a NAM allocation. |
24 | | A more advanced technique is the taskification of both computation and communications phases using the task data dependencies to connect both task types and guarantee a correct execution order. |
25 | | This strategy allows applications to parallelize their workloads following the data-flow paradigm principles, where communications are integrated into the flow of tasks. |
26 | | However, MPI and OpenMP were not designed to be combined efficiently since MPI only defines threading levels to safely interact with other parallel programming models. |
| 24 | A more advanced technique is the taskification of both computation and communication phases using the task data dependencies to connect both task types and guarantee a correct execution order. |
| 25 | This strategy allows applications to parallelize their workloads following the data-flow paradigm principles, where communications are integrated into the execution flow of tasks. |
| 26 | However, MPI and OpenMP were not designed to be combined efficiently nor safely; MPI only defines threading levels to safely call MPI functions from multiple threads. |
| 27 | For instance, calling blocking MPI operations from different tasks concurrently could end up in a communication deadlock. |
30 | | The TAMPI library allows calling MPI communication operations from within tasks. |
31 | | On the one hand, blocking MPI operations pause the calling task until the operation completes, but in the meantime, the tasking runtime system may execute other tasks. |
32 | | |
33 | | On the other hand, non-blocking operations can be bound to a task through two API functions named `TAMPI_Iwait` and `TAMPI_Iwaitall`, which have the same parameters as the standard `MPI_Wait` and `MPI_Waitall`, respectively. |
| 31 | The TAMPI library allows calling MPI communication operations from within tasks freely. |
| 32 | On the one hand, blocking MPI operations (e.g. `MPI_Recv`) pause the calling task until the operation completes, but in the meantime, the tasking runtime system may use the CPU to execute other ready tasks. |
| 33 | |
| 34 | On the other hand, non-blocking operations (e.g. `MPI_Irecv`) can be bound to a task through two API functions named `TAMPI_Iwait` and `TAMPI_Iwaitall`, which have the same parameters as the standard `MPI_Wait` and `MPI_Waitall`, respectively. |
35 | | The calling task will not release its dependencies until (1) the task finishes its execution and (2) all bound MPI operations complete. |
36 | | Its successor tasks will be ready to execute only once its dependencies are released. |
| 36 | The calling task will continue its execution immediately without blocking, transferring the management of the MPI requests to the TAMPI library, and will be able to finish its execution freely. |
| 37 | However, the calling task will not fully complete until (1) it finishes its execution and (2) all its bound MPI operations complete. |
| 38 | Upon the task completion, its data dependencies will be released and its successor tasks will be ready to execute. |
| 39 | Usually its successor tasks will be the ones that consume the received data or reuse a sending buffer. |
| 40 | Notice that this approach requires tasks to define data dependencies on the communication buffers in order to guarantee a correct execution order between communication tasks and consumer/reuser tasks. |