Context Navigation

TAMPI_NAM

Timestamp:: Apr 26, 2021, 3:49:56 PM (3 years ago)
Author:: Kevin Sala
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

Public/User_Guide/TAMPI_NAM

-                      v21
+                      v22
 == Quick Overview ==
 In this page we show how to access Network Attached Memory (NAM) from tasks in hybrid task-based MPI+!OmpSs-2 applications.
 The NAM memory regions can be accessed through MPI RMA windows thanks to the extensions on the RMA model interface provided by !ParaStation MPI.
 The applications can allocate MPI windows that are linked to NAM memory regions and perform RMA operations on those NAM regions, such as `MPI_Put` and `MPI_Get`.
 Following the standard MPI RMA fence synchronization mode, applications can open access epochs on those regions by calling `MPI_Win_fence`, then call the desired RMA operations, and finally, close the epoch with another call to `MPI_Win_fence`.
 Moreover, this NAM support has been integrated into the !OmpSs-2 tasking model, allowing accessing NAM regions from !OmpSs-2 tasks efficiently and safely through the Task-Aware MPI (TAMPI) library.
 In the following sections, we describe the TAMPI library and how hybrid task-based applications should access NAM memory regions.
 Finally, we show the Heat equation benchmark as an example of using this support to save periodic snapshots of the computed matrix in a NAM allocation.
+This page explains how to access Network Attached Memory (NAM) from tasks in hybrid task-based MPI+!OmpSs-2 applications.
+The NAM memory regions can be accessed through MPI RMA windows thanks to the extensions on the MPI RMA interface that is implemented by !ParaStation MPI.
+The user applications can allocate MPI windows that represent NAM memory regions and perform RMA operations on those NAM windows, such as `MPI_Put` and `MPI_Get`.
+Following the standard MPI RMA fence synchronization mode, applications can open access epochs on those windows by calling `MPI_Win_fence`, then perform the desired RMA operations on the NAM data, and finally, close the epoch with another call to `MPI_Win_fence`.
+This NAM support has been integrated into the !OmpSs-2 tasking model, allowing accessing NAM regions from !OmpSs-2 tasks efficiently and safely through the Task-Aware MPI (TAMPI) library.
+In the following sections, we describe the TAMPI library and how hybrid task-based applications can access NAM memory regions.
+Finally, we show the Heat equation benchmark as an example of using this support to save snapshots of the computed matrix in a NAM allocation.
 === Task-Aware MPI (TAMPI) ===
 …
 The most common practice in hybrid applications is combining MPI and OpenMP in a fork-join approach, where computation phases are parallelized using OpenMP and communication phases are serialized.
 This is a simple approach but it usually does not scale as well as the MPI-only parallelizations.
+A more advanced technique is the taskification of both computation and communications phases using the task data dependencies to connect both task types and guarantee a correct execution order.
+This strategy allows applications to parallelize their workloads following the data-flow paradigm principles, where communications are integrated into the flow of tasks.
+However, MPI and OpenMP were not designed to be combined efficiently since MPI only defines threading levels to safely interact with other parallel programming models.
+A more advanced technique is the taskification of both computation and communication phases using the task data dependencies to connect both task types and guarantee a correct execution order.
+This strategy allows applications to parallelize their workloads following the data-flow paradigm principles, where communications are integrated into the execution flow of tasks.
+However, MPI and OpenMP were not designed to be combined efficiently nor safely; MPI only defines threading levels to safely call MPI functions from multiple threads.
+For instance, calling blocking MPI operations from different tasks concurrently could end up in a communication deadlock.
 The Task-Aware MPI library was recently proposed to overcome these limitations.
 TAMPI allows the safe and efficient taskification of MPI communications, including all blocking, non-blocking, point-to-point, and collective operations.
 The TAMPI library allows calling MPI communication operations from within tasks.
 On the one hand, blocking MPI operations pause the calling task until the operation completes, but in the meantime, the tasking runtime system may execute other tasks.
 On the other hand, non-blocking operations can be bound to a task through two API functions named `TAMPI_Iwait` and `TAMPI_Iwaitall`, which have the same parameters as the standard `MPI_Wait` and `MPI_Waitall`, respectively.
+The TAMPI library allows calling MPI communication operations from within tasks freely.
+On the one hand, blocking MPI operations (e.g. `MPI_Recv`) pause the calling task until the operation completes, but in the meantime, the tasking runtime system may use the CPU to execute other ready tasks.
+On the other hand, non-blocking operations (e.g. `MPI_Irecv`) can be bound to a task through two API functions named `TAMPI_Iwait` and `TAMPI_Iwaitall`, which have the same parameters as the standard `MPI_Wait` and `MPI_Waitall`, respectively.
 These two TAMPI functions are non-blocking and asynchronous and can be used to bind the completion of the calling task to the finalization of the MPI requests passed as parameters.
+The calling task will not release its dependencies until (1) the task finishes its execution and (2) all bound MPI operations complete.
+Its successor tasks will be ready to execute only once its dependencies are released.
+The calling task will continue its execution immediately without blocking, transferring the management of the MPI requests to the TAMPI library, and will be able to finish its execution freely.
+However, the calling task will not fully complete until (1) it finishes its execution and (2) all its bound MPI operations complete.
+Upon the task completion, its data dependencies will be released and its successor tasks will be ready to execute.
+Usually its successor tasks will be the ones that consume the received data or reuse a sending buffer.
+Notice that this approach requires tasks to define data dependencies on the communication buffers in order to guarantee a correct execution order between communication tasks and consumer/reuser tasks.
 We must highlight that a task can call those TAMPI functions several times, binding multiple MPI requests during its execution.
+Moreover, TAMPI offers a wrapper function for each non-blocking MPI operation, which performs the standard non-blocking operation, and then, it automatically calls `TAMPI_Iwait` with the resulting MPI request.
+The `TAMPI_Isend` and `TAMPI_Irecv` are examples of these wrappers.
+More information about the TAMPI library and its code is available at [https://github.com/bsc-pm/tampi].
+More information about the TAMPI library, some examples and its code are available at [https://github.com/bsc-pm/tampi].
 === Accessing NAM through !ParaStation MPI and TAMPI ===