Changes between Version 21 and Version 22 of Public/User_Guide/TAMPI_NAM


Ignore:
Timestamp:
Apr 26, 2021, 3:49:56 PM (3 years ago)
Author:
Kevin Sala
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/TAMPI_NAM

    v21 v22  
    99== Quick Overview ==
    1010
    11 In this page we show how to access Network Attached Memory (NAM) from tasks in hybrid task-based MPI+!OmpSs-2 applications.
    12 The NAM memory regions can be accessed through MPI RMA windows thanks to the extensions on the RMA model interface provided by !ParaStation MPI.
    13 The applications can allocate MPI windows that are linked to NAM memory regions and perform RMA operations on those NAM regions, such as `MPI_Put` and `MPI_Get`.
    14 Following the standard MPI RMA fence synchronization mode, applications can open access epochs on those regions by calling `MPI_Win_fence`, then call the desired RMA operations, and finally, close the epoch with another call to `MPI_Win_fence`.
    15 Moreover, this NAM support has been integrated into the !OmpSs-2 tasking model, allowing accessing NAM regions from !OmpSs-2 tasks efficiently and safely through the Task-Aware MPI (TAMPI) library.
    16 
    17 In the following sections, we describe the TAMPI library and how hybrid task-based applications should access NAM memory regions.
    18 Finally, we show the Heat equation benchmark as an example of using this support to save periodic snapshots of the computed matrix in a NAM allocation.
     11This page explains how to access Network Attached Memory (NAM) from tasks in hybrid task-based MPI+!OmpSs-2 applications.
     12The NAM memory regions can be accessed through MPI RMA windows thanks to the extensions on the MPI RMA interface that is implemented by !ParaStation MPI.
     13The user applications can allocate MPI windows that represent NAM memory regions and perform RMA operations on those NAM windows, such as `MPI_Put` and `MPI_Get`.
     14Following the standard MPI RMA fence synchronization mode, applications can open access epochs on those windows by calling `MPI_Win_fence`, then perform the desired RMA operations on the NAM data, and finally, close the epoch with another call to `MPI_Win_fence`.
     15This NAM support has been integrated into the !OmpSs-2 tasking model, allowing accessing NAM regions from !OmpSs-2 tasks efficiently and safely through the Task-Aware MPI (TAMPI) library.
     16
     17In the following sections, we describe the TAMPI library and how hybrid task-based applications can access NAM memory regions.
     18Finally, we show the Heat equation benchmark as an example of using this support to save snapshots of the computed matrix in a NAM allocation.
    1919
    2020=== Task-Aware MPI (TAMPI) ===
     
    2222The most common practice in hybrid applications is combining MPI and OpenMP in a fork-join approach, where computation phases are parallelized using OpenMP and communication phases are serialized.
    2323This is a simple approach but it usually does not scale as well as the MPI-only parallelizations.
    24 A more advanced technique is the taskification of both computation and communications phases using the task data dependencies to connect both task types and guarantee a correct execution order.
    25 This strategy allows applications to parallelize their workloads following the data-flow paradigm principles, where communications are integrated into the flow of tasks.
    26 However, MPI and OpenMP were not designed to be combined efficiently since MPI only defines threading levels to safely interact with other parallel programming models.
     24A more advanced technique is the taskification of both computation and communication phases using the task data dependencies to connect both task types and guarantee a correct execution order.
     25This strategy allows applications to parallelize their workloads following the data-flow paradigm principles, where communications are integrated into the execution flow of tasks.
     26However, MPI and OpenMP were not designed to be combined efficiently nor safely; MPI only defines threading levels to safely call MPI functions from multiple threads.
     27For instance, calling blocking MPI operations from different tasks concurrently could end up in a communication deadlock.
    2728
    2829The Task-Aware MPI library was recently proposed to overcome these limitations.
    2930TAMPI allows the safe and efficient taskification of MPI communications, including all blocking, non-blocking, point-to-point, and collective operations.
    30 The TAMPI library allows calling MPI communication operations from within tasks.
    31 On the one hand, blocking MPI operations pause the calling task until the operation completes, but in the meantime, the tasking runtime system may execute other tasks.
    32 
    33 On the other hand, non-blocking operations can be bound to a task through two API functions named `TAMPI_Iwait` and `TAMPI_Iwaitall`, which have the same parameters as the standard `MPI_Wait` and `MPI_Waitall`, respectively.
     31The TAMPI library allows calling MPI communication operations from within tasks freely.
     32On the one hand, blocking MPI operations (e.g. `MPI_Recv`) pause the calling task until the operation completes, but in the meantime, the tasking runtime system may use the CPU to execute other ready tasks.
     33
     34On the other hand, non-blocking operations (e.g. `MPI_Irecv`) can be bound to a task through two API functions named `TAMPI_Iwait` and `TAMPI_Iwaitall`, which have the same parameters as the standard `MPI_Wait` and `MPI_Waitall`, respectively.
    3435These two TAMPI functions are non-blocking and asynchronous and can be used to bind the completion of the calling task to the finalization of the MPI requests passed as parameters.
    35 The calling task will not release its dependencies until (1) the task finishes its execution and (2) all bound MPI operations complete.
    36 Its successor tasks will be ready to execute only once its dependencies are released.
     36The calling task will continue its execution immediately without blocking, transferring the management of the MPI requests to the TAMPI library, and will be able to finish its execution freely.
     37However, the calling task will not fully complete until (1) it finishes its execution and (2) all its bound MPI operations complete.
     38Upon the task completion, its data dependencies will be released and its successor tasks will be ready to execute.
     39Usually its successor tasks will be the ones that consume the received data or reuse a sending buffer.
     40Notice that this approach requires tasks to define data dependencies on the communication buffers in order to guarantee a correct execution order between communication tasks and consumer/reuser tasks.
    3741We must highlight that a task can call those TAMPI functions several times, binding multiple MPI requests during its execution.
    38 Moreover, TAMPI offers a wrapper function for each non-blocking MPI operation, which performs the standard non-blocking operation, and then, it automatically calls `TAMPI_Iwait` with the resulting MPI request.
    39 The `TAMPI_Isend` and `TAMPI_Irecv` are examples of these wrappers.
    40 
    41 More information about the TAMPI library and its code is available at [https://github.com/bsc-pm/tampi].
     42
     43More information about the TAMPI library, some examples and its code are available at [https://github.com/bsc-pm/tampi].
    4244
    4345=== Accessing NAM through !ParaStation MPI and TAMPI ===