Context Navigation

TAMPI_NAM

Timestamp:: Apr 26, 2021, 4:08:26 PM (4 years ago)
Author:: Kevin Sala
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

Public/User_Guide/TAMPI_NAM

-                      v22
+                      v23
 === Accessing NAM through !ParaStation MPI and TAMPI ===
 The main idea is to allow tasks to access data stored in NAM regions efficiently and potentially in parallel, e.g. using several tasks to put/get data to/from the NAM.
+Our main objective is to allow tasks to access data stored in NAM regions efficiently and potentially in parallel, e.g. using several tasks to put/get data to/from the NAM.
 The mechanism to access NAM regions is provided by the !ParaStation MPI and is based on the MPI RMA model.
 !ParaStation MPI allows allocating NAM regions as MPI RMA windows, so that they can be accessed remotely by the ranks that participate in those windows using the standard `MPI_Put` and `MPI_Get` RMA operations.
 This support is based on the fence synchronization mode of the MPI RMA model.
+This mode works as follows: (1) all ranks participating in a given window have to open an access epoch on that window calling `MPI_Win_fence`, (2) they perform the desired RMA operations on the NAM window, and (3) they close the epoch with another call to `MPI_Win_fence`.
+This mode of operation can be integrated into a hybrid task-based application by instantiating a task to open the epoch on the NAM window with an MPI fence, followed by multiple concurrent tasks that write or read data to/from the NAM window, and finally, another task closing the access epoch with another fence.
+Notice that tasks can define the corresponding dependencies on the window to ensure this order of execution.
+In order to support this taskification, the fences should be managed by the TAMPI library to perform it efficiently and safely.
+We have extended the !ParaStation MPI to provide a new non-blocking function called MPI_Win_ifence that performs that starts a fence operation on a specific window and generates an MPI request to check its completion later.
+This mode works as follows: (1) all ranks participating in a given window have to open an access epoch on that window calling `MPI_Win_fence`, (2) perform the desired RMA operations on the NAM window reading or writing NAM's data, and (3) close the epoch with another call to `MPI_Win_fence`.
+This mode of operation can be easily integrated into a hybrid task-based application by instantiating a task that opens the epoch on the NAM window with an MPI fence, followed by multiple concurrent tasks that write or read data to/from the NAM window, and finally, another task that closes the access epoch with a fence.
+Notice that tasks should define the corresponding dependencies on the window to ensure this strict order of execution.
+However, since the MPI fence operations are blocking and synchronous, calling them from tasks is not safe.
+Having multiple windows and opening/closing epochs on them concurrently from different tasks could end up producing a communication deadlock.
+Thus, the taskification of the window fences and RMA operations should be managed by the TAMPI library so that we can execute them efficiently and safely, and potentially in parallel among different windows.
+To that end, we have extended the !ParaStation MPI to provide a new non-blocking function called `MPI_Win_ifence` that starts a fence operation on a specific window and generates an MPI request to check its completion later on.
 This MPI request can be then naturally handled by the TAMPI library using the `TAMPI_Iwait` function, as shown below.
 …
 In this example, the first task opens an access epoch on the NAM window using the new `MPI_Win_ifence`.
 This function starts a fence operation, generates an MPI request and returns immediately.
 This request is then handled by the TAMPI library through the `TAMPI_Iwait`, which bind the completion of the task to the finalization of the fence operation.
+This request is then handled by the TAMPI library through the `TAMPI_Iwait`, which binds the completion of the calling task to the finalization of the fence operation.
 The task can continue executing immediately without blocking and finalize its execution.
 Once the fence operation finalizes, the task will complete automatically and its successor task will become ready.
 The successors are the tasks that perform `MPI_Put` operations on the window in parallel.
 Notice that the dependencies of these tasks allow them to execute all `MPI_Put` in parallel, always after the window epoch has been opened.
 After all `MPI_Put` tasks have executed, the last task can run and close the window epoch.
+The successors are the tasks that perform `MPI_Put` operations on the NAM window concurrently.
+Notice that the dependencies of these tasks allow them to execute all `MPI_Put` in parallel, but always after the window epoch has been fully opened.
+After all `MPI_Put` tasks have executed, the last task can run and close the access epoch on the NAM window.
 The extended version of !ParaStation MPI supporting the new `MPI_Win_ifence` can be found in the `ifence-nam` branch at the [https://gitlab.version.fz-juelich.de/DEEP-EST/psmpi] repository.