Context Navigation

TAMPI_NAM

Timestamp:: Apr 21, 2021, 5:29:43 PM (3 years ago)
Author:: Kevin Sala
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

Public/User_Guide/TAMPI_NAM

-                      v18
+                      v19
 In the following sections, we describe the TAMPI library and how hybrid task-based applications should access NAM memory regions.
 Finally, we show the Heat equation benchmark as an example of using this support to save periodic snapshots of the computed matrix in a NAM allocation.
 === Task-Aware MPI (TAMPI) ===
 …
 === Accessing NAM through ParaStationMPI and TAMPI ===
+The main idea is to allow tasks to access data stored in NAM regions efficiently and potentially in parallel, e.g. using several tasks to put/get data to/from the NAM.
+The mechanism to access NAM regions is provided by the !ParaStation MPI and is based on the MPI RMA model.
+!ParaStation MPI allows allocating NAM regions as MPI RMA windows, so that they can be accessed remotely by the ranks that participate in those windows using the standard `MPI_Put` and `MPI_Get` RMA operations.
+This support is based on the fence synchronization mode of the MPI RMA model.
+This mode works as follows: (1) all ranks participating in a given window have to open an access epoch on that window calling `MPI_Win_fence`, (2) they perform the desired RMA operations on the NAM window, and (3) they close the epoch with another call to `MPI_Win_fence`.
+This mode of operation can be integrated into a hybrid task-based application by instantiating a task to open the epoch on the NAM window with `MPI_Win_fence`, followed by multiple concurrent tasks that write or read data to/from the NAM window, and finally, another task to close the access epoch.
+Notice that tasks can define the corresponding dependencies on the window to ensure this order of execution, as in the following example:
+{{{#!c
+// Open RMA access epoch to write the NAM window
+#pragma oss task inout(namWindow)
+{
+    MPI_Request request;
+    MPI_Win_ifence(0, namWindow, &request);
+    TAMPI_Iwait(&request, MPI_STATUS_IGNORE);
+}
+// Write to NAM region concurrently
+for (...) {
+    #pragma oss task concurrent(namWindow)
+    MPI_Put(..., namWindow);
+}
+// Close RMA access epoch to write the NAM window
+#pragma oss task inout(namWindow)
+{
+    MPI_Request request;
+    MPI_Win_ifence(0, namWindow, &request);
+    TAMPI_Iwait(&request, MPI_STATUS_IGNORE);
+}
+}}}
 == Heat Benchmark ==
 …
+    {
         MPI_Request request;
         MPI_Win_ifence(namWindow, 0, &request);
+        MPI_Win_ifence(0, namWindow, &request);
         TAMPI_Iwait(&request, MPI_STATUS_IGNORE);
+    }
 …
     // Write all blocks from the current rank to NAM subregions concurrently
     for (B : all blocks in current rank) {
         #pragma oss task in(..block B..) in(namWindow)
+        #pragma oss task in(..block B..) concurrent(namWindow)
+        {
             MPI_Put(/* source data */   ..block B..,
 …
+    {
         MPI_Request request;
         MPI_Win_ifence(namWindow, 0, &request);
+        MPI_Win_ifence(0, namWindow, &request);
         TAMPI_Iwait(&request, MPI_STATUS_IGNORE);
+    }