Context Navigation

TAMPI_NAM

Timestamp:: Mar 4, 2021, 11:13:31 AM (4 years ago)
Author:: Kevin Sala
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

Public/User_Guide/TAMPI_NAM

-                      v13
+                      v14
 === Task-Aware MPI (TAMPI) ===
+=== Extending ParaStationMPI to provide MPI_Win_ifence ===
+=== Using TAMPI and ParaStationMPI to access NAM memory ===
+=== Accessing NAM through ParaStationMPI and TAMPI ===
 == Heat Benchmark ==
 …
 === Using NAM in Heat benchmark ===
 In this benchmark, we use the NAM memory to save the computed matrix periodically. The idea is to save different states (snapshots) of the matrix during the execution in a persistent NAM memory region. Then, another program could retrieve all the matrix snapshots, process them and produce a GIF animation showing the heat's evolution throughout the execution. Notice that we cannot use regular RAM for that purpose because the matrix could be huge, and we may want to store tens of matrix snapshots. We also want to keep it persistently so that other programs could process the stored data. Moreover, the memory should be easily accessible by the multiple MPI ranks or their tasks in parallel. The NAM memory satisfies all these requirements, and as stated previously, ParaStationMPI allows accessing NAM allocations through standard MPI RMA operations.
+In this benchmark, we use the NAM memory to save the computed matrix periodically. The idea is to save different states (snapshots) of the matrix during the execution in a persistent NAM memory region. Then, another program could retrieve all the matrix snapshots, process them and produce a GIF animation showing the heat's evolution throughout the execution. Notice that we cannot use regular RAM for that purpose because the matrix could be huge, and we may want to store tens of matrix snapshots. We also want to keep it persistently so that other programs could process the stored data. Moreover, the memory should be easily accessible by the multiple MPI ranks or their tasks in parallel. The NAM memory satisfies all these requirements, and as previously stated, ParaStationMPI allows accessing NAM allocations through standard MPI RMA operations. We only implement the NAM snapshots in the TAMPI variants `02.heat_itampi_ompss2_tasks.bin` and `03.heat_tampirma_ompss2_tasks.bin`.
 The Heat benchmark allocates a single MPI window that holds the whole NAM region, which is used by all ranks (via the `MPI_COMM_WORLD` communicator) and throughout the execution. Every few timesteps (specified by the user), it saves the whole matrix into a specific NAM subregion. Each timestep that saves a matrix snapshot employs a distinct NAM subregion. These subregions are placed one after the other, consecutively, without overlapping. Thus, the entire NAM region's size is the full matrix size multiplied by the number of times the matrix will be saved. Even so, we allocate the NAM memory region using the Managed Contiguous layout (`psnam_structure_managed_contiguous`). This means that rank 0 allocates the whole region, but each rank acquires a consecutive memory subset, where it will store its blocks' data for every snapshot. For instance, the NAM allocation will first have the space for storing all snapshots of the blocks from rank 0, followed by the space for all snapshots of blocks from rank 1, and so on. By using that layout, NAM subregions are rank-addressed using the rank it belongs to, simplifying the saving and retrieving of snapshots.
 …
   * The '''GNU''' or '''Intel®''' Compiler Collection.
   * The '''ParaStationMPI''' installation supporting '''multi-threading''' and featuring the '''libNAM''' integration that allows access to NAM memory regions through MPI RMA windows.
+  * The '''ParaStationMPI''' installation supporting '''multi-threading''', featuring the '''libNAM''' integration that allows access to NAM memory regions through MPI RMA windows, and supporting the new request-based `MPI_Win_ifence` function.
   * The '''Task-Aware MPI (TAMPI)''' library which defines a clean '''interoperability''' mechanism for MPI and OpenMP/!OmpSs-2 tasks. It supports both blocking and non-blocking MPI operations by providing two different interoperability mechanisms. Downloads and more information at  [https://github.com/bsc-pm/tampi].