Changes between Version 20 and Version 21 of Public/User_Guide/TAMPI_NAM


Ignore:
Timestamp:
Apr 21, 2021, 6:44:41 PM (3 years ago)
Author:
Kevin Sala
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/TAMPI_NAM

    v20 v21  
    4141More information about the TAMPI library and its code is available at [https://github.com/bsc-pm/tampi].
    4242
    43 === Accessing NAM through ParaStationMPI and TAMPI ===
     43=== Accessing NAM through !ParaStation MPI and TAMPI ===
    4444
    4545The main idea is to allow tasks to access data stored in NAM regions efficiently and potentially in parallel, e.g. using several tasks to put/get data to/from the NAM.
     
    8888After all `MPI_Put` tasks have executed, the last task can run and close the window epoch.
    8989
     90The extended version of !ParaStation MPI supporting the new `MPI_Win_ifence` can be found in the `ifence-nam` branch at the [https://gitlab.version.fz-juelich.de/DEEP-EST/psmpi] repository.
     91The Task-Aware MPI library did not need any extension to support the aforementioned mode of operation; the lastest stable release should be compatible.
     92
    9093== Heat Benchmark ==
    9194
     
    104107=== Using NAM in Heat benchmark ===
    105108
    106 In this benchmark, we use the NAM memory to save the computed matrix periodically. The idea is to save different states (snapshots) of the matrix during the execution in a persistent NAM memory region. Then, another program could retrieve all the matrix snapshots, process them and produce a GIF animation showing the heat's evolution throughout the execution. Notice that we cannot use regular RAM for that purpose because the matrix could be huge, and we may want to store tens of matrix snapshots. We also want to keep it persistently so that other programs could process the stored data. Moreover, the memory should be easily accessible by the multiple MPI ranks or their tasks in parallel. The NAM memory satisfies all these requirements, and as previously stated, ParaStationMPI allows accessing NAM allocations through standard MPI RMA operations. We only implement the NAM snapshots in the TAMPI variants `02.heat_itampi_ompss2_tasks.bin` and `03.heat_tampirma_ompss2_tasks.bin`.
     109In this benchmark, we use the NAM memory to save the computed matrix periodically. The idea is to save different states (snapshots) of the matrix during the execution in a persistent NAM memory region. Then, another program could retrieve all the matrix snapshots, process them and produce a GIF animation showing the heat's evolution throughout the execution. Notice that we cannot use regular RAM for that purpose because the matrix could be huge, and we may want to store tens of matrix snapshots. We also want to keep it persistently so that other programs could process the stored data. Moreover, the memory should be easily accessible by the multiple MPI ranks or their tasks in parallel. The NAM memory satisfies all these requirements, and as previously stated, !ParaStation MPI allows accessing NAM allocations through standard MPI RMA operations. We only implement the NAM snapshots in the TAMPI variants `02.heat_itampi_ompss2_tasks.bin` and `03.heat_tampirma_ompss2_tasks.bin`.
    107110
    108111The Heat benchmark allocates a single MPI window that holds the whole NAM region, which is used by all ranks (via the `MPI_COMM_WORLD` communicator) and throughout the execution. Every few timesteps (specified by the user), it saves the whole matrix into a specific NAM subregion. Each timestep that saves a matrix snapshot employs a distinct NAM subregion. These subregions are placed one after the other, consecutively, without overlapping. Thus, the entire NAM region's size is the full matrix size multiplied by the number of times the matrix will be saved. Even so, we allocate the NAM memory region using the Managed Contiguous layout (`psnam_structure_managed_contiguous`). This means that rank 0 allocates the whole region, but each rank acquires a consecutive memory subset, where it will store its blocks' data for every snapshot. For instance, the NAM allocation will first have the space for storing all snapshots of the blocks from rank 0, followed by the space for all snapshots of blocks from rank 1, and so on. By using that layout, NAM subregions are rank-addressed using the rank it belongs to, simplifying the saving and retrieving of snapshots.
     
    177180  * The '''GNU''' or '''Intel®''' Compiler Collection.
    178181
    179   * The '''ParaStationMPI''' installation supporting '''multi-threading''', featuring the '''libNAM''' integration that allows access to NAM memory regions through MPI RMA windows, and supporting the new request-based `MPI_Win_ifence` function.
     182  * The '''!ParaStation MPI''' installation supporting '''multi-threading''', featuring the '''libNAM''' integration that allows access to NAM memory regions through MPI RMA windows, and supporting the new request-based `MPI_Win_ifence` function. The branch `ifence-nam` in the [https://gitlab.version.fz-juelich.de/DEEP-EST/psmpi] repository contains a !ParaStation MPI supporting these features.
    180183
    181184  * The '''Task-Aware MPI (TAMPI)''' library which defines a clean '''interoperability''' mechanism for MPI and OpenMP/!OmpSs-2 tasks. It supports both blocking and non-blocking MPI operations by providing two different interoperability mechanisms. Downloads and more information at  [https://github.com/bsc-pm/tampi].
     
    194197* [https://github.com/bsc-pm]
    195198* [https://github.com/bsc-pm/tampi]
     199* [https://gitlab.version.fz-juelich.de/DEEP-EST/psmpi]
    196200* [https://en.wikipedia.org/wiki/Gauss-Seidel_method]
    197201* [https://gitlab.version.fz-juelich.de/DEEP-EST/ompss-2-benchmarks]