Changes between Version 20 and Version 21 of Public/User_Guide/TAMPI_NAM
- Timestamp:
- Apr 21, 2021, 6:44:41 PM (4 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Public/User_Guide/TAMPI_NAM
v20 v21 41 41 More information about the TAMPI library and its code is available at [https://github.com/bsc-pm/tampi]. 42 42 43 === Accessing NAM through ParaStationMPI and TAMPI ===43 === Accessing NAM through !ParaStation MPI and TAMPI === 44 44 45 45 The main idea is to allow tasks to access data stored in NAM regions efficiently and potentially in parallel, e.g. using several tasks to put/get data to/from the NAM. … … 88 88 After all `MPI_Put` tasks have executed, the last task can run and close the window epoch. 89 89 90 The extended version of !ParaStation MPI supporting the new `MPI_Win_ifence` can be found in the `ifence-nam` branch at the [https://gitlab.version.fz-juelich.de/DEEP-EST/psmpi] repository. 91 The Task-Aware MPI library did not need any extension to support the aforementioned mode of operation; the lastest stable release should be compatible. 92 90 93 == Heat Benchmark == 91 94 … … 104 107 === Using NAM in Heat benchmark === 105 108 106 In this benchmark, we use the NAM memory to save the computed matrix periodically. The idea is to save different states (snapshots) of the matrix during the execution in a persistent NAM memory region. Then, another program could retrieve all the matrix snapshots, process them and produce a GIF animation showing the heat's evolution throughout the execution. Notice that we cannot use regular RAM for that purpose because the matrix could be huge, and we may want to store tens of matrix snapshots. We also want to keep it persistently so that other programs could process the stored data. Moreover, the memory should be easily accessible by the multiple MPI ranks or their tasks in parallel. The NAM memory satisfies all these requirements, and as previously stated, ParaStationMPI allows accessing NAM allocations through standard MPI RMA operations. We only implement the NAM snapshots in the TAMPI variants `02.heat_itampi_ompss2_tasks.bin` and `03.heat_tampirma_ompss2_tasks.bin`.109 In this benchmark, we use the NAM memory to save the computed matrix periodically. The idea is to save different states (snapshots) of the matrix during the execution in a persistent NAM memory region. Then, another program could retrieve all the matrix snapshots, process them and produce a GIF animation showing the heat's evolution throughout the execution. Notice that we cannot use regular RAM for that purpose because the matrix could be huge, and we may want to store tens of matrix snapshots. We also want to keep it persistently so that other programs could process the stored data. Moreover, the memory should be easily accessible by the multiple MPI ranks or their tasks in parallel. The NAM memory satisfies all these requirements, and as previously stated, !ParaStation MPI allows accessing NAM allocations through standard MPI RMA operations. We only implement the NAM snapshots in the TAMPI variants `02.heat_itampi_ompss2_tasks.bin` and `03.heat_tampirma_ompss2_tasks.bin`. 107 110 108 111 The Heat benchmark allocates a single MPI window that holds the whole NAM region, which is used by all ranks (via the `MPI_COMM_WORLD` communicator) and throughout the execution. Every few timesteps (specified by the user), it saves the whole matrix into a specific NAM subregion. Each timestep that saves a matrix snapshot employs a distinct NAM subregion. These subregions are placed one after the other, consecutively, without overlapping. Thus, the entire NAM region's size is the full matrix size multiplied by the number of times the matrix will be saved. Even so, we allocate the NAM memory region using the Managed Contiguous layout (`psnam_structure_managed_contiguous`). This means that rank 0 allocates the whole region, but each rank acquires a consecutive memory subset, where it will store its blocks' data for every snapshot. For instance, the NAM allocation will first have the space for storing all snapshots of the blocks from rank 0, followed by the space for all snapshots of blocks from rank 1, and so on. By using that layout, NAM subregions are rank-addressed using the rank it belongs to, simplifying the saving and retrieving of snapshots. … … 177 180 * The '''GNU''' or '''Intel®''' Compiler Collection. 178 181 179 * The ''' ParaStationMPI''' installation supporting '''multi-threading''', featuring the '''libNAM''' integration that allows access to NAM memory regions through MPI RMA windows, and supporting the new request-based `MPI_Win_ifence` function.182 * The '''!ParaStation MPI''' installation supporting '''multi-threading''', featuring the '''libNAM''' integration that allows access to NAM memory regions through MPI RMA windows, and supporting the new request-based `MPI_Win_ifence` function. The branch `ifence-nam` in the [https://gitlab.version.fz-juelich.de/DEEP-EST/psmpi] repository contains a !ParaStation MPI supporting these features. 180 183 181 184 * The '''Task-Aware MPI (TAMPI)''' library which defines a clean '''interoperability''' mechanism for MPI and OpenMP/!OmpSs-2 tasks. It supports both blocking and non-blocking MPI operations by providing two different interoperability mechanisms. Downloads and more information at [https://github.com/bsc-pm/tampi]. … … 194 197 * [https://github.com/bsc-pm] 195 198 * [https://github.com/bsc-pm/tampi] 199 * [https://gitlab.version.fz-juelich.de/DEEP-EST/psmpi] 196 200 * [https://en.wikipedia.org/wiki/Gauss-Seidel_method] 197 201 * [https://gitlab.version.fz-juelich.de/DEEP-EST/ompss-2-benchmarks]