Changes between Version 7 and Version 8 of Public/User_Guide/TAMPI_NAM
- Timestamp:
- Mar 3, 2021, 7:38:08 PM (4 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Public/User_Guide/TAMPI_NAM
v7 v8 31 31 When there is a timestep that requires a snapshot, the application instantiates multiple tasks that save the matrix data into the corresponding NAM subregion. Each MPI rank creates a task for saving the data of each matrix block into the NAM subregion. These communication tasks do not have any data dependency between them, so they can run in parallel writing data to the NAM region using regular `MPI_Put`. Ranks only write to the subregions that belong to themselves, never in other ranks' subregions. Even so, all `MPI_Put` calls must be done inside an RMA access epoch, so there must be one fence call before all the `MPI_Put` calls and another one after them to close the epoch for each of the timesteps with snapshot. Thus, here is where we use the new function `MPI_Win_ifence` together with the TAMPI non-blocking support. In this way, we taskify both synchronization and writing of NAM regions, keeping the data-flow model, and without having to stop the parallelism (e.g., with a `taskwait`) to perform the snapshots. Thanks to the task data dependencies and TAMPI, we cleanly include the snapshots in the application's data-flow execution as any other regular task. 32 32 33 The following pseudo-code shows how the saving of snapshots work in `02.heat_itampi_ompss2_tasks.bin`: 34 35 {{{#!c 36 void solve() { 37 int namSnapshotFreq = ...; 38 int namSnapshotId = 0; 39 40 for (t = 1; t <= timesteps; ++t) { 41 // Computation and communication tasks declaring 42 // dependencies on the blocks they process 43 gaussSeidelSolver(...); 44 45 if (t % namSnapshotFreq == 0) { 46 namSaveMatrix(namSnapshotId, namWindow, ...); 47 ++namSnapshotId; 48 } 49 } 50 #pragma oss taskwait 51 } 52 }}} 53 54 {{{#!c 55 void namSaveMatrix(int namSnapshotId, MPI_Win namWindow, ...) { 56 // Compute destination offset in NAM region 57 int snapshotOffset = namSnapshotId*sizeof(..all blocks..); 58 59 // Open RMA access epoch to write the NAM window for this timestep 60 #pragma oss task in(..all blocks..) inout(namWindow) 61 { 62 MPI_Request request; 63 MPI_Win_ifence(namWindow, 0, &request); 64 TAMPI_Iwait(&request, MPI_STATUS_IGNORE); 65 } 66 67 // Write all blocks from the current rank to NAM subregions concurrently 68 for (B : all blocks) { 69 #pragma oss task in(..block B..) in(namWindow) 70 { 71 MPI_Put(/* origin */ ..block B.., 72 /* target rank */ currentRank, 73 /* target offset */ snapshotOffset + B, 74 /* target window */ namWindow); 75 } 76 } 77 78 // Close RMA access epoch to write the NAM window for this timestep 79 #pragma oss task in(..all blocks..) inout(namWindow) 80 { 81 MPI_Request request; 82 MPI_Win_ifence(namWindow, 0, &request); 83 TAMPI_Iwait(&request, MPI_STATUS_IGNORE); 84 } 85 } 86 }}} 87 88 33 89 === Requirements === 34 90 The requirements of this application are shown in the following lists. The main requirements are: