Context Navigation

TAMPI_NAM

Timestamp:: Mar 3, 2021, 9:47:43 PM (3 years ago)
Author:: Kevin Sala
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

Public/User_Guide/TAMPI_NAM

-                      v10
+                      v11
 }}}
 The function above is the one called periodically from the main function. This function instantiates the tasks that will perform the snapshot of the current rank's blocks into the corresponding NAM memory subregion. The first step is to compute the offset of the current snapshot inside the NAM region using the snapshot identifier. Before writing to the NAM window, the application must ensure that a MPI RMA access epoch has been opened in that window. That is what the first task is doing. After all blocks are ready to be read (see the task dependencies), the task can run and execute an `MPI_Win_ifence` to start the opening of the epoch generating an MPI request, and after that, the task binds its completion to the finalization of the request by calling `TAMPI_Iwait`. This last call is non-blocking and asynchronous, so the fence operation may not be completed after returning. The task can finish its execution but it will not complete until the fence operation finishes. Once it finishes, TAMPI will automatically complete the task and make the successor tasks ready. The successor tasks of the fence task are the ones that perform the actual writing of data to the NAM memory calling `MPI_Put`. All blocks can be saved in the NAM memory in parallel through different tasks. The source of the `MPI_Put` is the block itself (in regular RAM memory) and the destination is the place where the block must be written in the NAM memory. After all writer tasks have finished, the task responsible for closing the MPI RMA access epoch in the NAM window will be able to start. This one will behave similarly to the opening task.
+The function above is the one called periodically from the primary procedure. It instantiates the tasks that will perform the snapshot of the current rank's blocks into their corresponding NAM memory subregions. The first step is to compute the offset of the current snapshot inside the NAM region using the snapshot identifier. Before writing to the NAM window, the application must ensure that an MPI RMA access epoch has been opened on that window. That is what the first task is doing. After all the blocks have been computed in that timestep and are ready to be read (notice its task dependencies), the first task will run and execute an `MPI_Win_ifence` to start the window epoch's opening. This MPI function generates an MPI request and serves as parameter of the subsequent call to `TAMPI_Iwait`, which binds the current task's completion to the finalization of the MPI request. This last call is non-blocking and asynchronous, so the fence operation may not be completed after returning. The task can finish its execution, but it will not complete until the fence operation finishes. Once it finishes, TAMPI will automatically complete the task and make its successor tasks ready. The successors of the fence task are the ones that perform the actual writing (copying) of data into the NAM memory by calling `MPI_Put`. All blocks can be saved in the NAM memory in parallel by different tasks. The source of the `MPI_Put` is the block itself (in regular RAM), while the destination is the place where the block should be written inside the NAM region. After all writer tasks finish, it is the turn for the task that closes the MPI RMA access epoch on the NAM window. This one should behave similarly to the one that opened the epoch.
 Notice that all tasks declare the proper dependencies on the matrix blocks and the NAM window to guarantee the correct order of execution. Thanks to these data dependencies and the TAMPI non-blocking feature, we can cleanly add the execution of the snapshots into the task graph, to be executed asynchronously, and being naturally interleaved with the other computation and communication tasks. Finally, it is worth noting that the writing of blocks to the NAM memory is done in parallel, trying to efficiently utilize the CPU and network resources of the machine.
+Notice that all tasks declare the proper dependencies on both the matrix blocks and the NAM window to guarantee their correct execution order. Thanks to these data dependencies and the TAMPI non-blocking feature, we can cleanly add the execution of the snapshots into the task graph, being executed asynchronously, and being naturally interleaved with the other computation and communication tasks. Finally, it is worth noting that the blocks are written into the NAM memory in parallel, utilizing the machine's CPU and network resources efficiently.