WikiPrint - from Polar Technologies

Integrating TAMPI and ParaStationMPI NAM windows

Table of contents:

Quick Overview

Heat Benchmark

In this section, we exemplify the use of TAMPI and NAM windows through the Heat benchmark. We use an iterative Gauss-Seidel method to solve the Heat equation, which is a parabolic partial differential equation that describes the distribution of heat in a given region over time. This benchmark simulates the heat diffusion on a 2D matrix of floating-point elements during multiple timesteps. The 2D matrix is logically divided into 2D blocks and may have multiple rows and columns of blocks. The computation of an element at position M[r][c] in the timestep t depends on the value of the top and left elements (M[r-1][c] and M[r][c-1]) computed in the current timestep t, and the right and bottom elements (M[r][c+1] and M[r+1][c]) from the previous timestep t-1. We can extrapolate this logic in the context of blocks so that a block has a dependency on the computation of its adjacent blocks. Notice that the computation of blocks in a diagonal is fully concurrent because there is no dependency between them.

There are three different MPI versions, and all of them distribute the 2D matrix across ranks assigning consecutive rows of blocks to each MPI rank. Note that the matrix is distributed by blocks vertically but not horizontally. Therefore, an MPI rank has two neighboring ranks: one above and another below. The exceptions are the first and last ranks since they have a single neighbor. This distribution requires the neighboring ranks to exchange the external rows (halos) from their boundary blocks in order to compute their local blocks in each timestep.

This benchmark is publicly available in the ?https://pm.bsc.es/gitlab/DEEP-EST/apps/Heat repository. The first version is based on an MPI-only parallelization, while the other two are hybrid MPI+OmpSs-2 leveraging tasks and the TAMPI library. We briefly describe each one below:

Requirements

The requirements of this application are shown in the following lists. The main requirements are:

In this benchmark, we use the NAM memory to perform checkpointing of the matrix that we are computing. During the execution of the application and every a few timesteps, the benchmark instantiate multiple tasks that save the whole matrix into a NAM region. There is a task for saving the data of each block into that region and may run in parallel.

Building & Executing on DEEP

The instructions to build and execute the Heat benchmark with NAM checkpointing will appear here soon.

References