Changes between Version 4 and Version 5 of Public/User_Guide/Offloading_hybrid_apps


Ignore:
Timestamp:
Sep 17, 2019, 12:59:05 PM (5 years ago)
Author:
Kevin Sala
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/Offloading_hybrid_apps

    v4 v5  
    5151  * '''Task-Aware MPI (TAMPI)''': The Task-Aware MPI library provides the interoperability mechanism
    5252    for MPI and OpenMP/!OmpSs-2. Downloads and more information at https://github.com/bsc-pm/tampi.
     53
     54=== Versions ===
     55
     56The NBody application has several versions which are compiled in different binaries,
     57by executing the `make` command. All of them divide the particle space into smaller
     58blocks. MPI processes are divided into two groups: GPU processes and CPU processes.
     59GPU processes are responsible for computing the forces between each pair of particles
     60blocks, and then, these forces are sent to the CPU processes, where each process
     61updates its particles blocks using the received forces. The particles and forces blocks
     62are equally distributed amongst each MPI process in each group. Thus, each MPI process
     63is in charge of computing the forces or updating the particles of a consecutive chunk
     64of blocks.
     65
     66The available versions are:
     67
     68  * '''nbody.mpi.${BS}bs.bin''': Parallel version using MPI.
     69  * '''nbody.mpi.ompss2.${BS}bs.bin''': Parallel version using MPI + !OmpSs-2 tasks. Both computation and
     70    communication phases are taskified, however, communication tasks are serialized by declaring an
     71    artificial dependency on a sentinel variable. This is to prevent deadlocks between processes,
     72    since communication tasks perform blocking MPI calls.
     73  * '''nbody.mpi.ompss2.cuda.${BS}bs.bin''': The same as the previous version but using CUDA tasks to
     74    execute the most compute-instensive parts of the application at the available GPUs.
     75  * '''nbody.tampi.ompss2.${BS}bs.bin''': Parallel version using MPI + !OmpSs-2 tasks + TAMPI library. This
     76    version disables the artificial dependencies on the sentinel variable, so communication tasks can
     77    run in parallel. The TAMPI library is in charge of managing the blocking MPI calls to avoid the
     78    blocking of the underlying execution resources.
     79  * '''nbody.tampi.ompss2.cuda.${BS}bs.bin''': The same as the previous version but using CUDA tasks to
     80    execute the most compute-instensive parts of the application at the available GPUs.
     81  * '''nbody.mpi.omp.${BS}bs.bin''':
     82  * '''nbody.mpi.omptarget.${BS}bs.bin''':
     83  * '''nbody.tampi.omp.${BS}bs.bin''':
     84  * '''nbody.tampi.omptarget.${BS}bs.bin''':