| 53 | |
| 54 | === Versions === |
| 55 | |
| 56 | The NBody application has several versions which are compiled in different binaries, |
| 57 | by executing the `make` command. All of them divide the particle space into smaller |
| 58 | blocks. MPI processes are divided into two groups: GPU processes and CPU processes. |
| 59 | GPU processes are responsible for computing the forces between each pair of particles |
| 60 | blocks, and then, these forces are sent to the CPU processes, where each process |
| 61 | updates its particles blocks using the received forces. The particles and forces blocks |
| 62 | are equally distributed amongst each MPI process in each group. Thus, each MPI process |
| 63 | is in charge of computing the forces or updating the particles of a consecutive chunk |
| 64 | of blocks. |
| 65 | |
| 66 | The available versions are: |
| 67 | |
| 68 | * '''nbody.mpi.${BS}bs.bin''': Parallel version using MPI. |
| 69 | * '''nbody.mpi.ompss2.${BS}bs.bin''': Parallel version using MPI + !OmpSs-2 tasks. Both computation and |
| 70 | communication phases are taskified, however, communication tasks are serialized by declaring an |
| 71 | artificial dependency on a sentinel variable. This is to prevent deadlocks between processes, |
| 72 | since communication tasks perform blocking MPI calls. |
| 73 | * '''nbody.mpi.ompss2.cuda.${BS}bs.bin''': The same as the previous version but using CUDA tasks to |
| 74 | execute the most compute-instensive parts of the application at the available GPUs. |
| 75 | * '''nbody.tampi.ompss2.${BS}bs.bin''': Parallel version using MPI + !OmpSs-2 tasks + TAMPI library. This |
| 76 | version disables the artificial dependencies on the sentinel variable, so communication tasks can |
| 77 | run in parallel. The TAMPI library is in charge of managing the blocking MPI calls to avoid the |
| 78 | blocking of the underlying execution resources. |
| 79 | * '''nbody.tampi.ompss2.cuda.${BS}bs.bin''': The same as the previous version but using CUDA tasks to |
| 80 | execute the most compute-instensive parts of the application at the available GPUs. |
| 81 | * '''nbody.mpi.omp.${BS}bs.bin''': |
| 82 | * '''nbody.mpi.omptarget.${BS}bs.bin''': |
| 83 | * '''nbody.tampi.omp.${BS}bs.bin''': |
| 84 | * '''nbody.tampi.omptarget.${BS}bs.bin''': |