Changes between Version 1 and Version 2 of Public/User_Guide/TAMPI


Ignore:
Timestamp:
Jun 17, 2019, 2:49:44 PM (5 years ago)
Author:
Pedro Martinez-Ferror
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/TAMPI

    v1 v2  
    1 HOLA
     1[[Image(OmpSs2_logo_full.png, 30%)]]
     2
     3= Usage of TAMPI =
     4
     5Table of contents:
     6* [#QuickOverview Quick Overview]
     7* [#QuickSetuponDEEPSystemforaHybridMPIOmpSs-2Application Quick Setup on DEEP System for a Hybrid MPI+OmpSs-2 Application]
     8* [#UsingtheRepositories Using the Repositories]
     9* Examples:
     10  * [#AStep-By-StepDetailedGuidetoExecutetheMultisaxpyBenchmarkOmpSs-2 A Step-By-Step Detailed Guide to Execute the Multisaxpy Benchmark (OmpSs-2)]
     11  * [#Dot-productBenchmarkOmpSs-2 Dot-product Benchmark (OmpSs-2)]
     12  * [#MergesortBenchmarkOmpSs-2 Mergesort Benchmark (OmpSs-2)]
     13  * [#NqueensBenchmarkOmpSs-2 Nqueens Benchmark (OmpSs-2)]
     14  * [#MatmulBenchmarkOmpSs-2 Matmul Benchmark (OmpSs-2)]
     15  * [#CholeskyBenchmarkOmpSs-2MKL Cholesky Nenchmark (OmpSs-2+MKL)]
     16  * [#NbodyBenchmarkMPI+OmpSs-2TAMPI Nbody Nenchmark (MPI+OmpSs-2+TAMPI)]
     17  * [#HeatBenchmarkMPI+OmpSs-2TAMPI Heat Benchmark (MPI+OmpSs-2+TAMPI)]
     18
     19----
     20
     21= Quick Overview =
     22
     23The **Task-Aware MPI** or TAMPI library ensures a **deadlock-free** execution of hybrid applications by implementing a cooperation mechanism between the MPI library and the parallel task-based runtime system.
     24
     25TAMPI extends the functionality of standard MPI libraries by providing new mechanisms for improving the interoperability between parallel task-based programming models, such as **OpenMP** or **!OmpSs-2**, and both **blocking** and **non-blocking** MPI operations.
     26
     27Presently OpenMP programs (based on a derivative version of the LLVM OpenMP, yet to be released) can only make use of the non-blocking mode of TAMPI, whereas !OmpSs-2 programs can leverage both blocking and non-blocking modes.
     28
     29TAMPI is compatible with mainstream MPI implementations that support the **MPI_THREAD_MULTIPLE** threading level, which is the minimum requirement to provide its task-aware features.
     30
     31**Additional information** about the TAMPI can be found at:
     32* !OmpSs-2 repository. [https://github.com/bsc-pm/tampi]
     33
     34----
     35
     36= Quick Setup on DEEP System for a Hybrid MPI+!OmpSs-2 Application =
     37
     38We highly recommend to interactively log in a **cluster module (CM) node** to begin using TAMPI.
     39
     40**Presently, it seems that system affinity is not correctly setup for hybrid applications using multi-threading, therefore multi-threading will be ignored from now on.**
     41
     42A truly hybrid application should simply execute two MPI ranks, one on each NUMA socket to mitigate suboptimal memory accesses. Such an application will then use all the cores/threads available on each NUMA socket to run a shared-memory parallel application.
     43
     44The command below requests an entire CM node for an interactive session with 2 MPI ranks (one per NUMA socket) and each rank using the 12 **physical** cores available on each socket (multi-threading disabled):
     45
     46`srun -p dp-cn -N 1 -n 2 -c 12 --pty /bin/bash -i`
     47
     48Once you have entered a CM node you can check the system affinity via the **NUMA command** `srun numactl --show`:
     49{{{
     50policy: bind
     51preferred node: 1
     52physcpubind: 12 13 14 15 16 17 18 19 20 21 22 23
     53cpubind: 1
     54nodebind: 1
     55membind: 1
     56policy: bind
     57preferred node: 0
     58physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11
     59cpubind: 0
     60nodebind: 0
     61membind: 0
     62}}}
     63
     64It can be readily seen that the each MPI process is bind to a different socket with no interleaving of processes or memory thus yielding optimal performance.
     65
     66TAMPI has already been installed on DEEP and can be used by simply executing the following commands:
     67
     68`modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/Core:$modulepath"`
     69
     70`modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/Compiler/mpi/intel/2019.0.117-GCC-7.3.0:$modulepath"`
     71
     72`modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/MPI/intel/2019.0.117-GCC-7.3.0/psmpi/5.2.1-1-mt:$modulepath"`
     73
     74`export MODULEPATH="$modulepath:$MODULEPATH"`
     75
     76`module load TAMPI`
     77
     78Note that loading the TAMPI module will automatically load the **!OmpSs-2** and **Parastation MPI** modules (notice that this MPI library has been compiled with multi-threading support enabled).
     79
     80You might want to request more MPI ranks per socket depending on your particular application.  See the examples below and the system affinity report (note that all of them ignore multi-threading):
     81
     82`srun -p dp-cn -N 1 -n 4 -c 6 --pty /bin/bash -i`
     83
     84{{{
     85$ srun numactl --show
     86policy: bind
     87preferred node: 0
     88physcpubind: 6 7 8 9 10 11
     89cpubind: 0
     90nodebind: 0
     91membind: 0
     92policy: bind
     93preferred node: 0
     94physcpubind: 0 1 2 3 4 5
     95cpubind: 0
     96nodebind: 0
     97membind: 0
     98policy: bind
     99preferred node: 1
     100physcpubind: 18 19 20 21 22 23
     101cpubind: 1
     102nodebind: 1
     103membind: 1
     104policy: bind
     105preferred node: 1
     106physcpubind: 12 13 14 15 16 17
     107cpubind: 1
     108nodebind: 1
     109membind: 1
     110}}}
     111
     112`srun -p dp-cn -N 1 -n 12 -c 2 --pty /bin/bash -i`
     113
     114{{{
     115$ srun numactl --show
     116policy: bind
     117preferred node: 0
     118physcpubind: 0 1
     119cpubind: 0
     120nodebind: 0
     121membind: 0
     122policy: bind
     123preferred node: 0
     124physcpubind: 4 5
     125cpubind: 0
     126nodebind: 0
     127membind: 0
     128policy: bind
     129preferred node: 0
     130physcpubind: 2 3
     131cpubind: 0
     132nodebind: 0
     133membind: 0
     134policy: bind
     135preferred node: 0
     136physcpubind: 8 9
     137cpubind: 0
     138nodebind: 0
     139membind: 0
     140policy: bind
     141preferred node: 0
     142physcpubind: 6 7
     143cpubind: 0
     144nodebind: 0
     145membind: 0
     146policy: bind
     147preferred node: 0
     148physcpubind: 10 11
     149cpubind: 0
     150nodebind: 0
     151membind: 0
     152policy: bind
     153preferred node: 1
     154physcpubind: 14 15
     155cpubind: 1
     156nodebind: 1
     157membind: 1
     158policy: bind
     159preferred node: 1
     160physcpubind: 16 17
     161cpubind: 1
     162nodebind: 1
     163membind: 1
     164policy: bind
     165preferred node: 1
     166physcpubind: 20 21
     167cpubind: 1
     168nodebind: 1
     169membind: 1
     170policy: bind
     171preferred node: 1
     172physcpubind: 18 19
     173cpubind: 1
     174nodebind: 1
     175membind: 1
     176policy: bind
     177preferred node: 1
     178physcpubind: 22 23
     179cpubind: 1
     180nodebind: 1
     181membind: 1
     182policy: bind
     183preferred node: 1
     184physcpubind: 12 13
     185cpubind: 1
     186nodebind: 1
     187membind: 1
     188}}}
     189
     190
     191
     192
     193----