Changes between Version 31 and Version 32 of Public/User_Guide/OmpSs-2


Ignore:
Timestamp:
Jun 12, 2019, 11:46:53 AM (5 years ago)
Author:
Pedro Martinez-Ferror
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/OmpSs-2

    v31 v32  
    11[[Image(OmpSs2_logo_full.png)]]
    22
    3 = Programming with OmpSs-2 =
     3= Programming with !OmpSs-2 =
    44
    55Table of contents:
     
    2020= Quick Overview =
    2121
    22 OmpSs-2 is a programming model composed of a set of directives and library routines that can be used in conjunction with a high-level programming language (such as C, C++ or Fortran) in order to develop concurrent applications. Its name originally comes from two other programming models: **OpenMP** and **StarSs**. The design principles of these two programming models constitute the fundamental ideas used to conceive the OmpSs philosophy.
     22!OmpSs-2 is a programming model composed of a set of directives and library routines that can be used in conjunction with a high-level programming language (such as C, C++ or Fortran) in order to develop concurrent applications. Its name originally comes from two other programming models: **OpenMP** and **!StarSs**. The design principles of these two programming models constitute the fundamental ideas used to conceive the !OmpSs philosophy.
    2323
    2424[[Image(OmpSsOpenMP.png, 30%)]]
    2525
    26 OmpSs-2 **thread-pool** execution model differs from the **fork-join** parallelism implemented in OpenMP.
     26!OmpSs-2 **thread-pool** execution model differs from the **fork-join** parallelism implemented in OpenMP.
    2727
    2828[[Image(pools.png, 30%)]]
     
    3232[[Image(taskGraph.png, 15%)]]
    3333
    34 The reference implementation of OmpSs-2 is based on the **Mercurium** source-to-source compiler and the **Nanos6** runtime library:
     34The reference implementation of !OmpSs-2 is based on the **Mercurium** source-to-source compiler and the **Nanos6** runtime library:
    3535* Mercurium source-to-source compiler provides the necessary support for transforming the high-level directives into a parallelized version of the application.
    3636* Nanos6 runtime library provides services to manage all the parallelism in the user-application, including task creation, synchronization and data movement, as well as support for resource heterogeneity.
     
    3838[[Image(MercuriumNanos.png, 35%)]]
    3939
    40 **Additional information** about the OmpSs-2 programming model can be found at:
    41 * OmpSs-2 official website. [https://pm.bsc.es/ompss-2]
    42 * OmpSs-2 specification. [https://pm.bsc.es/ftp/ompss-2/doc/spec]
    43 * OmpSs-2 user guide. [https://pm.bsc.es/ftp/ompss-2/doc/user-guide]
    44 * OmpSs-2 examples repository. [https://pm.bsc.es/gitlab/ompss-2/examples]
    45 * OmpSs-2 manual with examples and exercises. [https://pm.bsc.es/ftp/ompss-2/doc/examples/index.html]
     40**Additional information** about the !OmpSs-2 programming model can be found at:
     41* !OmpSs-2 official website. [https://pm.bsc.es/ompss-2]
     42* !OmpSs-2 specification. [https://pm.bsc.es/ftp/ompss-2/doc/spec]
     43* !OmpSs-2 user guide. [https://pm.bsc.es/ftp/ompss-2/doc/user-guide]
     44* !OmpSs-2 examples repository. [https://pm.bsc.es/gitlab/ompss-2/examples]
     45* !OmpSs-2 manual with examples and exercises. [https://pm.bsc.es/ftp/ompss-2/doc/examples/index.html]
    4646* Mercurium official website. [https://www.bsc.es/research-and-development/software-and-apps/software-list/mercurium-ccfortran-source-source-compiler Link 1], [https://pm.bsc.es/mcxx Link 2]
    4747* Nanos official website. [https://www.bsc.es/research-and-development/software-and-apps/software-list/nanos-rtl Link 1], [https://pm.bsc.es/nanox Link 2]
     
    5151= Quick Setup on DEEP System =
    5252
    53 We highly recommend to log in a **cluster module (CM) node** to begin using OmpSs-2.  To request an entire CM node for an interactive session, please execute the following command:
     53We highly recommend to log in a **cluster module (CM) node** to begin using !OmpSs-2.  To request an entire CM node for an interactive session, please execute the following command:
    5454 `srun --partition=dp-cn --nodes=1 --ntasks=48 --ntasks-per-socket=24  --ntasks-per-node=48 --pty /bin/bash -i`   
    5555
    5656Note that the command above is consistent with the actual hardware configuration of the cluster module with **hyper-threading enabled**.
    5757
    58 OmpSs-2 has already been installed on DEEP and can be used by simply executing the following commands:
     58!OmpSs-2 has already been installed on DEEP and can be used by simply executing the following commands:
    5959* `modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/Core:$modulepath"`
    6060* `modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/Compiler/mpi/intel/2019.0.117-GCC-7.3.0:$modulepath"`
     
    6363* `module load OmpSs-2`
    6464
    65 Remember that OmpSs?-2 uses a **thread-pool** execution model which means that it **permanently uses all the threads** present on the system. Users are strongly encouraged to always check the **system affinity** by running the **NUMA command** `numactl --show`:
     65Remember that !OmpSs-2 uses a **thread-pool** execution model which means that it **permanently uses all the threads** present on the system. Users are strongly encouraged to always check the **system affinity** by running the **NUMA command** `numactl --show`:
    6666{{{
    6767$ numactl --show
     
    8383Notice that both commands return consistent outputs and, even though an entire node with two sockets has been requested, only the first NUMA node (i.e. socket) has been correctly bind.  As a result, only 48 threads of the first socket (0-11, 24-35), from which 24 are physical and 24 logical (hyper-threading enabled), are going to be utilised whilst the other 48 threads available in the second socket will remain idle. Therefore, **the system affinity showed above is not valid since it does not represent the resources requested via SLURM.**
    8484
    85 System affinity can be used to specify, for example, the ratio of MPI and OmpSs-2 processes for a hybrid application and can be modified by user request in different ways:
     85System affinity can be used to specify, for example, the ratio of MPI and !OmpSs-2 processes for a hybrid application and can be modified by user request in different ways:
    8686* Via SLURM. However, if the affinity does not correspond to the resources requested like in the previous example, it should be reported to the system administrators.
    8787* Via the command `numactl`.
     
    9898== System configuration ==
    9999
    100 Please refer to section [#QuickSetuponDEEPSystem Quick Setup on DEEP System] to get a functional version of OmpSs-2 on DEEP. It is also recommended to run OmpSs-2 on a cluster module (CM) node.
     100Please refer to section [#QuickSetuponDEEPSystem Quick Setup on DEEP System] to get a functional version of !OmpSs-2 on DEEP. It is also recommended to run !OmpSs-2 on a cluster module (CM) node.
    101101
    102102== Building and running the examples ==
     
    147147----
    148148
    149 = multisaxpy benchmark (OmpSs-2) =
     149= multisaxpy benchmark (!OmpSs-2) =
    150150
    151151Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/multisaxpy] and transfer it to a DEEP working directory.
     
    221221
    222222
    223 = dot-product benchmark (OmpSs-2) =
     223= dot-product benchmark (!OmpSs-2) =
    224224
    225225Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/dot-product] and transfer it to a DEEP working directory.
     
    247247
    248248
    249 = mergesort benchmark (OmpSs-2) =
     249= mergesort benchmark (!OmpSs-2) =
    250250
    251251Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/mergesort] and transfer it to a DEEP working directory.
     
    274274
    275275
    276 = nqueens benchmark (OmpSs-2) =
     276= nqueens benchmark (!OmpSs-2) =
    277277
    278278Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/nqueens] and transfer it to a DEEP working directory.
     
    305305
    306306
    307 = matmul benchmark (OmpSs-2) =
     307= matmul benchmark (!OmpSs-2) =
    308308
    309309Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/matmul] and transfer it to a DEEP working directory.
     
    336336
    337337
    338 = Cholesky benchmark (OmpSs-2+MKL) =
     338= Cholesky benchmark (!OmpSs-2+MKL) =
    339339
    340340Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/cholesky] and transfer it to a DEEP working directory.
     
    342342== Description ==
    343343
    344 This benchmark is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose.  This Cholesky decomposition is carried out with OmpSs-2 using tasks with priorities.
     344This benchmark is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose.  This Cholesky decomposition is carried out with !OmpSs-2 using tasks with priorities.
    345345
    346346There are **3 implementations** of this benchmark.
     
    351351The Makefile has three additional rules:
    352352* **run:** runs each version one after the other.
    353 * **run-graph:** runs the OmpSs-2 versions with the graph instrumentation.
    354 * **run-extrae:** runs the OmpSs-2 versions with the extrae instrumentation.
     353* **run-graph:** runs the !OmpSs-2 versions with the graph instrumentation.
     354* **run-extrae:** runs the !OmpSs-2 versions with the extrae instrumentation.
    355355
    356356For the graph instrumentation, it is recommended to view the resulting PDF in single page mode and to advance through the pages. This will show the actual instantiation and execution of the code. For the extrae instrumentation, extrae must be loaded and available at least through the `LD_LIBRARY_PATH` environment variable.
     
    374374
    375375
    376 = nbody benchmark (MPI+OmpSs-2+TAMPI) =
     376= nbody benchmark (MPI+!OmpSs-2+TAMPI) =
    377377
    378378Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/nbody] and transfer it to a DEEP working directory.
     
    387387it is not.
    388388
    389 The interoperability versions (MPI+OmpSs-2+TAMPI) are compiled only if the environment variable `TAMPI_HOME` is set to the Task-Aware MPI (TAMPI) library's installation directory.
     389The interoperability versions (MPI+!OmpSs-2+TAMPI) are compiled only if the environment variable `TAMPI_HOME` is set to the Task-Aware MPI (TAMPI) library's installation directory.
    390390
    391391== Execution instructions ==
     
    397397`mpiexec -n 4 -bind-to hwthread:16 ./nbody -t 100 -p 8192`
    398398
    399 in which the application will perform 100 timesteps in 4 MPI processes with 16 hardware threads in each process (used by the OmpSs-2 runtime). The total number of particles will be 8192 so that each process will have 2048 particles (2 blocks per process).
     399in which the application will perform 100 timesteps in 4 MPI processes with 16 hardware threads in each process (used by the !OmpSs-2 runtime). The total number of particles will be 8192 so that each process will have 2048 particles (2 blocks per process).
    400400
    401401== References ==
     
    408408
    409409
    410 = heat benchmark (MPI+OmpSs-2+TAMPI) =
     410= heat benchmark (MPI+!OmpSs-2+TAMPI) =
    411411
    412412Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/heat] and transfer it to a DEEP working directory.
     
    421421binaries by executing the command `make`.
    422422
    423 The interoperability versions (MPI+OmpSs-2+TAMPI) are compiled only if the environment variable `TAMPI_HOME` is set to the Task-Aware MPI (TAMPI) library's installation directory.
     423The interoperability versions (MPI+!OmpSs-2+TAMPI) are compiled only if the environment variable `TAMPI_HOME` is set to the Task-Aware MPI (TAMPI) library's installation directory.
    424424
    425425== Execution instructions ==
     
    432432
    433433in which the application will perform 150 timesteps in 4 MPI processes with 16
    434 hardware threads in each process (used by the OmpSs-2 runtime). The size of the
     434hardware threads in each process (used by the !OmpSs-2 runtime). The size of the
    435435matrix in each dimension will be 8192 (8192^2^ elements in total), this means
    436436that each process will have 2048x8192 elements (16 blocks per process).
     
    444444----
    445445
    446 = krist benchmark (OmpSs-2+CUDA) =
     446= krist benchmark (!OmpSs-2+CUDA) =
    447447
    448448Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/krist] and transfer it to a DEEP working directory.