Context Navigation

OmpSs-2

Timestamp:: Jun 12, 2019, 11:46:53 AM (6 years ago)
Author:: Pedro Martinez-Ferror
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

Public/User_Guide/OmpSs-2

-                      v31
+                      v32
 [[Image(OmpSs2_logo_full.png)]]
 = Programming with OmpSs-2 =
+= Programming with !OmpSs-2 =
 Table of contents:
 …
 = Quick Overview =
 OmpSs-2 is a programming model composed of a set of directives and library routines that can be used in conjunction with a high-level programming language (such as C, C++ or Fortran) in order to develop concurrent applications. Its name originally comes from two other programming models: **OpenMP** and **StarSs**. The design principles of these two programming models constitute the fundamental ideas used to conceive the OmpSs philosophy.
+!OmpSs-2 is a programming model composed of a set of directives and library routines that can be used in conjunction with a high-level programming language (such as C, C++ or Fortran) in order to develop concurrent applications. Its name originally comes from two other programming models: **OpenMP** and **!StarSs**. The design principles of these two programming models constitute the fundamental ideas used to conceive the !OmpSs philosophy.
 [[Image(OmpSsOpenMP.png, 30%)]]
 OmpSs-2 **thread-pool** execution model differs from the **fork-join** parallelism implemented in OpenMP.
+!OmpSs-2 **thread-pool** execution model differs from the **fork-join** parallelism implemented in OpenMP.
 [[Image(pools.png, 30%)]]
 …
 [[Image(taskGraph.png, 15%)]]
 The reference implementation of OmpSs-2 is based on the **Mercurium** source-to-source compiler and the **Nanos6** runtime library:
+The reference implementation of !OmpSs-2 is based on the **Mercurium** source-to-source compiler and the **Nanos6** runtime library:
 * Mercurium source-to-source compiler provides the necessary support for transforming the high-level directives into a parallelized version of the application.
 * Nanos6 runtime library provides services to manage all the parallelism in the user-application, including task creation, synchronization and data movement, as well as support for resource heterogeneity.
 …
 [[Image(MercuriumNanos.png, 35%)]]
 **Additional information** about the OmpSs-2 programming model can be found at:
 * OmpSs-2 official website. [https://pm.bsc.es/ompss-2]
 * OmpSs-2 specification. [https://pm.bsc.es/ftp/ompss-2/doc/spec]
 * OmpSs-2 user guide. [https://pm.bsc.es/ftp/ompss-2/doc/user-guide]
 * OmpSs-2 examples repository. [https://pm.bsc.es/gitlab/ompss-2/examples]
 * OmpSs-2 manual with examples and exercises. [https://pm.bsc.es/ftp/ompss-2/doc/examples/index.html]
+**Additional information** about the !OmpSs-2 programming model can be found at:
+* !OmpSs-2 official website. [https://pm.bsc.es/ompss-2]
+* !OmpSs-2 specification. [https://pm.bsc.es/ftp/ompss-2/doc/spec]
+* !OmpSs-2 user guide. [https://pm.bsc.es/ftp/ompss-2/doc/user-guide]
+* !OmpSs-2 examples repository. [https://pm.bsc.es/gitlab/ompss-2/examples]
+* !OmpSs-2 manual with examples and exercises. [https://pm.bsc.es/ftp/ompss-2/doc/examples/index.html]
 * Mercurium official website. [https://www.bsc.es/research-and-development/software-and-apps/software-list/mercurium-ccfortran-source-source-compiler Link 1], [https://pm.bsc.es/mcxx Link 2]
 * Nanos official website. [https://www.bsc.es/research-and-development/software-and-apps/software-list/nanos-rtl Link 1], [https://pm.bsc.es/nanox Link 2]
 …
 = Quick Setup on DEEP System =
 We highly recommend to log in a **cluster module (CM) node** to begin using OmpSs-2.  To request an entire CM node for an interactive session, please execute the following command:
+We highly recommend to log in a **cluster module (CM) node** to begin using !OmpSs-2.  To request an entire CM node for an interactive session, please execute the following command:
  `srun --partition=dp-cn --nodes=1 --ntasks=48 --ntasks-per-socket=24  --ntasks-per-node=48 --pty /bin/bash -i`
 Note that the command above is consistent with the actual hardware configuration of the cluster module with **hyper-threading enabled**.
 OmpSs-2 has already been installed on DEEP and can be used by simply executing the following commands:
+!OmpSs-2 has already been installed on DEEP and can be used by simply executing the following commands:
 * `modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/Core:$modulepath"`
 * `modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/Compiler/mpi/intel/2019.0.117-GCC-7.3.0:$modulepath"`
 …
 * `module load OmpSs-2`
 Remember that OmpSs?-2 uses a **thread-pool** execution model which means that it **permanently uses all the threads** present on the system. Users are strongly encouraged to always check the **system affinity** by running the **NUMA command** `numactl --show`:
+Remember that !OmpSs-2 uses a **thread-pool** execution model which means that it **permanently uses all the threads** present on the system. Users are strongly encouraged to always check the **system affinity** by running the **NUMA command** `numactl --show`:
 {{{
 $ numactl --show
 …
 Notice that both commands return consistent outputs and, even though an entire node with two sockets has been requested, only the first NUMA node (i.e. socket) has been correctly bind.  As a result, only 48 threads of the first socket (0-11, 24-35), from which 24 are physical and 24 logical (hyper-threading enabled), are going to be utilised whilst the other 48 threads available in the second socket will remain idle. Therefore, **the system affinity showed above is not valid since it does not represent the resources requested via SLURM.**
 System affinity can be used to specify, for example, the ratio of MPI and OmpSs-2 processes for a hybrid application and can be modified by user request in different ways:
+System affinity can be used to specify, for example, the ratio of MPI and !OmpSs-2 processes for a hybrid application and can be modified by user request in different ways:
 * Via SLURM. However, if the affinity does not correspond to the resources requested like in the previous example, it should be reported to the system administrators.
 * Via the command `numactl`.
 …
 == System configuration ==
 Please refer to section [#QuickSetuponDEEPSystem Quick Setup on DEEP System] to get a functional version of OmpSs-2 on DEEP. It is also recommended to run OmpSs-2 on a cluster module (CM) node.
+Please refer to section [#QuickSetuponDEEPSystem Quick Setup on DEEP System] to get a functional version of !OmpSs-2 on DEEP. It is also recommended to run !OmpSs-2 on a cluster module (CM) node.
 == Building and running the examples ==
 …
 ----
 = multisaxpy benchmark (OmpSs-2) =
+= multisaxpy benchmark (!OmpSs-2) =
 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/multisaxpy] and transfer it to a DEEP working directory.
 …
 = dot-product benchmark (OmpSs-2) =
+= dot-product benchmark (!OmpSs-2) =
 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/dot-product] and transfer it to a DEEP working directory.
 …
 = mergesort benchmark (OmpSs-2) =
+= mergesort benchmark (!OmpSs-2) =
 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/mergesort] and transfer it to a DEEP working directory.
 …
 = nqueens benchmark (OmpSs-2) =
+= nqueens benchmark (!OmpSs-2) =
 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/nqueens] and transfer it to a DEEP working directory.
 …
 = matmul benchmark (OmpSs-2) =
+= matmul benchmark (!OmpSs-2) =
 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/matmul] and transfer it to a DEEP working directory.
 …
 = Cholesky benchmark (OmpSs-2+MKL) =
+= Cholesky benchmark (!OmpSs-2+MKL) =
 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/cholesky] and transfer it to a DEEP working directory.
 …
 == Description ==
 This benchmark is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose.  This Cholesky decomposition is carried out with OmpSs-2 using tasks with priorities.
+This benchmark is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose.  This Cholesky decomposition is carried out with !OmpSs-2 using tasks with priorities.
 There are **3 implementations** of this benchmark.
 …
 The Makefile has three additional rules:
 * **run:** runs each version one after the other.
 * **run-graph:** runs the OmpSs-2 versions with the graph instrumentation.
 * **run-extrae:** runs the OmpSs-2 versions with the extrae instrumentation.
+* **run-graph:** runs the !OmpSs-2 versions with the graph instrumentation.
+* **run-extrae:** runs the !OmpSs-2 versions with the extrae instrumentation.
 For the graph instrumentation, it is recommended to view the resulting PDF in single page mode and to advance through the pages. This will show the actual instantiation and execution of the code. For the extrae instrumentation, extrae must be loaded and available at least through the `LD_LIBRARY_PATH` environment variable.
 …
 = nbody benchmark (MPI+OmpSs-2+TAMPI) =
+= nbody benchmark (MPI+!OmpSs-2+TAMPI) =
 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/nbody] and transfer it to a DEEP working directory.
 …
 it is not.
 The interoperability versions (MPI+OmpSs-2+TAMPI) are compiled only if the environment variable `TAMPI_HOME` is set to the Task-Aware MPI (TAMPI) library's installation directory.
+The interoperability versions (MPI+!OmpSs-2+TAMPI) are compiled only if the environment variable `TAMPI_HOME` is set to the Task-Aware MPI (TAMPI) library's installation directory.
 == Execution instructions ==
 …
 `mpiexec -n 4 -bind-to hwthread:16 ./nbody -t 100 -p 8192`
 in which the application will perform 100 timesteps in 4 MPI processes with 16 hardware threads in each process (used by the OmpSs-2 runtime). The total number of particles will be 8192 so that each process will have 2048 particles (2 blocks per process).
+in which the application will perform 100 timesteps in 4 MPI processes with 16 hardware threads in each process (used by the !OmpSs-2 runtime). The total number of particles will be 8192 so that each process will have 2048 particles (2 blocks per process).
 == References ==
 …
 = heat benchmark (MPI+OmpSs-2+TAMPI) =
+= heat benchmark (MPI+!OmpSs-2+TAMPI) =
 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/heat] and transfer it to a DEEP working directory.
 …
 binaries by executing the command `make`.
 The interoperability versions (MPI+OmpSs-2+TAMPI) are compiled only if the environment variable `TAMPI_HOME` is set to the Task-Aware MPI (TAMPI) library's installation directory.
+The interoperability versions (MPI+!OmpSs-2+TAMPI) are compiled only if the environment variable `TAMPI_HOME` is set to the Task-Aware MPI (TAMPI) library's installation directory.
 == Execution instructions ==
 …
 in which the application will perform 150 timesteps in 4 MPI processes with 16
 hardware threads in each process (used by the OmpSs-2 runtime). The size of the
+hardware threads in each process (used by the !OmpSs-2 runtime). The size of the
 matrix in each dimension will be 8192 (8192^2^ elements in total), this means
 that each process will have 2048x8192 elements (16 blocks per process).
 …
 ----
 = krist benchmark (OmpSs-2+CUDA) =
+= krist benchmark (!OmpSs-2+CUDA) =
 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/krist] and transfer it to a DEEP working directory.