Context Navigation

OmpSs-2

Timestamp:: Jun 12, 2019, 11:21:22 AM (6 years ago)
Author:: Pedro Martinez-Ferror
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

Public/User_Guide/OmpSs-2

-                      v28
+                      v29
 * [#matmulbenchmark(OmpSs-2) matmul benchmark (OmpSs-2)]
 * [#Choleskybenchmark(OmpSs-2+MKL) Cholesky benchmark (OmpSs-2+MKL)]
+* [#nbodybenchmark(MPI+OmpSs-2) nbody benchmark (MPI+OmpSs-2)]
+* [#heatbenchmark(MPI+OmpSs-2) heat benchmark (MPI+OmpSs-2)]
+----
 = Quick Overview =
 …
 * Nanos official website. [https://www.bsc.es/research-and-development/software-and-apps/software-list/nanos-rtl Link 1], [https://pm.bsc.es/nanox Link 2]
+----
 = Quick Setup on DEEP System =
 …
+----
 = Repository with Examples =
 …
 Additionally, you will need to change your running script in order to invoke the program through this trace.sh script. Although you can also edit your running script adding all the environment variables related with the instrumentation, it is preferable to use this extra script to easily change between instrumented and non-instrumented executions. When in need to instrument your execution, simply include trace.sh before the program invocation. Note that the **extrae.xml** file, which is used to configure the Extrae library to get a Paraver trace, is also needed.
+----
 = multisaxpy benchmark (OmpSs-2) =
 …
+----
 = dot-product benchmark (OmpSs-2) =
 …
 * [https://en.wikipedia.org/wiki/Dot_product]
+----
 = mergesort benchmark (OmpSs-2) =
 …
+----
 = nqueens benchmark (OmpSs-2) =
 …
 == Description ==
 This benchmark computes, for a N x N chessboard, the number of configurations
+This benchmark computes, for a NxN chessboard, the number of configurations
 of placing N chess queens in the chessboard such that none of them is able to
 attack any other. It is implemented using a branch-and-bound algorithm.
 …
+----
 = matmul benchmark (OmpSs-2) =
+Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/matmul] and transfer it to a DEEP working directory.
+== Description ==
+This benchmark runs a matrix multiplication operation C = A✕B, where A has size
+N✕M, B has size M✕P, and the resulting matrix C has size N✕P.
+There are **3 implementations** of this benchmark.
+== Execution instructions ==
+`./matmul N M P BLOCK_SIZE`
+where:
+* `N` is the number of rows of the matrix A.
+* `M` is the number of columns of the matrix A and the number of rows of the matrix B.
+* `P` is the number of columns of the matrix B.
+* The matrix multiplication operation will be applied in blocks that contains `BLOCK_SIZE`✕`BLOCK_SIZE` elements.
+== References ==
+* [https://pm.bsc.es/gitlab/ompss-2/examples/matmul]
+* [https://pm.bsc.es/ftp/ompss-2/doc/examples/local/sphinx/02-examples.html]
+* [https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm]
+----
+= Cholesky benchmark (OmpSs-2+MKL) =
 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/cholesky] and transfer it to a DEEP working directory.
 …
+= Cholesky benchmark (OmpSs-2+MKL) =
+Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/cholesky] and transfer it to a DEEP working directory.
+== Description ==
+This benchmark is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose.  This Cholesky decomposition is carried out with OmpSs-2 using tasks with priorities.
+There are **3 implementations** of this benchmark.
+The code uses the CBLAS and LAPACKE interfaces to both BLAS and LAPACK.
+By default we try to find MKL, ATLAS and LAPACKE from the MKLROOT, LIBRARY_PATH and C_INCLUDE_PATH environment variables. If you are using an implementation with other linking requirements, please edit the `LIBS` entry in the makefile accordingly.
+The Makefile has three additional rules:
+* **run:** runs each version one after the other.
+* **run-graph:** runs the OmpSs-2 versions with the graph instrumentation.
+* **run-extrae:** runs the OmpSs-2 versions with the extrae instrumentation.
+For the graph instrumentation, it is recommended to view the resulting PDF in single page mode and to advance through the pages. This will show the actual instantiation and execution of the code. For the extrae instrumentation, extrae must be loaded and available at least through the `LD_LIBRARY_PATH` environment variable.
+== Execution instructions ==
+`./cholesky SIZE BLOCK_SIZE`
+where:
+* `SIZE` is the number of elements per side of the matrix.
+* The decomposition is made by blocks of `BLOCK_SIZE` by `BLOCK_SIZE` elements.
+== References ==
+* [https://pm.bsc.es/gitlab/ompss-2/examples/cholesky]
+* [https://pm.bsc.es/ftp/ompss-2/doc/examples/02-examples/cholesky-mkl/README.html]
+* [https://en.wikipedia.org/wiki/Eight_queens_puzzle]
+----
+= nbody benchmark (MPI+OmpSs-2+TAMPI) =
+Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/nbody] and transfer it to a DEEP working directory.
+== Description ==
+This benchmark represents an N-body simulation to numerically approximate the evolution of a system of bodies in which each body continuously interacts with every other body.  A familiar example is an astrophysical simulation in which each body represents a galaxy or an individual star, and the bodies attract each other through the gravitational force.
+There are **7 implementations** of this benchmark which are compiled in different
+binaries by executing the command `make`. These versions can be blocking,
+when the particle space is divided into smaller blocks, or non-blocking, when
+it is not.
+== Execution instructions ==
+The binaries accept several options. The most relevant options are the number
+of total particles (`-p`) and the number of timesteps (`-t`). More options
+can be seen with the `-h` option. An example of execution could be:
+`mpiexec -n 4 -bind-to hwthread:16 ./nbody -t 100 -p 8192`
+in which the application will perform 100 timesteps in 4 MPI processes with 16 hardware threads in each process (used by the OmpSs-2 runtime). The total number of particles will be 8192 so that each process will have 2048 particles (2 blocks per process).
+== References ==
+* [https://pm.bsc.es/gitlab/ompss-2/examples/nbody]
+* [https://pm.bsc.es/ftp/ompss-2/doc/examples/local/sphinx/02-examples.html]
+* [https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm]
+----