Changes between Version 28 and Version 29 of Public/User_Guide/OmpSs-2


Ignore:
Timestamp:
Jun 12, 2019, 11:21:22 AM (5 years ago)
Author:
Pedro Martinez-Ferror
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/OmpSs-2

    v28 v29  
    1313* [#matmulbenchmark(OmpSs-2) matmul benchmark (OmpSs-2)]
    1414* [#Choleskybenchmark(OmpSs-2+MKL) Cholesky benchmark (OmpSs-2+MKL)]
    15 
     15* [#nbodybenchmark(MPI+OmpSs-2) nbody benchmark (MPI+OmpSs-2)]
     16* [#heatbenchmark(MPI+OmpSs-2) heat benchmark (MPI+OmpSs-2)]
     17
     18----
    1619
    1720= Quick Overview =
     
    4447* Nanos official website. [https://www.bsc.es/research-and-development/software-and-apps/software-list/nanos-rtl Link 1], [https://pm.bsc.es/nanox Link 2]
    4548
     49----
    4650
    4751= Quick Setup on DEEP System =
     
    8589
    8690
     91----
     92
     93
    8794= Repository with Examples =
    8895
     
    137144
    138145Additionally, you will need to change your running script in order to invoke the program through this trace.sh script. Although you can also edit your running script adding all the environment variables related with the instrumentation, it is preferable to use this extra script to easily change between instrumented and non-instrumented executions. When in need to instrument your execution, simply include trace.sh before the program invocation. Note that the **extrae.xml** file, which is used to configure the Extrae library to get a Paraver trace, is also needed.
     146
     147----
    139148
    140149= multisaxpy benchmark (OmpSs-2) =
     
    209218
    210219
     220----
     221
     222
    211223= dot-product benchmark (OmpSs-2) =
    212224
     
    232244* [https://en.wikipedia.org/wiki/Dot_product]
    233245
     246----
     247
     248
    234249= mergesort benchmark (OmpSs-2) =
    235250
     
    256271
    257272
     273----
     274
     275
    258276= nqueens benchmark (OmpSs-2) =
    259277
     
    262280== Description ==
    263281
    264 This benchmark computes, for a N x N chessboard, the number of configurations
     282This benchmark computes, for a NxN chessboard, the number of configurations
    265283of placing N chess queens in the chessboard such that none of them is able to
    266284attack any other. It is implemented using a branch-and-bound algorithm.
     
    284302
    285303
     304----
     305
     306
    286307= matmul benchmark (OmpSs-2) =
     308
     309Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/matmul] and transfer it to a DEEP working directory.
     310
     311== Description ==
     312
     313This benchmark runs a matrix multiplication operation C = A✕B, where A has size
     314N✕M, B has size M✕P, and the resulting matrix C has size N✕P.
     315
     316There are **3 implementations** of this benchmark.
     317
     318== Execution instructions ==
     319
     320`./matmul N M P BLOCK_SIZE`
     321
     322where:
     323* `N` is the number of rows of the matrix A.
     324* `M` is the number of columns of the matrix A and the number of rows of the matrix B.
     325* `P` is the number of columns of the matrix B.
     326* The matrix multiplication operation will be applied in blocks that contains `BLOCK_SIZE`✕`BLOCK_SIZE` elements.
     327
     328== References ==
     329
     330* [https://pm.bsc.es/gitlab/ompss-2/examples/matmul]
     331* [https://pm.bsc.es/ftp/ompss-2/doc/examples/local/sphinx/02-examples.html]
     332* [https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm]
     333
     334
     335----
     336
     337
     338= Cholesky benchmark (OmpSs-2+MKL) =
    287339
    288340Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/cholesky] and transfer it to a DEEP working directory.
     
    319371
    320372
    321 = Cholesky benchmark (OmpSs-2+MKL) =
    322 
    323 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/cholesky] and transfer it to a DEEP working directory.
    324 
    325 == Description ==
    326 
    327 This benchmark is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose.  This Cholesky decomposition is carried out with OmpSs-2 using tasks with priorities.
    328 
    329 There are **3 implementations** of this benchmark.
    330 
    331 The code uses the CBLAS and LAPACKE interfaces to both BLAS and LAPACK.
    332 By default we try to find MKL, ATLAS and LAPACKE from the MKLROOT, LIBRARY_PATH and C_INCLUDE_PATH environment variables. If you are using an implementation with other linking requirements, please edit the `LIBS` entry in the makefile accordingly.
    333 
    334 The Makefile has three additional rules:
    335 * **run:** runs each version one after the other.
    336 * **run-graph:** runs the OmpSs-2 versions with the graph instrumentation.
    337 * **run-extrae:** runs the OmpSs-2 versions with the extrae instrumentation.
    338 
    339 For the graph instrumentation, it is recommended to view the resulting PDF in single page mode and to advance through the pages. This will show the actual instantiation and execution of the code. For the extrae instrumentation, extrae must be loaded and available at least through the `LD_LIBRARY_PATH` environment variable.
    340 
    341 == Execution instructions ==
    342 
    343 `./cholesky SIZE BLOCK_SIZE`
    344 
    345 where:
    346 * `SIZE` is the number of elements per side of the matrix.
    347 * The decomposition is made by blocks of `BLOCK_SIZE` by `BLOCK_SIZE` elements.
    348 
    349 == References ==
    350 
    351 * [https://pm.bsc.es/gitlab/ompss-2/examples/cholesky]
    352 * [https://pm.bsc.es/ftp/ompss-2/doc/examples/02-examples/cholesky-mkl/README.html]
    353 * [https://en.wikipedia.org/wiki/Eight_queens_puzzle]
     373----
     374
     375
     376= nbody benchmark (MPI+OmpSs-2+TAMPI) =
     377
     378Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/nbody] and transfer it to a DEEP working directory.
     379
     380== Description ==
     381
     382This benchmark represents an N-body simulation to numerically approximate the evolution of a system of bodies in which each body continuously interacts with every other body.  A familiar example is an astrophysical simulation in which each body represents a galaxy or an individual star, and the bodies attract each other through the gravitational force.
     383
     384There are **7 implementations** of this benchmark which are compiled in different
     385binaries by executing the command `make`. These versions can be blocking,
     386when the particle space is divided into smaller blocks, or non-blocking, when
     387it is not.
     388
     389== Execution instructions ==
     390
     391The binaries accept several options. The most relevant options are the number
     392of total particles (`-p`) and the number of timesteps (`-t`). More options
     393can be seen with the `-h` option. An example of execution could be:
     394
     395`mpiexec -n 4 -bind-to hwthread:16 ./nbody -t 100 -p 8192`
     396
     397in which the application will perform 100 timesteps in 4 MPI processes with 16 hardware threads in each process (used by the OmpSs-2 runtime). The total number of particles will be 8192 so that each process will have 2048 particles (2 blocks per process).
     398
     399== References ==
     400
     401* [https://pm.bsc.es/gitlab/ompss-2/examples/nbody]
     402* [https://pm.bsc.es/ftp/ompss-2/doc/examples/local/sphinx/02-examples.html]
     403* [https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm]
     404
     405
     406----