| 308 | |
| 309 | Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/matmul] and transfer it to a DEEP working directory. |
| 310 | |
| 311 | == Description == |
| 312 | |
| 313 | This benchmark runs a matrix multiplication operation C = A✕B, where A has size |
| 314 | N✕M, B has size M✕P, and the resulting matrix C has size N✕P. |
| 315 | |
| 316 | There are **3 implementations** of this benchmark. |
| 317 | |
| 318 | == Execution instructions == |
| 319 | |
| 320 | `./matmul N M P BLOCK_SIZE` |
| 321 | |
| 322 | where: |
| 323 | * `N` is the number of rows of the matrix A. |
| 324 | * `M` is the number of columns of the matrix A and the number of rows of the matrix B. |
| 325 | * `P` is the number of columns of the matrix B. |
| 326 | * The matrix multiplication operation will be applied in blocks that contains `BLOCK_SIZE`✕`BLOCK_SIZE` elements. |
| 327 | |
| 328 | == References == |
| 329 | |
| 330 | * [https://pm.bsc.es/gitlab/ompss-2/examples/matmul] |
| 331 | * [https://pm.bsc.es/ftp/ompss-2/doc/examples/local/sphinx/02-examples.html] |
| 332 | * [https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm] |
| 333 | |
| 334 | |
| 335 | ---- |
| 336 | |
| 337 | |
| 338 | = Cholesky benchmark (OmpSs-2+MKL) = |
321 | | = Cholesky benchmark (OmpSs-2+MKL) = |
322 | | |
323 | | Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/cholesky] and transfer it to a DEEP working directory. |
324 | | |
325 | | == Description == |
326 | | |
327 | | This benchmark is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose. This Cholesky decomposition is carried out with OmpSs-2 using tasks with priorities. |
328 | | |
329 | | There are **3 implementations** of this benchmark. |
330 | | |
331 | | The code uses the CBLAS and LAPACKE interfaces to both BLAS and LAPACK. |
332 | | By default we try to find MKL, ATLAS and LAPACKE from the MKLROOT, LIBRARY_PATH and C_INCLUDE_PATH environment variables. If you are using an implementation with other linking requirements, please edit the `LIBS` entry in the makefile accordingly. |
333 | | |
334 | | The Makefile has three additional rules: |
335 | | * **run:** runs each version one after the other. |
336 | | * **run-graph:** runs the OmpSs-2 versions with the graph instrumentation. |
337 | | * **run-extrae:** runs the OmpSs-2 versions with the extrae instrumentation. |
338 | | |
339 | | For the graph instrumentation, it is recommended to view the resulting PDF in single page mode and to advance through the pages. This will show the actual instantiation and execution of the code. For the extrae instrumentation, extrae must be loaded and available at least through the `LD_LIBRARY_PATH` environment variable. |
340 | | |
341 | | == Execution instructions == |
342 | | |
343 | | `./cholesky SIZE BLOCK_SIZE` |
344 | | |
345 | | where: |
346 | | * `SIZE` is the number of elements per side of the matrix. |
347 | | * The decomposition is made by blocks of `BLOCK_SIZE` by `BLOCK_SIZE` elements. |
348 | | |
349 | | == References == |
350 | | |
351 | | * [https://pm.bsc.es/gitlab/ompss-2/examples/cholesky] |
352 | | * [https://pm.bsc.es/ftp/ompss-2/doc/examples/02-examples/cholesky-mkl/README.html] |
353 | | * [https://en.wikipedia.org/wiki/Eight_queens_puzzle] |
| 373 | ---- |
| 374 | |
| 375 | |
| 376 | = nbody benchmark (MPI+OmpSs-2+TAMPI) = |
| 377 | |
| 378 | Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/nbody] and transfer it to a DEEP working directory. |
| 379 | |
| 380 | == Description == |
| 381 | |
| 382 | This benchmark represents an N-body simulation to numerically approximate the evolution of a system of bodies in which each body continuously interacts with every other body. A familiar example is an astrophysical simulation in which each body represents a galaxy or an individual star, and the bodies attract each other through the gravitational force. |
| 383 | |
| 384 | There are **7 implementations** of this benchmark which are compiled in different |
| 385 | binaries by executing the command `make`. These versions can be blocking, |
| 386 | when the particle space is divided into smaller blocks, or non-blocking, when |
| 387 | it is not. |
| 388 | |
| 389 | == Execution instructions == |
| 390 | |
| 391 | The binaries accept several options. The most relevant options are the number |
| 392 | of total particles (`-p`) and the number of timesteps (`-t`). More options |
| 393 | can be seen with the `-h` option. An example of execution could be: |
| 394 | |
| 395 | `mpiexec -n 4 -bind-to hwthread:16 ./nbody -t 100 -p 8192` |
| 396 | |
| 397 | in which the application will perform 100 timesteps in 4 MPI processes with 16 hardware threads in each process (used by the OmpSs-2 runtime). The total number of particles will be 8192 so that each process will have 2048 particles (2 blocks per process). |
| 398 | |
| 399 | == References == |
| 400 | |
| 401 | * [https://pm.bsc.es/gitlab/ompss-2/examples/nbody] |
| 402 | * [https://pm.bsc.es/ftp/ompss-2/doc/examples/local/sphinx/02-examples.html] |
| 403 | * [https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm] |
| 404 | |
| 405 | |
| 406 | ---- |