Changes between Version 25 and Version 26 of Public/User_Guide/OmpSs-2


Ignore:
Timestamp:
Jun 12, 2019, 9:33:27 AM (5 years ago)
Author:
Pedro Martinez-Ferror
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/OmpSs-2

    v25 v26  
    5454* `module load OmpSs-2`
    5555
    56 Remember that OmpSs?-2 uses a **thread-pool** execution model which means that it permanently **uses all the threads** present on the system. Users are strongly encouraged to always check the **system affinity** by running the **NUMA command** `numactl --show`:
     56Remember that OmpSs?-2 uses a **thread-pool** execution model which means that it **permanently uses all the threads** present on the system. Users are strongly encouraged to always check the **system affinity** by running the **NUMA command** `numactl --show`:
    5757{{{
    5858$ numactl --show
     
    7272}}}
    7373
    74 Notice that both commands return consistent outputs and, even though an entire node with two sockets has been requested, only the first NUMA node (i.e. socket) has been correctly bind.  As a result, only 48 threads of the first socket (0-11, 24-35), from which 24 are physical and 24 logical (hyper-threading enabled), are going to be utilised whilst the other 48 threads available on the second socket will remain idle. Therefore, **the system affinity showed above is not valid since it does not represent the resources requested via SLURM.**
     74Notice that both commands return consistent outputs and, even though an entire node with two sockets has been requested, only the first NUMA node (i.e. socket) has been correctly bind.  As a result, only 48 threads of the first socket (0-11, 24-35), from which 24 are physical and 24 logical (hyper-threading enabled), are going to be utilised whilst the other 48 threads available in the second socket will remain idle. Therefore, **the system affinity showed above is not valid since it does not represent the resources requested via SLURM.**
    7575
    7676System affinity can be used to specify, for example, the ratio of MPI and OmpSs-2 processes for a hybrid application and can be modified by user request in different ways:
     
    9090== Building and running the examples ==
    9191
    92 All the examples come with a Makefile already configured to build (e.g. `make`) and run (e.g. `make run`) them.
     92All the examples come with a Makefile already configured to build (e.g. `make`) and run (e.g. `make run`) them.  You can clean the directory with the command `make clean`.
    9393
    9494== Controlling available threads ==
     
    137137Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/multisaxpy] and transfer it to a DEEP working directory.
    138138
    139 
    140 
    141 
    142 
    143 
     139== Description ==
     140
     141This benchmark runs several SAXPY operations. SAXPY is a combination of scalar multiplication and vector addition (a common operation in computations with vector processors) and constitutes a level 1 operation in the Basic Linear Algebra Subprograms (BLAS) package.
     142
     143There are **7 implementations** of this benchmark.
     144
     145== Execution instructions ==
     146
     147`./multisaxpy SIZE BLOCK_SIZE INTERATIONS`
     148
     149where:
     150* SIZE is the number of elements of the vectors used on the SAXPY operation.
     151* The SAXPY operation will be applied to the vector in blocks that contains BLOCK_SIZE elements.
     152* ITERATIONS is the number of times the SAXPY operation is executed.
     153
     154== Example output ==
     155
     156{{{
     157$ make clean
     158rm -f 01.multisaxpy_seq 02.multisaxpy_task_loop 03.multisaxpy_task 04.multisaxpy_task+dep 05.multisaxpy_task+weakdep 06.multisaxpy_task_loop+weakdep 07.multisaxpy_task+reduction
     159
     160$ make
     161mcxx --ompss-2 01.multisaxpy_seq.cpp main.cpp -o 01.multisaxpy_seq -lrt
     162mcxx --ompss-2 02.multisaxpy_task_loop.cpp main.cpp -o 02.multisaxpy_task_loop -lrt
     163mcxx --ompss-2 03.multisaxpy_task.cpp main.cpp -o 03.multisaxpy_task -lrt
     16403.multisaxpy_task.cpp:3:13: info: adding task function 'axpy_task' for device 'smp'
     16503.multisaxpy_task.cpp:12:3: info: call to task function '::axpy_task'
     16603.multisaxpy_task.cpp:3:13: info: task function declared here
     167mcxx --ompss-2 04.multisaxpy_task+dep.cpp main.cpp -o 04.multisaxpy_task+dep -lrt
     16804.multisaxpy_task+dep.cpp:3:13: info: adding task function 'axpy_task' for device 'smp'
     16904.multisaxpy_task+dep.cpp:12:3: info: call to task function '::axpy_task'
     17004.multisaxpy_task+dep.cpp:3:13: info: task function declared here
     171mcxx --ompss-2 05.multisaxpy_task+weakdep.cpp main.cpp -o 05.multisaxpy_task+weakdep -lrt
     17205.multisaxpy_task+weakdep.cpp:3:13: info: adding task function 'axpy_task' for device 'smp'
     17305.multisaxpy_task+weakdep.cpp:12:3: info: call to task function '::axpy_task'
     17405.multisaxpy_task+weakdep.cpp:3:13: info: task function declared here
     175mcxx --ompss-2 06.multisaxpy_task_loop+weakdep.cpp main.cpp -o 06.multisaxpy_task_loop+weakdep -lrt
     176mcxx --ompss-2 07.multisaxpy_task+reduction.cpp main.cpp -o 07.multisaxpy_task+reduction -lrt
     17707.multisaxpy_task+reduction.cpp:14:13: info: reduction of variable 'yy' of type 'double [elements]' solved to 'operator +'
     178<openmp-builtin-reductions>:1:1: info: reduction declared here
     17907.multisaxpy_task+reduction.cpp:21:13: info: reduction of variable 'y' of type 'double [N]' solved to 'operator +'
     180<openmp-builtin-reductions>:1:1: info: reduction declared here
     181
     182$ make run
     183./01.multisaxpy_seq 16777216 8192 100
     184size: 16777216, bs: 8192, iterations: 100, time: 3.2982, performance: 0.508678
     185NANOS6_SCHEDULER=fifo ./02.multisaxpy_task_loop 16777216 8192 100
     186size: 16777216, bs: 8192, iterations: 100, time: 0.40835, performance: 4.10854
     187./03.multisaxpy_task 16777216 8192 100
     188size: 16777216, bs: 8192, iterations: 100, time: 0.646697, performance: 2.59429
     189./04.multisaxpy_task+dep 16777216 8192 100
     190size: 16777216, bs: 8192, iterations: 100, time: 1.00903, performance: 1.6627
     191./05.multisaxpy_task+weakdep 16777216 8192 100
     192size: 16777216, bs: 8192, iterations: 100, time: 1.17464, performance: 1.42829
     193NANOS6_SCHEDULER=fifo ./06.multisaxpy_task_loop+weakdep 16777216 8192 100
     194size: 16777216, bs: 8192, iterations: 100, time: 3.81836, performance: 0.439382
     195./07.multisaxpy_task+reduction 16777216 8192  100
     196size: 16777216, bs: 8192, iterations: 100, time: 4.26565, performance: 0.39331
     197}}}
     198
     199== References ==
     200
     201* [https://pm.bsc.es/gitlab/ompss-2/examples/multisaxpy]
     202* [https://pm.bsc.es/ftp/ompss-2/doc/examples/local/sphinx/03-fundamentals.html]
     203* [http://en.wikipedia.org/wiki/AXPY]
     204