Changes between Version 35 and Version 36 of Public/User_Guide/OmpSs-2
- Timestamp:
- Jun 14, 2019, 3:39:05 PM (6 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Public/User_Guide/OmpSs-2
v35 v36 5 5 Table of contents: 6 6 * [#QuickOverview Quick Overview] 7 * [#QuickSetuponDEEPSystem Quick Setup on DEEP System]8 * [# RepositorywithExamples Repository with Examples]7 * [#QuickSetuponDEEPSystemforaPureOmpSs-2Application Quick Setup on DEEP System for a Pure OmpSs-2 Application] 8 * [#UsingtheRepositories Using the Repositories] 9 9 * [#multisaxpybenchmarkOmpSs-2 multisaxpy benchmark (OmpSs-2)] 10 10 * [#dot-productbenchmarkOmpSs-2 dot-product benchmark (OmpSs-2)] … … 49 49 ---- 50 50 51 = Quick Setup on DEEP System = 52 53 We highly recommend to log in a **cluster module (CM) node** to begin using !OmpSs-2. To request an entire CM node for an interactive session, please execute the following command: 54 `srun --partition=dp-cn --nodes=1 --ntasks=48 --ntasks-per-socket=24 --ntasks-per-node=48 --pty /bin/bash -i` 51 = Quick Setup on DEEP System for a Pure !OmpSs-2 Application = 52 53 We highly recommend to interactively log in a **cluster module (CM) node** to begin using !OmpSs-2. To request an entire CM node for an interactive session, please execute the following command to use all the 48 available threads: 54 55 `srun -p dp-cn -N 1 -n 1 -c 48 --pty /bin/bash -i` 55 56 56 57 Note that the command above is consistent with the actual hardware configuration of the cluster module with **hyper-threading enabled**. 57 58 58 59 !OmpSs-2 has already been installed on DEEP and can be used by simply executing the following commands: 59 * `modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/Core:$modulepath"`60 * `modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/Compiler/mpi/intel/2019.0.117-GCC-7.3.0:$modulepath"`61 * `modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/MPI/intel/2019.0.117-GCC-7.3.0/psmpi/5.2.1-1-mt:$modulepath"`62 * `export MODULEPATH="$modulepath:$MODULEPATH"`63 * `module load OmpSs-2`64 65 Remember that !OmpSs-2 uses a **thread-pool** execution model which means that it **permanently uses all the threads** present on the system. Users are strongly encouraged to always check the **system affinity** by running the **NUMA command** `numactl --show`:66 60 {{{ 67 $ numactl --show 68 policy: bind 69 preferred node: 0 70 physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 24 25 26 27 28 29 30 31 32 33 34 35 71 cpubind: 0 72 nodebind: 0 73 membind: 0 61 modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/Core:$modulepath" 62 modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/Compiler/mpi/intel/2019.0.117-GCC-7.3.0:$modulepath" 63 modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/MPI/intel/2019.0.117-GCC-7.3.0/psmpi/5.2.1-1-mt:$modulepath" 64 export MODULEPATH="$modulepath:$MODULEPATH" 65 module load OmpSs-2 74 66 }}} 75 as well as the **Nanos6 command** `nanos6-info --runtime-details | grep List`: 67 68 Remember that !OmpSs-2 uses a **thread-pool** execution model which means that it **permanently uses all the threads** present on the system. Users are strongly encouraged to always check the **system affinity** by running the **NUMA command** `srun numactl --show`: 76 69 {{{ 77 $ nanos6-info --runtime-details | grep List 78 Initial CPU List 0-11,24-35 70 $ srun numactl --show 71 policy: default 72 preferred node: current 73 physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 74 cpubind: 0 1 75 nodebind: 0 1 76 membind: 0 1 77 }}} 78 as well as the **Nanos6 command** `srun nanos6-info --runtime-details | grep List`: 79 {{{ 80 $ srun nanos6-info --runtime-details | grep List 81 Initial CPU List 0-47 79 82 NUMA Node 0 CPU List 0-35 80 NUMA Node 1 CPU List 83 NUMA Node 1 CPU List 12-47 81 84 }}} 82 85 83 Notice that both commands return consistent outputs and, even though an entire node with two sockets has been requested, only the first NUMA node (i.e. socket) has been correctly bind. As a result, only 48 threads of the first socket (0-11, 24-35), from which 24 are physical and 24 logical (hyper-threading enabled), are going to be utilised whilst the other 48 threads available in the second socket will remain idle. Therefore, **the system affinity showed above is not valid since it does not represent the resources requested via SLURM.**84 85 86 System affinity can be used to specify, for example, the ratio of MPI and !OmpSs-2 processes for a hybrid application and can be modified by user request in different ways: 86 * Via SLURM. However, if the affinity does not correspond to the resources requested like in the previous example, it should be reported to the system administrators.87 * Via the command `srun` or `salloc`. However, if the affinity given by SLURM does not correspond to the resources requested, it should be reported to the system administrators. 87 88 * Via the command `numactl`. 88 89 * Via the command `taskset`. … … 92 93 93 94 94 = Repository with Examples =95 = Using the Repositories = 95 96 96 97 All the examples shown here are publicly available at [https://pm.bsc.es/gitlab/ompss-2/examples]. Users must clone/download each example's repository and then transfer it to a DEEP working directory. … … 98 99 == System configuration == 99 100 100 Please refer to section [#QuickSetuponDEEPSystem Quick Setup on DEEP System] to get a functional version of !OmpSs-2 on DEEP. It is also recommended to run !OmpSs-2 on a cluster module (CM) node.101 Please refer to section [#QuickSetuponDEEPSystem Quick Setup on DEEP System] to get a functional version of !OmpSs-2 on DEEP. It is also recommended to run !OmpSs-2 via an interactive session on a cluster module (CM) node. 101 102 102 103 == Building and running the examples ==