Changes between Version 56 and Version 57 of Public/User_Guide/Batch_system


Ignore:
Timestamp:
Jul 8, 2022, 3:44:10 PM (23 months ago)
Author:
Jochen Kreutz
Comment:

small corrections for interactive batch job section

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/Batch_system

    v56 v57  
    22
    33= Information about the batch system (SLURM) =
    4 
    5 {{{#!comment outdated ?!
    6 For the old torque documentation, please see [wiki:"Public/User_Guide/Batch_system_torque" the old documentation].
    7 
    8 Please confer /etc/slurm/README.
    9 }}}
    10 
    11 The DEEP prototype systems are running SLURM for resource management. Documentation of Slurm can be found [https://slurm.schedmd.com/ here].
     4{{{#!comment outdated ?! For the old torque documentation, please see [wiki:"Public/User_Guide/Batch_system_torque" the old documentation].
     5
     6Please confer /etc/slurm/README. }}}
     7
     8The DEEP prototype systems are running SLURM for resource management. Documentation of Slurm can be found [https://slurm.schedmd.com/ here].
    129
    1310== Overview ==
    14 
    1511Slurm offers interactive and batch jobs (scripts submitted into the system). The relevant commands are `srun` and `sbatch`. The `srun` command can be used to spawn processes ('''please do not use mpiexec'''), both from the frontend and from within a batch script. You can also get a shell on a node to work locally there (e.g. to compile your application natively for a special platform or module).
    1612
    1713== Available Partitions ==
    18 
    19 Please note that there is no default partition configured. In order to run a job, you have to specify one of the following partitions, using the {{{--partition=...}}} switch:
    20 
    21  || '''Name''' || '''Description''' ||
    22  || dp-cn ||  dp-cn[01-50], DEEP-EST Cluster nodes (Xeon Skylake) ||
    23  || dp-dam ||  dp-dam[01-16], DEEP-EST Dam nodes (Xeon Cascadelake + 1 V100 + 1 Stratix 10) ||
    24  || dp-dam-ext ||  dp-dam[09-16], DEEP-EST Dam nodes connected with Extoll Tourmalet ||
    25  || dp-esb-ib || dp-esb[26-75], DEEP-EST ESB nodes connected with IB EDR (Xeon Cascadelake + 1 V100) ||
    26  || dp-esb-ext || dp-esb[01-25], DEEP-EST ESB nodes connected with Extoll Fabri3) ||
    27  || dp-sdv-esb ||  dp-sdv-esb[01-02], DEEP-EST ESB Test nodes (Xeon Cascadelake + 1 V100) ||
    28  || ml-gpu ||  ml-gpu[01-03], GPU test nodes for ML applications (4 V100 cards) ||
    29  || knl || knl[01,04-06], KNL nodes ||
    30  || knl256 ||  knl[01,05], KNL nodes with 64 cores ||
    31  || knl272 ||  knl[04,06], KNL nodes with 68 cores ||
    32  || snc4 ||  knl[05], KNL node in snc4 memory mode ||
    33  || debug ||  all compute nodes (no gateways) ||
    34 
    35 Anytime, you can list the state of the partitions with the {{{sinfo}}} command. The properties of a partition can be seen using
     14Please note that there is no default partition configured. In order to run a job, you have to specify one of the following partitions, using the `--partition=...` switch:
     15
     16  || '''Name''' || '''Description''' ||
     17  || dp-cn || dp-cn[01-50], DEEP-EST Cluster nodes (Xeon Skylake) ||
     18  || dp-dam || dp-dam[01-16], DEEP-EST Dam nodes (Xeon Cascadelake + 1 V100 + 1 Stratix 10) ||
     19  || dp-esb || dp-esb[log:@26-75 "[01-75]"], DEEP-EST ESB nodes connected with IB EDR (Xeon Cascadelake + 1 V100) ||
     20  || dp-sdv-esb || dp-sdv-esb[01-02], DEEP-EST ESB Test nodes (Xeon Cascadelake + 1 V100) ||
     21  || ml-gpu || ml-gpu[01-03], GPU test nodes for ML applications (4 V100 cards) ||
     22  || knl || knl[01,04-06], KNL nodes ||
     23  || knl256 || knl[01,05], KNL nodes with 64 cores ||
     24  || knl272 || knl[04,06], KNL nodes with 68 cores ||
     25  || snc4 || knl[05], KNL node in snc4 memory mode ||
     26  || debug || all compute nodes (no gateways) ||
     27
     28Anytime, you can list the state of the partitions with the `sinfo` command. The properties of a partition (.e.g. the maximum walltime) can be seen using
    3629
    3730{{{
    3831scontrol show partition <partition>
    3932}}}
    40 
    4133== Remark about environment ==
    42 
    4334By default, Slurm passes the environment from your job submission session directly to the execution environment. Please be aware of this when running jobs with `srun` or when submitting scripts with `sbatch`. This behavior can be controlled via the `--export` option. Please refer to the [https://slurm.schedmd.com/ Slurm documentation] to get more information about this.
    4435
    4536In particular, when submitting job scripts, **it is recommended to load the necessary modules within the script and submit the script from a clean environment.**
    4637
    47 
    4838== An introductory example ==
    49 
    50 Suppose you have an mpi executable named {{{hello_mpi}}}. There are three ways to start the binary.
    51 
     39Suppose you have an mpi executable named `hello_mpi`. There are three ways to start the binary.
    5240
    5341=== From a shell on a node ===
    54 
    55 First, start a shell on a node. Assume you would like to run your mpi task on 4 cluster nodes with 2 tasks per node:
    56 {{{
    57 [kreutz1@deepv /p/project/cdeep/kreutz1/Temp]$ srun -A deep -p dp-cn -N 4 -n 8 -t 00:30:00 --pty --interactive /bin/bash -i
    58 [kreutz1@dp-cn01 /p/project/cdeep/kreutz1/Temp]$
    59 }}}
    60 
    61 The environment is transported to the remote shell, no {{{.profile}}}, {{{.bashrc}}}, ... are sourced (especially not the modules default from {{{/etc/profile.d/modules.sh}}}). As of March 2020, an account has to be specified using the `--account` (short `-A`) option, which is "deep" for the project members. For people not included in the DEEP-EST project, please use the "Budget" name you received along with your account creation.
    62 
    63 Once you get to the compute node, start your application using {{{srun}}}. Note that the number of tasks used is the same as specified in the initial {{{srun}}} command above (4 nodes with two tasks each):
    64 {{{
    65 [kreutz1@deepv Temp]$ salloc -A deep -p dp-cn -N 4 -n 8 -t 00:30:00 srun --pty --interactive /bin/bash -i
    66 [kreutz1@dp-cn01 Temp]$ srun -N 2 -n 8 ./MPI_HelloWorld
     42If you just need one node to run your interactive session on you can simply use the `srun` command (without `salloc`), e.g.:
     43
     44{{{
     45[kreutz1@deepv ~]$ srun -A deep -N 1 -n 8 -p dp-cn -t 00:30:00 --pty --interactive bash
     46[kreutz1@dp-cn22 ~]$ srun -n 8 hostname
     47dp-cn22
     48dp-cn22
     49dp-cn22
     50dp-cn22
     51dp-cn22
     52dp-cn22
     53dp-cn22
     54dp-cn22
     55}}}
     56The environment is transported to the remote shell, no `.profile`, `.bashrc`, ... are sourced (especially not the modules default from `/etc/profile.d/modules.sh`). As of March 2020, an account has to be specified using the `--account` (short `-A`)  option, which is "deepsea" for DEEP-SEA project members. For people not  included in the DEEP-SEA project, please use the "Budget" name you  received along with your account creation.
     57
     58===  ===
     59Assume you would like to run an MPI task on 4 cluster nodes with 2 tasks per node. It's necessary to use `salloc` then:
     60
     61{{{
     62[kreutz1@deepv Temp]$ salloc -A deep -p dp-cn -N 4 -n 8 -t 00:30:00 srun --pty --interactive /bin/bash
     63[kreutz1@dp-cn01 Temp]$ srun -N 4 -n 8 ./MPI_HelloWorld
    6764Hello World from rank 3 of 8 on dp-cn02
    6865Hello World from rank 7 of 8 on dp-cn04
     
    7471Hello World from rank 5 of 8 on dp-cn03
    7572}}}
    76 
    77 You can ignore potential warnings about the cpu binding. !ParaStation will pin your processes.
    78 
    79 If you just need to one node to run your interactive session on you can simply use the `srun` command (without `salloc`), e.g.:
    80 
    81 {{{
    82 [kreutz1@deepv ~]$ srun -A deep -N 1 -n 8 -p dp-cn -t 00:30:00 --pty --interactive bash -i
    83 [kreutz1@dp-cn22 ~]$ srun -n 8 hostname
    84 dp-cn22
    85 dp-cn22
    86 dp-cn22
    87 dp-cn22
    88 dp-cn22
    89 dp-cn22
    90 dp-cn22
    91 dp-cn22
    92 }}}
    93 
     73[[BR]]Once you get to the compute node, start your application using `srun`. Note that the number of tasks used is the same as specified in the initial `srun` command above (4 nodes with two tasks each). It's also possible to use less nodes in the `srun` command.  So the following command would work as well:
     74
     75{{{
     76[kreutz1@dp-cn01 Temp]$ srun -N 1 -n 1 ./MPI_HelloWorld
     77Hello World from rank 0 of 1 on dp-cn01
     78}}}
    9479=== Running directly from the front ends ===
    95 
    9680You can run the application directly from the frontend, bypassing the shell. Do not forget to set the correct environment for running your executable on the login node as this will be used for execution with `srun`.
    9781
     
    10892Hello World from rank 5 of 8 on dp-cn03
    10993}}}
    110 
    11194It can be useful to create an allocation which can be used for several runs of your job:
    11295
     
    130113salloc: Relinquishing job allocation 69263
    131114}}}
    132 
    133115Note that in this case the `-N` and `-n` options for the `srun` command can be skipped (they default to the corresponding options given to `salloc`).
    134116
    135117=== Batch script ===
    136 
    137 As stated above, it is recommended to load the necessary modules within the script and submit the script from a clean environment.
    138 
    139 The following script {{{hello_cluster.sh}}} will unload all modules and load the modules required for executing the given binary:
     118As stated above, it is recommended to load the necessary modules within the script and submit the script from a clean environment.
     119
     120The following script `hello_cluster.sh` will unload all modules and load the modules required for executing the given binary:
    140121
    141122{{{
     
    154135srun ./MPI_HelloWorld
    155136}}}
    156 
    157 This script requests 4 nodes of the ESB module with 8 tasks, specifies the stdout and stderr files, and asks for 10 minutes of walltime.
    158 You can submit the job script as follows:
     137This script requests 4 nodes of the ESB module with 8 tasks, specifies the stdout and stderr files, and asks for 10 minutes of walltime.  You can submit the job script as follows:
    159138
    160139{{{
     
    162141Submitted batch job 69264
    163142}}}
    164 
    165143... and check what it's doing:
    166144
     
    170148             69264     dp-cn hello_cl  kreutz1 CG       0:04      4 dp-cn[01-04]
    171149}}}
    172 
    173150Once finished, you can check the result (and the error file if needed)
    174151
     
    184161Hello World from rank 0 of 8 on dp-esb34
    185162}}}
    186 
    187 
    188163{{{#!comment JK: not available anymore in current SLURM version
     164
    189165== Submitting jobs to alternative modules ==
    190166Users can submit batch jobs to multiple modules by using `--module-list` extension for Slurm `sbatch` command. This extension accepts two modules, a primary module, submitted with higher priority, and an alternative module that receives a lower priority in the job queue. In the below example the job is submitted to two modules: the primary module is dp-cn, while the secondary module is dp-dam.
     
    192168`sbatch --module-list=dp-cm,dp-dam job.batch`
    193169
    194 The parameters for the alternative module are automatically calculated by using an internal conversion model. Module-list is an alternative to `--partition`, which does not apply any conversion model, and submits the job to multiple partitions but with the same configuration. Available conversion models:
    195  - CM to DAM: number of requested nodes / 2, number of tasks per node x2 (CPU cores ratio)
    196  - ESB to CM: same number of nodes, time limit * 10 (due to GPU vs CPU performance), number of tasks per node / 3 (CPU cores ratio)
    197  - DAM to CM: number of nodes * 2 (due to available memory), time limit * 10 (due to GPU vs CPU performance), number of tasks per node / 2 (CPU cores ratio)
    198  - DAM to ESB: number of nodes * 4 (due to available memory), time limit / 4 (due to using more nodes), number of tasks per node / 6 (CPU cores ratio)
     170The parameters for the alternative module are automatically calculated by using an internal conversion model. Module-list is an alternative to `--partition`, which does not apply any conversion model, and submits the job to multiple partitions but with the same configuration. Available conversion models:
     171
     172 * CM to DAM: number of requested nodes / 2, number of tasks per node x2 (CPU cores ratio)
     173 * ESB to CM: same number of nodes, time limit * 10 (due to GPU vs CPU performance), number of tasks per node / 3 (CPU cores ratio)
     174 * DAM to CM: number of nodes * 2 (due to available memory), time limit * 10 (due to GPU vs CPU performance), number of tasks per node / 2 (CPU cores ratio)
     175 * DAM to ESB: number of nodes * 4 (due to available memory), time limit / 4 (due to using more nodes), number of tasks per node / 6 (CPU cores ratio)
    199176
    200177At submission time it is recommended to specify the number of nodes, the number of tasks per node, the number of CPUs per task, and the number of GPUs per node.
    201178
    202 Module-list is currently not compatible with other dependencies specified with `--dependency` clause, and also with other partitions specified with `--partition`.
    203 }}}
    204 
     179Module-list is currently not compatible with other dependencies specified with `--dependency` clause, and also with other partitions specified with `--partition`. }}}
    205180
    206181== Job chains ==
    207 
    208182Please refer to the [wiki:Public/User_Guide/Batch_system#FAQ FAQ] for creation of job chains and implementing job dependencies. If you would like to implement workflows, take a look at the [wiki:Public/User_Guide/Workflows Workflows] section.
    209183
    210 
    211184== Information on past jobs and accounting ==
    212 
    213185The `sacct` command can be used to enquire the Slurm database about a past job.
    214186
     
    22119369268+1            bash     dp-dam deepest-a+        384  COMPLETED      0:0
    222194}}}
    223 
    224195On the Cluster (CM) nodes it's possible to query the consumed energy for a certain job:
    225196
     
    232203       496.70K xlinpack_+ 69326.0        08:10:24          1
    233204}}}
    234 
    235205This feature will also be for the ESB nodes.
    236206
    237 
    238207== FAQ ==
    239 
    240208=== Is there a cheat sheet for all main Slurm commands? ===
    241 
    242209Yes, it is available [https://slurm.schedmd.com/pdfs/summary.pdf here].
    243210
    244211=== Why's my job not running? ===
    245 
    246212You can check the state of your job with
    247213
     
    249215scontrol show job <job id>
    250216}}}
    251 
    252 In the output, look for the {{{Reason}}} field.
     217In the output, look for the `Reason` field.
    253218
    254219You can check the existing reservations using
     
    257222scontrol show res
    258223}}}
    259 
    260224=== How can I check which jobs are running in the machine? ===
    261 
    262 Please use the {{{squeue}}} command ( the "-u $USER" option to only list jobs belonging to your user id).
     225Please use the `squeue` command ( the "-u $USER" option to only list jobs belonging to your user id).
    263226
    264227=== How do I do chain jobs with dependencies? ===
    265 
    266 Please confer the {{{sbatch}}}/{{{srun}}} man page, especially the
     228Please confer the `sbatch`/`srun` man page, especially the
    267229
    268230{{{
    269231-d, --dependency=<dependency_list>
    270232}}}
    271 
    272233entry.
    273234
     
    275236
    276237=== How can check the status of partitions and nodes? ===
    277 
    278238The main command to use is `sinfo`. By default, when called alone, `sinfo` will list the available partitions and the number of nodes in each partition in a given status. For example:
    279239
     
    313273debug           up   20:00:00     69   idle deeper-sdv[06-16],dp-cn[01-08,11-24,26-32,34-48],dp-dam[02,06,11-16],knl[04-06],ml-gpu[01-03]
    314274}}}
    315 
    316275Please refer to the man page for `sinfo` for more information.
    317276
    318 === Can I join stderr and stdout like it was done with {{{-joe}}} in Torque? ===
    319 
     277=== Can I join stderr and stdout like it was done with `-joe` in Torque? ===
    320278Not directly. In your batch script, redirect stdout and stderr to the same file:
    321279
    322 {{{#!sh
    323 ...
    324 #SBATCH -o /point/to/the/common/logfile-%j.log
    325 #SBATCH -e /point/to/the/common/logfile-%j.log
    326 ...
    327 }}}
    328 
    329 (The {{{%j}}} will place the job id in the output file). N.B. It might be more efficient to redirect the output of your script's commands to a dedicated file.
     280{{{#!sh ... #SBATCH -o /point/to/the/common/logfile-%j.log #SBATCH -e /point/to/the/common/logfile-%j.log ... }}}
     281
     282(The `%j` will place the job id in the output file). N.B. It might be more efficient to redirect the output of your script's commands to a dedicated file.
    330283
    331284{{{#!comment
     285
    332286=== What is the default binding/pinning behaviour on DEEP? ===
    333 
    334287DEEP uses a !ParTec-modified version of Slurm called psslurm. In psslurm, the options concerning binding and pinning are different from the ones provided in Vanilla Slurm. By default, psslurm will use a ''by rank'' pinning strategy, assigning each Slurm task to a different physical thread on the node starting from OS processor 0. For example:
    335288
    336 {{{#!sh
    337 [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=1 srun -N 1 -n 4 -p dp-cn ./HybridHello | sort -k9n -k11n
    338 Hello from node dp-cn50, core 0; AKA rank 0, thread 0
    339 Hello from node dp-cn50, core 1; AKA rank 1, thread 0
    340 Hello from node dp-cn50, core 2; AKA rank 2, thread 0
    341 Hello from node dp-cn50, core 3; AKA rank 3, thread 0
    342 }}}
    343 
    344 **Attention:** please be aware that the psslurm affinity settings only affect the tasks spawned by Slurm. When using threaded  applications, the thread affinity will be inherited from the task affinity of the process originally spawned by Slurm. For example, for a hybrid MPI-OpenMP application:
    345 {{{#!sh
    346 [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=4 srun -N 1 -n 4 -c 4 -p dp-dam ./HybridHello | sort -k9n -k11n
    347 Hello from node dp-dam01, core 0-3; AKA rank 0, thread 0
    348 Hello from node dp-dam01, core 0-3; AKA rank 0, thread 1
    349 Hello from node dp-dam01, core 0-3; AKA rank 0, thread 2
    350 Hello from node dp-dam01, core 0-3; AKA rank 0, thread 3
    351 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 0
    352 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 1
    353 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 2
    354 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 3
    355 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 0
    356 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 1
    357 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 2
    358 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 3
    359 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 0
    360 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 1
    361 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 2
    362 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 3
    363 }}}
    364 
    365 Be sure to explicitly set the thread affinity settings in your script (e.g. exporting environment variables) or directly in your code. Taking the previous example:
    366 {{{#!sh
    367 [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=4 OMP_PROC_BIND=close srun -N 1 -n 4 -c 4 -p dp-dam ./HybridHello | sort -k9n -k11n
    368 Hello from node dp-dam01, core 0; AKA rank 0, thread 0
    369 Hello from node dp-dam01, core 1; AKA rank 0, thread 1
    370 Hello from node dp-dam01, core 2; AKA rank 0, thread 2
    371 Hello from node dp-dam01, core 3; AKA rank 0, thread 3
    372 Hello from node dp-dam01, core 4; AKA rank 1, thread 0
    373 Hello from node dp-dam01, core 5; AKA rank 1, thread 1
    374 Hello from node dp-dam01, core 6; AKA rank 1, thread 2
    375 Hello from node dp-dam01, core 7; AKA rank 1, thread 3
    376 Hello from node dp-dam01, core 8; AKA rank 2, thread 0
    377 Hello from node dp-dam01, core 9; AKA rank 2, thread 1
    378 Hello from node dp-dam01, core 10; AKA rank 2, thread 2
    379 Hello from node dp-dam01, core 11; AKA rank 2, thread 3
    380 Hello from node dp-dam01, core 12; AKA rank 3, thread 0
    381 Hello from node dp-dam01, core 13; AKA rank 3, thread 1
    382 Hello from node dp-dam01, core 14; AKA rank 3, thread 2
    383 Hello from node dp-dam01, core 15; AKA rank 3, thread 3
    384 }}}
     289{{{#!sh [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=1 srun -N 1 -n 4 -p dp-cn ./HybridHello | sort -k9n -k11n Hello from node dp-cn50, core 0; AKA rank 0, thread 0 Hello from node dp-cn50, core 1; AKA rank 1, thread 0 Hello from node dp-cn50, core 2; AKA rank 2, thread 0 Hello from node dp-cn50, core 3; AKA rank 3, thread 0 }}}
     290
     291**Attention:** please be aware that the psslurm affinity settings only affect the tasks spawned by Slurm. When using threaded  applications, the thread affinity will be inherited from the task affinity of the process originally spawned by Slurm. For example, for a hybrid MPI-OpenMP application: {{{#!sh [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=4 srun -N 1 -n 4 -c 4 -p dp-dam ./HybridHello | sort -k9n -k11n Hello from node dp-dam01, core 0-3; AKA rank 0, thread 0 Hello from node dp-dam01, core 0-3; AKA rank 0, thread 1 Hello from node dp-dam01, core 0-3; AKA rank 0, thread 2 Hello from node dp-dam01, core 0-3; AKA rank 0, thread 3 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 0 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 1 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 2 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 3 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 0 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 1 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 2 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 3 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 0 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 1 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 2 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 3 }}}
     292
     293Be sure to explicitly set the thread affinity settings in your script (e.g. exporting environment variables) or directly in your code. Taking the previous example: {{{#!sh [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=4 OMP_PROC_BIND=close srun -N 1 -n 4 -c 4 -p dp-dam ./HybridHello | sort -k9n -k11n Hello from node dp-dam01, core 0; AKA rank 0, thread 0 Hello from node dp-dam01, core 1; AKA rank 0, thread 1 Hello from node dp-dam01, core 2; AKA rank 0, thread 2 Hello from node dp-dam01, core 3; AKA rank 0, thread 3 Hello from node dp-dam01, core 4; AKA rank 1, thread 0 Hello from node dp-dam01, core 5; AKA rank 1, thread 1 Hello from node dp-dam01, core 6; AKA rank 1, thread 2 Hello from node dp-dam01, core 7; AKA rank 1, thread 3 Hello from node dp-dam01, core 8; AKA rank 2, thread 0 Hello from node dp-dam01, core 9; AKA rank 2, thread 1 Hello from node dp-dam01, core 10; AKA rank 2, thread 2 Hello from node dp-dam01, core 11; AKA rank 2, thread 3 Hello from node dp-dam01, core 12; AKA rank 3, thread 0 Hello from node dp-dam01, core 13; AKA rank 3, thread 1 Hello from node dp-dam01, core 14; AKA rank 3, thread 2 Hello from node dp-dam01, core 15; AKA rank 3, thread 3 }}}
    385294
    386295Please refer to the [https://apps.fz-juelich.de/jsc/hps/jureca/affinity.html following page] on the JURECA documentation for more information about how to affect affinity on the DEEP system using psslurm options. Please be aware that different partitions on DEEP have different number of sockets per node and cores/threads per socket with respect to JURECA. Please refer to the [wiki:System_overview] or run the `lstopo-no-graphics` on the compute nodes to get more information about the hardware configuration on the different modules.
    387  
    388 
    389296
    390297=== How do I use SMT on the DEEP CPUs? ===
    391 
    392298On DEEP, SMT is enabled by default on all nodes. Please be aware that on all JSC systems (including DEEP), each hardware thread is exposed by the OS as a separate CPU. For a ''n''-core node, with ''m'' hardware threads per core, the OS cores from ''0'' to ''n-1'' will correspond to the first hardware thread of all hardware cores (from all sockets), the OS cores from ''n'' to ''2n-1'' to the second hardware thread of the hardware cores, and so on.
    393299
    394300For instance, on a Cluster node (with two sockets with 12 cores each, with 2 hardware threads per core):
     301
    395302{{{
    396303[deamicis1@deepv hybridhello]$ srun -N 1 -n 1 -p dp-cn lstopo-no-graphics --no-caches --no-io --no-bridges --of ascii
     
    453360To exploit SMT, simply run a job using a number of tasks*threads_per_task higher than the number of physical cores available on a node. Please refer to the [https://apps.fz-juelich.de/jsc/hps/jureca/smt.html relevant page] of the JURECA documentation for more information on how to use SMT on the DEEP nodes.
    454361
    455 **Attention**: currently the only way to assign Slurm tasks to hardware threads belonging to the same hardware core is to use the `--cpu-bind` option of psslurm using `mask_cpu` to provide affinity masks for each task. For example:
    456 {{{#!sh
    457 [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=2 OMP_PROC_BIND=close OMP_PLACES=threads srun -N 1 -n 2 -p dp-dam --cpu-bind=mask_cpu:$(printf '%x' "$((2#1000000000000000000000000000000000000000000000001))"),$(printf '%x' "$((2#10000000000000000000000000000000000000000000000010))") ./HybridHello | sort -k9n -k11n
    458 Hello from node dp-dam01, core 0; AKA rank 0, thread 0
    459 Hello from node dp-dam01, core 48; AKA rank 0, thread 1
    460 Hello from node dp-dam01, core 1; AKA rank 1, thread 0
    461 Hello from node dp-dam01, core 49; AKA rank 1, thread 1
    462 }}}
    463 
    464 This can be cumbersome for jobs using a large number of tasks per node. In such cases, a tool like [https://www.open-mpi.org/projects/hwloc/ hwloc] (currently available on the compute nodes, but not on the login node!) can be used to calculate the affinity masks to be passed to psslurm.
    465 }}}
     362**Attention**: currently the only way to assign Slurm tasks to hardware threads belonging to the same hardware core is to use the `--cpu-bind` option of psslurm using `mask_cpu` to provide affinity masks for each task. For example: {{{#!sh [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=2 OMP_PROC_BIND=close OMP_PLACES=threads srun -N 1 -n 2 -p dp-dam --cpu-bind=mask_cpu:$(printf '%x' "$((2#1000000000000000000000000000000000000000000000001))"),$(printf '%x' "$((2#10000000000000000000000000000000000000000000000010))") ./HybridHello | sort -k9n -k11n Hello from node dp-dam01, core 0; AKA rank 0, thread 0 Hello from node dp-dam01, core 48; AKA rank 0, thread 1 Hello from node dp-dam01, core 1; AKA rank 1, thread 0 Hello from node dp-dam01, core 49; AKA rank 1, thread 1 }}}
     363
     364This can be cumbersome for jobs using a large number of tasks per node. In such cases, a tool like [https://www.open-mpi.org/projects/hwloc/ hwloc] (currently available on the compute nodes, but not on the login node!) can be used to calculate the affinity masks to be passed to psslurm. }}}
    466365
    467366{{{#!comment
     367
    468368== pbs/slurm dictionary ==
    469369|| '''PBS command''' || '''closest slurm equivalent''' ||
     
    483383|| setres -u <user> ALL || scontrol create reservation ReservationName=\<some name> user=\<user> Nodes=ALL startTime=now duration=unlimited FLAGS=maint,ignore_jobs ||
    484384|| releaseres || scontrol delete ReservationName= <reservation> ||
    485 }}}
     385
     386}}}