Changes between Version 19 and Version 20 of Public/User_Guide/Batch_system


Ignore:
Timestamp:
Nov 21, 2019, 4:41:51 PM (4 years ago)
Author:
Jacopo de Amicis
Comment:

Added information about heterogeneous jobs, binding/pinning and SMT.

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/Batch_system

    v19 v20  
    136136}}}
    137137
     138== Heterogeneous jobs ==
     139
     140As of version 17.11 of Slurm, heterogeneous jobs are supported. For example, the user can run:
     141
     142{{{
     143srun --partition=dp-cn -N 1 -n 1 hostname : --partition=dp-dam -N 1 -n 1 hostname
     144dp-cn01
     145dp-dam01
     146}}}
     147
     148In order to submit a heterogeneous job, the user needs to set the batch script similarly to the following:
     149
     150{{{#!sh
     151#!/bin/bash
     152
     153#SBATCH --job-name=imb_execute_1
     154#SBATCH --account=deep
     155#SBATCH --mail-user=
     156#SBATCH --mail-type=ALL
     157#SBATCH --output=job.out
     158#SBATCH --error=job.err
     159#SBATCH --time=00:02:00
     160
     161#SBATCH --partition=dp-cn
     162#SBATCH --nodes=1
     163#SBATCH --ntasks=12
     164#SBATCH --ntasks-per-node=12
     165#SBATCH --cpus-per-task=1
     166
     167#SBATCH packjob
     168
     169#SBATCH --partition=dp-dam
     170#SBATCH --constraint=
     171#SBATCH --nodes=1
     172#SBATCH --ntasks=12
     173#SBATCH --ntasks-per-node=12
     174#SBATCH --cpus-per-task=1
     175
     176srun ./app_cn : ./app_dam
     177}}}
     178
     179Here the `packjob` keyword allows to define Slurm parameter for each sub-job of the heterogeneous job.
     180
     181When submitting a heterogeneous job with this colon notation using ParaStationMPI, a unique `MPI_COMM_WORLD` is created, spanning across the two partitions. If this is not desired, one can use the `--pack-group` key to submit independent job steps to the different node-groups of a heterogeneous allocation:
     182
     183{{{#!sh
     184srun --pack-group=0 ./app_cn ; srun --pack-group=1 ./app_dam
     185}}}
     186
     187Using this configuration implies that inter-communication must be established manually by the applications during run time, if needed.
     188
     189More information about heterogeneous and cross-module jobs (including how to used gateway nodes) can be found on [https://apps.fz-juelich.de/jsc/hps/jureca/modular-jobs.html this page] of the JURECA documentation. All information available there applies for the DEEP system as well. Please be aware that the DEEP system currently includes 2 gateway nodes between the Infiniband and EXTOLL fabrics.
     190
     191Also, for more information about heterogeneous jobs please refer to the [https://slurm.schedmd.com/heterogeneous_jobs.html relevant page] of the Slurm documentation.
     192
     193{{{#!comment
     194If you need to load modules before launching the application, it's suggested to create wrapper scripts around the applications, and submit such scripts with srun, like this:
     195
     196{{{#!sh
     197...
     198srun ./script_sdv.sh : ./script_knl.sh
     199}}}
     200
     201where a script should contain:
     202
     203{{{#!sh
     204#!/bin/bash
     205
     206module load ...
     207./app_sdv
     208}}}
     209
     210This way it will also be possible to load different modules on the different partitions used in the heterogeneous job.
     211}}}
     212
    138213== Available Partitions ==
    139214
     
    160235}}}
    161236
    162 {{{#!comment information included with the examples
    163 == Interactive Jobs ==
    164 
    165 == Batch Jobs ==
    166 }}}
     237== Information on past jobs and accounting ==
     238
     239The `sacct` command can be used to enquire the Slurm database about a past job.
     240
    167241
    168242== FAQ ==
    169243
     244=== Is there a cheat sheet for all main Slurm commands? ===
     245
     246Yes, it is available [https://slurm.schedmd.com/pdfs/summary.pdf here].
     247
    170248=== Why's my job not running? ===
    171249
     
    198276entry.
    199277
    200 === How can get a list of broken nodes? ===
    201 
    202 The command to use is
    203 
    204 {{{
    205 sinfo -Rl -h -o "%n %12U %19H %6t %E" | sort -u
    206 }}}
    207 
    208 See also the translation table below.
    209 
    210 === Can I join stderr and stdout like it was done with {{{-joe}}} in Torque? ===
     278=== How can check the status of partitions and nodes? ===
     279
     280The main command to use is `sinfo`. By default, when called alone, `sinfo` will list the available partitions and the number of nodes in each partition in a given status. For example:
     281
     282{{{
     283[deamicis1@deepv hybridhello]$ sinfo
     284PARTITION    AVAIL  TIMELIMIT  NODES  STATE NODELIST
     285sdv             up   20:00:00     16   idle deeper-sdv[01-16]
     286knl             up   20:00:00      1  drain knl01
     287knl             up   20:00:00      3   idle knl[04-06]
     288knl256          up   20:00:00      1  drain knl01
     289knl256          up   20:00:00      1   idle knl05
     290knl272          up   20:00:00      2   idle knl[04,06]
     291snc4            up   20:00:00      1   idle knl05
     292dam             up   20:00:00      1  down* protodam01
     293dam             up   20:00:00      3   idle protodam[02-04]
     294extoll          up   20:00:00     16   idle deeper-sdv[01-16]
     295ml-gpu          up   20:00:00      1   idle ml-gpu01
     296dp-cn           up   20:00:00      1  drain dp-cn49
     297dp-cn           up   20:00:00      2  alloc dp-cn[01,50]
     298dp-cn           up   20:00:00     47   idle dp-cn[02-48]
     299dp-dam          up   20:00:00      1 drain* dp-dam01
     300dp-dam          up   20:00:00      1  drain dp-dam02
     301dp-dam          up   20:00:00     14   down dp-dam[03-16]
     302dp-sdv-esb      up   20:00:00      2   idle dp-sdv-esb[01-02]
     303psgw-cluster    up   20:00:00      1  down* nfgw01
     304psgw-booster    up   20:00:00      1  down* nfgw02
     305debug           up   20:00:00      1 drain* dp-dam01
     306debug           up   20:00:00      1  down* protodam01
     307debug           up   20:00:00      3  drain dp-cn49,dp-dam02,knl01
     308debug           up   20:00:00     14   down dp-dam[03-16]
     309debug           up   20:00:00      2  alloc dp-cn[01,50]
     310debug           up   20:00:00     69   idle deeper-sdv[01-16],dp-cn[02-48],knl[04-06],protodam[02-04]
     311}}}
     312
     313Please refer to the man page for `sinfo` for more information.
     314
     315=== Can I join stderr and stdout like it was done with {{{-joe}}} in Torque? ===
    211316
    212317Not directly. In your batch script, redirect stdout and stderr to the same file:
     
    221326(The {{{%j}}} will place the job id in the output file). N.B. It might be more efficient to redirect the output of your script's commands to a dedicated file.
    222327
    223 === What's the equivalent of {{{qsub -l nodes=x:ppn=y:cluster+n_b:ppn=p_b:booster}}}? ===
    224 
    225 As of version 17.11 of Slurm, heterogeneous jobs are supported. For example, the user can run:
    226 
    227 {{{
    228 srun --partition=sdv -N 1 -n 1 hostname : --partition=knl -N 1 -n 1 hostname
    229 deeper-sdv01
    230 knl05
    231 }}}
    232 
    233 In order to submit a heterogeneous job, the user needs to set the batch script similarly to the following:
    234 
    235 {{{#!sh
    236 #!/bin/bash
    237 
    238 #SBATCH --job-name=imb_execute_1
    239 #SBATCH --account=deep
    240 #SBATCH --mail-user=
    241 #SBATCH --mail-type=ALL
    242 #SBATCH --output=job.out
    243 #SBATCH --error=job.err
    244 #SBATCH --time=00:02:00
    245 
    246 #SBATCH --partition=sdv
    247 #SBATCH --constraint=
    248 #SBATCH --nodes=1
    249 #SBATCH --ntasks=12
    250 #SBATCH --ntasks-per-node=12
    251 #SBATCH --cpus-per-task=1
    252 
    253 #SBATCH packjob
    254 
    255 #SBATCH --partition=knl
    256 #SBATCH --constraint=
    257 #SBATCH --nodes=1
    258 #SBATCH --ntasks=12
    259 #SBATCH --ntasks-per-node=12
    260 #SBATCH --cpus-per-task=1
    261 
    262 srun ./app_sdv : ./app_knl
    263 }}}
    264 
    265 Here the `packjob` keyword allows to define Slurm parameter for each sub-job of the heterogeneous job.
    266 
    267 If you need to load modules before launching the application, it's suggested to create wrapper scripts around the applications, and submit such scripts with srun, like this:
    268 
    269 {{{#!sh
    270 ...
    271 srun ./script_sdv.sh : ./script_knl.sh
    272 }}}
    273 
    274 where a script should contain:
    275 
    276 {{{#!sh
    277 #!/bin/bash
    278 
    279 module load ...
    280 ./app_sdv
    281 }}}
    282 
    283 This way it will also be possible to load different modules on the different partitions used in the heterogeneous job.
    284 
    285 
     328
     329=== What is the default binding/pinning behaviour on DEEP? ===
     330
     331DEEP uses a !ParTec-modified version of Slurm called psslurm. In psslurm, the options concerning binding and pinning are different from the ones provided in Vanilla Slurm. By default, psslurm will use a ''by rank'' pinning strategy, assigning each Slurm task to a different physical thread on the node starting from OS processor 0. For example:
     332
     333{{{#!sh
     334[deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=1 srun -N 1 -n 4 -p dp-cn ./HybridHello | sort -k9n -k11n
     335Hello from node dp-cn50, core 0; AKA rank 0, thread 0
     336Hello from node dp-cn50, core 1; AKA rank 1, thread 0
     337Hello from node dp-cn50, core 2; AKA rank 2, thread 0
     338Hello from node dp-cn50, core 3; AKA rank 3, thread 0
     339}}}
     340
     341**Attention:** please be aware that the psslurm affinity settings only affect the tasks spawned by Slurm. When using threaded  applications, the thread affinity will be inherited from the task affinity of the process originally spawned by Slurm. For example, for a hybrid MPI-OpenMP application:
     342{{{#!sh
     343[deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=4 srun -N 1 -n 4 -c 4 -p dp-dam ./HybridHello | sort -k9n -k11n
     344Hello from node dp-dam01, core 0-3; AKA rank 0, thread 0
     345Hello from node dp-dam01, core 0-3; AKA rank 0, thread 1
     346Hello from node dp-dam01, core 0-3; AKA rank 0, thread 2
     347Hello from node dp-dam01, core 0-3; AKA rank 0, thread 3
     348Hello from node dp-dam01, core 4-7; AKA rank 1, thread 0
     349Hello from node dp-dam01, core 4-7; AKA rank 1, thread 1
     350Hello from node dp-dam01, core 4-7; AKA rank 1, thread 2
     351Hello from node dp-dam01, core 4-7; AKA rank 1, thread 3
     352Hello from node dp-dam01, core 8-11; AKA rank 2, thread 0
     353Hello from node dp-dam01, core 8-11; AKA rank 2, thread 1
     354Hello from node dp-dam01, core 8-11; AKA rank 2, thread 2
     355Hello from node dp-dam01, core 8-11; AKA rank 2, thread 3
     356Hello from node dp-dam01, core 12-15; AKA rank 3, thread 0
     357Hello from node dp-dam01, core 12-15; AKA rank 3, thread 1
     358Hello from node dp-dam01, core 12-15; AKA rank 3, thread 2
     359Hello from node dp-dam01, core 12-15; AKA rank 3, thread 3
     360}}}
     361
     362Be sure to explicitly set the thread affinity settings in your script (e.g. exporting environment variables) or directly in your code. Taking the previous example:
     363{{{#!sh
     364[deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=4 OMP_PROC_BIND=close srun -N 1 -n 4 -c 4 -p dp-dam ./HybridHello | sort -k9n -k11n
     365Hello from node dp-dam01, core 0; AKA rank 0, thread 0
     366Hello from node dp-dam01, core 1; AKA rank 0, thread 1
     367Hello from node dp-dam01, core 2; AKA rank 0, thread 2
     368Hello from node dp-dam01, core 3; AKA rank 0, thread 3
     369Hello from node dp-dam01, core 4; AKA rank 1, thread 0
     370Hello from node dp-dam01, core 5; AKA rank 1, thread 1
     371Hello from node dp-dam01, core 6; AKA rank 1, thread 2
     372Hello from node dp-dam01, core 7; AKA rank 1, thread 3
     373Hello from node dp-dam01, core 8; AKA rank 2, thread 0
     374Hello from node dp-dam01, core 9; AKA rank 2, thread 1
     375Hello from node dp-dam01, core 10; AKA rank 2, thread 2
     376Hello from node dp-dam01, core 11; AKA rank 2, thread 3
     377Hello from node dp-dam01, core 12; AKA rank 3, thread 0
     378Hello from node dp-dam01, core 13; AKA rank 3, thread 1
     379Hello from node dp-dam01, core 14; AKA rank 3, thread 2
     380Hello from node dp-dam01, core 15; AKA rank 3, thread 3
     381}}}
     382
     383Please refer to the [https://apps.fz-juelich.de/jsc/hps/jureca/affinity.html following page] on the JURECA documentation for more information about how to affect affinity on the DEEP system using psslurm options. Please be aware that different partitions on DEEP have different number of sockets per node and cores/threads per socket with respect to JURECA. Please refer to the [wiki:System_overview] or run the `lstopo-no-graphics` on the compute nodes to get more information about the hardware configuration on the different modules.
     384 
     385
     386
     387=== How do I use SMT on the DEEP CPUs? ===
     388
     389On DEEP, SMT is enabled by default on all nodes. Please be aware that on all JSC systems (including DEEP), each hardware thread is exposed by the OS as a physical core. For a ''n''-core node, with ''m'' hardware threads per core, the OS cores from ''0'' to ''n-1'' will correspond to the first hardware thread of all hardware cores (from all sockets), the OS cores from ''n'' to ''2n-1'' to the second hardware thread of the hardware cores, and so on.
     390
     391For instance, on a Cluster node (with two sockets with 12 cores each, with 2 hardware threads per core):
     392{{{
     393[deamicis1@deepv hybridhello]$ srun -N 1 -n 1 -p dp-cn lstopo-no-graphics --no-caches --no-io --no-bridges --of ascii
     394┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
     395│ Machine (191GB total)                                                                                                                                  │
     396│                                                                                                                                                        │
     397│ ┌────────────────────────────────────────────────────────────────────────┐  ┌────────────────────────────────────────────────────────────────────────┐ │
     398│ │ ┌────────────────────────────────────────────────────────────────────┐ │  │ ┌────────────────────────────────────────────────────────────────────┐ │ │
     399│ │ │ NUMANode P#0 (95GB)                                                │ │  │ │ NUMANode P#1 (96GB)                                                │ │ │
     400│ │ └────────────────────────────────────────────────────────────────────┘ │  │ └────────────────────────────────────────────────────────────────────┘ │ │
     401│ │                                                                        │  │                                                                        │ │
     402│ │ ┌────────────────────────────────────────────────────────────────────┐ │  │ ┌────────────────────────────────────────────────────────────────────┐ │ │
     403│ │ │ Package P#0                                                        │ │  │ │ Package P#1                                                        │ │ │
     404│ │ │                                                                    │ │  │ │                                                                    │ │ │
     405│ │ │ ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ │ │  │ │ ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ │ │ │
     406│ │ │ │ Core P#0    │  │ Core P#1    │  │ Core P#2    │  │ Core P#3    │ │ │  │ │ │ Core P#0    │  │ Core P#3    │  │ Core P#4    │  │ Core P#8    │ │ │ │
     407│ │ │ │             │  │             │  │             │  │             │ │ │  │ │ │             │  │             │  │             │  │             │ │ │ │
     408│ │ │ │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │ │ │  │ │ │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │ │ │ │
     409│ │ │ │ │ PU P#0  │ │  │ │ PU P#1  │ │  │ │ PU P#2  │ │  │ │ PU P#3  │ │ │ │  │ │ │ │ PU P#12 │ │  │ │ PU P#13 │ │  │ │ PU P#14 │ │  │ │ PU P#15 │ │ │ │ │
     410│ │ │ │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │ │ │  │ │ │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │ │ │ │
     411│ │ │ │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │ │ │  │ │ │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │ │ │ │
     412│ │ │ │ │ PU P#24 │ │  │ │ PU P#25 │ │  │ │ PU P#26 │ │  │ │ PU P#27 │ │ │ │  │ │ │ │ PU P#36 │ │  │ │ PU P#37 │ │  │ │ PU P#38 │ │  │ │ PU P#39 │ │ │ │ │
     413│ │ │ │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │ │ │  │ │ │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │ │ │ │
     414│ │ │ └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘ │ │  │ │ └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘ │ │ │
     415│ │ │                                                                    │ │  │ │                                                                    │ │ │
     416│ │ │ ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ │ │  │ │ ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ │ │ │
     417│ │ │ │ Core P#4    │  │ Core P#9    │  │ Core P#10   │  │ Core P#16   │ │ │  │ │ │ Core P#9    │  │ Core P#10   │  │ Core P#11   │  │ Core P#16   │ │ │ │
     418│ │ │ │             │  │             │  │             │  │             │ │ │  │ │ │             │  │             │  │             │  │             │ │ │ │
     419│ │ │ │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │ │ │  │ │ │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │ │ │ │
     420│ │ │ │ │ PU P#4  │ │  │ │ PU P#5  │ │  │ │ PU P#6  │ │  │ │ PU P#7  │ │ │ │  │ │ │ │ PU P#16 │ │  │ │ PU P#17 │ │  │ │ PU P#18 │ │  │ │ PU P#19 │ │ │ │ │
     421│ │ │ │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │ │ │  │ │ │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │ │ │ │
     422│ │ │ │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │ │ │  │ │ │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │ │ │ │
     423│ │ │ │ │ PU P#28 │ │  │ │ PU P#29 │ │  │ │ PU P#30 │ │  │ │ PU P#31 │ │ │ │  │ │ │ │ PU P#40 │ │  │ │ PU P#41 │ │  │ │ PU P#42 │ │  │ │ PU P#43 │ │ │ │ │
     424│ │ │ │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │ │ │  │ │ │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │ │ │ │
     425│ │ │ └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘ │ │  │ │ └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘ │ │ │
     426│ │ │                                                                    │ │  │ │                                                                    │ │ │
     427│ │ │ ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ │ │  │ │ ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ │ │ │
     428│ │ │ │ Core P#18   │  │ Core P#19   │  │ Core P#25   │  │ Core P#26   │ │ │  │ │ │ Core P#17   │  │ Core P#18   │  │ Core P#24   │  │ Core P#26   │ │ │ │
     429│ │ │ │             │  │             │  │             │  │             │ │ │  │ │ │             │  │             │  │             │  │             │ │ │ │
     430│ │ │ │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │ │ │  │ │ │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │ │ │ │
     431│ │ │ │ │ PU P#8  │ │  │ │ PU P#9  │ │  │ │ PU P#10 │ │  │ │ PU P#11 │ │ │ │  │ │ │ │ PU P#20 │ │  │ │ PU P#21 │ │  │ │ PU P#22 │ │  │ │ PU P#23 │ │ │ │ │
     432│ │ │ │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │ │ │  │ │ │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │ │ │ │
     433│ │ │ │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │ │ │  │ │ │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │ │ │ │
     434│ │ │ │ │ PU P#32 │ │  │ │ PU P#33 │ │  │ │ PU P#34 │ │  │ │ PU P#35 │ │ │ │  │ │ │ │ PU P#44 │ │  │ │ PU P#45 │ │  │ │ PU P#46 │ │  │ │ PU P#47 │ │ │ │ │
     435│ │ │ │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │ │ │  │ │ │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │ │ │ │
     436│ │ │ └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘ │ │  │ │ └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘ │ │ │
     437│ │ └────────────────────────────────────────────────────────────────────┘ │  │ └────────────────────────────────────────────────────────────────────┘ │ │
     438│ └────────────────────────────────────────────────────────────────────────┘  └────────────────────────────────────────────────────────────────────────┘ │
     439└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
     440┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
     441│ Host: dp-cn50                                                                                                                                          │
     442│                                                                                                                                                        │
     443│ Indexes: physical                                                                                                                                      │
     444│                                                                                                                                                        │
     445│ Date: Thu 21 Nov 2019 15:22:31 CET                                                                                                                     │
     446└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
     447}}}
     448The `PU P#X` are the Processing Units numbers exposed by the OS.
     449
     450To exploit SMT, simply run a job using a number of tasks*threads_per_task higher than the number of physical cores available on a node. Please refer to the [https://apps.fz-juelich.de/jsc/hps/jureca/smt.html relevant page] of the JURECA documentation for more information on how to use SMT on the DEEP nodes.
     451
     452**Attention**: currently the only way of assign Slurm tasks to hardware threads belonging to the same hardware core is to use the `--cpu-bind` option of psslurm using `mask_cpu` to provide affinity masks for each task. For example:
     453{{{#!sh
     454[deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=2 OMP_PROC_BIND=close OMP_PLACES=threads srun -N 1 -n 2 -p dp-dam --cpu-bind=mask_cpu:$(printf '%x' "$((2#1000000000000000000000000000000000000000000000001))"),$(printf '%x' "$((2#10000000000000000000000000000000000000000000000010))") ./HybridHello | sort -k9n -k11n
     455Hello from node dp-dam01, core 0; AKA rank 0, thread 0
     456Hello from node dp-dam01, core 48; AKA rank 0, thread 1
     457Hello from node dp-dam01, core 1; AKA rank 1, thread 0
     458Hello from node dp-dam01, core 49; AKA rank 1, thread 1
     459}}}
     460
     461This can be cumbersome for jobs using a large number of tasks per node. In such cases, a tool like [https://www.open-mpi.org/projects/hwloc/ hwloc] (currently available on the compute nodes, but not on the login node!) can be used to calculate the affinity masks to be passed to psslurm.
     462
     463
     464{{{#!comment
    286465== pbs/slurm dictionary ==
    287466|| '''PBS command''' || '''closest slurm equivalent''' ||
     
    301480|| setres -u <user> ALL || scontrol create reservation ReservationName=\<some name> user=\<user> Nodes=ALL startTime=now duration=unlimited FLAGS=maint,ignore_jobs ||
    302481|| releaseres || scontrol delete ReservationName= <reservation> ||
     482}}}