Changes between Version 57 and Version 58 of Public/User_Guide/Batch_system


Ignore:
Timestamp:
Jul 8, 2022, 3:49:07 PM (23 months ago)
Author:
Jochen Kreutz
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/Batch_system

    v57 v58  
    22
    33= Information about the batch system (SLURM) =
    4 {{{#!comment outdated ?! For the old torque documentation, please see [wiki:"Public/User_Guide/Batch_system_torque" the old documentation].
    5 
    6 Please confer /etc/slurm/README. }}}
    74
    85The DEEP prototype systems are running SLURM for resource management. Documentation of Slurm can be found [https://slurm.schedmd.com/ here].
     
    5754
    5855===  ===
    59 Assume you would like to run an MPI task on 4 cluster nodes with 2 tasks per node. It's necessary to use `salloc` then:
     56Assume you would like to run an MPI task on 4 cluster nodes with 2 tasks per node. It's necessary to use `salloc` then:
    6057
    6158{{{
     
    7168Hello World from rank 5 of 8 on dp-cn03
    7269}}}
    73 [[BR]]Once you get to the compute node, start your application using `srun`. Note that the number of tasks used is the same as specified in the initial `srun` command above (4 nodes with two tasks each). It's also possible to use less nodes in the `srun` command.  So the following command would work as well:
     70[[BR]]Once you get to the compute node, start your application using `srun`. Note that the number of tasks used is the same as specified in the initial `srun` command above (4 nodes with two tasks each). It's also possible to use less nodes in the `srun` command.  So the following command would work as well:
    7471
    7572{{{
     
    278275Not directly. In your batch script, redirect stdout and stderr to the same file:
    279276
    280 {{{#!sh ... #SBATCH -o /point/to/the/common/logfile-%j.log #SBATCH -e /point/to/the/common/logfile-%j.log ... }}}
     277`#!sh ... #SBATCH -o /point/to/the/common/logfile-%j.log #SBATCH -e /point/to/the/common/logfile-%j.log ... `
    281278
    282279(The `%j` will place the job id in the output file). N.B. It might be more efficient to redirect the output of your script's commands to a dedicated file.
     
    287284DEEP uses a !ParTec-modified version of Slurm called psslurm. In psslurm, the options concerning binding and pinning are different from the ones provided in Vanilla Slurm. By default, psslurm will use a ''by rank'' pinning strategy, assigning each Slurm task to a different physical thread on the node starting from OS processor 0. For example:
    288285
    289 {{{#!sh [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=1 srun -N 1 -n 4 -p dp-cn ./HybridHello | sort -k9n -k11n Hello from node dp-cn50, core 0; AKA rank 0, thread 0 Hello from node dp-cn50, core 1; AKA rank 1, thread 0 Hello from node dp-cn50, core 2; AKA rank 2, thread 0 Hello from node dp-cn50, core 3; AKA rank 3, thread 0 }}}
    290 
    291 **Attention:** please be aware that the psslurm affinity settings only affect the tasks spawned by Slurm. When using threaded  applications, the thread affinity will be inherited from the task affinity of the process originally spawned by Slurm. For example, for a hybrid MPI-OpenMP application: {{{#!sh [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=4 srun -N 1 -n 4 -c 4 -p dp-dam ./HybridHello | sort -k9n -k11n Hello from node dp-dam01, core 0-3; AKA rank 0, thread 0 Hello from node dp-dam01, core 0-3; AKA rank 0, thread 1 Hello from node dp-dam01, core 0-3; AKA rank 0, thread 2 Hello from node dp-dam01, core 0-3; AKA rank 0, thread 3 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 0 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 1 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 2 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 3 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 0 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 1 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 2 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 3 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 0 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 1 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 2 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 3 }}}
    292 
    293 Be sure to explicitly set the thread affinity settings in your script (e.g. exporting environment variables) or directly in your code. Taking the previous example: {{{#!sh [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=4 OMP_PROC_BIND=close srun -N 1 -n 4 -c 4 -p dp-dam ./HybridHello | sort -k9n -k11n Hello from node dp-dam01, core 0; AKA rank 0, thread 0 Hello from node dp-dam01, core 1; AKA rank 0, thread 1 Hello from node dp-dam01, core 2; AKA rank 0, thread 2 Hello from node dp-dam01, core 3; AKA rank 0, thread 3 Hello from node dp-dam01, core 4; AKA rank 1, thread 0 Hello from node dp-dam01, core 5; AKA rank 1, thread 1 Hello from node dp-dam01, core 6; AKA rank 1, thread 2 Hello from node dp-dam01, core 7; AKA rank 1, thread 3 Hello from node dp-dam01, core 8; AKA rank 2, thread 0 Hello from node dp-dam01, core 9; AKA rank 2, thread 1 Hello from node dp-dam01, core 10; AKA rank 2, thread 2 Hello from node dp-dam01, core 11; AKA rank 2, thread 3 Hello from node dp-dam01, core 12; AKA rank 3, thread 0 Hello from node dp-dam01, core 13; AKA rank 3, thread 1 Hello from node dp-dam01, core 14; AKA rank 3, thread 2 Hello from node dp-dam01, core 15; AKA rank 3, thread 3 }}}
    294 
    295 Please refer to the [https://apps.fz-juelich.de/jsc/hps/jureca/affinity.html following page] on the JURECA documentation for more information about how to affect affinity on the DEEP system using psslurm options. Please be aware that different partitions on DEEP have different number of sockets per node and cores/threads per socket with respect to JURECA. Please refer to the [wiki:System_overview] or run the `lstopo-no-graphics` on the compute nodes to get more information about the hardware configuration on the different modules. 
     286`#!sh [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=1 srun -N 1 -n 4 -p dp-cn ./HybridHello | sort -k9n -k11n Hello from node dp-cn50, core 0; AKA rank 0, thread 0 Hello from node dp-cn50, core 1; AKA rank 1, thread 0 Hello from node dp-cn50, core 2; AKA rank 2, thread 0 Hello from node dp-cn50, core 3; AKA rank 3, thread 0 `
     287
     288**Attention:** please be aware that the psslurm affinity settings only affect the tasks spawned by Slurm. When using threaded  applications, the thread affinity will be inherited from the task affinity of the process originally spawned by Slurm. For example, for a hybrid MPI-OpenMP application: `#!sh [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=4 srun -N 1 -n 4 -c 4 -p dp-dam ./HybridHello | sort -k9n -k11n Hello from node dp-dam01, core 0-3; AKA rank 0, thread 0 Hello from node dp-dam01, core 0-3; AKA rank 0, thread 1 Hello from node dp-dam01, core 0-3; AKA rank 0, thread 2 Hello from node dp-dam01, core 0-3; AKA rank 0, thread 3 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 0 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 1 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 2 Hello from node dp-dam01, core 4-7; AKA rank 1, thread 3 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 0 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 1 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 2 Hello from node dp-dam01, core 8-11; AKA rank 2, thread 3 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 0 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 1 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 2 Hello from node dp-dam01, core 12-15; AKA rank 3, thread 3 `
     289
     290Be sure to explicitly set the thread affinity settings in your script (e.g. exporting environment variables) or directly in your code. Taking the previous example: `#!sh [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=4 OMP_PROC_BIND=close srun -N 1 -n 4 -c 4 -p dp-dam ./HybridHello | sort -k9n -k11n Hello from node dp-dam01, core 0; AKA rank 0, thread 0 Hello from node dp-dam01, core 1; AKA rank 0, thread 1 Hello from node dp-dam01, core 2; AKA rank 0, thread 2 Hello from node dp-dam01, core 3; AKA rank 0, thread 3 Hello from node dp-dam01, core 4; AKA rank 1, thread 0 Hello from node dp-dam01, core 5; AKA rank 1, thread 1 Hello from node dp-dam01, core 6; AKA rank 1, thread 2 Hello from node dp-dam01, core 7; AKA rank 1, thread 3 Hello from node dp-dam01, core 8; AKA rank 2, thread 0 Hello from node dp-dam01, core 9; AKA rank 2, thread 1 Hello from node dp-dam01, core 10; AKA rank 2, thread 2 Hello from node dp-dam01, core 11; AKA rank 2, thread 3 Hello from node dp-dam01, core 12; AKA rank 3, thread 0 Hello from node dp-dam01, core 13; AKA rank 3, thread 1 Hello from node dp-dam01, core 14; AKA rank 3, thread 2 Hello from node dp-dam01, core 15; AKA rank 3, thread 3 `
     291
     292Please refer to the [https://apps.fz-juelich.de/jsc/hps/jureca/affinity.html following page] on the JURECA documentation for more information about how to affect affinity on the DEEP system using psslurm options. Please be aware that different partitions on DEEP have different number of sockets per node and cores/threads per socket with respect to JURECA. Please refer to the [wiki:System_overview] or run the `lstopo-no-graphics` on the compute nodes to get more information about the hardware configuration on the different modules.
    296293
    297294=== How do I use SMT on the DEEP CPUs? ===
     
    360357To exploit SMT, simply run a job using a number of tasks*threads_per_task higher than the number of physical cores available on a node. Please refer to the [https://apps.fz-juelich.de/jsc/hps/jureca/smt.html relevant page] of the JURECA documentation for more information on how to use SMT on the DEEP nodes.
    361358
    362 **Attention**: currently the only way to assign Slurm tasks to hardware threads belonging to the same hardware core is to use the `--cpu-bind` option of psslurm using `mask_cpu` to provide affinity masks for each task. For example: {{{#!sh [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=2 OMP_PROC_BIND=close OMP_PLACES=threads srun -N 1 -n 2 -p dp-dam --cpu-bind=mask_cpu:$(printf '%x' "$((2#1000000000000000000000000000000000000000000000001))"),$(printf '%x' "$((2#10000000000000000000000000000000000000000000000010))") ./HybridHello | sort -k9n -k11n Hello from node dp-dam01, core 0; AKA rank 0, thread 0 Hello from node dp-dam01, core 48; AKA rank 0, thread 1 Hello from node dp-dam01, core 1; AKA rank 1, thread 0 Hello from node dp-dam01, core 49; AKA rank 1, thread 1 }}}
     359**Attention**: currently the only way to assign Slurm tasks to hardware threads belonging to the same hardware core is to use the `--cpu-bind` option of psslurm using `mask_cpu` to provide affinity masks for each task. For example: `#!sh [deamicis1@deepv hybridhello]$ OMP_NUM_THREADS=2 OMP_PROC_BIND=close OMP_PLACES=threads srun -N 1 -n 2 -p dp-dam --cpu-bind=mask_cpu:$(printf '%x' "$((2#1000000000000000000000000000000000000000000000001))"),$(printf '%x' "$((2#10000000000000000000000000000000000000000000000010))") ./HybridHello | sort -k9n -k11n Hello from node dp-dam01, core 0; AKA rank 0, thread 0 Hello from node dp-dam01, core 48; AKA rank 0, thread 1 Hello from node dp-dam01, core 1; AKA rank 1, thread 0 Hello from node dp-dam01, core 49; AKA rank 1, thread 1 `
    363360
    364361This can be cumbersome for jobs using a large number of tasks per node. In such cases, a tool like [https://www.open-mpi.org/projects/hwloc/ hwloc] (currently available on the compute nodes, but not on the login node!) can be used to calculate the affinity masks to be passed to psslurm. }}}