Changes between Initial Version and Version 1 of Public/User_Guide/Modular_jobs


Ignore:
Timestamp:
Oct 15, 2021, 11:27:12 AM (3 years ago)
Author:
Jochen Kreutz
Comment:

add content from batchsystem page

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/Modular_jobs

    v1 v1  
     1[[TOC]]
     2
     3= Information about heterogeneous and modular jobs =
     4
     5== Heterogeneous jobs ==
     6
     7As of version 17.11 of Slurm, heterogeneous jobs are supported. For example, the user can run:
     8
     9{{{
     10srun --account=deep --partition=dp-cn -N 1 -n 1 hostname : --partition=dp-dam -N 1 -n 1 hostname
     11dp-cn01
     12dp-dam01
     13}}}
     14
     15Please notice the `:` separating the definitions for each sub-job of the heterogeneous job. Also, please be aware that it is possible to have more than two sub-jobs in a heterogeneous job.
     16
     17The user can also request several sets of nodes in a heterogeneous allocation using `salloc`. For example:
     18{{{
     19salloc --partition=dp-cn -N 2 : --partition=dp-dam -N 4
     20}}}
     21
     22In order to submit a heterogeneous job via `sbatch`, the user needs to set the batch script similar to the following one:
     23
     24{{{#!sh
     25#!/bin/bash
     26
     27#SBATCH --job-name=imb_execute_1
     28#SBATCH --account=deep
     29#SBATCH --mail-user=
     30#SBATCH --mail-type=ALL
     31#SBATCH --output=job.out
     32#SBATCH --error=job.err
     33#SBATCH --time=00:02:00
     34
     35#SBATCH --partition=dp-cn
     36#SBATCH --nodes=1
     37#SBATCH --ntasks=12
     38#SBATCH --ntasks-per-node=12
     39#SBATCH --cpus-per-task=1
     40
     41#SBATCH packjob
     42
     43#SBATCH --partition=dp-dam
     44#SBATCH --constraint=
     45#SBATCH --nodes=1
     46#SBATCH --ntasks=12
     47#SBATCH --ntasks-per-node=12
     48#SBATCH --cpus-per-task=1
     49
     50srun ./app_cn : ./app_dam
     51}}}
     52
     53Here the `packjob` keyword allows to define Slurm parameters for each sub-job of the heterogeneous job. Some Slurm options can be defined once at the beginning of the script and are automatically propagated to all sub-jobs of the heterogeneous job, while some others (i.e. `--nodes` or `--ntasks`) must be defined for each sub-job. You can find a list of the propagated options on the [https://slurm.schedmd.com/heterogeneous_jobs.html#submitting Slurm documentation].
     54
     55When submitting a heterogeneous job with this colon notation using ParaStationMPI, a unique `MPI_COMM_WORLD` is created, spanning across the two partitions. If this is not desired, one can use the `--pack-group` key to submit independent job steps to the different node-groups of a heterogeneous allocation:
     56
     57{{{#!sh
     58srun --pack-group=0 ./app_cn ; srun --pack-group=1 ./app_dam
     59}}}
     60
     61Using this configuration implies that inter-communication must be established manually by the applications during run time, if needed.
     62
     63For more information about heterogeneous jobs please refer to the [https://slurm.schedmd.com/heterogeneous_jobs.html relevant page] of the Slurm documentation.
     64
     65=== Heterogeneous jobs with MPI communication across modules ===
     66
     67In order to establish MPI communication across modules using different interconnect technologies, some special Gateway nodes must be used. On the DEEP-EST system, MPI communication across gateways is needed only between Infiniband and Extoll interconnects.
     68
     69**Attention:** Only !ParaStation MPI supports MPI communication across gateway nodes.
     70
     71This is an example job script for setting up an Intel MPI benchmark between a Cluster and a DAM node using a IB <-> Extoll gateway for MPI communication:
     72
     73{{{#!sh
     74#!/bin/bash
     75
     76# Script to launch IMB PingPong between DAM-CN using 1 Gateway
     77# Use the gateway allocation provided by SLURM
     78# Use the packjob feature to launch separately CM and DAM executable
     79
     80
     81# General configuration of the job
     82#SBATCH --job-name=modular-imb
     83#SBATCH --account=deep
     84#SBATCH --time=00:10:00
     85#SBATCH --output=modular-imb-%j.out
     86#SBATCH --error=modular-imb-%j.err
     87
     88# Configure the gateway daemon
     89#SBATCH --gw_num=1
     90#SBATCH --gw_psgwd_per_node=1
     91
     92# Configure node and process count on the CM
     93#SBATCH --partition=dp-cn
     94#SBATCH --nodes=1
     95#SBATCH --ntasks-per-node=1
     96
     97#SBATCH packjob
     98
     99# Configure node and process count on the DAM
     100#SBATCH --partition=dp-dam-ext
     101#SBATCH --nodes=1
     102#SBATCH --ntasks-per-node=1
     103
     104# Echo job configuration
     105echo "DEBUG: SLURM_JOB_NODELIST=$SLURM_JOB_NODELIST"
     106echo "DEBUG: SLURM_NNODES=$SLURM_NNODES"
     107echo "DEBUG: SLURM_TASKS_PER_NODE=$SLURM_TASKS_PER_NODE"
     108
     109
     110# Set the environment to use PS-MPI
     111module --force purge
     112module use $OTHERSTAGES
     113module load Stages/Devel-2019a
     114module load Intel
     115module load ParaStationMPI
     116
     117# Show the hosts we are running on
     118srun hostname : hostname
     119
     120# Execute
     121APP="./IMB-MPI1 Uniband"
     122srun ${APP}  : ${APP}
     123}}}
     124
     125
     126
     127**Attention:** During the first part of 2020, only the DAM nodes will have Extoll interconnect (and only the nodes belonging to the `deep-dam-ext` partition will have Extoll active), while the CM and the ESB nodes will be connected via Infiniband. This will change later during the course of the project (expected end of Summer 2020), when the ESB will be equipped with Extoll connectivity (Infiniband will be removed from the ESB and left only for the CM).
     128
     129A general description of how the user can request and use gateway nodes is provided at [https://apps.fz-juelich.de/jsc/hps/jureca/modular-jobs.html#mpi-traffic-across-modules this section] of the JURECA documentation.
     130
     131**Attention:** some information provided on the JURECA documentation do not apply for the DEEP system. In particular:
     132* as of 31/03/2020, the DEEP system has 2 gateway nodes.
     133
     134* As of 09/01/2020 the gateway nodes are exclusive to the job requesting them. Given the limited number of gateway nodes available on the system, this may change in the future.
     135
     136* As of 09/04/2020 the `xenv` utility (necessary on JURECA to load modules for different architectures - Haswell and KNL) is not needed any more on DEEP when using the latest version of ParaStationMPI (currently available in the `Devel-2019a` stage and soon available on the default production stage).
     137
     138{{{#!comment
     139If you need to load modules before launching the application, it's suggested to create wrapper scripts around the applications, and submit such scripts with srun, like this:
     140
     141{{{#!sh
     142...
     143srun ./script_sdv.sh : ./script_knl.sh
     144}}}
     145
     146where a script should contain:
     147
     148{{{#!sh
     149#!/bin/bash
     150
     151module load ...
     152./app_sdv
     153}}}
     154
     155This way it will also be possible to load different modules on the different partitions used in the heterogeneous job.
     156}}}