9 | | Slurm offers interactive and batch jobs (scripts submitted into the system). The relevant commands are {{{srun}}} and {{{sbatch}}}. The {{{srun}}} command can be used to spawn processes ('''please do not use mpiexec'''), both from the frontend and from within a batch script. You can also get a shell on a node to work locally there (e.g. to compile your application natively for a special platform. |
10 | | |
11 | | == !!!OUTDATED!!! Remark about modules == |
12 | | |
13 | | Slurm passes the environment from your job submission session directly to the execution environment. The setup as used with torque therefore doesn't work anymore. Please use |
14 | | |
15 | | {{{ |
16 | | # workaround for missing module file |
17 | | . /etc/profile.d/modules.sh |
18 | | |
19 | | module purge |
20 | | module load intel/16.3 parastation/intel1603-e10-5.1.9-1_11_gc11866c_e10 extoll |
21 | | }}} |
22 | | |
23 | | instead. |
| 11 | Slurm offers interactive and batch jobs (scripts submitted into the system). The relevant commands are `srun` and `sbatch`. The `srun` command can be used to spawn processes ('''please do not use mpiexec'''), both from the frontend and from within a batch script. You can also get a shell on a node to work locally there (e.g. to compile your application natively for a special platform. |
| 12 | |
| 13 | == Remark about modules == |
| 14 | |
| 15 | By default, Slurm passes the environment from your job submission session directly to the execution environment. Please be aware of this when running jobs with `srun` or when submitting scripts with `sbatch`. This behavior can be controlled via the `--export` option. Please refer to the [https://slurm.schedmd.com/ Slurm documentation] to get more information about this. |
| 16 | |
| 17 | In particular, when submitting job scripts, it is recommended to load the necessary modules within the script and submit the script from a clean environment. |
| 18 | |
228 | | Mixing nodes from different partitions will appear in version 17.11 of slurm. As a workaround, you can explicitly request nodes: |
229 | | |
230 | | {{{ |
231 | | srun/sbatch --partition=extoll -w cluster1,...,clusterx,booster1,...,boostern_b -n ... |
232 | | }}} |
233 | | |
234 | | With this the same number of processes will be launched on all allocated nodes. With the following example the number of processes per node can be different for each partition. one node of the sdv partition and one of the knl partition is allocated here. The -m plane=X option sets the number of processes on the first part of nodes (in this case 4 and then 1 process is left for the knl node, because -n is set to 5): |
235 | | |
236 | | {{{ |
237 | | -bash-4.1$ srun --partition=extoll -N2 -n 5 -C '[sdv*1&knl*1]' -m plane=4 hostname |
238 | | deeper-sdv16 |
239 | | deeper-sdv16 |
240 | | deeper-sdv16 |
241 | | deeper-sdv16 |
242 | | knl01 |
243 | | }}} |
244 | | |
245 | | To change the node where to start your job (e.g. start on one partition and then spawn the rest of the processes later within your code) please use the -r option for srun. |
246 | | |
247 | | {{{ |
248 | | -bash-4.1$ salloc --partition=extoll -N2 -n 5 -C '[sdv*1&knl*1]' -m plane=4 |
249 | | salloc: Granted job allocation 5581 |
250 | | -bash-4.1$ srun -n 1 -r 1 hostname |
251 | | knl02 |
252 | | }}} |
| 214 | As of version 17.11 of Slurm, heterogeneous jobs are supported. For example, the user can run: |
| 215 | |
| 216 | {{{ |
| 217 | srun --partition=sdv -N 1 -n 1 hostname : --partition=knl -N 1 -n 1 hostname |
| 218 | deeper-sdv01 |
| 219 | knl05 |
| 220 | }}} |
| 221 | |
| 222 | In order to submit a heterogeneous job, the user needs to set the batch script similarly to the following: |
| 223 | |
| 224 | {{{#!sh |
| 225 | #!/bin/bash |
| 226 | |
| 227 | #SBATCH --job-name=imb_execute_1 |
| 228 | #SBATCH --account=deep |
| 229 | #SBATCH --mail-user= |
| 230 | #SBATCH --mail-type=ALL |
| 231 | #SBATCH --output=job.out |
| 232 | #SBATCH --error=job.err |
| 233 | #SBATCH --time=00:02:00 |
| 234 | |
| 235 | #SBATCH --partition=sdv |
| 236 | #SBATCH --constraint= |
| 237 | #SBATCH --nodes=1 |
| 238 | #SBATCH --ntasks=12 |
| 239 | #SBATCH --ntasks-per-node=12 |
| 240 | #SBATCH --cpus-per-task=1 |
| 241 | |
| 242 | #SBATCH packjob |
| 243 | |
| 244 | #SBATCH --partition=knl |
| 245 | #SBATCH --constraint= |
| 246 | #SBATCH --nodes=1 |
| 247 | #SBATCH --ntasks=12 |
| 248 | #SBATCH --ntasks-per-node=12 |
| 249 | #SBATCH --cpus-per-task=1 |
| 250 | |
| 251 | srun ./app_sdv : ./app_knl |
| 252 | }}} |
| 253 | |
| 254 | Here the `packjob` keyword allows to define Slurm parameter for each sub-job of the heterogeneous job. |
| 255 | |
| 256 | If you need to load modules before launching the application, it's suggested to create wrapper scripts around the applications, and submit such scripts with srun, like this: |
| 257 | |
| 258 | {{{#!sh |
| 259 | ... |
| 260 | srun ./script_sdv.sh : ./script_knl.sh |
| 261 | }}} |
| 262 | |
| 263 | where a script should contain: |
| 264 | |
| 265 | {{{#!sh |
| 266 | #!/bin/bash |
| 267 | |
| 268 | module load ... |
| 269 | ./app_sdv |
| 270 | }}} |
| 271 | |
| 272 | This way it will also be possible to load different modules on the different partitions used in the heterogeneous job. |
| 273 | |