452 | | If a job exits earlier than the allocated time asked by the user, the corresponding reservation for this job is deleted automatically and the resources become available for the other jobs. However, users should be careful with the requested time when submitting workflows as the larger time values can delay the scheduling of the workflows depending on the situation of the resources. |
| 454 | If a job exits earlier than the allocated time asked by the user, the corresponding reservation for this job is deleted 5 minutes after the end of the job, automatically and the resources become available for the other jobs. However, users should be careful with the requested time when submitting workflows as the larger time values can delay the scheduling of the workflows depending on the situation of the resources. |
| 455 | |
| 456 | The workflows created using {{{delay}}} switch ensure overlap between the applications. The second method that includes dependencies among jobs, does not ensure an overlap but avoids users to guess the time a job will take and how much should be the delay between jobs. The process is simple. A user submits a job and later a dependent job with a dependency of type {{{afterok}}}. Inside the first (independent) job, the application running calls the function provided in {{{slurm_workflow}}} library, that changes the dependency type of the dependent job to {{{after}}}. This enables the dependent job to be eligible for allocation by slurm immediately. However, the allocation of resources depends upon the situation of resources available in the system. The following script helps to submit jobs in the form of a chain with a provided dependency type. |
| 457 | {{{ |
| 458 | [huda1@deepv scripts]$ cat chain_jobs.sh |
| 459 | #!/usr/bin/env bash |
| 460 | |
| 461 | if [ $# -lt 3 ] |
| 462 | then |
| 463 | echo "$0: ERROR (MISSING ARGUMENTS)" |
| 464 | exit 1 |
| 465 | fi |
| 466 | |
| 467 | LOCKFILE=$1 |
| 468 | DEPENDENCY_TYPE=$2 |
| 469 | shift 2 |
| 470 | SUBMITSCRIPT=$* |
| 471 | |
| 472 | |
| 473 | if [ -f $LOCKFILE ] |
| 474 | then |
| 475 | if [[ "$DEPENDENCY_TYPE" =~ ^(after|afterany|afterok|afternotok)$ ]]; then |
| 476 | DEPEND_JOBID=`head -1 $LOCKFILE` |
| 477 | echo "sbatch --dependency=${DEPENDENCY_TYPE}:${DEPEND_JOBID} $SUBMITSCRIPT" |
| 478 | JOBID=`sbatch --dependency=${DEPENDENCY_TYPE}:${DEPEND_JOBID} $SUBMITSCRIPT` |
| 479 | else |
| 480 | echo "$0: ERROR (WRONG DEPENDENCY TYPE: choose among 'after', 'afterany', 'afterok' or 'afternotok')" |
| 481 | fi |
| 482 | else |
| 483 | echo "sbatch $SUBMITSCRIPT" |
| 484 | JOBID=`sbatch $SUBMITSCRIPT` |
| 485 | fi |
| 486 | |
| 487 | echo "RETURN: $JOBID" |
| 488 | # the JOBID is the last field of the output line |
| 489 | echo ${JOBID##* } > $LOCKFILE |
| 490 | |
| 491 | exit 0 |
| 492 | }}} |
| 493 | |
| 494 | Here is the example of submission. |
| 495 | {{{ |
| 496 | [huda1@deepv scripts]$ ./chain_jobs.sh lockfile afterok simple_job.sh |
| 497 | sbatch simple_job.sh |
| 498 | RETURN: Submitted batch job 98626 |
| 499 | [huda1@deepv scripts]$ ./chain_jobs.sh lockfile afterok simple_job.sh |
| 500 | sbatch --dependency=afterok:98626 simple_job.sh |
| 501 | RETURN: Submitted batch job 98627 |
| 502 | [huda1@deepv scripts]$ ./chain_jobs.sh lockfile afterok simple_job.sh |
| 503 | sbatch --dependency=afterok:98627 simple_job.sh |
| 504 | RETURN: Submitted batch job 98628 |
| 505 | [huda1@deepv scripts]$ squeue -u huda1 |
| 506 | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) |
| 507 | 98627 sdv simple_j huda1 PD 0:00 2 (Dependency) |
| 508 | 98628 sdv simple_j huda1 PD 0:00 2 (Dependency) |
| 509 | 98626 sdv simple_j huda1 R 0:21 2 deeper-sdv[01-02] |
| 510 | [huda1@deepv scripts]$ scontrol show job 98628 | grep Dependency |
| 511 | JobState=PENDING Reason=Dependency Dependency=afterok:98627 |
| 512 | [huda1@deepv scripts]$ cat lockfile |
| 513 | 98628 |
| 514 | }}} |
| 515 | Note that the {{{lockfile}}} contains the id of last submitted job. |