Changes between Version 41 and Version 42 of Public/User_Guide/Batch_system


Ignore:
Timestamp:
Jul 29, 2020, 12:52:11 AM (4 years ago)
Author:
Jacopo de Amicis
Comment:

Changed wording of workflows and workflow_library section.

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/Batch_system

    v41 v42  
    233233This is an example job script for setting up an Intel MPI benchmark between a Cluster and a DAM node using a IB <-> Extoll gateway for MPI communication:
    234234
    235 {{{
     235{{{#!sh
    236236#!/bin/bash
    237237
     
    327327- submitting separate jobs using an `afterok` dependency and later requesting a change in dependency type from `afterok` to `after` (using our provided shared library), which allows the second job to start if resources are available.
    328328
    329 An example project that uses all the features discussed is provided [https://gitlab.version.fz-juelich.de/deamicis1/mpi_connect_test/-/tree/test_zia_workflows here].
     329An example project that uses all the features discussed is provided [https://gitlab.version.fz-juelich.de/DEEP-EST/mpi_connect_test here].
    330330
    331331The following simple example script helps to understand the mechanism of new {{{delay}}} switch for workflows.
    332332
    333 {{{
     333{{{#!sh
    334334[huda1@deepv scripts]$ cat test.sh
    335335#!/bin/sh
     
    386386
    387387Another feature to note is that if there are multiple jobs in a job pack and any number of consecutive jobs have the same {{{delay}}} values, they are combined into a new heterogeneous job. This allows to have heterogeneous jobs within workflows. Here is an example of such a script:
    388 {{{
     388{{{#!sh
    389389[huda1@deepv scripts]$ cat batch_workflow_complex.sh
    390390#!/bin/bash
     
    468468If a job exits earlier than the allocated time asked by the user, the corresponding reservation for this job is deleted 5 minutes after the end of the job, automatically and the resources become available for the other jobs. However, users should be careful with the requested time when submitting workflows as the larger time values can delay the scheduling of the workflows depending on the situation of the resources.
    469469
    470 The workflows created using {{{delay}}} switch ensure overlap between the applications. The second method that includes dependencies among jobs, does not ensure an overlap but avoids users to guess the time a job will take and how much should be the delay between jobs. The process is simple. A user submits a job and later a dependent job with a dependency of type {{{afterok}}}. Inside the first (independent) job, the application running calls the function provided in {{{slurm_workflow}}} library, that changes the dependency type of the dependent job to {{{after}}}. This enables the dependent job to be eligible for allocation by slurm immediately. However, the allocation of resources depends upon the situation of resources available in the system. The following script helps to submit jobs in the form of a chain with a provided dependency type.
    471 {{{
     470The workflows created using {{{delay}}} switch ensure overlap between the applications. Instead, using the alternative method (which uses Slurm job dependencies) does not ensure a time overlap between two consecutive jobs of a workflow. Though, in this case users do not need to guess the time a job will take and how much should the delay between jobs starting times should be.
     471
     472Jobs can be chained in Slurm with the aid of the following script:
     473{{{#!sh
    472474[huda1@deepv scripts]$ cat chain_jobs.sh
    473475#!/usr/bin/env bash
     
    506508}}}
    507509
    508 Here is the example of submission.
     510This is a modified version of the of the `chainJobs.sh` included in JUBE, which allows to select the desired dependency type between two consecutive jobs.
     511Here is an example of submission of a workflow with Slurm dependencies using the previous script (here called `chain_jobs.sh`):
    509512{{{
    510513[huda1@deepv scripts]$ ./chain_jobs.sh lockfile afterok simple_job.sh
     
    52753098628
    528531}}}
    529 Note that the {{{lockfile}}} contains the id of last submitted job.
     532Please note that `lockfile` must not exist previous to the first submission.
     533After the first job submission, that file will contain the id of last submitted job, which is later used by the subsequent call to the `chain_job.sh` script to set the dependency.
     534
    530535
    531536=== {{{slurm_workflow}}} Library ===
    532537
    533 We have developed a library that developers can use to change the reservation beginning times or dependency type of the dependent jobs in a workflow. This library is called {{{slurm_workflow}}}. The library has two functions.
    534 
    535 The first function moves all the reservations of the remaining workflow jobs to an earlier time when the workflow is created using {{{--delay}}} switch.
     538In order to improve the usability of workflows, a library has been developed and deployed on the system to allow users to interact with the scheduler from within applications involved in a workflow.
     539The library is called `slurm_workflow`.
     540
     541The library has two functions.
     542
     543The first function is relevant to workflows created using the `--delay` switch and  moves all the reservations of the remaining workflow jobs.
    536544{{{
    537545/*
     
    541549int slurm_wf_move_all_res(uint32_t t);
    542550}}}
     551The minimum value usable for the parameter is currently 2 (minutes).
    543552
    544553The second function changes the dependencies type of all jobs dependent on the current job from {{{afterok:job_id}}} to {{{after:job_id}}}.
     
    551560}}}
    552561
    553 Call the above function to change all {{{afterok:$(SLURM_JOBID)}}} dependencies into {{{after:$(SLURM_JOBID)}} dependencies. This enables the jobs in workflow eligible for allocation by Slurm.
    554 
    555 The header file can be included using {{{#include <slurm/slurm_workflow.h>}}} and should be linked using {{{-lslurm_workflow}}} and {{{-lslurm}}}.
     562This enables the jobs in workflow eligible for allocation by Slurm.
     563
     564Both functions allow an application to notify the scheduler that it is ready for the start of the subsequent jobs of a workflow.
     565This is particularly relevant in case a network connection must be established between the two applications, but only after a certain time from the start of the first job.
     566
     567When using the library, the header file can be included using `#include <slurm/slurm_workflow.h>` and the library should be linked against using `-lslurm_workflow -lslurm`.
    556568
    557569