Changes between Version 27 and Version 28 of Public/ParaStationMPI


Ignore:
Timestamp:
May 28, 2021, 4:00:57 PM (3 years ago)
Author:
Carsten Clauß
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/ParaStationMPI

    v27 v28  
    88
    99== Heterogeneous Jobs using inter-module MPI communication ==
    10 !ParaStation MPI provides support for inter-module communication in federated high-speed networks. Therefore, so-called gateway (GW) daemons bridge the MPI traffic between the modules. This mechanism is transparent to the MPI application, i.e., the MPI ranks see a common `MPI_COMM_WORLD` across all modules within the job. However, the user has to account for these additional GW resources during the job submission. An example SLURM Batch script illustrating the submission of heterogeneous pack jobs including the allocation of GW resources can be found [wiki:User_Guide/Batch_system#HeterogeneousjobswithMPIcommunicationacrossmodules here].
     10!ParaStation MPI provides support for inter-module communication in federated high-speed networks.
     11Therefore, so-called Gateway (GW) daemons bridge the MPI traffic between the modules.
     12This mechanism is transparent to the MPI application, i.e., the MPI ranks see a common `MPI_COMM_WORLD` across all modules within the job.
     13However, the user has to account for these additional Gateway resources during the job submission.
     14The following `srun` command line with so-called ''colon notation'' illustrates the submission of heterogeneous pack jobs including the allocation of Gateway resources:
     15
     16{{{
     17srun --gw_num=1 --partition dp-cn -N8 -n64 ./mpi_hello : --partition dp-esb -N16 -n256 ./mpi_hello
     18}}}
     19
     20An MPI job started with this colon notation via srun will run in a single `MPI_COMM_WORLD`.
     21
     22However, workflows across modules may demand for multiple `MPI_COMM_WOLRD`s that may connect (and later disconnect) with each other during runtime.
     23The following simple job script is example that supports such a case:
     24
     25{{{
     26#!/bin/bash
     27#SBATCH --gw_num=1
     28#SBATCH --nodes=8 --partition=dp-cn
     29#SBATCH hetjob
     30#SBATCH --nodes=16 --partition=dp-esb
     31
     32srun -n64  --het-group 0 ./mpi_hello_accept &
     33srun -n256 --het-group 1 ./mpi_hello_connect &
     34wait
     35}}}
     36
     37Further examples of Slurm  batch scripts illustrating the allocation of heterogeneous resources can be found [wiki:User_Guide/Batch_system#HeterogeneousjobswithMPIcommunicationacrossmodules here].
    1138
    1239=== Application-dependent Tuning ===
     
    189216 * `MPI_Barrier`
    190217
    191 === Feature Usage ===
     218=== Feature usage on the DEEP-EST prototype ===
    192219
    193220For activating/controlling this feature, the following environment variables must/can be used:
     
    205232
    206233
    207 === Usage with older versions ===
     234=== Feature usage in environments without MSA support ===
     235
     236On the DEEP-EST prototype, the Module ID is determined automatically and the environment variable `PSP_MSA_MODULE_ID` is then set accordingly.
     237However, on systems without this support, and/or on systems with a !ParaStation MPI ''before'' version 5.4.6, the user has to set and pass this variable explicitly, for example, via a bash script:
     238
     239{{{
     240> cat msa.sh
     241#!/bin/bash
     242ID=$2
     243APP=$1
     244shift
     245shift
     246ARGS=$@
     247PSP_MSA_MODULE_ID=${ID} ${APP} ${ARGS}
     248
     249> srun ./msa.sh ./IMB-MPI1 Bcast : ./msa.sh ./IMB-MPI1 Bcast
     250}}}
     251
     252
    208253
    209254For psmpi versions ''before'' 5.4.6, the Module IDs (`PSP_MSA_MODULE_ID`) were not set automatically!
     
    229274}}}
    230275
    231 Since psmpi-5.4.6, the Module ID is set automatically, which means that you can omit all the script stuff above. [[BR]]
    232 However, you can still use `PSP_MSA_MODULE_ID` and the script approach if you want to set the Module IDs ''explicitly'', e.g. for debugging and/or emulating reasons.
    233 
    234 
    235 {{{#!comment
    236 **Attention:** Please note that a meaningful usage of `PSP_MSA_AWARE_COLLOPS=2` requires `psmpi-5.4.5` or higher.
    237 Currently (effective April 2020), this means that !ParaStation MPI has to be loaded on the DEEP-EST system as a module of the devel-stage:
    238 {{{
    239 # Set the environment to use psmpi-5.4.5-2:
    240 module --force purge
    241 module use $OTHERSTAGES
    242 module load Stages/Devel-2019a
    243 module load Intel
    244 module load ParaStationMPI
    245 }}}
    246 }}}
     276In addition, this script approach can always be useful if one wants to set the Module IDs ''explicitly'', e.g. for debugging and/or emulating reasons.
     277
     278
     279----
    247280
    248281== CUDA Support by !ParaStation MPI ==
     
    256289 * [https://devblogs.nvidia.com/parallelforall/introduction-cuda-aware-mpi/ Introduction to CUDA-Aware MPI] (by NVIDIA)
    257290
    258 === Current status on the DEEP system ===
    259 Currently (effective October 2019), !ParaStation MPI supports CUDA-awareness for Extoll just from the semantic-related point of view: The usage of Device pointers as arguments for send and receive buffers when calling MPI functions is supported but by an explicit ''Staging'' when Extoll is used.
    260 This is because the Extoll runtime up to now does not support GPUDirect, but EXTOLL is currently working on this in the context of DEEP-EST.
    261 As soon as GPUDirect will be supported by Extoll, this will also be integrated and enabled in !ParaStation MPI.
    262 (BTW: For !InfiniBand communication, !ParaStation MPI is already GPUDirect enabled.)
    263 
    264 === Usage on the DEEP system ===
    265 
    266 **Warning:** ''This manual section is currently under development. Therefore, the following usage guidelines may be not flawless and are likely to change in some respects in the near future! ''
    267 
    268 On the DEEP system, the CUDA awareness can be enabled by loading a module that links to a dedicated !ParaStation MPI library providing CUDA support:
     291
     292=== Usage on the DEEP-EST system ===
     293
     294On the DEEP-EST system, the CUDA awareness can be enabled by loading a module that links to a dedicated !ParaStation MPI library providing CUDA support:
    269295{{{
    270296module load GCC
    271 module load ParaStationMPI/5.4.0-1-CUDA
    272 }}}
    273 Please note that CUDA-awareness might impact the MPI performance on systems parts where CUDA is not used.
    274 Therefore, it might be useful (and the other way around necessary) to disable/enable the CUDA-awareness.
    275 Furthermore, additional optimisations such as GPUDirect, i.e., direct RMA transfers to/from CUDA device memory, are available with certain pscom plugins depending on the underlying hardware.
     297module load ParaStationMPI/5.4.2-1-CUDA
     298}}}
     299
     300Please note that CUDA awareness might impact the MPI performance on systems parts where CUDA is not used.
     301Therefore, it might be useful (and the other way around necessary) to disable/enable the CUDA awareness.
     302Furthermore, additional optimizations such as GPUDirect, i.e., direct RMA transfers to/from CUDA device memory, are available with certain pscom plugins depending on the underlying hardware.
     303
    276304The following environment variables may be used to influence the CUDA awareness in !ParaStation MPI
    277305{{{