Changes between Version 7 and Version 8 of Public/User_Guide/DEEP-EST_DAM


Ignore:
Timestamp:
Oct 16, 2019, 10:08:54 PM (5 years ago)
Author:
Jacopo de Amicis
Comment:

Updated description for multi-node jobs, including CUDA-aware psmpi.

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/DEEP-EST_DAM

    v7 v8  
    8383
    8484== Multi-node Jobs ==
    85 Currently, multi-node MPI jobs are possible on the DAM only by modifying the environment in the following way:
     85Multi-node jobs can be launched on the `dp-dam` partition with ParaStationMPI by loading the `pscom` module (currently `pscom/5.3.1-1`) and the `extoll` module. Please beware that the `extoll` module can be loaded only on nodes with an EXTOLL device, therefore it cannot be loaded on the login node: please load it in a batch script for `sbatch` or directly on the compute nodes within an interactive session (see [wiki:Batch_system#Fromashellonanode here] for more information on the interactive sessions).
    8686
    87 {{{
    88 $ ml Intel ParaStationMPI
    89 $ env LD_LIBRARY_PATH=/opt/parastation/lib64:/opt/extoll/x86_64/lib/:${LD_LIBRARY_PATH} PSP_TCP=0 PSP_OPENIB=0 PSP_EXTOLL=1 srun -p dp-dam -N 2 -n 2 ./MPI_HelloWorld
    90 Hello World from processor dp-dam02, rank 1 out of 2
    91 Hello World from processor dp-dam01, rank 0 out of 2
    92 }}}
    93 **Attention:** This is a temporary workaround.
     87A release-candidate version of ParaStationMPI with CUDA awareness is also available on the system. It is installed under the GCC stack (run `ml spider ParaStationMPI` to find the relevant installation for CUDA). This version also automatically loads a CUDA-aware installation of `pscom`.
    9488
    95 {{{#!comment
    96 **Attention:** Since the Extoll network is not in place yet multi-node MPI Jobs are currently disabled.
    97 }}}
     89**Attention:** As of 16.10.2019, there is no support for GPUDirect over EXTOLL. As a temporary workaround, this version of ParaStationMPI automatically handles the device-to-host, host-to-host and host-to-device copies transparently to the user and can be used to test the functionality of applications requiring a CUDA-aware MPI implementation. Support for GPUDirect will be provided by EXTOLL in the near future.
    9890
    99 {{{#!comment
    100 
    101 Loading the most recent ParaStation module will be enough to run multi-node MPI jobs over Extoll
    102 
    103 {{{
    104 module load ParaStationMPI
    105 }}}
    106 }}}
     91**Attention:** As of 16.10.2019, a minor bug in the `pscom/.5.4.0-1-CUDA` module requires the user to explicitly export `PSP_CUDA=1` in order to use the CUDA-aware ParaStationMPI. This bug will be fixed in the next days.