Changes between Version 23 and Version 24 of Public/User_Guide/DEEP-EST_DAM


Ignore:
Timestamp:
Jul 14, 2022, 11:33:37 AM (22 months ago)
Author:
Jochen Kreutz
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/DEEP-EST_DAM

    v23 v24  
    145145
    146146== Multi-node Jobs ==
    147 {{{#!comment JK: 2020-04-03 meanwhile loading Intel/GCC and ParaStationMPI modules seems to be sufficient
    148 Multi-node jobs can be launched on the `dp-dam` partition with ParaStationMPI by loading the `pscom` module (currently `pscom/5.3.1-1`) and the `extoll` module. Please beware that the `extoll` module can be loaded only on nodes with an EXTOLL device, therefore it cannot be loaded on the login node: please load it in a batch script for `sbatch` or directly on the compute nodes within an interactive session (see [wiki:Batch_system#Fromashellonanode here] for more information on the interactive sessions).
    149 }}}
    150 Multi-node MPI jobs can be launched on the DAM nodes with !ParaStation MPI by loading the `Intel` (or `GCC`) and `ParaStationMPI` modules.
    151 {{{#!comment JDA: 2020-12-08 I believe this would just confuse the users
    152 There is no need to manually load the `extoll` or `pscom` modules anymore unless you would like to test new features only available in a certain development version of the pscom.
     147
     148The latest `pscom` version used in !ParaStation MPI provides support for the Infiniband interconnect used in the DEEP-EST Data Analytics Module. Hence, loading the most recent ParaStationMPI module will be enough to run multi-node MPI jobs over Infiniband:
     149
     150{{{
     151module load ParaStationMPI
    153152}}}
    154153
    155 **Extoll:** As of 12.12.2019, the first half of the DAM nodes (`dp-dam[01-08]`) has only GbE connectivity, while the second half has also the faster Extoll interconnect active (nodes `dp-dam[09-16]`). To run multi-node MPI jobs on the DAM nodes, it is strongly recommended to use the `dp-dam-ext` partition, which includes only the nodes providing EXTOLL connectivity. If necessary, users can also run MPI jobs on the other DAM nodes (using the `dp-dam` partition) by setting the `PSP_TCP=1` environment variable in their scripts. This will cause any MPI communication to go through the slower 40 Gb Ethernet fabric.
     154For using Cluster nodes in heterogeneous jobs with DAM and ESB nodes no gateway has to be used (anymore), since all 3 compute modules (as well es the login and file servers) are using EDR Infiniband as interconnect.
    156155
    157 A release-candidate version of ParaStationMPI with CUDA awareness and GPU direct support for Extoll is currently being tested. Once released it will become available on the DAM nodes with the modules environment.
    158 Further information on CUDA awareness can be found in the [https://deeptrac.zam.kfa-juelich.de:8443/trac/wiki/Public/ParaStationMPI#CUDASupportbyParaStationMPI ParaStationMPI] section.
    159 As a temporary workaround, the current version of ParaStationMPI automatically performs device-to-host, host-to-host and host-to-device copies transparently to the user, so it can be used to run applications requiring a CUDA-aware MPI implementation (with limited data transfer performance).
     156TBD:
     157- Cuda aware MPI with GPU DAM nodes
    160158
    161 For using Cluster nodes in heterogeneous jobs together with CM and/or ESB nodes, please see info about [https://deeptrac.zam.kfa-juelich.de:8443/trac/wiki/Public/User_Guide/Batch_system#Heterogeneousjobs heterogeneous jobs].
     159For using DAM nodes in heterogeneous jobs together with CM and/or ESB nodes no gateway has to be used (anymore), since all 3 compute modules (as well es the login and file servers) are using EDR Infiniband as interconnect.
     160For further inforamtion, please also take a look at [https://deeptrac.zam.kfa-juelich.de:8443/trac/wiki/Public/User_Guide/Modular_jobs heterogeneous jobs].
    162161