Changes between Version 15 and Version 16 of Public/User_Guide/DEEP-EST_DAM


Ignore:
Timestamp:
Apr 3, 2020, 3:24:43 PM (4 years ago)
Author:
Jochen Kreutz
Comment:

updated information on pscom and GPU direct

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/DEEP-EST_DAM

    v15 v16  
    55
    66{{{
    7 srun -N 4 --tasks-per-node 2 -p dp-dam --time=1:0:0 --pty /bin/bash -i
     7srun -A deep N 4 --tasks-per-node 2 -p dp-dam --time=1:0:0 --pty /bin/bash -i
    88[kreutz1@dp-dam01 ~]$ srun -n 8 hostname
    99dp-dam01
     
    2525The DCPMMs can be driven in different modes. For further information of the operation modes and how to use them, please refer to the following [https://github.com/pmemhackathon/2019-11-08 information]
    2626
    27 DCPMM modes have been added To check which nodes are running in which mode, you can use the `scontrol` command:
     27DCPMM modes have been added to check which nodes are running in which mode, you can use the `scontrol` command:
    2828
    2929{{{
     
    4545
    4646Currently Loaded Modules:
    47   1) GCCcore/.8.3.0 (H)   2) binutils/.2.32 (H)   3) nvidia/.418.40.04 (H,g)   4) CUDA/10.1.105 (g)
     47  1) GCCcore/.8.3.0 (H)   2) binutils/.2.32 (H)   3) nvidia/.driver (H,g)   4) CUDA/10.1.105 (g)
    4848
    4949  Where:
     
    5252}}}
    5353
    54 **Attention:** As of 23.01.2020 a work around for loading the correct CUDA driver and module has to be use. Please see [https://deeptrac.zam.kfa-juelich.de:8443/trac/wiki/Public/User_Guide/Information_on_software#UsingCuda Using CUDA] section.
    5554
    5655== Using FPGAs ==
    5756
    58 Each node is equipped with a Stratix 10 FPGA. For getting started using OpenCL with the FPGAs you can find some hints as well as the slides and exercises from the Intel FPGA workshop held at JSC
     57Each node is equipped with a Stratix 10 FPGA. For getting started using OpenCL with the FPGAs you can find some hints as well as the slides and exercises from the Intel FPGA workshop held at JSC in:
    5958
    6059{{{
     
    110109It's possible to access the local storage of the DEEP-ER SDV (`/sdv-work`), but you have to keep in mind that the file servers of that storage can just be accessed through 1 GbE ! Hence, it should not be used for performance relevant applications since it is much slower than the DEEP-EST local storages mounted to `/work`.
    111110
    112 There is node local storage available for the DEEP-EST DAM node (2 x 1.5 TB NVMe SSD), it is mounted to `/nvme/scratch` and `/nvme/scratch2`. Additionally, there is a small (about 380 GB) scratch folder available in `/scratch`. Remember that the three scratch folders are not persistent and **will be cleaned after your job has finished** !
     111There is node local storage available for the DEEP-EST DAM node (2 x 1.5 TB NVMe SSD), it is mounted to `/nvme/scratch` and `/nvme/scratch2`. Additionally, there is a small (about 380 GB) scratch folder available in `/scratch`. Remember that the three **scratch folders** are not persistent and **will be cleaned after your job has finished** !
    113112
    114113== Multi-node Jobs ==
     114{{{#!comment JK: 2020-04-03 meanwhile loading Intel/GCC and ParaStationMPI modules seems to be sufficient
    115115Multi-node jobs can be launched on the `dp-dam` partition with ParaStationMPI by loading the `pscom` module (currently `pscom/5.3.1-1`) and the `extoll` module. Please beware that the `extoll` module can be loaded only on nodes with an EXTOLL device, therefore it cannot be loaded on the login node: please load it in a batch script for `sbatch` or directly on the compute nodes within an interactive session (see [wiki:Batch_system#Fromashellonanode here] for more information on the interactive sessions).
     116}}}
     117Multi-node jobs can be launched on the `dp-dam` partition with !ParaStationMPI by loading Intel (or GCC) and ParaStationMPI modules. There is no need to manually load the `extoll` or `pscom` modules anymore unless you would like to test new features only available in a certain development version of the pscom.
    116118
    117 A release-candidate version of ParaStationMPI with CUDA awareness is also available on the system. It is installed under the GCC stack (run `ml spider ParaStationMPI` to find the relevant installation for CUDA). This version also automatically loads a CUDA-aware installation of `pscom`.
     119A release-candidate version of ParaStationMPI with CUDA awareness and GPU direct support for Extoll is currently being tested. Once released it will become available on the DAM nodes with the modules environment.
    118120Further information on CUDA awareness can be found in the [https://deeptrac.zam.kfa-juelich.de:8443/trac/wiki/Public/ParaStationMPI#CUDASupportbyParaStationMPI ParaStationMPI] section.
     121As a temporary workaround, the current version of ParaStationMPI automatically performs device-to-host, host-to-host and host-to-device copies transparently to the user, so it can be used to run applications requiring a CUDA-aware MPI implementation (with limited data transfer performance).
    119122
    120 **Attention:** As of 16.10.2019, there is no support for GPUDirect over EXTOLL. As a temporary workaround, this version of ParaStationMPI automatically performs device-to-host, host-to-host and host-to-device copies transparently to the user, so it can be used to run applications requiring a CUDA-aware MPI implementation (with limited data transfer performance). Support for GPUDirect will be provided by EXTOLL in the near future.
     123For using Cluster nodes in heterogeneous jobs together with CM and/or ESB nodes, please see info about [https://deeptrac.zam.kfa-juelich.de:8443/trac/wiki/Public/User_Guide/Batch_system#Heterogeneousjobs heterogeneous jobs].
    121124
    122125**Extoll:** As of 12.12.2019, the first half of the DAM nodes has GbE network (partition=dp-dam,nodeslist=dp-dam[01-16]), the second half has Extoll interconnect (partition=dp-dam-ext,nodeslist=dp-dam[09-16]).