wiki:Public/User_Guide/DEEP-EST_CM

Version 12 (modified by Jochen Kreutz, 3 years ago) (diff)

System usage

The DEEP-EST Cluster Module (CM) can be used through the SLURM based batch system that is also used for the DAM and ESB modules and (most of) the Software Development Vehicles (SDV). You can request CM cluster nodes (dp-cn[01-50]) on the with an interactive session like this:

srun -A deep --partition=dp-cn -N 4 -n 2 --pty /bin/bash -i
srun ./hello_cluster 
Hello World from processor dp-cn15, rank 2 out of 8 
Hello World from processor dp-cn15, rank 3 out of 8 
Hello World from processor dp-cn17, rank 6 out of 8 
Hello World from processor dp-cn17, rank 7 out of 8 
Hello World from processor dp-cn14, rank 0 out of 8 
Hello World from processor dp-cn16, rank 4 out of 8 
Hello World from processor dp-cn14, rank 1 out of 8 
Hello World from processor dp-cn16, rank 5 out of 8 

When using a batch script, you have to adapt the partition option within your script: --partition=dp-cn

Filesystems and local storage

The home filesystem on the DEEP-EST Cluster Module is provided via GPFS/NFS and hence the same as on (most of) the remaining compute nodes. The all flash storage stystem (AFSM) system running BeeGFS is available at

/work

In addition, the older SSSM storage system provides the /usr/local filesystem on the CM compute nodes running BeeGFS as well. There is a gateway being used to bridge between the Infiniband EDR used for the CM and the 40 GbE network the SSSM file servers are connected to.

This is NOT the same storage being used on the DEEP-ER SDV system. Both, the DEEP-EST prototype system and the DEEP-ER SDV have their own local storage.

There is also some node local storage available for the DEEP-EST Cluster nodes mounted to /scratch on each node (about 380 GB with XFS). Remember that this scratch is not persistent and will be cleaned after your job has finished !

Please, refer to the system overview and filesystems pages for further information of the CM hardware, available filesystems and network connections.

Multi-node Jobs

The latest pscom version used in ParaStation MPI provides support for the Infiniband interconnect used in the DEEP-EST Cluster Module. Hence, loading the most recent ParaStationMPI module will be enough to run multi-node MPI jobs over Infiniband:

module load ParaStationMPI

For using Cluster nodes in heterogeneous jobs together with DAM nodes, please see info about heterogeneous jobs. Currently (as of 2020-04-03) the ESB racks 2 and 3 are equipped with IB and directly connected to the CM nodes. Hence, no gateway has to be used when running on CM and ESB nodes. The first rack is planned to be modified to use IB as well (instead of currently installed Extoll Fabri³ solution).