wiki:Public/User_Guide/DEEP-EST_DAM

Version 21 (modified by Jochen Kreutz, 22 months ago) (diff)

update FPGA info

System usage

The DEEP-EST Data Analytics Module (DAM) can be used through the SLURM based batch system that is also used for (most of) the Software Development Vehicles (SDV). You can request a DAM node (dp-dam[01-16]) with an interactive session like this:

srun -A deepsea -N 1 --tasks-per-node 4 -p dp-dam --time=1:0:0 --pty --interactive /bin/bash 
kreutz1@dp-dam01 ~]$ srun -n 8 hostname
dp-dam01
dp-dam01
dp-dam01
dp-dam01

When using a batch script, you have to adapt the partition option within your script: --partition=dp-dam (or short form: -p dp-dam)

Persistent Memory

Each of the DAM nodes is equipped with Intel's Optane DC Persistent Memory Modules (DCPMM). All DAM nodes (dp-dam[01-16]) expose 3 TB of persistent memory.

The DCPMMs can be driven in different modes. For further information of the operation modes and how to use them, please refer to the following information

Currently all nodes are running in "App Direct Mode".

Using Cuda

The first 12 DAM nodes are equipped with GPUs

  • dp-dam[01-08]: 1 x Nvidia V100
  • dp-dam[09-12]: 2 x Nvidia V100

Please use the gres option with srun if you would like to use GPUs on DAM nodes, e.g. in an interactive session:

srun -A deepsea -p dp-dam --gres=gpu:1 -t 1:0:0 --interactive --pty /bin/bash  # to start an interactive session on an DAM node exposig at least 1 GPU
srun -A deepsea -p dp-dam --gres=gpu:2 -t 1:0:0 --interactive --pty /bin/bash  # to start an interactive session on an DAM node exposig 2 GPUs

To compile and run Cuda applications on the Nvidia V100 cards included in the DAM nodes, it is necessary to load the CUDA module. It's advised to use the 2022 Stage to avoid Nvidia driver mismatch issues.

module --force purge
ml use $OTHERSTAGES
ml Stages/2022
ml CUDA
[kreutz1@deepv ~]$ ml

Currently Loaded Modules:
  1) Stages/2022 (S)   2) nvidia-driver/.default (H,g,u)   3) CUDA/11.5 (g,u)

  Where:
   S:  Module is Sticky, requires --force to unload or purge
   g:  built for GPU
   u:  Built by user

Using FPGAs

Nodes `dp-dam[13-16] are equipped with 2 x Stratix 10 FPGAs each (Intel PAC d5005).

It is recommended to do the first steps in an interactive session on a DAM node. Since there is (currently) no FPGA resource defined in SLURM for these nodes, please use the --hostlist= option with srun to open a session on a DAM node equipped with FPGAs, for example:

srun -A deepsea -p dp-dam --nodelist=dp-dam13 -t 1:0:0 --interactive --pty /bin/bash

For getting started using OpenCL with the FPGAs you can find some hints as well as the slides and exercises from the Intel FPGA workshop held at JSC in:

/usr/local/software/legacy/fpga/

More details to follow.