[[TOC]]

= System usage =
The DEEP-EST Data Analytics Module (DAM) can be used through the SLURM based batch system that is also used for (most of) the Software Development Vehicles (SDV). You can request a DAM node (`dp-dam[01-16]`) with an interactive session like this:

{{{
srun -A deepsea -N 1 --tasks-per-node 4 -p dp-dam --time=1:0:0 --pty --interactive /bin/bash 
kreutz1@dp-dam01 ~]$ srun -n 8 hostname
dp-dam01
dp-dam01
dp-dam01
dp-dam01
}}}

When using a batch script, you have to adapt the partition option within your script: `--partition=dp-dam` (or short form: `-p dp-dam`)

== Persistent Memory ==

Each of the DAM nodes is equipped with [https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html Intel's Optane DC Persistent Memory Modules] (DCPMM). 
All DAM nodes (`dp-dam[01-16]`) expose 3 TB of persistent memory.

The DCPMMs can be driven in different modes. For further information of the operation modes and how to use them, please refer to the following [https://github.com/pmemhackathon/2019-11-08 information]

Currently all nodes are running in "App Direct Mode".
{{{#!comment not working 
DCPMM modes have been added to check which nodes are running in which mode, you can use the `scontrol` command:
{{{
scontrol show node dp-dam01 | grep AvailableFeatures
}}}


To select a node using a certain memory mode, you can use the constraint option within SLURM, e.g.:

{{{
srun -p dp-dam -N 1 -c 24 -t 0:30:0 --constraint=dpcmm_mem --pty /bin/bash
}}}
}}}

== Using Cuda ==

The first 12 DAM nodes are equipped with GPUs

* `dp-dam[01-08]`: 1 x Nvidia V100
* `dp-dam[09-12]`: 2 x Nvidia V100

Please use the `gres` option with `srun` if you would like to use GPUs on DAM nodes, e.g. in an interactive session:

{{{
srun -A deepsea -p dp-dam --gres=gpu:1 -t 1:0:0 --interactive --pty /bin/bash  # to start an interactive session on an DAM node exposig at least 1 GPU
srun -A deepsea -p dp-dam --gres=gpu:2 -t 1:0:0 --interactive --pty /bin/bash  # to start an interactive session on an DAM node exposig 2 GPUs
}}}

To compile and run Cuda applications on the Nvidia V100 cards included in the DAM nodes, it is necessary to load the CUDA module. It's advised to use the 2022 Stage to avoid 
[https://deeptrac.zam.kfa-juelich.de:8443/trac/wiki/Public/User_Guide/PaS#Softwareissues Nvidia driver mismatch] issues.
{{{
module --force purge
ml use $OTHERSTAGES
ml Stages/2022
ml CUDA
[kreutz1@deepv ~]$ ml

Currently Loaded Modules:
  1) Stages/2022 (S)   2) nvidia-driver/.default (H,g,u)   3) CUDA/11.5 (g,u)

  Where:
   S:  Module is Sticky, requires --force to unload or purge
   g:  built for GPU
   u:  Built by user
}}}


== Using FPGAs ==

Nodes `dp-dam[13-16] are equipped with 2 x Stratix 10 FPGAs each 
([https://www.intel.de/content/www/de/de/products/sku/193921/intel-fpga-pac-d5005/specifications.html Intel PAC d5005]). 


It is recommended to do the first steps in an interactive session on a DAM node.
Since there is (currently) no FPGA resource defined in SLURM for these nodes, please use the `--hostlist=` option with `srun` to open a session on a DAM node equipped with FPGAs, for example:

{{{
srun -A deepsea -p dp-dam --nodelist=dp-dam13 -t 1:0:0 --interactive --pty /bin/bash
}}}

For getting started using OpenCL with the FPGAs you can find some hints as well as the slides and exercises from the Intel FPGA workshop held at JSC in:

{{{
/usr/local/software/legacy/fpga/
}}}

More details to follow. 

{{{#!comment currently not working
To set up and check the FPGA environment, do the following:

{{{
/usr/local/software/legacy/intel/oneapi/setvars.sh
lspci | grep -i 'accel'
aocl list-devices
aoc -list-boards
# -- optional for doing the exercises:
# export CL_CONTEXT_EMULATOR_DEVICE_INTELFPGA=1 
}}}


You can copy and untar the lab into your home directory to do the exercises step by step. The exercises use the emulator device instead of the actual FPGA device due to the long
compilation time for the FPGAs. For using the FPGA device you will have to compile your OpenCL kernels using `-board=pac_s10_dc` option:

{{{
# compile for the emulator
aoc -march=emulator -fast-emulator kernel-file.cl

# compile for the FPGA device
aoc -board=pac_s10_dc kernel-file.cl
}}}

In addition, you will have to adapt the OpenCL host file to select the correct platform ("Intel(R) FPGA SDK for OpenCL(TM)" or "Intel(R) FPGA Emulation Platform for OpenCL(TM) (preview)").


**Attention:** Compiling kernels for the FPGA device (instead of the emulator) might take several hours.

Although eclipse is available on the DAM nodes, compiling and running the example applications might not work out, so you have to fall back to the command line as described in the exercise manual and by using
the provided `simple_compile.sh` scripts. 

=== Vector addition tutorial === 
A short tutorial to use the Intel FPGA Stratix 10 in the DAM nodes for the vector addition kernel in C++/OpenCL and PyOpenCL can be found in the DEEP-EST Gitlab:
https://gitlab.version.fz-juelich.de/DEEP-EST/fpga_usage

end of comment 
}}}

== Filesystems and local storage ==
The home filesystem on the DEEP-EST Data Analytics Module is provided via GPFS/NFS and hence the same as on (most of) the remaining compute nodes. 
The DAM is connected to the all flash storage stystem (AFSM) system via Infiniband. The AFMS runs BeeGFS and provides a fast local work filesystem at 
{{{
/work
}}}

In addition, the older SSSM storage system provides the `/usr/local` filesystem on the DAM compute nodes running BeeGFS as well.

There is node local storage available for the DEEP-EST DAM node (2 x 1.5 TB NVMe SSD), it is mounted to `/nvme/scratch` and `/nvme/scratch2`. Additionally, there is a small (about 380 GB) scratch folder available in `/scratch`. Remember that the three **scratch folders** are not persistent and **will be cleaned after your job has finished** !

Please, refer to the [wiki:Public/User_Guide/System_overview system overview] and [wiki:Public/User_Guide/Filesystems filesystems] pages for further information of the CM hardware, available filesystems and network connections.


== Multi-node Jobs ==

The latest `pscom` version used in !ParaStation MPI provides support for the Infiniband interconnect used in the DEEP-EST Data Analytics Module. Hence, loading the most recent ParaStationMPI module will be enough to run multi-node MPI jobs over Infiniband:

{{{
module load ParaStationMPI
}}}

For using Cluster nodes in heterogeneous jobs with DAM and ESB nodes no gateway has to be used (anymore), since all 3 compute modules (as well es the login and file servers) are using EDR Infiniband as interconnect.

TBD:
- Cuda aware MPI with GPU DAM nodes

For using DAM nodes in heterogeneous jobs together with CM and/or ESB nodes no gateway has to be used (anymore), since all 3 compute modules (as well es the login and file servers) are using EDR Infiniband as interconnect.
For further inforamtion, please also take a look at [https://deeptrac.zam.kfa-juelich.de:8443/trac/wiki/Public/User_Guide/Modular_jobs heterogeneous jobs].