Latest news on the DEEP-EST prototype system
This is a summary of the latest news concerning the system. For a list of known problems related to the system, please refer to this page.
Last update: 2024-01-16'
System software
- ParaStation update (psmgmt) to 5.1.53-1 has been performed
OS
- compute nodes, bxi nodes and login node have been updated to Rocky 8.6
- file servers and master nodes to follow
EasyBuild
- 2023 stage is the default now
- Stage 2023 was relocated to
/p/software/deep/stages/2023
, if you run into trouble please check if you have the old path hardcoded somewhere.
- Please don't use
module use $OTHERSTAGES
when loading Stage 2023. This won't work anymore.
System hardware
CM nodes
- the cluster nodes have direct EBR IB access to the SSSM storage nodes now (without using the IB ↔ 40 GbE gateway)
ESB nodes
- all ESB nodes (
dp-esb[01-75]
) are using EDR Infiniband interconnect (no Extoll anymore)
- SSSM and AFSM file servers can be directly accessed through IB
DAM nodes
- DAM nodes are using EDR Infiniband (instead of using 40 GbE and Extoll) now
- SSSM and AFSM file servers can be directly accessed through IB
- current accelerator layout:
dp-dam[01-08]
: 1 x Nvidia V100 GPU
- `dp-dam02: 1 x Intel PAC D5005 FPGA (for testing)
dp-dam[09-12]
: 2 x Nvidia V100 GPU
dp-dam[13-16]
: 2 x Intel PAC D5005 FPGA
BXI nodes, Network Federation Gateways
- former network federation gateways now used for BXI testing:
dp-nfgw[02,03,05,06]
- can be accessed via Slurm using partition
dp-bxi
SDV
- two Intel test nodes have been added and are available to users via
dp-intelmax
partition
File Systems
please also refer to the Filesystems overview
- quota has been added to
/tmp
on deepv
to avoid congestion
- the All Flash Storage Module (AFSM) provides a fast work file system mounted to
/afsm
(symbolic link to /work
) on all compute nodes (CM, DAM, ESB) and the login node (deepv
)
- it is managed via project subfolders: after activating a project environment using
jutil
command the $WORK
will be set accordingly
- the older System Services and Storage Module (SSSM) work file system is obsolete, but still available at (
/work_old
) for data migration
- SSSM still serves the /usr/local/software file system, but
- starting from Rocky 8 image
/usr/local
will is a local file system on the compute nodes
/usr/local/software
is still shared and provided by the SSSM storage
- in addition to the !Easybuild software stack the shared
/usr/local/software
filesystem contains some manually installed software in a legacy
subfolder