Version 32 (modified by 19 months ago) (diff) | ,
---|
Table of Contents
Latest news on the DEEP-EST prototype system
This is a summary of the latest news concerning the system. For a list of known problems related to the system, please refer to this page.
Last update: 2022-10-13
System software
- ParaStation update (psmgmt) to 5.1.51-0 has been performed
OS
- compute nodes and login node have been updated to Rocky 8.6
- file servers and master nodes to follow
EasyBuild
- 2022 EasyBuild stage is the default now
- depenencies to rocky 8.6 have been resolved by re-installing EB core packages like Python and Glibc
System hardware
CM nodes
- the cluster nodes have direct EBR IB access to the SSSM storage nodes now (without using the IB ↔ 40 GbE gateway)
ESB nodes
- all ESB nodes (
dp-esb[01-75]
) are using EDR Infiniband interconnect (no Extoll anymore) - SSSM and AFSM file servers can be directly accessed through IB
DAM nodes
- DAM nodes are using EDR Infiniband (instead of using 40 GbE and Extoll) now
- SSSM and AFSM file servers can be directly accessed through IB
- accelerator layout has been revised:
dp-dam[01-08]
: 1 x Nvidia V100 GPUdp-dam[09-12]
: 2 x Nvidia V100 GPUdp-dam[13-16]
: 2 x Intel PAC D5005 FPGA
Network Federation Gateways
- with the all IB solution NFGWs are not needed anymore ! this also affects heterogenous jobs
dp-nfgw[01,04]
(IB EDR ↔ 40GbE) still present, but not in use anymore
- remaining NFGWs are being used for BXI testing now:
dp-nfgw[02,03,05,06]
SDV
- 4 node BXI test setup has been installed using the former GW nodes
- FPGA test nodes available for using FPGAs with oneAPI, OpenCL:
- Arria10: deeper-sdv[09,10]
- Stratix10: dp-sdv-esb[01,02]
File Systems
please also refer to the Filesystems overview
- quota has been added to
/tmp
ondeepv
to avoid congestion - the All Flash Storage Module (AFSM) provides a fast work file system mounted to
/afsm
(symbolic link to/work
) on all compute nodes (CM, DAM, ESB) and the login node (deepv
)- it is managed via project subfolders: after activating a project environment using
jutil
command the$WORK
will be set accordingly
- it is managed via project subfolders: after activating a project environment using
- the older System Services and Storage Module (SSSM) work file system is obsolete, but still available at (
/work_old
) for data migration - SSSM still serves the /usr/local/software file system, but
- starting from Rocky 8 image
/usr/local
will is a local file system on the compute nodes /usr/local/software
is still shared and provided by the SSSM storage- in addition to the !Easybuild software stack the shared
/usr/local/software
filesystem contains some manually installed software in alegacy
subfolder
- starting from Rocky 8 image