Latest news on the DEEP-EST prototype system
This is a summary of the latest news concerning the system. For a list of known problems related to the system, please refer to this page.
Last update: 2021-09-08
System hardware
ESB nodes
- the first ESB rack (nodes
dp-esb[01-25]
) is planned to be revised to use IB again (instead of Extoll interconnect)
- date for the HW intervention not yet fixed, but due to delivery time unlikely to be performed this year
DAM nodes
- along with first ESB rack also the DAM nodes will move to IB (instead of using 40 GbE and Extoll)
Network Federation Gateways
- with aiming for an "all IB" solution the NFGWs will become obsolete
- current status is
- 2x NF-GW EDR/Extoll (1 x Fabri3, 1 x Tourmalet)
- 2x NF-GW 40GbE/Extoll ( 1 x Fabri3, 1 x Tourmalet)
- 2x NF-GW EDR/40GbE
- the NF-GWs equipped with Fabri3 PCIe cards are not in operation
- for an example on how to use the gateway nodes and for further information,
please refer to the batchsystem wiki page.
Global resources
NAM
- a NAM SW implementation has been done, a test environment on the DAM has been set up on dp-dam[09-16].
- for more information on NAM usage and an example, please refer to NAM with TAMPI page
File Systems
please refer to the Filesystems overview
- recent changes:
- a new All Flash Storage Module (AFSM) is in place and provides fast work file system mounted to
/work
on compute nodes and login node (deepv
)
- the older System Services and Storage Module (SSSM) still serves the /usr/local file system
- SSSM storage has been rebuilt for performance reasons
- BeeGFS servers and clients have been updated
- BeeGFS (
/work
) user quotas is in place now (see section "User management")
- It is possible to access the
$ARCHIVE
file system from the deepv
login node under /arch
.
See hint in the MOTD for efficient usage of the archive filesystem
System software
SW updates
- transition from CentOS to Rocky Linux currently being investigated
- a second login node running rocky linux 8.4 has been provided for testing:
ssh -l your_judoor_id zam906
- few CM nodes can be used as rocky linux testbed:
dp-cn[47-50]
- please get in contact with
niessen(at)par-tec.com
if you would like to get access
- new SLURM version has been installed: 20.11.8.
- please use
--interactive
flag for interactive sessions now (see MOTD hint)
- 2021 Easybuild stage is being set up
- as of 2021-09-08 the default stage is
2020
(was 2019a
before)
- latest Intel oneAPI version is available in /usr/local/intel/oneapi
User management
BeeGFS Quotas
- a quota for the BeeGFS file system (mounted to /work) has been implemented
- no need to activate thresholds yet