[[TOC]] = Latest news on the DEEP-EST prototype system = This is a summary of the latest news concerning the system. For a list of known problems related to the system, please refer to [wiki:Public/User_Guide/PaS this page]. ''Last update: 2021-02-18'' {{{#!comment [[span(style=color: #FF0000, System will be in maintenance in CW37 (Monday, 2020-09-07 to Friday 2020-09-11))]] }}} == System hardware == {{{#!comment === CM nodes === }}} === ESB nodes === - in CW51 the update of the first ESB rack started for using Extoll Fabri3 network (instead of IB) - work is still ongoing === DAM nodes === - first 8 DAM nodes (dp-dam[01-08]) are currently being integrated into Extoll Fabri3 network - Persistent memory for nodes dp-dam[03-16] has been extended to 3 TB with next maintenance (CW51) === Network Federation Gateways === - Gateway nodes have been completed in CW51 to the final layout: * 2x NF-GW EDR/Extoll (1 x Fabri3, 1 x Tourmalet) * 2x NF-GW 40GbE/Extoll ( 1 x Fabri3, 1 x Tourmalet) * 2x NF-GW EDR/40GbE - **due to the ongoing Fabri3 installation two of the NF-GWs (the ones equipped with Fabri3 PCIe cards) are not in operation yet** - for an example on how to use the gateway nodes and for further information, please refer to the [wiki:/Public/User_Guide/Batch_system#HeterogeneousjobswithMPIcommunicationacrossmodules batchsystem] wiki page. === Global resources === ==== NAM ==== - a NAM SW implementation has been done, a test environment on the DAM has been set up on dp-dam[09-16]. === File Systems === - **a new All Flash Storage Module (AFSM) is going to be added to the system on 24./25. February** - DEEP-EST storage has been rebuilt for performance reasons - BeeGFS servers and clients have been updated - the SDV is de-coupled now meaning that the SDV nodes do not mount `/work` anymore and the DEEP-EST (CM,DAM,ESB) nodes only mount `/work` (not `/sdv-work`) - BeeGFS (`/work`) user quotas is in place now (see section "User management") - It is possible to access the `$ARCHIVE` file system from the `deepv` login node under `/arch`. For more information about `$ARCHIVE`, please refer to the [wiki:Filesystems Filesystems page] and see also the hint in the MOTD for efficient usage of the archive filesystem - DCPMM usage within BeeGFS has successfully been tested on `dp-dam03` using a 4.19 kernel == System software == === SW updates === - BeeOND integration into SLURM currently being prepared - new SLURM features are being integrated and will be rolled out soon: - extended logging and improved resource management for jobs within a workflow - burst buffer plugin - 2020 Easybuild stage is being set up - latest Intel oneAPI version is available in /usr/local/intel/oneapi === User management === ==== BeeGFS Quotas ==== - a quota for the BeeGFS file system (mounted to /work) has been implemented - no need to activate thresholds yet