wiki:Public/User_Guide/News

Latest news on the DEEP-EST prototype system

This is a summary of the latest news concerning the system. For a list of known problems related to the system, please refer to this page.

Last update: 2022-10-13

System software

  • ParaStation update (psmgmt) to 5.1.51-0 has been performed

OS

  • compute nodes and login node have been updated to Rocky 8.6
  • file servers and master nodes to follow

EasyBuild

  • 2022 EasyBuild stage is the default now
  • depenencies to rocky 8.6 have been resolved by re-installing EB core packages like Python and Glibc

System hardware

CM nodes

  • the cluster nodes have direct EBR IB access to the SSSM storage nodes now (without using the IB ↔ 40 GbE gateway)

ESB nodes

  • all ESB nodes (dp-esb[01-75]) are using EDR Infiniband interconnect (no Extoll anymore)
  • SSSM and AFSM file servers can be directly accessed through IB

DAM nodes

  • DAM nodes are using EDR Infiniband (instead of using 40 GbE and Extoll) now
  • SSSM and AFSM file servers can be directly accessed through IB
  • accelerator layout has been revised:
  • dp-dam[01-08]: 1 x Nvidia V100 GPU
  • dp-dam[09-12]: 2 x Nvidia V100 GPU
  • dp-dam[13-16]: 2 x Intel PAC D5005 FPGA

Network Federation Gateways

  • with the all IB solution NFGWs are not needed anymore ! this also affects heterogenous jobs
  • dp-nfgw[01,04] (IB EDR ↔ 40GbE) still present, but not in use anymore
  • remaining NFGWs are being used for BXI testing now: dp-nfgw[02,03,05,06]

SDV

  • 4 node BXI test setup has been installed using the former GW nodes
  • FPGA test nodes available for using FPGAs with oneAPI, OpenCL:
    • Arria10: deeper-sdv[09,10]
    • Stratix10: dp-sdv-esb[01,02]

File Systems

please also refer to the Filesystems overview

  • quota has been added to /tmp on deepv to avoid congestion
  • the All Flash Storage Module (AFSM) provides a fast work file system mounted to /afsm (symbolic link to /work) on all compute nodes (CM, DAM, ESB) and the login node (deepv)
    • it is managed via project subfolders: after activating a project environment using jutil command the $WORK will be set accordingly
  • the older System Services and Storage Module (SSSM) work file system is obsolete, but still available at (/work_old) for data migration
  • SSSM still serves the /usr/local/software file system, but
    • starting from Rocky 8 image /usr/local will is a local file system on the compute nodes
    • /usr/local/software is still shared and provided by the SSSM storage
    • in addition to the !Easybuild software stack the shared /usr/local/software filesystem contains some manually installed software in a legacy subfolder

Last modified 7 weeks ago Last modified on Oct 13, 2022, 9:12:05 AM