[[TOC]] = Latest news on the DEEP-EST prototype system = This is a summary of the latest news concerning the system. For a list of known problems related to the system, please refer to [wiki:Public/User_Guide/PaS this page]. ''Last update: 2022-10-13'' {{{#!comment [[span(style=color: #FF0000, System will be in maintenance in CW37 (Monday, 2020-09-07 to Friday 2020-09-11))]] }}} == System software == - !ParaStation update (psmgmt) to 5.1.51-0 has been performed === OS - compute nodes and login node have been updated to Rocky 8.6 - file servers and master nodes to follow === !EasyBuild - 2022 !EasyBuild stage is the default now - depenencies to rocky 8.6 have been resolved by re-installing EB core packages like Python and Glibc == System hardware == === CM nodes === - the cluster nodes have direct EBR IB access to the SSSM storage nodes now (without using the IB <-> 40 GbE gateway) === ESB nodes === - all ESB nodes (`dp-esb[01-75]`) are using EDR Infiniband interconnect (no Extoll anymore) - SSSM and AFSM file servers can be directly accessed through IB === DAM nodes === - DAM nodes are using EDR Infiniband (instead of using 40 GbE and Extoll) now - SSSM and AFSM file servers can be directly accessed through IB - accelerator layout has been revised: - `dp-dam[01-08]`: 1 x Nvidia V100 GPU - `dp-dam[09-12]`: 2 x Nvidia V100 GPU - `dp-dam[13-16]`: 2 x Intel PAC D5005 FPGA === Network Federation Gateways === - with the all IB solution NFGWs are not needed anymore ! this also affects heterogenous jobs - `dp-nfgw[01,04]` (IB EDR <-> 40GbE) still present, but not in use anymore - remaining NFGWs are being used for BXI testing now: `dp-nfgw[02,03,05,06]` === SDV === - 4 node BXI test setup has been installed using the former GW nodes - FPGA test nodes available for using FPGAs with oneAPI, OpenCL: - Arria10: deeper-sdv[09,10] - Stratix10: dp-sdv-esb[01,02] == File Systems == **please also refer to the** [wiki:Filesystems Filesystems] **overview** - quota has been added to `/tmp` on `deepv` to avoid congestion - the All Flash Storage Module (AFSM) provides a fast work file system mounted to `/afsm` (symbolic link to `/work`) on all compute nodes (CM, DAM, ESB) and the login node (`deepv`) - it is managed via project subfolders: after activating a project environment using `jutil` command the `$WORK` will be set accordingly - the older System Services and Storage Module (SSSM) work file system is obsolete, but still available at (`/work_old`) for data migration - SSSM still serves the /usr/local/software file system, but - starting from Rocky 8 image `/usr/local` will is a local file system on the compute nodes - `/usr/local/software` is still shared and provided by the SSSM storage - in addition to the !Easybuild software stack the shared `/usr/local/software` filesystem contains some manually installed software in a `legacy` subfolder