Changes between Version 25 and Version 26 of Public/User_Guide/News
- Timestamp:
- Jan 21, 2022, 11:36:18 AM (2 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Public/User_Guide/News
v25 v26 5 5 This is a summary of the latest news concerning the system. For a list of known problems related to the system, please refer to [wiki:Public/User_Guide/PaS this page]. 6 6 7 ''Last update: 202 1-12-10''7 ''Last update: 2022-01-21'' 8 8 9 [[span(style=color: #FF0000, System will be in maintenance in CW50 (Tuesday, 2021-12-14 to Thursday 2021-12-16))]] 10 9 [[span(style=color: #FF0000, Moving to Rocky Linux 8.5 on compute nodes, please expect limited system access. See also [wiki:News#SystemSystemsoftware System software] section)]] 11 10 {{{#!comment 12 11 [[span(style=color: #FF0000, System will be in maintenance in CW37 (Monday, 2020-09-07 to Friday 2020-09-11))]] … … 15 14 == System hardware == 16 15 17 {{{#!comment 16 18 17 === CM nodes === 19 }}} 18 19 - the cluster nodes have direct EBR IB access to the SSSM storage nodes now (without using the IB <-> 40 GbE gateway) 20 20 21 21 22 === ESB nodes === 22 23 23 - the first ESB rack (nodes `dp-esb[01-25]`) is planned to be revised to use IB again (instead of Extoll interconnect) 24 - date for the HW intervention not yet fixed, but due to delivery time unlikely to be performed this year 24 - all ESB nodes (`dp-esb[01-75]`) are using EDR Infiniband interconnect (no Extoll anymore) 25 - SSSM and AFSM file servers can be directly accessed through IB 26 27 25 28 26 29 === DAM nodes === 27 30 28 - along with first ESB rack also the DAM nodes will move to IB (instead of using 40 GbE and Extoll) 31 - DAM nodes are using EDR Infiniband (instead of using 40 GbE and Extoll) now 32 - SSSM and AFSM file servers can be directly accessed through IB 33 29 34 30 35 === Network Federation Gateways === 31 36 32 - with aiming for an "all IB" solution the NFGWs will become obsolete37 - with the all IB solution NFGWs are not needed anymore ! this also affects heterogenous jobs 33 38 34 - current status is 35 - 2x NF-GW EDR/Extoll (1 x Fabri3, 1 x Tourmalet) 36 - 2x NF-GW 40GbE/Extoll ( 1 x Fabri3, 1 x Tourmalet) 37 - 2x NF-GW EDR/40GbE 38 - the NF-GWs equipped with Fabri3 PCIe cards are not in operation 39 - for an example on how to use the gateway nodes and for further information, 40 please refer to the [wiki:/Public/User_Guide/Batch_system#HeterogeneousjobswithMPIcommunicationacrossmodules batchsystem] wiki page. 39 - `dp-nfgw[01,04]` (IB EDR <-> 40GbE) still present, but not in use anymore 40 41 - remaining NFGWs are being used for BXI testing now: `dp-nfgw[02,03,05,06]` 42 41 43 42 44 43 === Global resources === 45 46 === SDV === 47 48 - 4 node BXI test setup has been installed using the former GW nodes 49 50 - FPGA test nodes available for using FPGAs with oneAPI, OpenCL: 51 - Arria10: deeper-sdv[09,10] 52 - Stratix10: dp-sdv-esb[01,02] 53 54 55 56 {{{#!comment obsolete with removal of Extoll 44 57 45 58 ==== NAM ==== … … 47 60 - a NAM SW implementation has been done, a test environment on the DAM has been set up on dp-dam[09-16]. 48 61 - for more information on NAM usage and an example, please refer to [wiki:Public/User_Guide/TAMPI_NAM NAM with TAMPI] page 62 }}} 49 63 50 64 === File Systems === 51 65 52 **please refer to the** [wiki:Filesystems Filesystems] **overview**66 **please also refer to the** [wiki:Filesystems Filesystems] **overview** 53 67 54 - recent changes: 55 - a new All Flash Storage Module (AFSM) is in place and provides fast work file system mounted to `/work` on compute nodes and login node (`deepv`) 56 - the older System Services and Storage Module (SSSM) still serves the /usr/local file system 57 - SSSM storage has been rebuilt for performance reasons 58 - BeeGFS servers and clients have been updated 59 - BeeGFS (`/work`) user quotas is in place now (see section "User management") 60 - It is possible to access the `$ARCHIVE` file system from the `deepv` login node under `/arch`. 61 See hint in the MOTD for efficient usage of the archive filesystem 68 - the All Flash Storage Module (AFSM) provides a fast work file system mounted to `/afsm` (symbolic link to `/work`) on all compute nodes (CM, DAM, ESB) and the login node (`deepv`) 69 - the older System Services and Storage Module (SSSM) work file system is obsolete, but still available at (`/work_old`) for data migration 70 - SSSM still serves the /usr/local/software file system, but 71 - starting from Rocky 8 image `/usr/local` will be a local file system 72 - `/usr/local/software` will be shared and provided by the SSSM storage 73 - in addition to the !Easybuild software stack the shared `/usr/local/software` filesystem will contain manually installed software in a `legacy` subfolder 62 74 63 75 64 76 == System software == 65 77 66 === SW updates === 78 - ParaStation update (psmgmt) to 5.1.45-3 has been performed 67 79 68 - transition from CentOS to Rocky Linux currently being investigated 69 - a second login node running rocky linux 8.4 has been provided for testing: 70 `ssh -l your_judoor_id zam906` 71 - few CM nodes can be used as rocky linux testbed: `dp-cn[47-50]` 72 - please get in contact with `niessen(at)par-tec.com` if you would like to get access 80 === OS 81 82 - the decision was taken to use one Rocky linux image for all compute nodes 83 84 - transition from CentOS to Rocky Linux 8.5 is ongoing (starting on the CM and then continue with ESB and DAM) 85 86 - login node `deepv`has moved to Rocky linux 8 as well 73 87 74 88 75 - new SLURM version has been installed: 20.11.8. 76 - please use `--interactive` flag for interactive sessions now (see MOTD hint) 77 78 - 2021 Easybuild stage is being set up 79 - as of 2021-09-08 the default stage is `2020` (was `2019a` before) 80 81 - latest Intel oneAPI version is available in /usr/local/intel/oneapi 89 === !EasyBuild 82 90 83 91 84 === User management === 85 86 ==== BeeGFS Quotas ==== 87 88 - a quota for the BeeGFS file system (mounted to /work) has been implemented 89 - no need to activate thresholds yet 92 - 2022 !EasyBuild stage is being set up 93 - expect to have a first version (basic components like compilers etc.) available in Feburary 2022 90 94 91 95 92 93 94