Changes between Version 25 and Version 26 of Public/User_Guide/News


Ignore:
Timestamp:
Jan 21, 2022, 11:36:18 AM (2 years ago)
Author:
Jochen Kreutz
Comment:

general update to reflect current system status

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/News

    v25 v26  
    55This is a summary of the latest news concerning the system. For a list of known problems related to the system, please refer to [wiki:Public/User_Guide/PaS this page].
    66
    7 ''Last update: 2021-12-10''
     7''Last update: 2022-01-21''
    88
    9 [[span(style=color: #FF0000, System will be in maintenance in CW50 (Tuesday, 2021-12-14 to Thursday 2021-12-16))]]
    10 
     9[[span(style=color: #FF0000, Moving to Rocky Linux 8.5 on compute nodes, please expect limited system access. See also [wiki:News#SystemSystemsoftware System software] section)]]
    1110{{{#!comment
    1211[[span(style=color: #FF0000, System will be in maintenance in CW37 (Monday, 2020-09-07 to Friday 2020-09-11))]]
     
    1514== System hardware ==
    1615
    17 {{{#!comment
     16
    1817=== CM nodes ===
    19 }}}
     18
     19- the cluster nodes have direct EBR IB access to the SSSM storage nodes now (without using the IB <-> 40 GbE gateway)
     20
    2021
    2122=== ESB nodes ===
    2223
    23 - the first ESB rack (nodes `dp-esb[01-25]`) is planned to be revised to use IB again (instead of Extoll interconnect)
    24   - date for the HW intervention not yet fixed, but due to delivery time unlikely to be performed this year
     24- all ESB nodes (`dp-esb[01-75]`) are using EDR Infiniband interconnect (no Extoll anymore)
     25- SSSM and AFSM file servers can be directly accessed through IB
     26
     27
    2528
    2629=== DAM nodes ===
    2730
    28 - along with first ESB rack also the DAM nodes will move to IB (instead of using 40 GbE and Extoll)
     31- DAM nodes are using EDR Infiniband (instead of using 40 GbE and Extoll) now
     32- SSSM and AFSM file servers can be directly accessed through IB
     33
    2934
    3035=== Network Federation Gateways ===
    3136
    32 - with aiming for an "all IB" solution the NFGWs will become obsolete
     37- with the all IB solution NFGWs are not needed anymore ! this also affects heterogenous jobs
    3338
    34 - current status is
    35   - 2x NF-GW EDR/Extoll (1 x Fabri3, 1 x Tourmalet)
    36   - 2x NF-GW 40GbE/Extoll ( 1 x Fabri3, 1 x Tourmalet)
    37   - 2x NF-GW EDR/40GbE
    38   - the NF-GWs equipped with Fabri3 PCIe cards are not in operation
    39   - for an example on how to use the gateway nodes and for further information,
    40     please refer to the [wiki:/Public/User_Guide/Batch_system#HeterogeneousjobswithMPIcommunicationacrossmodules batchsystem] wiki page.
     39- `dp-nfgw[01,04]` (IB EDR <-> 40GbE) still present, but not in use anymore
     40
     41- remaining NFGWs are being used for BXI testing now: `dp-nfgw[02,03,05,06]`
     42 
    4143
    4244
    43 === Global resources ===
     45
     46=== SDV ===
     47
     48- 4 node BXI test setup has been installed using the former GW nodes
     49
     50- FPGA test nodes available for using FPGAs with oneAPI, OpenCL:
     51  - Arria10: deeper-sdv[09,10]
     52  - Stratix10: dp-sdv-esb[01,02]
     53
     54
     55
     56{{{#!comment obsolete with removal of Extoll
    4457
    4558==== NAM ====
     
    4760- a NAM SW implementation has been done, a test environment on the DAM has been set up on dp-dam[09-16].
    4861- for more information on NAM usage and an example, please refer to [wiki:Public/User_Guide/TAMPI_NAM NAM with TAMPI] page
     62}}}
    4963
    5064=== File Systems ===
    5165
    52 **please refer to the**  [wiki:Filesystems Filesystems] **overview**
     66**please also refer to the**  [wiki:Filesystems Filesystems] **overview**
    5367
    54 - recent changes:
    55   - a new All Flash Storage Module (AFSM) is in place and provides fast work file system mounted to `/work` on compute nodes and login node (`deepv`)
    56     - the older System Services and Storage Module (SSSM) still serves the /usr/local file system
    57     - SSSM storage has been rebuilt for performance reasons
    58   - BeeGFS servers and clients have been updated
    59   - BeeGFS (`/work`) user quotas is in place now (see section "User management")
    60   - It is possible to access the `$ARCHIVE` file system from the `deepv` login node under `/arch`. 
    61     See hint in the MOTD for efficient usage of the archive filesystem
     68- the All Flash Storage Module (AFSM) provides a fast work file system mounted to `/afsm` (symbolic link to `/work`) on all compute nodes (CM, DAM, ESB) and the login node (`deepv`)
     69- the older System Services and Storage Module (SSSM) work file system is obsolete, but still available at (`/work_old`) for data migration
     70- SSSM still serves the /usr/local/software file system, but
     71  - starting from Rocky 8 image `/usr/local` will be a local file system
     72  - `/usr/local/software` will be shared and provided by the SSSM storage
     73  - in addition to the !Easybuild software stack the shared `/usr/local/software` filesystem will contain manually installed software in a `legacy` subfolder
    6274 
    6375
    6476== System software ==
    6577
    66 === SW updates ===
     78- ParaStation update (psmgmt) to 5.1.45-3 has been performed
    6779
    68 - transition from CentOS to Rocky Linux currently being investigated
    69   - a second login node running rocky linux 8.4 has been provided for testing:
    70     `ssh -l your_judoor_id zam906`
    71   - few CM nodes can be used as rocky linux testbed: `dp-cn[47-50]`
    72   - please get in contact with `niessen(at)par-tec.com` if you would like to get access
     80=== OS
     81
     82- the decision was taken to use one Rocky linux image for all compute nodes
     83
     84- transition from CentOS to Rocky Linux 8.5 is ongoing (starting on the CM and then continue with ESB and DAM)
     85
     86  - login node `deepv`has moved to Rocky linux 8 as well
    7387
    7488
    75 - new SLURM version has been installed: 20.11.8.         
    76   - please use `--interactive` flag for interactive sessions now (see MOTD hint)
    77 
    78 - 2021 Easybuild stage is being set up
    79   - as of 2021-09-08 the default stage is `2020` (was `2019a` before)
    80 
    81 - latest Intel oneAPI version is available in /usr/local/intel/oneapi
     89=== !EasyBuild
    8290
    8391
    84 === User management ===
    85 
    86 ==== BeeGFS Quotas ====
    87 
    88 - a quota for the BeeGFS file system (mounted to /work) has been implemented
    89   - no need to activate thresholds yet
     92- 2022 !EasyBuild stage is being set up
     93  - expect to have a first version (basic components like compilers etc.) available in Feburary 2022
    9094
    9195
    92 
    93 
    94