Changes between Version 43 and Version 44 of Public/User_Guide/PaS


Ignore:
Timestamp:
Jan 21, 2022, 12:12:14 PM (2 years ago)
Author:
Jochen Kreutz
Comment:

general update to reflect current system status

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/PaS

    v43 v44  
    33This page is intended to give a short overview on known issues and to provide potential solutions and workarounds to the issues seen.
    44
    5 ''Last update: 2021-12-10''
     5''Last update: 2022-01-21''
    66
    7 [[span(style=color: #FF0000, System maintenance from Tuesday, 2021-12-14 to Thursday, 2021-12-16, limited user access !)]]
     7[[span(style=color: #FF0000, Rocky 8.5 being rolled out to the compute nodes, expect limited access to some of the nodes !)]]
    88
    99
     
    1919=== CM nodes ===
    2020
    21 * dp-cn05: memory issue - node at Megware for repair (#2682)
     21* dp-cn06: MCE Errors found (#2819)
    2222
    23 * dp-cn25: FW issues (#2495)
     23* dp-cn25: SEL ProblemsFW issues (#2769)
    2424
    25 * dp-cn42: memory issue (#2675)
    26 
    27 * dp-cn[47-50]: rocky linux testbed
     25* several cluster nodes marked as down in the scope of the Rocky 8.5 roll out
    2826
    2927       
    3028=== DAM nodes ===
    3129
    32 * dp-dam08: memory issues (#2722)
     30* dp-dam02: reserved for FPGA tests
     31* dp-dam[05-08]: reservation "maint-dam-rocky85" in place for Rocky 8.5 tests
     32* dp-dam[09-16]: OS update ongoing
    3333
    3434
    3535=== ESB nodes ===
    3636
    37 {{{#!comment JK: EM client has been fixed
    38 [[span(style=color: #FF0000, Currently facing issues in reading the ESB Energy Meter leading to nodes going offline. A fix is ready for roll-out)]]
    39 }}}
    40 
    41 * dp-esb[01-25]: currently being prepared as rocky linux testbed
    42 
    43 * dp-esb75: node currently reserved for special use case (#2568)
     37* dp-esb[01,02]: pshealthcheck failed for BeeGFS
     38* dp-esb[07,13,16,22]: problems with energy meter
    4439
    4540
     
    4843* deeper-sdv cluster nodes (Haswell) have been taken offline: deeper-sdv[01-16]
    4944  - not included in SLURM anymore 
    50   - deeper-sdv[01-10] will be used for testing
     45  - deeper-sdv[09-10] used for testing (please contact j.kreutz(at)fz-juelich.de if you would like to get access
    5146
    5247* knl01: NVMe issues (#2011)
     48
     49* dp-sdv-esb[01,02]: Slurm update required
    5350
    5451
    5552== Software issues ==
    5653
    57 === SLURM jobs ===
     54- Moving to Rocky 8.5 and the new Easybuild stage 2022 (in February) might cause unexpected behavior and problems with the installed software components:
    5855
    59 - due to introduction of accounting there is some re-configuration
    60   of user accounts needed within SLURM to assign the correct QOS levels and priorities for the jobs
    61   * this might lead to (temporary) failing job starts for certain users
    62   * if you cannot start jobs via SLURM, please write an email to the support list: `sup(at)deep-sea.eu`
     56
     57**Please, use the support mailing list `sup(at)deep-sea-project.eu` to report any issues**
     58
     59
    6360
    6461{{{#!comment JK: invalid