Changes between Version 23 and Version 24 of Public/User_Guide/PaS


Ignore:
Timestamp:
Sep 1, 2020, 2:34:27 PM (4 years ago)
Author:
Jochen Kreutz
Comment:

HW status updated, SW status commented out (to be updated as well)

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/PaS

    v23 v24  
    33This page is intended to give a short overview on known issues and to provide potential solutions and workarounds to the issues seen.
    44
     5''Last update: 2020-09-01''
     6{{{#!comment highlighted red text
     7[[span(style=color: #FF0000, 2020-05-12: Currently no login possible )]]
     8}}}
     9
    510To stay informed, please refer to the [wiki:Public/User_Guide/News News page]. Also, please pay attention to the information contained in the "Message of the day" displayed when logging onto the system.
    611
    7 [[span(style=color: #FF0000, 2020-05-12: Currently no login possible )]]
    812
    913== Detected HW and node issues ==
     
    1115=== CM nodes ===
    1216
    13 {{{#!comment JK 2020-04-24: node back online
    14 * dp-cn08: node offline after memory issues (#2385) 
    15 }}}
    16 * dp-cn09 - dp-cn16: nodes currently reservedfor special use case during working hours
    17 * dp-cn49: node currently reserved for special use case
     17* dp-cn[09,10]: nodes currently reserved for special use case during working hours
     18* dp-cn11: node was not responding (#2426)
     19* dp-cn24: thermal trip asserted (#2443, #2306)
     20* dp-cn41: node not responding (#2477)
    1821
    1922=== DAM nodes ===
    2023
    21 * dp-dam03: node currently reserved for special use case
     24* dp-dam03: node currently reserved for special use case (#2242)
    2225* dp-dam04: showing low streams performance (#2401)
    2326* dp-dam05: node currently reserved for special use case
    2427* dp-dam07: showing problems with its FPGA (#2353)
    25 * dp-dam08: issues with second socket CPU seen (#2304)
     28* dp-dam[09,10]: nodes currently reserved for special use case during working hours
    2629
    2730
    2831=== ESB nodes ===
    2932
    30 * dp-esb08: GPU shows PCIe x8 connection only (#2370)
    31 * dp-esb11: no GPU device detected, under repair (#2358)
     33[[span(style=color: #FF0000, Currently facing issues in reading the ESB Energy Meter leading to nodes going offline. A fix is ready for roll-out)]]
     34
     35* dp-esb02: energy meter reading issues
     36* dp-esb03: energy meter reading issues (#2466)
     37* dp-esb11: wrong GPU Link Speed detected (#2358)
    3238* dp-esb23: MCE problems (#2350)
    33 * dp-esb24: used for oneAPI testing (#2396)
     39* dp-esb24: CentOS8 Testbed (#2396)
     40* dp-esb28: no access to bmc (#2430)
     41* dp-esb33: no access to bmc (#2429)
     42* dp-esb38: no access to bmc
     43* dp-esb39: energy meter reading issues (#2432)
     44* dp-esb52: energy meter reading issues (#2433)
     45* dp-esb71: energy meter reading issues (#2432)
     46* dp-esb73: energy meter reading issues (#2433)
    3447
    3548
    36 === SDV ESB nodes ===
     49=== SDV nodes ===
    3750
    38 * dp-sdv-esb[01,02]: replacement of V100 cards
     51* deeper-sdv01: node reachable via nework, but marked as down in SLURM
     52* nfgw[01,02]: node reachable via nework, but marked as down in SLURM
     53* knl01: NVMe issues (#2011)
    3954
    4055
     56{{{#!comment JK: status to be clarified on Thursday, 2020-09-03
    4157== Software issues ==
    4258
     
    110126slurmtop 2> /dev/null
    111127}}}
     128
     129}}}