Changes between Version 23 and Version 24 of Public/User_Guide/PaS
- Timestamp:
- Sep 1, 2020, 2:34:27 PM (5 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Public/User_Guide/PaS
v23 v24 3 3 This page is intended to give a short overview on known issues and to provide potential solutions and workarounds to the issues seen. 4 4 5 ''Last update: 2020-09-01'' 6 {{{#!comment highlighted red text 7 [[span(style=color: #FF0000, 2020-05-12: Currently no login possible )]] 8 }}} 9 5 10 To stay informed, please refer to the [wiki:Public/User_Guide/News News page]. Also, please pay attention to the information contained in the "Message of the day" displayed when logging onto the system. 6 11 7 [[span(style=color: #FF0000, 2020-05-12: Currently no login possible )]]8 12 9 13 == Detected HW and node issues == … … 11 15 === CM nodes === 12 16 13 {{{#!comment JK 2020-04-24: node back online 14 * dp-cn08: node offline after memory issues (#2385) 15 }}} 16 * dp-cn09 - dp-cn16: nodes currently reservedfor special use case during working hours 17 * dp-cn49: node currently reserved for special use case 17 * dp-cn[09,10]: nodes currently reserved for special use case during working hours 18 * dp-cn11: node was not responding (#2426) 19 * dp-cn24: thermal trip asserted (#2443, #2306) 20 * dp-cn41: node not responding (#2477) 18 21 19 22 === DAM nodes === 20 23 21 * dp-dam03: node currently reserved for special use case 24 * dp-dam03: node currently reserved for special use case (#2242) 22 25 * dp-dam04: showing low streams performance (#2401) 23 26 * dp-dam05: node currently reserved for special use case 24 27 * dp-dam07: showing problems with its FPGA (#2353) 25 * dp-dam 08: issues with second socket CPU seen (#2304)28 * dp-dam[09,10]: nodes currently reserved for special use case during working hours 26 29 27 30 28 31 === ESB nodes === 29 32 30 * dp-esb08: GPU shows PCIe x8 connection only (#2370) 31 * dp-esb11: no GPU device detected, under repair (#2358) 33 [[span(style=color: #FF0000, Currently facing issues in reading the ESB Energy Meter leading to nodes going offline. A fix is ready for roll-out)]] 34 35 * dp-esb02: energy meter reading issues 36 * dp-esb03: energy meter reading issues (#2466) 37 * dp-esb11: wrong GPU Link Speed detected (#2358) 32 38 * dp-esb23: MCE problems (#2350) 33 * dp-esb24: used for oneAPI testing (#2396) 39 * dp-esb24: CentOS8 Testbed (#2396) 40 * dp-esb28: no access to bmc (#2430) 41 * dp-esb33: no access to bmc (#2429) 42 * dp-esb38: no access to bmc 43 * dp-esb39: energy meter reading issues (#2432) 44 * dp-esb52: energy meter reading issues (#2433) 45 * dp-esb71: energy meter reading issues (#2432) 46 * dp-esb73: energy meter reading issues (#2433) 34 47 35 48 36 === SDV ESBnodes ===49 === SDV nodes === 37 50 38 * dp-sdv-esb[01,02]: replacement of V100 cards 51 * deeper-sdv01: node reachable via nework, but marked as down in SLURM 52 * nfgw[01,02]: node reachable via nework, but marked as down in SLURM 53 * knl01: NVMe issues (#2011) 39 54 40 55 56 {{{#!comment JK: status to be clarified on Thursday, 2020-09-03 41 57 == Software issues == 42 58 … … 110 126 slurmtop 2> /dev/null 111 127 }}} 128 129 }}}