wiki:Public/User_Guide/PaS

Version 24 (modified by Jochen Kreutz, 4 years ago) (diff)

HW status updated, SW status commented out (to be updated as well)

This page is intended to give a short overview on known issues and to provide potential solutions and workarounds to the issues seen.

Last update: 2020-09-01

To stay informed, please refer to the News page. Also, please pay attention to the information contained in the "Message of the day" displayed when logging onto the system.

Detected HW and node issues

CM nodes

  • dp-cn[09,10]: nodes currently reserved for special use case during working hours
  • dp-cn11: node was not responding (#2426)
  • dp-cn24: thermal trip asserted (#2443, #2306)
  • dp-cn41: node not responding (#2477)

DAM nodes

  • dp-dam03: node currently reserved for special use case (#2242)
  • dp-dam04: showing low streams performance (#2401)
  • dp-dam05: node currently reserved for special use case
  • dp-dam07: showing problems with its FPGA (#2353)
  • dp-dam[09,10]: nodes currently reserved for special use case during working hours

ESB nodes

Currently facing issues in reading the ESB Energy Meter leading to nodes going offline. A fix is ready for roll-out

  • dp-esb02: energy meter reading issues
  • dp-esb03: energy meter reading issues (#2466)
  • dp-esb11: wrong GPU Link Speed detected (#2358)
  • dp-esb23: MCE problems (#2350)
  • dp-esb24: CentOS8 Testbed (#2396)
  • dp-esb28: no access to bmc (#2430)
  • dp-esb33: no access to bmc (#2429)
  • dp-esb38: no access to bmc
  • dp-esb39: energy meter reading issues (#2432)
  • dp-esb52: energy meter reading issues (#2433)
  • dp-esb71: energy meter reading issues (#2432)
  • dp-esb73: energy meter reading issues (#2433)

SDV nodes

  • deeper-sdv01: node reachable via nework, but marked as down in SLURM
  • nfgw[01,02]: node reachable via nework, but marked as down in SLURM
  • knl01: NVMe issues (#2011)