[[TOC]]

This page is intended to give a short overview on known issues and to provide potential solutions and workarounds to the issues seen.

''Last update: 2021-12-10''

[[span(style=color: #FF0000, System maintenance from Tuesday, 2021-12-14 to Thursday, 2021-12-16, limited user access !)]]


{{{#!comment highlighted red text
[[span(style=color: #FF0000, System maintenance from Monday, 2020-09-07 to Friday, 2020-09-11, no user access !)]]
}}}

To stay informed, please refer to the [wiki:Public/User_Guide/News News page]. Also, please pay attention to the information contained in the "Message of the day" displayed when logging onto the system. 


== Detected HW and node issues ==

=== CM nodes ===

* dp-cn05: memory issue - node at Megware for repair (#2682)

* dp-cn25: FW issues (#2495)

* dp-cn42: memory issue (#2675)

* dp-cn[47-50]: rocky linux testbed 

       
=== DAM nodes ===

* dp-dam08: memory issues (#2722)


=== ESB nodes ===

{{{#!comment JK: EM client has been fixed
[[span(style=color: #FF0000, Currently facing issues in reading the ESB Energy Meter leading to nodes going offline. A fix is ready for roll-out)]]
}}}

* dp-esb[01-25]: currently being prepared as rocky linux testbed

* dp-esb75: node currently reserved for special use case (#2568)


=== SDV nodes ===

* deeper-sdv cluster nodes (Haswell) have been taken offline: deeper-sdv[01-16]
  - not included in SLURM anymore  
  - deeper-sdv[01-10] will be used for testing

* knl01: NVMe issues (#2011)


== Software issues ==

=== SLURM jobs ===

- due to introduction of accounting there is some re-configuration
  of user accounts needed within SLURM to assign the correct QOS levels and priorities for the jobs
  * this might lead to (temporary) failing job starts for certain users
  * if you cannot start jobs via SLURM, please write an email to the support list: `sup(at)deep-sea.eu`

{{{#!comment JK: invalid
=== GPU direct usage with Extoll on DAM ===

- new Extoll driver for GPU direct over Extoll still shows low performance on the DAM nodes
- available via Developer stage, for testing load:

{{{
ml --force purge
ml use $OTHERSTAGES
ml load Stages/Devel-2020
ml load Intel
ml load ParaStationMPI
}}}

- expect performance (and maybe also stability) issues
}}}