[[TOC]] This page is intended to give a short overview on known issues and to provide potential solutions and workarounds to the issues seen. ''Last update: 2020-10-27'' {{{#!comment highlighted red text [[span(style=color: #FF0000, System maintenance from Monday, 2020-09-07 to Friday, 2020-09-11, no user access !)]] }}} To stay informed, please refer to the [wiki:Public/User_Guide/News News page]. Also, please pay attention to the information contained in the "Message of the day" displayed when logging onto the system. == Detected HW and node issues == === CM nodes === * dp-cn25: node shows Unknown SPS FW Health (#2495) === DAM nodes === * dp-dam03: node currently reserved for special use case (#2242) === ESB nodes === {{{#!comment JK: EM client has been fixed [[span(style=color: #FF0000, Currently facing issues in reading the ESB Energy Meter leading to nodes going offline. A fix is ready for roll-out)]] }}} * dp-esb11: wrong GPU Link Speed detected (#2358) * dp-esb24: CentOS8 Testbed (#2396) * dp-esb52: energy meter reading issues (#2433) === SDV nodes === * deeper-sdv[01-16]: currently offline after removing Extoll network * nfgw[01,02]: node reachable via nework, but marked as down in SLURM * knl01: NVMe issues (#2011) == Software issues == === SLURM jobs === - due to introduction of accounting with the start of the early access program there is some re-configuration of user accounts needed within SLURM to assign the correct QOS levels and priorities for the jobs - this might lead to (temporary) failing job starts for certain users - if you cannot start jobs via SLURM, please write an email to the support list: `sup(at)deep-est.eu` === GPU direct usage with IB on ESB === - currently only available via Developer stage, for testing load: {{{ module --force purge module use $OTHERSTAGES module load Stages/Devel-2019a module load GCC/8.3.0 or module load Intel module load ParaStationMPI }}} - use `PSP_CUDA=1` and `PSP_UCP=1` === GPU direct usage with Extoll on DAM === - new Extoll driver for GPU direct over Extoll currently being tested on the DAM nodes - only available via Developer stage, for testing load: {{{ module --force purge module use $OTHERSTAGES module load Stages/Devel-2019a module load GCC/8.3.0 or module load Intel module load ParaStationMPI }}} - expect performance and stability issues