Context Navigation

Changes between Version 39 and Version 40 of Public/User_Guide/PaS

Timestamp:: Aug 27, 2021, 11:47:07 AM (4 years ago)
Author:: Jochen Kreutz
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

Public/User_Guide/PaS

-                      v39
+                      v40
 This page is intended to give a short overview on known issues and to provide potential solutions and workarounds to the issues seen.
 ''Last update: 2021-04-09''
+''Last update: 2021-08-27''
 [[span(style=color: #FF0000, System maintenance from Tuesday, 2021-04-13 to Thursday, 2021-04-15, no user access !)]]
+[[span(style=color: #FF0000, 2021-08-27: Filesystem issues, no user access !)]]
 {{{#!comment highlighted red text
 [[span(style=color: #FF0000, System maintenance from Monday, 2020-09-07 to Friday, 2020-09-11, no user access !)]]
 …
 === CM nodes ===
+* dp-cn05: memory issue - node at Megware for repair (#2682)
 * dp-cn25: FW issues (#2495)
+* dp-cn42: memory issue (#2675)
+{{{#!comment
 === DAM nodes ===
 …
 * dp-dam02: node currently reserved for special use case (#2554)
 * dp-dam03: node currently reserved for special use case (#2242)
+}}}
 === ESB nodes ===
 …
 }}}
+* **dp-esb[01-25]: currently not avialable due to Fabri3 installation**
+* dp-esb[01-25]: currently being prepared as rocky linux testbed
 * dp-esb75: node currently reserved for special use case (#2568)
 …
 === SDV nodes ===
 * several nodes have been taken offline:
   - deeper-sdv[11-16]
   - deeper storage system: (deeper-fs[01-03], deeper-raids)
+* deeper-sdv[01-10]: currently not available: configuration change needed (low priority)
+* deeper-sdv cluster nodes (Haswell) have been taken offline: deeper-sdv[11-16]
+  - not included in SLURM anymore
+  - will be used for testing
 * knl01: NVMe issues (#2011)
 …
 === SLURM jobs ===
 - due to introduction of accounting with the start of the early access program there is some re-configuration
+- due to introduction of accounting there is some re-configuration
   of user accounts needed within SLURM to assign the correct QOS levels and priorities for the jobs
+- this might lead to (temporary) failing job starts for certain users
+- if you cannot start jobs via SLURM, please write an email to the support list: `sup(at)deep-est.eu`
+=== GPU direct usage with IB on ESB ===
+- only available via Developer stage, for testing load:
+{{{
+ml --force purge
+ml use $OTHERSTAGES
+ml load Stages/Devel-2020
+ml load Intel
+ml load ParaStationMPI
+}}}
+- use `PSP_CUDA=1` and `PSP_UCP=1`
+  * this might lead to (temporary) failing job starts for certain users
+  * if you cannot start jobs via SLURM, please write an email to the support list: `sup(at)deep-est.eu`