Changes between Version 28 and Version 29 of Public/User_Guide/PaS
- Timestamp:
- Oct 27, 2020, 2:17:11 PM (5 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Public/User_Guide/PaS
v28 v29 3 3 This page is intended to give a short overview on known issues and to provide potential solutions and workarounds to the issues seen. 4 4 5 ''Last update: 2020- 09-23''5 ''Last update: 2020-10-27'' 6 6 {{{#!comment highlighted red text 7 7 [[span(style=color: #FF0000, System maintenance from Monday, 2020-09-07 to Friday, 2020-09-11, no user access !)]] … … 15 15 === CM nodes === 16 16 17 * dp-cn[01,09,10]: nodes currently reserved for special use case during working hours 18 * dp-cn33: memory issues (#2464) 19 * dp-cn49: configuration change required (#2291) 20 * dp-cn50: node not reachable (#2488) 17 * dp-cn25: node shows Unknown SPS FW Health (#2495) 18 21 19 22 20 === DAM nodes === 23 21 24 22 * dp-dam03: node currently reserved for special use case (#2242) 25 * dp-dam04: showing low streams performance (#2401)26 * dp-dam05: node currently reserved for special use case27 * dp-dam07: showing problems with its FPGA (#2353)28 * dp-dam[09,10]: nodes currently reserved for special use case during working hours29 23 30 24 … … 35 29 }}} 36 30 37 * dp-esb05: node hangs in state "Idle+Completing"38 31 * dp-esb11: wrong GPU Link Speed detected (#2358) 39 32 * dp-esb24: CentOS8 Testbed (#2396) 40 * dp-esb39: energy meter reading issues (#2432)41 33 * dp-esb52: energy meter reading issues (#2433) 42 * dp-esb61: node not reachable (#2469)43 * dp-esb71: energy meter reading issues (#2432)44 * dp-esb73: energy meter reading issues (#2433)45 34 46 35 47 36 === SDV nodes === 48 37 49 * deeper-sdv 01: node reachable via nework, but marked as down in SLURM38 * deeper-sdv[01-16]: currently offline after removing Extoll network 50 39 * nfgw[01,02]: node reachable via nework, but marked as down in SLURM 51 40 * knl01: NVMe issues (#2011) 52 * ml-gpu02: memory issues reported with MCE (#2489)53 54 41 55 42 56 43 == Software issues == 57 58 === Modular jobs failing ===59 60 - users reported failing jobs that are doing MPI on more than one module using the gateways61 - the problem is being investigated62 63 44 64 45 === SLURM jobs ===