Changes between Version 46 and Version 47 of Public/User_Guide/PaS
- Timestamp:
- Jul 8, 2022, 8:34:47 AM (3 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Public/User_Guide/PaS
v46 v47 7 7 **Please, use the support mailing list `sup(at)deep-sea-project.eu` to report any issues** 8 8 9 {{{#!comment highlighted red text 10 [[span(style=color: #FF0000, System maintenance from Monday, 2020-09-07 to Friday, 2020-09-11, no user access !)]] 11 }}} 9 {{{#!comment highlighted red text [[span(style=color: #FF0000, System maintenance from Monday, 2020-09-07 to Friday, 2020-09-11, no user access !)]] }}} 12 10 13 To stay informed, please refer to the [wiki:Public/User_Guide/News News page]. Also, please pay attention to the information contained in the "Message of the day" displayed when logging onto the system. 14 11 To stay informed, please refer to the [wiki:Public/User_Guide/News News page]. Also, please pay attention to the information contained in the "Message of the day" displayed when logging onto the system. 15 12 16 13 == Detected HW and node issues == 14 === CM nodes === 15 * dp-cn25: SEL ProblemsFW issues (#2769) 17 16 18 === CM nodes === 17 * dp-cn27: MCE Errors found (#2919) 19 18 20 * dp-cn25: SEL ProblemsFW issues (#2769)21 22 * dp-cn27: MCE Errors found (#2919)23 24 25 19 === DAM nodes === 26 27 * dp-dam02: reserved for FPGA tests 28 * dp-dam03: PCI link speed degraded (#2931) 29 * dp-dam10: PMEM module issue (#2875) 30 * dp-dam16: testbed 31 20 * dp-dam02: reserved for FPGA tests 21 * dp-dam03: PCI link speed degraded (#2931) 22 * dp-dam10: PMEM module issue (#2875) 23 * dp-dam16: testbed 32 24 33 25 === ESB nodes === 34 35 * dp-esb[07]: used for Rocky 8.6 tests 36 * dp-esb[11]: memory issues 37 26 * dp-esb[07]: used for Rocky 8.6 tests 27 * dp-esb[11]: memory issues 38 28 39 29 === SDV nodes === 30 * deeper-sdv cluster nodes (Haswell) have been taken offline: deeper-sdv[01-16] 31 * not included in SLURM anymore 32 * deeper-sdv[09-10] used for testing (please contact j.kreutz(at)fz-juelich.de if you would like to get access 40 33 41 * deeper-sdv cluster nodes (Haswell) have been taken offline: deeper-sdv[01-16] 42 - not included in SLURM anymore 43 - deeper-sdv[09-10] used for testing (please contact j.kreutz(at)fz-juelich.de if you would like to get access 34 * knl01: serves as golden client for imaging only 44 35 45 * knl01: serves as golden client for imaging only 46 47 * dp-sdv-esb[01,02]: Slurm update required 48 36 * dp-sdv-esb[01,02]: Slurm update required 49 37 50 38 == Software issues == 51 52 39 === nvidia driver mismatch === 53 54 - loading CUDA module and trying to run `nvidia-smi` (or any application trying to use the GPU) leads to 40 * loading CUDA module and trying to run `nvidia-smi` (or any application trying to use the GPU) leads to 55 41 56 42 {{{ 57 43 Failed to initialize NVML: Driver/library version mismatch 58 44 }}} 59 60 - workaround is to unload the unload the driver module: `ml -nvidia-driver/.default` 45 * workaround is to unload the unload the driver module: `ml -nvidia-driver/.default` 46 * for furhter information, please also see [https://gitlab.jsc.fz-juelich.de/hps-public/easybuild-repository/-/wikis/Failed-to-initialize-NVML-Driver-library-version-mismatch-message here][[BR]] 61 47 62 48 === Easybuild === 63 64 - Moving the new Easybuild stage 2022 (in February) might cause unexpected behavior and problems with the installed software components: 65 66 67 68 69 49 * Moving the new Easybuild stage 2022 (in February) might cause unexpected behavior and problems with the installed software components: 70 50 71 51 {{{#!comment JK: invalid 52 72 53 === GPU direct usage with Extoll on DAM === 73 74 - new Extoll driver for GPU direct over Extoll still shows low performance on the DAM nodes 75 - available via Developer stage, for testing load: 54 * new Extoll driver for GPU direct over Extoll still shows low performance on the DAM nodes 55 * available via Developer stage, for testing load: 76 56 77 57 {{{ … … 82 62 ml load ParaStationMPI 83 63 }}} 64 * expect performance (and maybe also stability) issues 84 65 85 - expect performance (and maybe also stability) issues86 66 }}}