Changes between Version 66 and Version 67 of Public/User_Guide/PaS
- Timestamp:
- Mar 16, 2023, 8:50:42 AM (2 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Public/User_Guide/PaS
v66 v67 3 3 This page is intended to give a short overview on known issues and to provide potential solutions and workarounds to the issues seen. 4 4 5 ''Last update: 202 2-12-08''5 ''Last update: 2023-03-16'' 6 6 7 7 {{{#!comment … … 16 16 The system status is reported on [https://status.jsc.fz-juelich.de/ JSC status] as well. 17 17 18 19 == Login node == 20 21 * Time limit for user processes enforced on deepv login: **Processes will be killed after 24 hours** In case of problems, please contact niessen@par-tec.com 22 18 23 == Detected HW and node issues == 19 24 20 25 === Cooling issues === 21 * pump failures for JSC cooling loop have been detected 22 * root cause still to be idedtified 23 * considering manual mode to allow for operation of CM and ESB nodes in the meantime 26 * pump in JSC cooling loop is running in manual mode: frequently running HPL jobs (with low priority) to create some load (waste heat) 27 * HPL jobs can be killed on demand: in case of problems (your jobs being blocked by HPL runs), please contact j.kreutz@fz-juelich.de or niessen@par-tec.com 24 28 25 29 === CM nodes === 26 * dp-cn25: SEL ProblemsFW issues(#2769)30 * dp-cn25: Thermal issues within chassis slot (#2769) 27 31 28 32 === DAM nodes === 29 33 * dp-dam02: reserved for FPGA tests 34 * dp-dam13: failing healthcheck: memory_not_reclaimable 30 35 * dp-dam16: testbed 31 36 32 37 === ESB nodes === 33 * dp-esb[11]: memory issues (#2857) 34 * dp-esb[31]: GPU issues (#2949) 38 * dp-esb[07]: wrong BIOS settings (#2881) 39 * dp-esb[17]: IB HCA issues (#3140) 40 * dp-esb[75]: Easybuild testbed (#3094) 35 41 36 42 … … 47 53 48 54 === MODULEPATH 49 50 * MODULEPATH variable seems to get overwritten though being set correctly in `/etc/profile.d/modules.sh`55 56 * MODULEPATH variable might get overwritten when switching stages 51 57 * leads to various modules not being detected / found correctly 52 * re-setting the MODULEPATH manually might solve the issue, please try:58 * re-setting the MODULEPATH manually might solve the issue, e.g. for the 2022 stage, please try: 53 59 {{{ 54 60 export MODULEPATH=/usr/local/software/skylake/Stages/2022/modules/all/Compiler/sidecompiler/GCCcore/11.2.0:/usr/local/software/skylake/Stages/2022/modules/all/Compiler/GCCcore/11.2.0:/usr/local/software/skylake/Stages/2022/modules/all/Core:/usr/local/software/skylake/Stages/2022/modules/all/MPI:/usr/local/software/skylake/Stages/2022/modules/all/MPI_settings:/usr/local/software/skylake/Stages/2022/modules/all/comm_settings:/usr/local/software/skylake/Stages/2022/modules/all/pkg_settings:usr/local/software/skylake/Stages/2022/UI/Defaults:/usr/local/software/skylake/Stages/2022/UI/Tools:/usr/local/software/skylake/Stages/2022/UI/Compilers:/usr/local/software/skylake/userinstallations:/usr/local/software/skylake/OtherStages:/usr/local/software/skylake/Devel … … 59 65 === Cuda and Rocky 8.6 60 66 61 - New CUDA drivers on the compute nodes. In case of problems, please manually prepend your `LD_LIBRARY_PATH` (first for libcuda, second for libcublas, fft, etc.):67 - New CUDA drivers on the compute nodes. In case of problems, please manually prepend your `LD_LIBRARY_PATH` (first for libcuda, second for libcublas, fft, etc.): 62 68 {{{ 63 69 ln -s /usr/lib64/libcuda.so.1 .