Changes between Version 54 and Version 55 of Public/User_Guide/PaS


Ignore:
Timestamp:
Sep 21, 2022, 1:35:36 PM (20 months ago)
Author:
Jochen Kreutz
Comment:

update list of broken nodes and sw issues

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/PaS

    v54 v55  
    33This page is intended to give a short overview on known issues and to provide potential solutions and workarounds to the issues seen.
    44
    5 ''Last update: 2022-09-15''
     5''Last update: 2022-09-21''
    66
    77
     
    1717=== CM nodes ===
    1818 * dp-cn25: SEL ProblemsFW issues (#2769)
    19 
    20  * dp-cn27: MCE Errors found (#2919)
     19 * dp-cn30: Image update needed (#2991)
     20 * dp-cn35: Image update needed (#3005)
     21 * dp-cn36: Image update needed (fixed EM issue, see #2992)
     22 * dp-cn37: Image update needed (fixed EM issue, see #2993)
     23 * dp-cn[47-50]: BeeOnd testbed
    2124
    2225=== DAM nodes ===
    2326 * dp-dam02: reserved for FPGA tests
    2427 * dp-dam03: PCI link speed degraded (#2931)
    25  * dp-dam10: PMEM module issue (#2875)
     28 * dp-dam08: no turbo mode (#2974)
    2629 * dp-dam16: testbed
    2730
    2831=== ESB nodes ===
    2932 * dp-esb[07]: used for Rocky 8.6 tests
    30  * dp-esb[11]: memory issues
     33 * dp-esb[11]: memory issues (#2857)
     34 * dp-esb[25]: Image update needed
     35 * dp-esb[31]: GPU issues (#2949)
     36 * dp-esb[47]: SEL Problems (#2998)
     37 * dp-esb[61]: Eth connections issues (#3010)
     38 * dp-esb[65]: Eth connection issues (#2978)
    3139
    3240=== SDV nodes ===
     
    3745 * knl01: serves as golden client for imaging only
    3846
    39  * dp-sdv-esb[01,02]: Slurm update required
     47 * dp-sdv-esb[01,02]: will only be powered on demand
    4048
    4149== Software issues ==
     50
     51{{{#!comment solved with EB 2022 stage
    4252=== nvidia driver mismatch ===
    4353 * loading CUDA module and trying to run `nvidia-smi` (or any application trying to use the GPU) leads to
     
    4858 * workaround is to unload the unload the driver module: `ml -nvidia-driver/.default`
    4959 * for furhter information, please also see  [https://gitlab.jsc.fz-juelich.de/hps-public/easybuild-repository/-/wikis/Failed-to-initialize-NVML-Driver-library-version-mismatch-message here][[BR]]
     60}}}
     61
    5062
    5163=== nvidia profiling tools ===
     
    6072 * you will still see a warning "OpenGL Version check failed. Falling back to Mesa software rendering.", but the profling tool (e.g. `nsight-sys`) should start up
    6173
    62 === Easybuild ===
    63  * Moving the new Easybuild stage 2022 (in February) might cause unexpected behavior and problems with the installed software components:
    64 
    65 {{{#!comment JK: invalid
    66 
    67 === GPU direct usage with Extoll on DAM ===
    68  * new Extoll driver for GPU direct over Extoll still shows low performance on the DAM nodes
    69  * available via Developer stage, for testing load:
    70 
    71 {{{
    72 ml --force purge
    73 ml use $OTHERSTAGES
    74 ml load Stages/Devel-2020
    75 ml load Intel
    76 ml load ParaStationMPI
    77 }}}
    78  * expect performance (and maybe also stability) issues
    79 
    80 }}}