Changes between Version 39 and Version 40 of Public/User_Guide/PaS


Ignore:
Timestamp:
Aug 27, 2021, 11:47:07 AM (3 years ago)
Author:
Jochen Kreutz
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/PaS

    v39 v40  
    33This page is intended to give a short overview on known issues and to provide potential solutions and workarounds to the issues seen.
    44
    5 ''Last update: 2021-04-09''
     5''Last update: 2021-08-27''
    66
    7 [[span(style=color: #FF0000, System maintenance from Tuesday, 2021-04-13 to Thursday, 2021-04-15, no user access !)]]
     7[[span(style=color: #FF0000, 2021-08-27: Filesystem issues, no user access !)]]
    88{{{#!comment highlighted red text
    99[[span(style=color: #FF0000, System maintenance from Monday, 2020-09-07 to Friday, 2020-09-11, no user access !)]]
     
    1717=== CM nodes ===
    1818
     19* dp-cn05: memory issue - node at Megware for repair (#2682)
     20
    1921* dp-cn25: FW issues (#2495)
     22
     23* dp-cn42: memory issue (#2675)
     24
    2025       
    21 
     26{{{#!comment
    2227=== DAM nodes ===
    2328
     
    2530* dp-dam02: node currently reserved for special use case (#2554)
    2631* dp-dam03: node currently reserved for special use case (#2242)
    27 
     32}}}
    2833
    2934=== ESB nodes ===
     
    3338}}}
    3439
    35 * **dp-esb[01-25]: currently not avialable due to Fabri3 installation**
     40* dp-esb[01-25]: currently being prepared as rocky linux testbed
     41
    3642* dp-esb75: node currently reserved for special use case (#2568)
    3743
     
    3945=== SDV nodes ===
    4046
    41 * several nodes have been taken offline:
    42   - deeper-sdv[11-16]
    43   - deeper storage system: (deeper-fs[01-03], deeper-raids)
    44 * deeper-sdv[01-10]: currently not available: configuration change needed (low priority)
     47* deeper-sdv cluster nodes (Haswell) have been taken offline: deeper-sdv[11-16]
     48  - not included in SLURM anymore 
     49  - will be used for testing
     50
    4551* knl01: NVMe issues (#2011)
    4652
     
    5056=== SLURM jobs ===
    5157
    52 - due to introduction of accounting with the start of the early access program there is some re-configuration
     58- due to introduction of accounting there is some re-configuration
    5359  of user accounts needed within SLURM to assign the correct QOS levels and priorities for the jobs
    54 - this might lead to (temporary) failing job starts for certain users
    55 - if you cannot start jobs via SLURM, please write an email to the support list: `sup(at)deep-est.eu`
    56 
    57 
    58 === GPU direct usage with IB on ESB ===
    59 
    60 - only available via Developer stage, for testing load:
    61 
    62 {{{
    63 ml --force purge
    64 ml use $OTHERSTAGES
    65 ml load Stages/Devel-2020
    66 ml load Intel
    67 ml load ParaStationMPI
    68 }}}
    69 
    70 - use `PSP_CUDA=1` and `PSP_UCP=1`
     60  * this might lead to (temporary) failing job starts for certain users
     61  * if you cannot start jobs via SLURM, please write an email to the support list: `sup(at)deep-est.eu`
    7162
    7263