Changes between Version 13 and Version 14 of Public/User_Guide/PaS


Ignore:
Timestamp:
Apr 16, 2020, 1:44:39 PM (4 years ago)
Author:
Jochen Kreutz
Comment:

updated info on offlined nodes and currently know SW issues

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/PaS

    v13 v14  
    99=== CM nodes ===
    1010
    11 * dp-cn33: node still offline after memory issues (#2338) 
    12 * dp-cn49 and dp-cn50: nodes currently reserved for special use case
     11* dp-cn08: node offline after memory issues (#2385) 
     12* dp-cn09: node currently reserved for special use case
     13* dp-cn10: node currently reserved for special use case
     14* dp-cn29: node still offline after memory issues (#2395) 
     15* dp-cn49: node currently reserved for special use case
    1316
    1417=== DAM nodes ===
    1518
    16 * dp-dam03: being investigated after unexptected reboot (#2323)
     19* dp-dam03: node currently reserved for special use case
    1720* dp-dam07: showing problems with its FPGA (#2353)
    1821* dp-dam08: issues with second socket CPU seen (#2304)
     22* dp-dam09: node currently reserved for special use case
     23* dp-dam10: node currently reserved for special use case
    1924
    2025=== ESB nodes ===
     
    2328* dp-esb11: no GPU device detected, under repair (#2358)
    2429* dp-esb23: MCE problems (#2350)
     30* dp-esb24: offline due to spontaneous reboot
    2531
    2632
    2733== Software issues ==
     34
     35=== LDAP error message during login ===
     36
     37- currntly, a failover between the two master nodes might lead to seeing the following or a similar error message during login:
     38
     39{{{
     40Error: ldap_search: failed to open connection to LDAP server(s) and search. Exception: socket connection error while opening: [Errno 111] Connection refused
     41}}}
     42
     43- the message usually can be ignored
     44- if you see further issues or cannot login at all, please write an email to the support list: `sup(at)deep-est.eu`
     45
     46=== SLURM jobs ===
     47
     48- due to introduction of accounting with the start of the early access program there is some re-configuration
     49  of user accounts needed within SLURM to assign the correct QOS levels and priorities for the jobs
     50- this might lead to (temporary) failing job starts for certain users
     51- if you cannot start jobs via SLURM, please write an email to the support list: `sup(at)deep-est.eu`
    2852
    2953=== GPU direct usage with IB on ESB ===