Changes between Version 44 and Version 45 of Public/User_Guide/System_overview


Ignore:
Timestamp:
Jan 6, 2022, 12:50:09 PM (2 years ago)
Author:
Jochen Kreutz
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/System_overview

    v44 v45  
    22
    33= System overview =
     4
     5
     6{{{#!comment
     7[[span(style=color: #FF0000, **Last Update:** )]] 2022-01-06
     8}}}
     9
    410This page is supposed to give a short overview on the available systems from a hardware point of view. All hardware can be reached through a login node via SSH: '''!deep@fz-juelich.de'''.  The login node is implemented as virtual machine hosted by the master nodes (in a failover mode).  Please, see also information about [wiki:Public/User_Guide/Account getting an account] and using the [wiki:Public/User_Guide/Batch_system batch system].
    511
     
    1420
    1521In addition to the three compute modules, a Scalable Storage Service Module (SSSM) provides shared storage infrastructure for the DEEP-EST prototype (`/usr/local`) and is accompanied by the All Flash Storage Module (AFSM) leveraging a fast local work filesystem (`/afsm`) on the compute nodes.
    16 All modules are connected via a 100 Gbp/s EDR IB network in a non-blocking tree topology. In addition the system is connected to the Jülich storage system (JUST) to share home and project file systems with other HPC systems hosted at Jülich Supercompting Centre (JSC).
     22All modules are connected via a 100 Gbp/s EDR IB network in a non-blocking tree topology accompanied by a Gigabit Ethernet service network. In addition the system is connected to the Jülich storage system (JUST) to share home and project file systems with other HPC systems hosted at Jülich Supercompting Centre (JSC).
    1723
    1824=== Cluster Module ===
     
    4147   * 48 GB RAM
    4248   * 1 x 512 GB SSD
    43    * network: IB EDR 100 (Gb/s) (nodes `dp-esb[01-25]` to be converted from Extoll to IB EDR)
     49   * network: IB EDR (100 Gb/s)
    4450 
    4551}}}
     
    4854}}}
    4955
    50 {{{#!comment
    51 [[span(style=color: #FF0000, **Attention:** )]] the Extreme Scale Booster will become available in March 2020.
    52 }}}
    5356
    5457=== Data Analytics Module ===
     
    5962 * Data Analytics Module [16 nodes]: `dp-dam[01-16]`
    6063   * 2 x Intel Xeon 'Cascade Lake' Platinum 8260M CPU @ 2.40GHz
    61    * 1 x Nvidia V100 Tesla GPU (32 GB HBM2)
    62    * 1 x Intel STRATIX10 FPGA (32 GB DDR4)
    63    * 384 GB RAM + 3 TB non-volatile memory ( 14 nodes with 2, 2 nodes with 3)
     64   * dp-dam[01-08]: 1 x Nvidia V100 Tesla GPU (32 GB HBM2)
     65   * dp-dam[09-12]: 2 x Nvidia V100 Tesla GPU (32 GB HBM2)
     66   * dp-dam[13-16]: 2 x Intel STRATIX10 FPGA (32 GB DDR4)
     67   * 384 GB RAM + 3 TB non-volatile memory
    6468   * 2 x 1.5 TB Intel Optane SSD (1 for local scratch, 1 for BeeOND)
    6569   * 1 x 240 GB SSD (for boot and OS)
    66    * network: EXTOLL (100 Gb/s) + 40 Gb Ethernet (to be converted to IB EDR)
     70   * network: IB EDR (100 Gb/s)
    6771
    6872}}}
     
    7276
    7377=== Scalable Storage Service Module ===
    74 It is based on spinning disks. It is composed of 4 volume data server systems, 2 metadata servers and 2 RAID enclosures. The RAID enclosures each host 24 spinning disks with a capacity of 8 TB each. Both RAIDs expose two 16 Gb/s fibre channel connections, each connecting to one of the four file servers. There are 2 volumes per RAID setup. The volumes are driven with a RAID-6 configuration. The BeeGFS global parallel file system is used to make 292 TB of data storage capacity available.
     78It is based on spinning disks and composed of 4 volume data server systems, 2 metadata servers and 2 RAID enclosures. The RAID enclosures each host 24 spinning disks with a capacity of 8 TB each. Both RAIDs expose two 16 Gb/s fibre channel connections, each connecting to one of the four file servers. There are 2 volumes per RAID setup. The volumes are driven with a RAID-6 configuration. The BeeGFS global parallel file system is used to make 292 TB of data storage capacity available.
    7579
    7680Here are the specifications of the main hardware components more in detail:
     
    8286   * 2 x 240 GB SSD
    8387   * (additional 2 x 480 GB SSD in `dp-fs[01-02]` for metadata)
    84    * network: 40 Gb Ethernet (to be converted to IB EDR)
     88   * network: IB EDR (100 Gb/s)
    8589 * SSSM [2 EUROstor ES-6600 RAID enclosures]: `dp-raid[01-02]`:
    8690   * 24 x 8 TB SAS Nearline
    87    * 2 x 16 Gbit FC connector
     91   * 2 x 16 Gb FC connector
    8892}}}
    8993{{{#!td
     
    9296
    9397=== All Flash Storage Module ===
    94 It is based on PCIe3 NVMe SSD storage devices. It is composed of 6 volume data server systems and 2 metadata servers interconnected with a 100 Gbps EDR-!InfiniBand fabric. The AFSM is integrated into the DEEP-EST Prototype EDR fabric topology of the CM and ESB EDR partition. The BeeGFS global parallel file system is used to make 1.3 PB of data storage capacity available.
     98It is based on PCIe3 NVMe SSD storage devices. It is composed of 6 volume data server systems and 2 metadata servers interconnected with a 100 Gbps EDR-!InfiniBand fabric. The BeeGFS global parallel file system is used to make 1.3 PB of data storage capacity available.
    9599
    96100Here are the specifications of the main hardware components more in detail:
     
    122126network overview to be updated once all IB solution is in place
    123127}}}
    124 [[Image(DEEP-EST_Prototype_Network_Overview.png, width=850px, align=center)]]
     128[[Image(IB_non-blocking_fat_tree.png, width=850px, align=center)]]
    125129
    126130