Changes between Version 9 and Version 10 of Public/User_Guide/System_overview


Ignore:
Timestamp:
Nov 14, 2019, 5:59:02 PM (5 years ago)
Author:
Jacopo de Amicis
Comment:

Restructure page to better highlight the modules and their hardware; moved the rack information below the module information.

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/System_overview

    v9 v10  
    11= Overview of our systems =
     2This page is supposed to give a short overview on the available systems from a hardware point of view. All hardware can be reached through a login node via SSH: '''!deep@fz-juelich.de'''.  The login node is implemented as virtual machine hosted by the master nodes (in a failover mode).  Please, see also information about [wiki:Public/User_Guide/Account getting an account] and using the [wiki:Public/User_Guide/Batch_system batch system].
    23
    3 This page is supposed to give a short overview on the available systems from a hardware point of view. All hardware can be reached through a login node via SSH: '''!deep@fz-juelich.de'''.
    4 The login node is implemented as virtual machine hosted by the master nodes (in a failover mode).
    5 Please, see also information about [wiki:Public/User_Guide/Account getting an account] and using the [wiki:Public/User_Guide/Batch_system batch system].
     4== DEEP-EST Modular Supercomputer ==
     5
     6[[Insert Figure]]
     7
     8The DEEP-EST system is a prototype of Modular Supercomputing Architecture (MSA) consisting of the following modules:
     9
     10 * Cluster Module
     11 * Extreme Scale Booster
     12 * Data Analytics Module
     13
     14In addition to the previous compute modules, the Scalable Storage Service Module provides the shared storage infrastructure for the DEEP-EST prototype.
     15
     16The modules are connected together by the Network Federation, composed by different types of interconnects and briefly described below.
     17
     18
     19
     20=== Cluster Module ===
     21It is composed of 50 nodes with the following hardware specifications:
     22
     23{{{#!td
     24Cluster [50 nodes]: `dp-cn[01-50]`
     25 * 2 Intel Xeon 'Skylake' Gold 6146 (12 cores (24 threads), 3.2GHz)
     26 * 192 GB RAM
     27 * 1 x 400GB NVMe SSD (for OS only, not exposed to users)
     28 * network: !InfiniBand EDR (100 Gb/s)
     29}}}
     30{{{#!td
     31[[Insert figure]]
     32}}}
     33
     34=== Extreme Scale Booster ===
     35It is composed of 75 nodes with the following hardware specifications:
     36
     37{{{#!td
     38 * Extreme Scale Booster [75 nodes]: `dp-esb[01-75]`
     39   * 1 x Intel Xeon 'Cascade Lake' Silver 4215 CPU @ 2.50GHz
     40   * 1 x Nvidia V100 Tesla GPU (32 GB HBM2)
     41   * 48 GB RAM
     42   * 1 x ?? GB SSD (for boot and OS)
     43   * network: EXTOLL 100 (Gb/s)
     44}}}
     45{{{#!td
     46[[Insert figure]]
     47}}}
     48
     49**Attention:** the Extreme Scale Booster will become available in January 2020.
     50
     51=== Data Analytics Module ===
     52It is composed of 16 nodes with the following hardware specifications:
     53
     54{{{#!td
     55 * Data Analytics Module [16 nodes]: `dp-dam[01-16]`
     56   * 2 x Intel Xeon 'Cascade Lake' Platinum 8260M CPU @ 2.40GHz
     57   * 1 x Nvidia V100 Tesla GPU (32 GB HBM2)
     58   * 1 x Intel STRATIX10 FPGA (32 GB DDR4)
     59   * 384 GB RAM + 2 or 3 TB non-volatile memory ( 14 nodes with 2, 2 nodes with 3)
     60   * 2 x 1.5 TB Intel Optane SSD
     61   * 1 x 240 GB SSD (for boot and OS)
     62   * network: EXTOLL (100 Gb/s) + 40 Gb Ethernet
     63}}}
     64{{{#!td
     65[[Insert figure]]
     66}}}
     67
     68== Network overview ==
     69Different types of interconnects are in use along with the Gigabit Ethernet connectivity (used for administration and service network) that is available for all the nodes. The following sketch should give a rough overview. Network details will be of particular interest for the storage access. Please also refer to the description of the  [wiki:Public/User_Guide/Filesystems filesystems].
     70
     71[[Image(DEEP-EST_Networks_Schematic_Overview.png, 50%, align=center)]]
     72
     73**Attention:** performance measurements for the Network Federation will be provided in the future.
     74
    675
    776== Rack plan ==
    8 
    977This is a sketch of the available hardware including a short description of the hardware interesting for the system users (the nodes you can use for running your jobs and that can be used for testing).
    1078
    11 [[Image(Prototype_plus_SSSM_and_SDV_Rackplan_47U--2019-07.png, 100%)]]
    12 
     79[[Image(Prototype_plus_SSSM_and_SDV_Rackplan_47U--2019-07.png, 100%, align=center)]]
    1380
    1481{{{#!comment
     82
    1583== miclogin: ==
    16 * knc1:
    17  * 2 Xeon CPUs
    18  * 64 GB memory
    19  * 4 KNCs (named knc1-mic![0-3]) with 61 cores and 16 GB each
    20 * knc2:
    21  * 2 Xeon CPUs
    22  * 64 GB memory
    23  * 2 KNCs (named knc2-mic![0-1]) with 57 cores and 6 GB each
     84 * knc1:
     85   * 2 Xeon CPUs
     86   * 64 GB memory
     87   * 4 KNCs (named knc1-mic![0-3]) with 61 cores and 16 GB each
     88 * knc2:
     89   * 2 Xeon CPUs
     90   * 64 GB memory
     91   * 2 KNCs (named knc2-mic![0-1]) with 57 cores and 6 GB each
    2492
    2593The DEEP cluster has been removed permanently!
     94
    2695== DEEP: ==
    27 * Cluster:
    28  * 2 Xeon CPUs per node
    29  * 32 GB memory per node
    30  * 128 nodes
    31 * Booster:
    32  * 2 KNCs per BNC (Booster Node Card)
    33  * 16 GB per KNC
    34  * 192 BNCs
     96 * Cluster:
     97   * 2 Xeon CPUs per node
     98   * 32 GB memory per node
     99   * 128 nodes
     100 * Booster:
     101   * 2 KNCs per BNC (Booster Node Card)
     102   * 16 GB per KNC
     103   * 192 BNCs
     104
    35105}}}
    36106
    37107=== SSSM rack ===
    38 
    39108This rack hosts the master nodes, files servers and the storage as well as network components for the Gigabit Ethernet administration and service networks. Users can access the login node via '''!deep@fz-juelich.de''' (implemented as virtual machine running on the master nodes).
    40109
    41110=== CM rack ===
    42 
    43111Contains the hardware of the DEEP-EST Cluster Module including compute nodes, management nodes, network components and liquid cooling unit.
    44112
    45 * Cluster [50 nodes]: `dp-cn[01-50]`
    46  * 2 Intel Xeon 'Skylake' Gold 6146 (12 cores (24 threads), 3.2GHz)
    47  * 192 GB RAM
    48  * 1 x 400GB NVMe SSD (for OS only, not exposed to users)
    49  * network: IB EDR (100 Gb/s)
     113=== DAM rack ===
     114This rack hosts the nodes of the Data Analytics Module of the DEEP-EST prototype.
    50115
    51 === DAM rack ===
     116=== SDV rack ===
     117Along with the prototype systems serveral test nodes and so called software development vehicles have been installed in the scope of the DEEP(-ER,EST) projects. These are located in the SDV rack. The following components can be accessed by the users:
    52118
    53 This rack hosts the nodes of the Data Analytics Module of the DEEP-EST prototype. Currently it contains four test servers:
     119 * Prototype DAM [4 nodes]: `protodam[01-04]`
     120   * 2 x Intel Xeon 'Skylake' (26 cores per socket)
     121   * 192 GB RAM
     122   * network: Gigabit Ethernet
    54123
    55 * Data Analytics Module [16 nodes]: `dp-dam[01-16]`
    56  * 2 x Intel Xeon 'Cascade Lake' Platinum 8260M CPU @ 2.40GHz
    57  * 1 x Nvidia V100 Tesla GPU (32 GB HBM2)
    58  * 1 x Intel STRATIX10 FPGA (32 GB DDR4)
    59  * 384 GB RAM + 2 or 3 TB non-volatile memory ( 14 nodes with 2, 2 nodes with 3)
    60  * 2 x 1.5 TB Intel Optane SSD
    61  * 1 x 240 GB SSD (for boot and OS)
    62  * network: EXTOLL + 40 Gb Ethernet
     124 * Old DEEP-ER Cluster Module SDV [16 nodes]: `deeper-sdv[01-16]`
     125   * 2 Intel Xeon 'Haswell' E5-v2680 v3 (2.5 GHz)
     126   * 128 GB RAM
     127   * 1 NVMe with 400 GB per node( accessible through BeeGFS on demand)
     128   * network: 100 Gb/s Extoll tourmalet
     129
     130 * KNLs [4 nodes]: `knl[01,04-06]`
     131   * 1 Intel Xeon Phi (64-68 cores)
     132   * 1 NVMe with 400 GB per node (accessible through BeeGFS on demand)
     133   * 16 GB MCDRAM plus 96 GB RAM per KNL
     134   * network: Gigabit Ethernet
     135
     136{{{#!comment have been removed meanwhile
     137
     138 * KNMs [2 nodes]: `knm[01-02]`
     139   * 1 Intel Xeon Phi - Knight Mill (72 cores)
     140   * 16 GB MCDRAM plus 96 GB RAM per KNL
     141   * network: Gigabit Ethernet
     142
     143}}}
     144
     145 * GPU nodes for Machine Learning [3 nodes]: `ml-gpu[01-03]`
     146   * 2 x Intel Xeon 'Skylake' Silver 4112 (2.6 GHz)
     147   * 192 GB RAM
     148   * 4 x Nvidia Tesla V100 GPU (PCIe Gen3), 16 GB HBM2
     149   * network: 40GbE connection
     150
     151 * Old DEEP-ER NAM SDV:
     152   * size: 2 GB
     153   * network: Extoll
     154   * details: https://www.deep-projects.eu/hardware/memory-hierarchies/49-nam
     155
     156{{{#!comment Not available anymore
     157
     158=== FPGA test server ===
     159In addition to the seven racks hosting the SDV and prototype hardware there is an FPGA workstation available for testing. Please, get in contact to j.kreutz@fz-juelich.de if you would like to get access.
     160
     161 * FPGA [1 node]: `fpga01`
     162   * 2 x Intel CPU (8 cores)
     163   * 64 GB RAM
     164   * 1 x Intel Arria 10 PAC
     165}}}
    63166
    64167
    65 === SDV rack ===
     168= Further information =
     169 * [wiki:Public/User_Guide/Batch_system Information about the batchsystem]
     170 * [wiki:Public/User_Guide/Filesystems Filesystems]
     171 * [wiki:Public/User_Guide/Information_on_software Information on available software and tools]
    66172
    67 Along with the prototype systems serveral test nodes and so called software development vehicles have been installed in the scope of the DEEP(-ER,EST) projects. These are located in the SDV rack. The following components can be accessed by the users:
     173{{{#!comment
    68174
     175 * [wiki:Public/User_Guide/Cluster Use the Cluster] outdated
     176 * [wiki:Public/User_Guide/Booster Use the Booster] outdated
    69177
    70 * Prototype DAM [4 nodes]: `protodam[01-04]`
    71  * 2 x Intel Xeon 'Skylake' (26 cores per socket)
    72  * 192 GB RAM
    73  * network: Gigabit Ethernet
    74 
    75 * Cluster [16 nodes]: `deeper-sdv[01-16]`
    76  * 2 Intel Xeon 'Haswell' E5-v2680 v3 (2.5 GHz)
    77  * 128 GB RAM
    78  * 1 NVMe with 400 GB per node( accessible through BeeGFS on demand)
    79  * network: 100 Gb/s Extoll tourmalet
    80 
    81 * KNLs [4 nodes]: `knl[01,04-06]`
    82  * 1 Intel Xeon Phi (64-68 cores)
    83  * 1 NVMe with 400 GB per node (accessible through BeeGFS on demand)
    84  * 16 GB MCDRAM plus 96 GB RAM per KNL
    85  * network: Gigabit Ethernet
    86 
    87 {{{#!comment have been removed meanwhile
    88 * KNMs [2 nodes]: `knm[01-02]`
    89  * 1 Intel Xeon Phi - Knight Mill (72 cores)
    90  * 16 GB MCDRAM plus 96 GB RAM per KNL
    91  * network: Gigabit Ethernet
    92178}}}
    93179
    94 * GPU nodes for ML [3 nodes]: `ml-gpu[01-03]`
    95  * 2 x Intel Xeon 'Skylake' Silver 4112 (2.6 GHz)
    96  * 192 GB RAM
    97  * 4 x Nvidia Tesla V100 GPU (PCIe Gen3), 16 GB HBM2
    98  * network: 40GbE connection
    99 
    100 * NAM:
    101  * size: 2 GB
    102  * network: Extoll
    103  * details: https://www.deep-projects.eu/hardware/memory-hierarchies/49-nam
    104 
    105 {{{#!comment Not available anymore
    106 === FPGA test server ===
    107 
    108 In addition to the seven racks hosting the SDV and prototype hardware there is an FPGA workstation available for testing. Please, get in contact to j.kreutz@fz-juelich.de if you would like to get access.
    109 
    110 * FPGA [1 node]: `fpga01`
    111  * 2 x Intel CPU (8 cores)
    112  * 64 GB RAM
    113  * 1 x Intel Arria 10 PAC
    114  }}}
    115  
    116 == Network overview ==
    117 
    118 Different types of interconnects are in use along with the Gigabit Ethernet connectivity (used for administration and service network) that is available for all the nodes.
    119 The following sketch should give a rough overview. Network details will be of particular interest for the storage access. Please also refer to the description of the
    120 [wiki:Public/User_Guide/Filesystems filesystems].
    121 
    122 [[Image(DEEP-EST_Networks_Schematic_Overview.png, 50%)]]
    123 
    124 = Further information =
    125 * [wiki:Public/User_Guide/Batch_system Information about the batchsystem]
    126 * [wiki:Public/User_Guide/Filesystems Filesystems]
    127 * [wiki:Public/User_Guide/Information_on_software Information on available software and tools]
    128 {{{#!comment
    129 * [wiki:Public/User_Guide/Cluster Use the Cluster] outdated
    130 * [wiki:Public/User_Guide/Booster Use the Booster] outdated
    131 }}}
    132 * [wiki:Public/User_Guide/DEEP-EST_CM Use the DEEP-EST Cluster Module]
    133 * [wiki:Public/User_Guide/SDV_Cluster Use the SDV Cluster]
    134 * [wiki:Public/User_Guide/SDV_KNLs Use the SDV KNLs]
     180 * [wiki:Public/User_Guide/DEEP-EST_CM Use the DEEP-EST Cluster Module]
     181 * [wiki:Public/User_Guide/DEEP-EST_DAM Use the DEEP-EST Data Analytics Module]
     182 * [wiki:Public/User_Guide/SDV_Cluster Use the SDV Cluster]
     183 * [wiki:Public/User_Guide/SDV_KNLs Use the SDV KNLs]