3 | | This page is supposed to give a short overview on the available systems from a hardware point of view. All hardware can be reached through a login node via SSH: '''!deep@fz-juelich.de'''. |
4 | | The login node is implemented as virtual machine hosted by the master nodes (in a failover mode). |
5 | | Please, see also information about [wiki:Public/User_Guide/Account getting an account] and using the [wiki:Public/User_Guide/Batch_system batch system]. |
| 4 | == DEEP-EST Modular Supercomputer == |
| 5 | |
| 6 | [[Insert Figure]] |
| 7 | |
| 8 | The DEEP-EST system is a prototype of Modular Supercomputing Architecture (MSA) consisting of the following modules: |
| 9 | |
| 10 | * Cluster Module |
| 11 | * Extreme Scale Booster |
| 12 | * Data Analytics Module |
| 13 | |
| 14 | In addition to the previous compute modules, the Scalable Storage Service Module provides the shared storage infrastructure for the DEEP-EST prototype. |
| 15 | |
| 16 | The modules are connected together by the Network Federation, composed by different types of interconnects and briefly described below. |
| 17 | |
| 18 | |
| 19 | |
| 20 | === Cluster Module === |
| 21 | It is composed of 50 nodes with the following hardware specifications: |
| 22 | |
| 23 | {{{#!td |
| 24 | Cluster [50 nodes]: `dp-cn[01-50]` |
| 25 | * 2 Intel Xeon 'Skylake' Gold 6146 (12 cores (24 threads), 3.2GHz) |
| 26 | * 192 GB RAM |
| 27 | * 1 x 400GB NVMe SSD (for OS only, not exposed to users) |
| 28 | * network: !InfiniBand EDR (100 Gb/s) |
| 29 | }}} |
| 30 | {{{#!td |
| 31 | [[Insert figure]] |
| 32 | }}} |
| 33 | |
| 34 | === Extreme Scale Booster === |
| 35 | It is composed of 75 nodes with the following hardware specifications: |
| 36 | |
| 37 | {{{#!td |
| 38 | * Extreme Scale Booster [75 nodes]: `dp-esb[01-75]` |
| 39 | * 1 x Intel Xeon 'Cascade Lake' Silver 4215 CPU @ 2.50GHz |
| 40 | * 1 x Nvidia V100 Tesla GPU (32 GB HBM2) |
| 41 | * 48 GB RAM |
| 42 | * 1 x ?? GB SSD (for boot and OS) |
| 43 | * network: EXTOLL 100 (Gb/s) |
| 44 | }}} |
| 45 | {{{#!td |
| 46 | [[Insert figure]] |
| 47 | }}} |
| 48 | |
| 49 | **Attention:** the Extreme Scale Booster will become available in January 2020. |
| 50 | |
| 51 | === Data Analytics Module === |
| 52 | It is composed of 16 nodes with the following hardware specifications: |
| 53 | |
| 54 | {{{#!td |
| 55 | * Data Analytics Module [16 nodes]: `dp-dam[01-16]` |
| 56 | * 2 x Intel Xeon 'Cascade Lake' Platinum 8260M CPU @ 2.40GHz |
| 57 | * 1 x Nvidia V100 Tesla GPU (32 GB HBM2) |
| 58 | * 1 x Intel STRATIX10 FPGA (32 GB DDR4) |
| 59 | * 384 GB RAM + 2 or 3 TB non-volatile memory ( 14 nodes with 2, 2 nodes with 3) |
| 60 | * 2 x 1.5 TB Intel Optane SSD |
| 61 | * 1 x 240 GB SSD (for boot and OS) |
| 62 | * network: EXTOLL (100 Gb/s) + 40 Gb Ethernet |
| 63 | }}} |
| 64 | {{{#!td |
| 65 | [[Insert figure]] |
| 66 | }}} |
| 67 | |
| 68 | == Network overview == |
| 69 | Different types of interconnects are in use along with the Gigabit Ethernet connectivity (used for administration and service network) that is available for all the nodes. The following sketch should give a rough overview. Network details will be of particular interest for the storage access. Please also refer to the description of the [wiki:Public/User_Guide/Filesystems filesystems]. |
| 70 | |
| 71 | [[Image(DEEP-EST_Networks_Schematic_Overview.png, 50%, align=center)]] |
| 72 | |
| 73 | **Attention:** performance measurements for the Network Federation will be provided in the future. |
| 74 | |
55 | | * Data Analytics Module [16 nodes]: `dp-dam[01-16]` |
56 | | * 2 x Intel Xeon 'Cascade Lake' Platinum 8260M CPU @ 2.40GHz |
57 | | * 1 x Nvidia V100 Tesla GPU (32 GB HBM2) |
58 | | * 1 x Intel STRATIX10 FPGA (32 GB DDR4) |
59 | | * 384 GB RAM + 2 or 3 TB non-volatile memory ( 14 nodes with 2, 2 nodes with 3) |
60 | | * 2 x 1.5 TB Intel Optane SSD |
61 | | * 1 x 240 GB SSD (for boot and OS) |
62 | | * network: EXTOLL + 40 Gb Ethernet |
| 124 | * Old DEEP-ER Cluster Module SDV [16 nodes]: `deeper-sdv[01-16]` |
| 125 | * 2 Intel Xeon 'Haswell' E5-v2680 v3 (2.5 GHz) |
| 126 | * 128 GB RAM |
| 127 | * 1 NVMe with 400 GB per node( accessible through BeeGFS on demand) |
| 128 | * network: 100 Gb/s Extoll tourmalet |
| 129 | |
| 130 | * KNLs [4 nodes]: `knl[01,04-06]` |
| 131 | * 1 Intel Xeon Phi (64-68 cores) |
| 132 | * 1 NVMe with 400 GB per node (accessible through BeeGFS on demand) |
| 133 | * 16 GB MCDRAM plus 96 GB RAM per KNL |
| 134 | * network: Gigabit Ethernet |
| 135 | |
| 136 | {{{#!comment have been removed meanwhile |
| 137 | |
| 138 | * KNMs [2 nodes]: `knm[01-02]` |
| 139 | * 1 Intel Xeon Phi - Knight Mill (72 cores) |
| 140 | * 16 GB MCDRAM plus 96 GB RAM per KNL |
| 141 | * network: Gigabit Ethernet |
| 142 | |
| 143 | }}} |
| 144 | |
| 145 | * GPU nodes for Machine Learning [3 nodes]: `ml-gpu[01-03]` |
| 146 | * 2 x Intel Xeon 'Skylake' Silver 4112 (2.6 GHz) |
| 147 | * 192 GB RAM |
| 148 | * 4 x Nvidia Tesla V100 GPU (PCIe Gen3), 16 GB HBM2 |
| 149 | * network: 40GbE connection |
| 150 | |
| 151 | * Old DEEP-ER NAM SDV: |
| 152 | * size: 2 GB |
| 153 | * network: Extoll |
| 154 | * details: https://www.deep-projects.eu/hardware/memory-hierarchies/49-nam |
| 155 | |
| 156 | {{{#!comment Not available anymore |
| 157 | |
| 158 | === FPGA test server === |
| 159 | In addition to the seven racks hosting the SDV and prototype hardware there is an FPGA workstation available for testing. Please, get in contact to j.kreutz@fz-juelich.de if you would like to get access. |
| 160 | |
| 161 | * FPGA [1 node]: `fpga01` |
| 162 | * 2 x Intel CPU (8 cores) |
| 163 | * 64 GB RAM |
| 164 | * 1 x Intel Arria 10 PAC |
| 165 | }}} |
70 | | * Prototype DAM [4 nodes]: `protodam[01-04]` |
71 | | * 2 x Intel Xeon 'Skylake' (26 cores per socket) |
72 | | * 192 GB RAM |
73 | | * network: Gigabit Ethernet |
74 | | |
75 | | * Cluster [16 nodes]: `deeper-sdv[01-16]` |
76 | | * 2 Intel Xeon 'Haswell' E5-v2680 v3 (2.5 GHz) |
77 | | * 128 GB RAM |
78 | | * 1 NVMe with 400 GB per node( accessible through BeeGFS on demand) |
79 | | * network: 100 Gb/s Extoll tourmalet |
80 | | |
81 | | * KNLs [4 nodes]: `knl[01,04-06]` |
82 | | * 1 Intel Xeon Phi (64-68 cores) |
83 | | * 1 NVMe with 400 GB per node (accessible through BeeGFS on demand) |
84 | | * 16 GB MCDRAM plus 96 GB RAM per KNL |
85 | | * network: Gigabit Ethernet |
86 | | |
87 | | {{{#!comment have been removed meanwhile |
88 | | * KNMs [2 nodes]: `knm[01-02]` |
89 | | * 1 Intel Xeon Phi - Knight Mill (72 cores) |
90 | | * 16 GB MCDRAM plus 96 GB RAM per KNL |
91 | | * network: Gigabit Ethernet |
94 | | * GPU nodes for ML [3 nodes]: `ml-gpu[01-03]` |
95 | | * 2 x Intel Xeon 'Skylake' Silver 4112 (2.6 GHz) |
96 | | * 192 GB RAM |
97 | | * 4 x Nvidia Tesla V100 GPU (PCIe Gen3), 16 GB HBM2 |
98 | | * network: 40GbE connection |
99 | | |
100 | | * NAM: |
101 | | * size: 2 GB |
102 | | * network: Extoll |
103 | | * details: https://www.deep-projects.eu/hardware/memory-hierarchies/49-nam |
104 | | |
105 | | {{{#!comment Not available anymore |
106 | | === FPGA test server === |
107 | | |
108 | | In addition to the seven racks hosting the SDV and prototype hardware there is an FPGA workstation available for testing. Please, get in contact to j.kreutz@fz-juelich.de if you would like to get access. |
109 | | |
110 | | * FPGA [1 node]: `fpga01` |
111 | | * 2 x Intel CPU (8 cores) |
112 | | * 64 GB RAM |
113 | | * 1 x Intel Arria 10 PAC |
114 | | }}} |
115 | | |
116 | | == Network overview == |
117 | | |
118 | | Different types of interconnects are in use along with the Gigabit Ethernet connectivity (used for administration and service network) that is available for all the nodes. |
119 | | The following sketch should give a rough overview. Network details will be of particular interest for the storage access. Please also refer to the description of the |
120 | | [wiki:Public/User_Guide/Filesystems filesystems]. |
121 | | |
122 | | [[Image(DEEP-EST_Networks_Schematic_Overview.png, 50%)]] |
123 | | |
124 | | = Further information = |
125 | | * [wiki:Public/User_Guide/Batch_system Information about the batchsystem] |
126 | | * [wiki:Public/User_Guide/Filesystems Filesystems] |
127 | | * [wiki:Public/User_Guide/Information_on_software Information on available software and tools] |
128 | | {{{#!comment |
129 | | * [wiki:Public/User_Guide/Cluster Use the Cluster] outdated |
130 | | * [wiki:Public/User_Guide/Booster Use the Booster] outdated |
131 | | }}} |
132 | | * [wiki:Public/User_Guide/DEEP-EST_CM Use the DEEP-EST Cluster Module] |
133 | | * [wiki:Public/User_Guide/SDV_Cluster Use the SDV Cluster] |
134 | | * [wiki:Public/User_Guide/SDV_KNLs Use the SDV KNLs] |
| 180 | * [wiki:Public/User_Guide/DEEP-EST_CM Use the DEEP-EST Cluster Module] |
| 181 | * [wiki:Public/User_Guide/DEEP-EST_DAM Use the DEEP-EST Data Analytics Module] |
| 182 | * [wiki:Public/User_Guide/SDV_Cluster Use the SDV Cluster] |
| 183 | * [wiki:Public/User_Guide/SDV_KNLs Use the SDV KNLs] |