Changes between Version 44 and Version 45 of Public/User_Guide/System_overview
- Timestamp:
- Jan 6, 2022, 12:50:09 PM (3 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Public/User_Guide/System_overview
v44 v45 2 2 3 3 = System overview = 4 5 6 {{{#!comment 7 [[span(style=color: #FF0000, **Last Update:** )]] 2022-01-06 8 }}} 9 4 10 This page is supposed to give a short overview on the available systems from a hardware point of view. All hardware can be reached through a login node via SSH: '''!deep@fz-juelich.de'''. The login node is implemented as virtual machine hosted by the master nodes (in a failover mode). Please, see also information about [wiki:Public/User_Guide/Account getting an account] and using the [wiki:Public/User_Guide/Batch_system batch system]. 5 11 … … 14 20 15 21 In addition to the three compute modules, a Scalable Storage Service Module (SSSM) provides shared storage infrastructure for the DEEP-EST prototype (`/usr/local`) and is accompanied by the All Flash Storage Module (AFSM) leveraging a fast local work filesystem (`/afsm`) on the compute nodes. 16 All modules are connected via a 100 Gbp/s EDR IB network in a non-blocking tree topology . In addition the system is connected to the Jülich storage system (JUST) to share home and project file systems with other HPC systems hosted at Jülich Supercompting Centre (JSC).22 All modules are connected via a 100 Gbp/s EDR IB network in a non-blocking tree topology accompanied by a Gigabit Ethernet service network. In addition the system is connected to the Jülich storage system (JUST) to share home and project file systems with other HPC systems hosted at Jülich Supercompting Centre (JSC). 17 23 18 24 === Cluster Module === … … 41 47 * 48 GB RAM 42 48 * 1 x 512 GB SSD 43 * network: IB EDR 100 (Gb/s) (nodes `dp-esb[01-25]` to be converted from Extoll to IB EDR)49 * network: IB EDR (100 Gb/s) 44 50 45 51 }}} … … 48 54 }}} 49 55 50 {{{#!comment51 [[span(style=color: #FF0000, **Attention:** )]] the Extreme Scale Booster will become available in March 2020.52 }}}53 56 54 57 === Data Analytics Module === … … 59 62 * Data Analytics Module [16 nodes]: `dp-dam[01-16]` 60 63 * 2 x Intel Xeon 'Cascade Lake' Platinum 8260M CPU @ 2.40GHz 61 * 1 x Nvidia V100 Tesla GPU (32 GB HBM2) 62 * 1 x Intel STRATIX10 FPGA (32 GB DDR4) 63 * 384 GB RAM + 3 TB non-volatile memory ( 14 nodes with 2, 2 nodes with 3) 64 * dp-dam[01-08]: 1 x Nvidia V100 Tesla GPU (32 GB HBM2) 65 * dp-dam[09-12]: 2 x Nvidia V100 Tesla GPU (32 GB HBM2) 66 * dp-dam[13-16]: 2 x Intel STRATIX10 FPGA (32 GB DDR4) 67 * 384 GB RAM + 3 TB non-volatile memory 64 68 * 2 x 1.5 TB Intel Optane SSD (1 for local scratch, 1 for BeeOND) 65 69 * 1 x 240 GB SSD (for boot and OS) 66 * network: EXTOLL (100 Gb/s) + 40 Gb Ethernet (to be converted to IB EDR)70 * network: IB EDR (100 Gb/s) 67 71 68 72 }}} … … 72 76 73 77 === Scalable Storage Service Module === 74 It is based on spinning disks . It iscomposed of 4 volume data server systems, 2 metadata servers and 2 RAID enclosures. The RAID enclosures each host 24 spinning disks with a capacity of 8 TB each. Both RAIDs expose two 16 Gb/s fibre channel connections, each connecting to one of the four file servers. There are 2 volumes per RAID setup. The volumes are driven with a RAID-6 configuration. The BeeGFS global parallel file system is used to make 292 TB of data storage capacity available.78 It is based on spinning disks and composed of 4 volume data server systems, 2 metadata servers and 2 RAID enclosures. The RAID enclosures each host 24 spinning disks with a capacity of 8 TB each. Both RAIDs expose two 16 Gb/s fibre channel connections, each connecting to one of the four file servers. There are 2 volumes per RAID setup. The volumes are driven with a RAID-6 configuration. The BeeGFS global parallel file system is used to make 292 TB of data storage capacity available. 75 79 76 80 Here are the specifications of the main hardware components more in detail: … … 82 86 * 2 x 240 GB SSD 83 87 * (additional 2 x 480 GB SSD in `dp-fs[01-02]` for metadata) 84 * network: 40 Gb Ethernet (to be converted to IB EDR)88 * network: IB EDR (100 Gb/s) 85 89 * SSSM [2 EUROstor ES-6600 RAID enclosures]: `dp-raid[01-02]`: 86 90 * 24 x 8 TB SAS Nearline 87 * 2 x 16 Gb itFC connector91 * 2 x 16 Gb FC connector 88 92 }}} 89 93 {{{#!td … … 92 96 93 97 === All Flash Storage Module === 94 It is based on PCIe3 NVMe SSD storage devices. It is composed of 6 volume data server systems and 2 metadata servers interconnected with a 100 Gbps EDR-!InfiniBand fabric. The AFSM is integrated into the DEEP-EST Prototype EDR fabric topology of the CM and ESB EDR partition. TheBeeGFS global parallel file system is used to make 1.3 PB of data storage capacity available.98 It is based on PCIe3 NVMe SSD storage devices. It is composed of 6 volume data server systems and 2 metadata servers interconnected with a 100 Gbps EDR-!InfiniBand fabric. The BeeGFS global parallel file system is used to make 1.3 PB of data storage capacity available. 95 99 96 100 Here are the specifications of the main hardware components more in detail: … … 122 126 network overview to be updated once all IB solution is in place 123 127 }}} 124 [[Image( DEEP-EST_Prototype_Network_Overview.png, width=850px, align=center)]]128 [[Image(IB_non-blocking_fat_tree.png, width=850px, align=center)]] 125 129 126 130