| 83 | ==== Energy and Power Sensors |
| 84 | |
| 85 | DCDB provides several types of energy and power measurements that can be useful for performance analysis. Depending on the data source, these can have different units and scales - this information can be shown by querying a sensor with the {{{dcdbconfig sensor show}}} command. In detail, these are the energy sensors available for compute nodes in the DEEP-EST prototype: |
| 86 | |
| 87 | * **energy** [Joules]: energy consumption of a compute node as a whole. |
| 88 | * **pkg-energy** [!MicroJoules]: energy consumption of a compute node's CPUs. |
| 89 | * **dram-energy** [!MicroJoules]: energy consumption of a compute node's memory components. |
| 90 | * **gpu0/energy** (ESB, DAM) [!MilliJoules]: energy consumption of Nvidia V100 GPUs. |
| 91 | * **gpu0/sysfs-energy** (ESB) [Joules]: alternative sensor for the energy consumption of Nvidia V100 GPUs, not including the energy drawn via PCIe. |
| 92 | * **sysfs-energy** (ESB) [Joules]: energy consumption of an ESB compute node, excluding the GPU. |
| 93 | |
| 94 | These, instead, are the available sensors for estimating power consumption: |
| 95 | |
| 96 | * **power** [!MilliWatts]: power consumption of a compute node as a whole. On DAM nodes this sensor is quantified in Watts. |
| 97 | * **gpu0/power** (ESB, DAM) [!MilliWatts]: power consumption of Nvidia V100 GPUs. |
| 98 | * **gpu0/sysfs-power** (ESB) [!MilliWatts]: alternative sensor for the power consumption of Nvidia V100 GPUs, not including the energy drawn via PCIe. |
| 99 | |
| 100 | In the case of multi-socket systems (i.e. CN and DAM), the pkg and dram energy and power sensors are available both for each single socket (e.g., under {{{socket0/pkg-energy}}}), as well as for the entire compute node. The same aggregation scheme applies for sensors associated with CPU performance counters. Aside from those on compute nodes, many other energy and power sensors are available on the DEEP-EST prototype: these are associated with the system's infrastructure and cooling equipment. |
| 101 | |
103 | | * energy |
104 | | * dram-energy |
105 | | * pkg-energy |
106 | | * MemUsed |
107 | | * instructions |
108 | | * cpu-cycles |
109 | | * cache-misses |
110 | | * cache-misses-l2 |
111 | | * cache-misses-l3 |
112 | | * scalar-double |
113 | | * scalar-double-128 |
114 | | * scalar-double-256 |
115 | | * scalar-double-512 |
116 | | * gpu-energy (DAM, ESB) |
117 | | * ib-portXmitData (CN, ESB) |
118 | | * ib-portRcvData (CN, ESB) |
| 122 | * **energy** [Joules]: energy consumption of nodes, as computed from the respective energy sensors. |
| 123 | * **dram-energy** [!MicroJoules]: energy consumption of the nodes' memory, as computed from the dram-energy sensors. |
| 124 | * **pkg-energy** [!MicroJoules]: energy consumption of the nodes' CPUs, as computed from the pkg-energy sensors. |
| 125 | * **!MemUsed** [!KiloBytes]: amount of used RAM on the compute nodes. |
| 126 | * **instructions**: number of CPU instructions executed on the compute nodes. |
| 127 | * **cpu-cycles**: amount of CPU cycles (affected by frequency scaling) on the compute nodes. |
| 128 | * **cache-misses**: total amount of CPU cache misses on the compute nodes. |
| 129 | * **cache-misses-l2**: amount of CPU L2 cache misses on the compute nodes. |
| 130 | * **cache-misses-l3**: amount of CPU L3 cache misses on the compute nodes. |
| 131 | * **scalar-double**: number of double precision floating point operations on the compute nodes' CPUs. |
| 132 | * **scalar-double-128**: number of double precision floating point operations using 128-bit vectorization. |
| 133 | * **scalar-double-256**: number of double precision floating point operations using 256-bit vectorization. |
| 134 | * **scalar-double-512**: number of double precision floating point operations using 512-bit vectorization. |
| 135 | * **gpu-energy** (DAM, ESB) [!MilliJoules]: energy consumption of the compute nodes' GPUs, as computed from the gpu0/energy sensors. |
| 136 | * **ib-portXmitData** (CN, ESB) [Bytes]: amount of data transmitted over the Infiniband network. |
| 137 | * **ib-portRcvData** (CN, ESB) [Bytes]: amount of data received over the Infiniband network. |