Context Navigation

Changes between Version 5 and Version 6 of Public/Energy

Timestamp:: Sep 25, 2020, 4:39:00 PM (5 years ago)
Author:: Jochen Kreutz
Comment:: Update information on energy measurements

Legend:

: Unmodified
: Added
: Removed
: Modified

Public/Energy

-                      v5
+                      v6
+== Energy Measurement
+For the CM and ESB modules there is a fine grained energy measurement in place using Megwares energy meters attached to the compute nodes of these modules.
+There are different ways to get information about energy consumption for your jobs (and the nodes). Preferred methods are:
+=== SLURM sacctl command
+This is probably the easiest way to get energy consumption for your interactive and batch jobs.  Once your job has finshed, you can use the
+`sacct` command enquire the Slurm database about its energy consumption. For further information and an example
+on how to use the command for energy measurements, please see [wiki:/Public/User_Guide/Batch_system#Informationonpastjobsandaccounting accounting].
+=== DCDB (coming soon)
+The "Datacenter Database" very frequently (every 10 seconds) stores measured values from node and infrastructure sensors including power and
+energy consumption of the compute nodes. This allows for a very fine grained analysis of consumed energy for your jobs, e.g. by specifying
+precise time stamps / ranges and by providing access to measured data from different components like CPU, GPU, memory etc. instead of making
+available an accumulated value only. On the other hand it offers a convenient way for analysis of SLURM jobs.
+The DCDB can be queried from the login node using the DCDB client tools. A user guide that
+gives some more details on the database and explains how to use it for energy measurements will be attached to this page in the next days.
+=== Use /sys files
+The energy meters provide their measured values through the "/sys" filesystem on the nodes using different files.
+To query the overall energy (in Joules) a node has consumed so far, you can use the `energy_j` file.
+You should integrate readings in your SLURM job script before and after you `srun` your commands to measure
+consumed energy by your commands (applications):
+Unit=[Joules]
+**CM Module**
+{{{
+srun sh -c 'if [ $SLURM_LOCALID == 0 ]; then echo ${SLURM_NODEID}: $(cat /sys/devices/platform/sem/energy_j); fi'
+}}}
+**ESB Module**
+There are two energy meters present for the ESB nodes, one for the CPU blade and one for the GPU part:
+{{{
+srun sh -c 'if [ $SLURM_LOCALID == 0 ]; then echo ${SLURM_NODEID}: $(cat /sys/devices/platform/sem.1/energy_j); fi'
+srun sh -c 'if [ $SLURM_LOCALID == 0 ]; then echo ${SLURM_NODEID}: $(cat /sys/devices/platform/sem.2/energy_j); fi'
+}}}
+To get the consumed energy by a multi-node job you have to accumulate all the values which makes it quite cumbersome,
+but for running on single nodes (maybe even in an interactive session) reading out current values directly from the files
+might be quite useful.
+{{{#!comment jk 2020-09-25 original version
 == Energy Measurement
 To print before and after you srun commands in the SLURM script.
 …
 srun -n 1 energymeter_client -m -d 5
 }}}
+}}}