Changes between Version 5 and Version 6 of Public/Energy


Ignore:
Timestamp:
Sep 25, 2020, 4:39:00 PM (4 years ago)
Author:
Jochen Kreutz
Comment:

Update information on energy measurements

Legend:

Unmodified
Added
Removed
Modified
  • Public/Energy

    v5 v6  
     1== Energy Measurement
    12
     3For the CM and ESB modules there is a fine grained energy measurement in place using Megwares energy meters attached to the compute nodes of these modules.
     4There are different ways to get information about energy consumption for your jobs (and the nodes). Preferred methods are:
    25
     6=== SLURM sacctl command
     7
     8This is probably the easiest way to get energy consumption for your interactive and batch jobs.  Once your job has finshed, you can use the
     9`sacct` command enquire the Slurm database about its energy consumption. For further information and an example
     10on how to use the command for energy measurements, please see [wiki:/Public/User_Guide/Batch_system#Informationonpastjobsandaccounting accounting].
     11
     12=== DCDB (coming soon)
     13
     14The "Datacenter Database" very frequently (every 10 seconds) stores measured values from node and infrastructure sensors including power and
     15energy consumption of the compute nodes. This allows for a very fine grained analysis of consumed energy for your jobs, e.g. by specifying
     16precise time stamps / ranges and by providing access to measured data from different components like CPU, GPU, memory etc. instead of making
     17available an accumulated value only. On the other hand it offers a convenient way for analysis of SLURM jobs.
     18
     19The DCDB can be queried from the login node using the DCDB client tools. A user guide that
     20gives some more details on the database and explains how to use it for energy measurements will be attached to this page in the next days.
     21
     22=== Use /sys files
     23
     24The energy meters provide their measured values through the "/sys" filesystem on the nodes using different files.
     25To query the overall energy (in Joules) a node has consumed so far, you can use the `energy_j` file.
     26You should integrate readings in your SLURM job script before and after you `srun` your commands to measure
     27consumed energy by your commands (applications):
     28 
     29Unit=[Joules]
     30
     31**CM Module**
     32
     33{{{
     34srun sh -c 'if [ $SLURM_LOCALID == 0 ]; then echo ${SLURM_NODEID}: $(cat /sys/devices/platform/sem/energy_j); fi'
     35}}}
     36
     37**ESB Module**
     38
     39There are two energy meters present for the ESB nodes, one for the CPU blade and one for the GPU part:
     40
     41{{{
     42srun sh -c 'if [ $SLURM_LOCALID == 0 ]; then echo ${SLURM_NODEID}: $(cat /sys/devices/platform/sem.1/energy_j); fi'
     43srun sh -c 'if [ $SLURM_LOCALID == 0 ]; then echo ${SLURM_NODEID}: $(cat /sys/devices/platform/sem.2/energy_j); fi'
     44}}}
     45
     46To get the consumed energy by a multi-node job you have to accumulate all the values which makes it quite cumbersome,
     47but for running on single nodes (maybe even in an interactive session) reading out current values directly from the files
     48might be quite useful. 
     49
     50{{{#!comment jk 2020-09-25 original version
    351== Energy Measurement
    452To print before and after you srun commands in the SLURM script.
     
    2674srun -n 1 energymeter_client -m -d 5
    2775}}}
     76
     77}}}