Changes between Version 7 and Version 8 of Public/Energy


Ignore:
Timestamp:
Oct 28, 2020, 9:55:20 AM (4 years ago)
Author:
Alessio Netti
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/Energy

    v7 v8  
    8383==== Using dcdbquery for Job Queries
    8484
    85 DCDB collects a series of aggregated metrics for each job running on the system, every 10s. These can be queried using the same {{{dcdbquery}}} syntax as above. The sensor name, however, should be constructed as follows:
     85DCDB collects a series of aggregated metrics for each job running on the system, every 30s. These can be queried using the same {{{dcdbquery}}} syntax as above. The sensor name, however, should be constructed as follows:
    8686
    8787{{{
     
    9090}}}
    9191
    92 The {{{<JOBID>}}} field identifies the job to be queried. {{{<SENSORNAME>}}} identifies instead the aggregated metric and can be one of the following:
     92The {{{<JOBID>}}} field identifies the job to be queried. This matches with the job IDs reported by SLURM, and follows the same format of the latter for job packs and job arrays. For example, the following are valid queries:
     93
     94{{{
     95Job Pack Example:  /job122444+1/pkg-energy.avg
     96Job Array Example: /job122999_15/pkg-energy.avg
     97}}}
     98
     99In the job pack example above, data is queried for a specific sub-job in the pack. However, aggregated sensor data also exists for the pack as a whole, thus including measurements from multiple modules or allocations on the DEEP-EST prototype. This aggregated data can be queried using the base job ID of the pack, omitting the {{{+}}} notation. No high-level aggregation is performed across the different jobs of a job array.
     100
     101The {{{<SENSORNAME>}}} field identifies instead the aggregated metric and can be one of the following:
    93102
    94103* energy
     
    111120Finally, the {{{<STATNAME>}}} field identifies the actual type of aggregation performed from the readings of the queried sensor, by combining the data of all compute nodes on which the job was running. It can be one of the following:
    112121
    113 * .avg (average)
    114 * .med (median)
    115 * .sum (sum)
    116 
    117 In addition, for jobs that are running on multiple modules of the prototype, per-module aggregated metrics are also available. In this case, the sensor name is built as follows:
    118 
    119 {{{
    120 Prototype: /job<JOBID>/<MODULE>/<SENSORNAME><STATNAME>
    121 Example:   /job1001/esb/instructions.med
    122 }}}
    123 
    124 The {{{<MODULE>}}} field identifies the prototype module to query, and can be one of the following:
    125 
    126 * cn (Cluster Module)
    127 * esb (Extreme-Scale Booster)
    128 * dam (Data Analytics Module)
     122* .sum (sum of all 10s measurements in a certain 30s time window)
     123* .avg (average of all 10s measurements in a certain 30s time window)
     124* .med (median of all 10s measurements in a certain 30s time window)
     125
     126In order to get a cumulative measure of a job's performance (e.g., total energy spent or total amount of instructions computed), {{{.sum}}} should be used. Moreover, for a job to be measured by DCDB, its duration has to be longer than 30s.
    129127
    130128==== Long-term Sub-sampled Sensor Data