Context Navigation

Changes between Version 7 and Version 8 of Public/Energy

Timestamp:: Oct 28, 2020, 9:55:20 AM (5 years ago)
Author:: Alessio Netti
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

Public/Energy

-                      v7
+                      v8
 ==== Using dcdbquery for Job Queries
 DCDB collects a series of aggregated metrics for each job running on the system, every 10s. These can be queried using the same {{{dcdbquery}}} syntax as above. The sensor name, however, should be constructed as follows:
+DCDB collects a series of aggregated metrics for each job running on the system, every 30s. These can be queried using the same {{{dcdbquery}}} syntax as above. The sensor name, however, should be constructed as follows:
 {{{
 …
 }}}
+The {{{<JOBID>}}} field identifies the job to be queried. {{{<SENSORNAME>}}} identifies instead the aggregated metric and can be one of the following:
+The {{{<JOBID>}}} field identifies the job to be queried. This matches with the job IDs reported by SLURM, and follows the same format of the latter for job packs and job arrays. For example, the following are valid queries:
+{{{
+Job Pack Example:  /job122444+1/pkg-energy.avg
+Job Array Example: /job122999_15/pkg-energy.avg
+}}}
+In the job pack example above, data is queried for a specific sub-job in the pack. However, aggregated sensor data also exists for the pack as a whole, thus including measurements from multiple modules or allocations on the DEEP-EST prototype. This aggregated data can be queried using the base job ID of the pack, omitting the {{{+}}} notation. No high-level aggregation is performed across the different jobs of a job array.
+The {{{<SENSORNAME>}}} field identifies instead the aggregated metric and can be one of the following:
 * energy
 …
 Finally, the {{{<STATNAME>}}} field identifies the actual type of aggregation performed from the readings of the queried sensor, by combining the data of all compute nodes on which the job was running. It can be one of the following:
+* .avg (average)
+* .med (median)
+* .sum (sum)
+In addition, for jobs that are running on multiple modules of the prototype, per-module aggregated metrics are also available. In this case, the sensor name is built as follows:
+{{{
+Prototype: /job<JOBID>/<MODULE>/<SENSORNAME><STATNAME>
+Example:   /job1001/esb/instructions.med
+}}}
+The {{{<MODULE>}}} field identifies the prototype module to query, and can be one of the following:
+* cn (Cluster Module)
+* esb (Extreme-Scale Booster)
+* dam (Data Analytics Module)
+* .sum (sum of all 10s measurements in a certain 30s time window)
+* .avg (average of all 10s measurements in a certain 30s time window)
+* .med (median of all 10s measurements in a certain 30s time window)
+In order to get a cumulative measure of a job's performance (e.g., total energy spent or total amount of instructions computed), {{{.sum}}} should be used. Moreover, for a job to be measured by DCDB, its duration has to be longer than 30s.
 ==== Long-term Sub-sampled Sensor Data