92 | | The {{{<JOBID>}}} field identifies the job to be queried. {{{<SENSORNAME>}}} identifies instead the aggregated metric and can be one of the following: |
| 92 | The {{{<JOBID>}}} field identifies the job to be queried. This matches with the job IDs reported by SLURM, and follows the same format of the latter for job packs and job arrays. For example, the following are valid queries: |
| 93 | |
| 94 | {{{ |
| 95 | Job Pack Example: /job122444+1/pkg-energy.avg |
| 96 | Job Array Example: /job122999_15/pkg-energy.avg |
| 97 | }}} |
| 98 | |
| 99 | In the job pack example above, data is queried for a specific sub-job in the pack. However, aggregated sensor data also exists for the pack as a whole, thus including measurements from multiple modules or allocations on the DEEP-EST prototype. This aggregated data can be queried using the base job ID of the pack, omitting the {{{+}}} notation. No high-level aggregation is performed across the different jobs of a job array. |
| 100 | |
| 101 | The {{{<SENSORNAME>}}} field identifies instead the aggregated metric and can be one of the following: |
113 | | * .avg (average) |
114 | | * .med (median) |
115 | | * .sum (sum) |
116 | | |
117 | | In addition, for jobs that are running on multiple modules of the prototype, per-module aggregated metrics are also available. In this case, the sensor name is built as follows: |
118 | | |
119 | | {{{ |
120 | | Prototype: /job<JOBID>/<MODULE>/<SENSORNAME><STATNAME> |
121 | | Example: /job1001/esb/instructions.med |
122 | | }}} |
123 | | |
124 | | The {{{<MODULE>}}} field identifies the prototype module to query, and can be one of the following: |
125 | | |
126 | | * cn (Cluster Module) |
127 | | * esb (Extreme-Scale Booster) |
128 | | * dam (Data Analytics Module) |
| 122 | * .sum (sum of all 10s measurements in a certain 30s time window) |
| 123 | * .avg (average of all 10s measurements in a certain 30s time window) |
| 124 | * .med (median of all 10s measurements in a certain 30s time window) |
| 125 | |
| 126 | In order to get a cumulative measure of a job's performance (e.g., total energy spent or total amount of instructions computed), {{{.sum}}} should be used. Moreover, for a job to be measured by DCDB, its duration has to be longer than 30s. |