| 18 | |
| 19 | == Reporting of Statistical Information == |
| 20 | |
| 21 | The recently installed !ParaStation MPI version 5.4.7-1 offers the possibility to collect statistical information and to print a respective report on the number of messages and the distribution over their length at the end of an MPI run. |
| 22 | This new feature is currently enabled on DEEP-EST for the psmpi installation in the `Devel-2019a` stage: |
| 23 | {{{ |
| 24 | > module use $OTHERSTAGES |
| 25 | > module load Stages/Devel-2019a |
| 26 | > module load GCC/8.3.0 |
| 27 | > module load ParaStationMPI/5.4.7-1 |
| 28 | }}} |
| 29 | |
| 30 | For activating this feature for an MPI run, the `PSP_HISTOGRAM=1` environment variable has to be set: |
| 31 | {{{ |
| 32 | > PSP_HISTOGRAM=1 srun --gw_num=1 -A deep --partition=dp-cn -N2 -n2 ./IMB-MPI1 Bcast -npmin 4 : --partition=dp-dam-ext -N2 -n2 ./IMB-MPI1 Bcast -npmin 4 |
| 33 | |
| 34 | srun: psgw: requesting 1 gateway nodes |
| 35 | srun: job 101384 queued and waiting for resources |
| 36 | srun: job 101384 has been allocated resources |
| 37 | #------------------------------------------------------------ |
| 38 | # Intel(R) MPI Benchmarks 2019 Update 5, MPI-1 part |
| 39 | #------------------------------------------------------------ |
| 40 | |
| 41 | ... |
| 42 | |
| 43 | #---------------------------------------------------------------- |
| 44 | # Benchmarking Bcast |
| 45 | # #processes = 4 |
| 46 | #---------------------------------------------------------------- |
| 47 | #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] |
| 48 | 0 1000 0.03 0.04 0.04 |
| 49 | 1 1000 0.81 6.70 2.81 |
| 50 | 2 1000 0.86 6.79 2.69 |
| 51 | 4 1000 0.84 6.79 2.69 |
| 52 | 8 1000 0.86 6.80 2.72 |
| 53 | 16 1000 0.85 6.76 2.68 |
| 54 | 32 1000 0.87 6.88 2.67 |
| 55 | 64 1000 0.95 7.43 3.38 |
| 56 | 128 1000 0.98 7.02 3.18 |
| 57 | 256 1000 0.91 8.11 3.68 |
| 58 | 512 1000 0.91 10.46 4.80 |
| 59 | 1024 1000 1.01 11.13 5.59 |
| 60 | 2048 1000 1.07 11.91 6.12 |
| 61 | 4096 1000 1.35 12.77 6.78 |
| 62 | 8192 1000 1.77 14.81 8.23 |
| 63 | 16384 1000 3.24 18.66 11.19 |
| 64 | 32768 1000 4.93 25.96 16.14 |
| 65 | 65536 640 30.06 38.71 34.03 |
| 66 | 131072 320 44.85 60.80 52.53 |
| 67 | 262144 160 66.28 100.63 83.20 |
| 68 | 524288 80 109.16 180.59 144.57 |
| 69 | 1048576 40 199.61 343.00 271.12 |
| 70 | 2097152 20 377.66 666.27 521.72 |
| 71 | 4194304 10 736.83 1314.28 1025.35 |
| 72 | |
| 73 | |
| 74 | # All processes entering MPI_Finalize |
| 75 | |
| 76 | bin freq |
| 77 | 64 353913 |
| 78 | 128 6303 |
| 79 | 256 6303 |
| 80 | 512 6303 |
| 81 | 1024 6311 |
| 82 | 2048 6303 |
| 83 | 4096 6303 |
| 84 | 8192 6303 |
| 85 | 16384 6303 |
| 86 | 32768 6303 |
| 87 | 65536 4035 |
| 88 | 131072 2019 |
| 89 | 262144 1011 |
| 90 | 524288 507 |
| 91 | 1048576 255 |
| 92 | 2097152 129 |
| 93 | 4194304 66 |
| 94 | 8388608 0 |
| 95 | 16777216 0 |
| 96 | 33554432 0 |
| 97 | 67108864 0 |
| 98 | }}} |
| 99 | |
| 100 | As one can see, the messages being exchanged between all processes of the run are sorted into ''bins'' according to their message lengths. |
| 101 | The number of bins as well as their limits can be adjusted by the following variables: |
| 102 | |
| 103 | * `PSP_HISTOGRAM_MIN` (default: 64 bytes) Set the lower limit regarding the message size for controlling the number of bins of the histogram. |
| 104 | * `PSP_HISTOGRAM_MAX` (default: 64 MByte) Set the upper limit regarding the message size for controlling the number of bins of the histogram. |
| 105 | * `PSP_HISTOGRAM_SHIFT` (default: 1 bit position) Set the bit shift regarding the step width for controlling the number of bins of the histogram. |
| 106 | |
| 107 | Example: |
| 108 | {{{ |
| 109 | > PSP_HISTOGRAM=1 PSP_HISTOGRAM_SHIFT=2 PSP_HISTOGRAM_MAX=4096 srun --gw_num=1 -A deep --partition=dp-cn -N2 -n2 ./IMB-MPI1 Barrier -npmin 4 : --partition=dp-dam-ext -N2 -n2 ./IMB-MPI1 Barrier -npmin 4 |
| 110 | |
| 111 | ... |
| 112 | |
| 113 | #--------------------------------------------------- |
| 114 | # Benchmarking Barrier |
| 115 | # #processes = 4 |
| 116 | #--------------------------------------------------- |
| 117 | #repetitions t_min[usec] t_max[usec] t_avg[usec] |
| 118 | 1000 5.02 5.02 5.02 |
| 119 | |
| 120 | |
| 121 | # All processes entering MPI_Finalize |
| 122 | |
| 123 | bin freq |
| 124 | 64 16942 |
| 125 | 256 0 |
| 126 | 1024 8 |
| 127 | 4096 0 |
| 128 | }}} |
| 129 | |
| 130 | In this example, 16942 messages were smaller than or equal to 64 Byte of MPI payload, while 8 messages were greater than 256 Byte but smaller than or equal to 1024 Byte. |
| 131 | |
| 132 | Please note at this point that all messages larger than `PSP_HISTOGRAM_MAX` are as well counted and always fall into the last bin. |
| 133 | Therefore, in this example, no message of the whole run was larger than 1024 Byte, because the last bin, labeled with 4096 but collecting all messages larger than 1024, is empty. |