Changes between Version 24 and Version 25 of Public/ParaStationMPI


Ignore:
Timestamp:
Aug 11, 2020, 12:00:23 PM (5 years ago)
Author:
Carsten Clauß
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • TabularUnified Public/ParaStationMPI

    v24 v25  
    1616}}}
    1717The optimal chunk size is highly dependent on the communication pattern and therefore has to be chosen for each application individually.
     18
     19== Reporting of Statistical Information ==
     20
     21The recently installed !ParaStation MPI version 5.4.7-1 offers the possibility to collect statistical information and to print a respective report on the number of messages and the distribution over their length at the end of an MPI run.
     22This new feature is currently enabled on DEEP-EST for the psmpi installation in the `Devel-2019a` stage:
     23{{{
     24> module use $OTHERSTAGES
     25> module load Stages/Devel-2019a
     26> module load GCC/8.3.0
     27> module load ParaStationMPI/5.4.7-1
     28}}}
     29
     30For activating this feature for an MPI run, the `PSP_HISTOGRAM=1` environment variable has to be set:
     31{{{
     32> PSP_HISTOGRAM=1 srun --gw_num=1 -A deep --partition=dp-cn -N2 -n2 ./IMB-MPI1 Bcast -npmin 4 : --partition=dp-dam-ext -N2 -n2 ./IMB-MPI1 Bcast -npmin 4
     33
     34srun: psgw: requesting 1 gateway nodes
     35srun: job 101384 queued and waiting for resources
     36srun: job 101384 has been allocated resources
     37#------------------------------------------------------------
     38#    Intel(R) MPI Benchmarks 2019 Update 5, MPI-1 part
     39#------------------------------------------------------------
     40
     41...
     42
     43#----------------------------------------------------------------
     44# Benchmarking Bcast
     45# #processes = 4
     46#----------------------------------------------------------------
     47       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
     48            0         1000         0.03         0.04         0.04
     49            1         1000         0.81         6.70         2.81
     50            2         1000         0.86         6.79         2.69
     51            4         1000         0.84         6.79         2.69
     52            8         1000         0.86         6.80         2.72
     53           16         1000         0.85         6.76         2.68
     54           32         1000         0.87         6.88         2.67
     55           64         1000         0.95         7.43         3.38
     56          128         1000         0.98         7.02         3.18
     57          256         1000         0.91         8.11         3.68
     58          512         1000         0.91        10.46         4.80
     59         1024         1000         1.01        11.13         5.59
     60         2048         1000         1.07        11.91         6.12
     61         4096         1000         1.35        12.77         6.78
     62         8192         1000         1.77        14.81         8.23
     63        16384         1000         3.24        18.66        11.19
     64        32768         1000         4.93        25.96        16.14
     65        65536          640        30.06        38.71        34.03
     66       131072          320        44.85        60.80        52.53
     67       262144          160        66.28       100.63        83.20
     68       524288           80       109.16       180.59       144.57
     69      1048576           40       199.61       343.00       271.12
     70      2097152           20       377.66       666.27       521.72
     71      4194304           10       736.83      1314.28      1025.35
     72
     73
     74# All processes entering MPI_Finalize
     75
     76     bin  freq
     77      64  353913
     78     128  6303
     79     256  6303
     80     512  6303
     81    1024  6311
     82    2048  6303
     83    4096  6303
     84    8192  6303
     85   16384  6303
     86   32768  6303
     87   65536  4035
     88  131072  2019
     89  262144  1011
     90  524288  507
     91 1048576  255
     92 2097152  129
     93 4194304  66
     94 8388608  0
     9516777216  0
     9633554432  0
     9767108864  0
     98}}}
     99
     100As one can see, the messages being exchanged between all processes of the run are sorted into ''bins'' according to their message lengths.
     101The number of bins as well as their limits can be adjusted by the following variables:
     102
     103 * `PSP_HISTOGRAM_MIN` (default: 64 bytes) Set the lower limit regarding the message size for controlling the number of bins of the histogram.
     104 * `PSP_HISTOGRAM_MAX` (default: 64 MByte) Set the upper limit regarding the message size for controlling the number of bins of the histogram.
     105 * `PSP_HISTOGRAM_SHIFT` (default: 1 bit position) Set the bit shift regarding the step width for controlling the number of bins of the histogram.
     106
     107Example:
     108{{{
     109> PSP_HISTOGRAM=1 PSP_HISTOGRAM_SHIFT=2 PSP_HISTOGRAM_MAX=4096 srun --gw_num=1 -A deep --partition=dp-cn -N2 -n2 ./IMB-MPI1 Barrier -npmin 4 : --partition=dp-dam-ext -N2 -n2 ./IMB-MPI1 Barrier -npmin 4
     110
     111...
     112
     113#---------------------------------------------------
     114# Benchmarking Barrier
     115# #processes = 4
     116#---------------------------------------------------
     117 #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
     118         1000         5.02         5.02         5.02
     119
     120
     121# All processes entering MPI_Finalize
     122
     123 bin  freq
     124  64  16942
     125 256  0
     1261024  8
     1274096  0
     128}}}
     129
     130In this example, 16942 messages were smaller than or equal to 64 Byte of MPI payload, while 8 messages were greater than 256 Byte but smaller than or equal to 1024 Byte.
     131
     132Please note at this point that all messages larger than `PSP_HISTOGRAM_MAX` are as well counted and always fall into the last bin.
     133Therefore, in this example, no message of the whole run was larger than 1024 Byte, because the last bin, labeled with 4096 but collecting all messages larger than 1024, is empty.
    18134
    19135