Changes between Version 25 and Version 26 of Public/ParaStationMPI
- Timestamp:
- Aug 11, 2020, 12:21:55 PM (5 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Public/ParaStationMPI
v25 v26 17 17 The optimal chunk size is highly dependent on the communication pattern and therefore has to be chosen for each application individually. 18 18 19 19 20 == Reporting of Statistical Information == 20 21 21 The recently installed !ParaStation MPI version 5.4.7-1 offers the possibility to collect statistical information and to print a respective report on the number of messages and the distribution over their length at the end of an MPI run.22 The recently installed **!ParaStation MPI version 5.4.7-1** offers the possibility to collect statistical information and to print a respective report on the number of messages and the distribution over their length at the end of an MPI run. (The so-called psmpi **histogram** feature.) 22 23 This new feature is currently enabled on DEEP-EST for the psmpi installation in the `Devel-2019a` stage: 23 24 {{{ … … 38 39 # Intel(R) MPI Benchmarks 2019 Update 5, MPI-1 part 39 40 #------------------------------------------------------------ 40 41 41 ... 42 42 … … 99 99 100 100 As one can see, the messages being exchanged between all processes of the run are sorted into ''bins'' according to their message lengths. 101 The number of bins as well as their limits can be adjusted by the following variables:101 The number of bins as well as their limits can be adjusted by the following environment variables: 102 102 103 103 * `PSP_HISTOGRAM_MIN` (default: 64 bytes) Set the lower limit regarding the message size for controlling the number of bins of the histogram. … … 108 108 {{{ 109 109 > PSP_HISTOGRAM=1 PSP_HISTOGRAM_SHIFT=2 PSP_HISTOGRAM_MAX=4096 srun --gw_num=1 -A deep --partition=dp-cn -N2 -n2 ./IMB-MPI1 Barrier -npmin 4 : --partition=dp-dam-ext -N2 -n2 ./IMB-MPI1 Barrier -npmin 4 110 111 110 ... 112 111 … … 130 129 In this example, 16942 messages were smaller than or equal to 64 Byte of MPI payload, while 8 messages were greater than 256 Byte but smaller than or equal to 1024 Byte. 131 130 132 Please note at this point that all messages larger than `PSP_HISTOGRAM_MAX` are as well counted and always fall into the last bin.131 Please note at this point that all messages larger than `PSP_HISTOGRAM_MAX` are ''as well counted'' and always fall into the ''last bin''. 133 132 Therefore, in this example, no message of the whole run was larger than 1024 Byte, because the last bin, labeled with 4096 but collecting all messages larger than 1024, is empty. 133 134 135 === Filtering by Connection Type === 136 137 An addition that could make this feature quite useful for statistical analysis in the DEEP-EST project is the fact that the message counters can be filtered by connection types by setting the `PSP_HISTOGRAM_CONTYPE` variable. 138 For example, in the following run, only messages that cross the Gateway are recorded: 139 140 {{{ 141 > PSP_HISTOGRAM_CONTYPE=gw PSP_HISTOGRAM=1 PSP_HISTOGRAM_SHIFT=2 PSP_HISTOGRAM_MAX=4096 srun --gw_num=1 -A deep --partition=dp-cn -N2 -n2 ./IMB-MPI1 Barrier -npmin 4 : --partition=dp-dam-ext -N2 -n2 ./IMB-MPI1 Barrier -npmin 4 142 ... 143 144 #--------------------------------------------------- 145 # Benchmarking Barrier 146 # #processes = 4 147 #--------------------------------------------------- 148 #repetitions t_min[usec] t_max[usec] t_avg[usec] 149 1000 4.96 4.96 4.96 150 151 152 # All processes entering MPI_Finalize 153 154 bin freq (gw) 155 64 12694 156 256 0 157 1024 4 158 4096 0 159 }}} 160 161 Connection types for `PSP_HISTOGRAM_CONTYPE`that might be relevant for DEEP-EST are: 162 * `gw` for messages routed via a Gateway 163 * `openib` for !InfiniBand communication via the pscom4openib plugin 164 * `velo` for Extoll communication via the pscom4velo plugin 165 * `shm` for node-local communication via shared-memory. 166 167 === A note on performance impacts === 168 The collection of statistical data generates a small overhead, which may be reflected in the message latencies in particular. 169 It is therefore recommended to set `PSP_HISTOGRAM=0` for performance benchmarking -- or even better to use another psmpi version and/or installation where this feature is already disabled at compile time. 134 170 135 171