Changes between Version 11 and Version 12 of Public/User_Guide/TAMPI
- Timestamp:
- Nov 23, 2020, 9:56:17 AM (3 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Public/User_Guide/TAMPI
v11 v12 58 58 TAMPI has already been installed on DEEP and can be used by simply executing the following commands: 59 59 60 `modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/Core:$modulepath"` 61 62 `modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/Compiler/mpi/intel/2019.0.117-GCC-7.3.0:$modulepath"` 63 64 `modulepath="/usr/local/software/skylake/Stages/2018b/modules/all/MPI/intel/2019.0.117-GCC-7.3.0/psmpi/5.2.1-1-mt:$modulepath"` 65 66 `export MODULEPATH="$modulepath:$MODULEPATH"` 60 `module load Intel/2019.5.281-GCC-8.3.0 ParaStationMPI/5.4.6-1-mt` 61 62 `module load OmpSs-2` 67 63 68 64 `module load TAMPI` 69 70 Note that loading the TAMPI module will automatically load the **!OmpSs-2** and **Parastation MPI** modules (this MPI library has been compiled with multi-threading support enabled).71 65 72 66 You might want to request more MPI ranks per socket depending on your particular application. See the examples below together with the corresponding system affinity report: … … 250 244 can be seen with the `-h` option. An example of execution could be: 251 245 252 ` mpiexec -n 4 -bind-to hwthread:16 ./nbody-t 100 -p 8192`253 254 in which the application will perform 100 timesteps in 4 MPI processes with 16 hardware threads in each process (used by the !OmpSs-2 runtime). The total number of particles will be 8192 so that each process will have 2048 particles (2 blocks per process).246 `srun -n 4 07.nbody_mpi_ompss_tasks_interop_async.N2.2048bs.bin -t 100 -p 8192` 247 248 in which the application will perform 100 timesteps using 4 MPI processes. The total number of particles will be 8192 so that each process will have 2048 particles (2 blocks per process). 255 249 256 250 == References == … … 284 278 could be: 285 279 286 `mpiexec -n 4 -bind-to hwthread:16 ./heat -t 150 -s 8192` 287 288 in which the application will perform 150 timesteps in 4 MPI processes with 16 289 hardware threads in each process (used by the !OmpSs-2 runtime). The size of the 290 matrix in each dimension will be 8192 (8192^2^ elements in total), this means 291 that each process will have 2048x8192 elements (16 blocks per process). 280 `srun -n 4 05.heat_mpi_ompss_tasks.1024x1024bs.bin -t 150 -s 8192` 281 282 in which the application will perform 150 timesteps using 4 MPI processes. The size of the matrix in each dimension will be 1024 (1024^2^ elements in total), which means 283 that each process will have 256x1024 elements (4 blocks per process). 292 284 293 285 == References ==