Changes between Version 36 and Version 37 of Public/User_Guide/OmpSs-2
- Timestamp:
- Jun 14, 2019, 3:47:45 PM (6 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Public/User_Guide/OmpSs-2
v36 v37 7 7 * [#QuickSetuponDEEPSystemforaPureOmpSs-2Application Quick Setup on DEEP System for a Pure OmpSs-2 Application] 8 8 * [#UsingtheRepositories Using the Repositories] 9 * [#multisaxpybenchmarkOmpSs-2 multisaxpy benchmark (OmpSs-2)] 10 * [#dot-productbenchmarkOmpSs-2 dot-product benchmark (OmpSs-2)] 11 * [#mergesortbenchmarkOmpSs-2 mergesort benchmark (OmpSs-2)] 12 * [#nqueensbenchmarkOmpSs-2 nqueens benchmark (OmpSs-2)] 13 * [#matmulbenchmarkOmpSs-2 matmul benchmark (OmpSs-2)] 14 * [#CholeskybenchmarkOmpSs-2MKL Cholesky benchmark (OmpSs-2+MKL)] 15 * [#nbodybenchmarkMPI+OmpSs-2TAMPI nbody benchmark (MPI+OmpSs-2+TAMPI)] 16 * [#heatbenchmarkMPI+OmpSs-2TAMPI heat benchmark (MPI+OmpSs-2+TAMPI)] 9 * Examples: 10 * [#AStep-By-StepDetailedGuidetoExecutetheMultisaxpyBenchmark A Step-By-Step Detailed Guide to Execute the Multisaxpy Benchmark (OmpSs-2)] 11 * [#Dot-productBenchmarkOmpSs-2 Dot-product Benchmark (OmpSs-2)] 12 * [#MergesortBenchmarkOmpSs-2 Mergesort Benchmark (OmpSs-2)] 13 * [#NqueensBenchmarkOmpSs-2 Nqueens Benchmark (OmpSs-2)] 14 * [#MatmulBenchmarkOmpSs-2 Matmul Benchmark (OmpSs-2)] 15 * [#CholeskyBenchmarkOmpSs-2MKL Cholesky Nenchmark (OmpSs-2+MKL)] 16 * [#NbodyBenchmarkMPI+OmpSs-2TAMPI Nbody Nenchmark (MPI+OmpSs-2+TAMPI)] 17 * [#HeatBenchmarkMPI+OmpSs-2TAMPI Heat Benchmark (MPI+OmpSs-2+TAMPI)] 17 18 18 19 ---- … … 97 98 All the examples shown here are publicly available at [https://pm.bsc.es/gitlab/ompss-2/examples]. Users must clone/download each example's repository and then transfer it to a DEEP working directory. 98 99 99 == System configuration ==100 == System Configuration == 100 101 101 102 Please refer to section [#QuickSetuponDEEPSystem Quick Setup on DEEP System] to get a functional version of !OmpSs-2 on DEEP. It is also recommended to run !OmpSs-2 via an interactive session on a cluster module (CM) node. 102 103 103 == Building and running the examples ==104 == Building and Running the Examples == 104 105 105 106 All the examples come with a Makefile already configured to build (e.g. `make`) and run (e.g. `make run`) them. You can clean the directory with the command `make clean`. 106 107 107 == Controlling available threads ==108 == Controlling the Available Threads == 108 109 109 110 In order to limit or constraint the available threads for an application, the Unix **taskset** tool can be used to launch applications with a given thread affinity. In order to use taskset, simply precede the application's binary with taskset followed by a list of CPU IDs specifying the desired affinity: … … 113 114 The example above will run **application** with 4 cores: 0, 2, 3, 4. 114 115 115 == Dependency graphs ==116 == Creating Dependency Graphs == 116 117 117 118 Nanos6 allows for a graphical representation of data dependencies to be extracted. In order to generate said graph, run the application with the **NANOS6** environment variable set to **graph**: … … 125 126 The result will be a PDF file with several pages, each representing the graph at a certain point in time. For best results, we suggest to display the PDF with **single page** view, showing a full page and to advance page by page. 126 127 127 == Obtaining statistics ==128 == Obtaining Statistics == 128 129 129 130 Another equally interesting feature of Nanos6 is obtaining statistics. To do so, simply run the application as: … … 148 149 ---- 149 150 150 = multisaxpy benchmark (!OmpSs-2) =151 = Multisaxpy Benchmark (!OmpSs-2) = 151 152 152 153 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/multisaxpy] and transfer it to a DEEP working directory. … … 158 159 There are **7 implementations** of this benchmark. 159 160 160 == Execution instructions ==161 == Execution Instructions == 161 162 162 163 `./multisaxpy SIZE BLOCK_SIZE INTERATIONS` … … 167 168 * `ITERATIONS` is the number of times the SAXPY operation is executed. 168 169 169 == Example output ==170 == Example Output == 170 171 171 172 {{{ … … 222 223 223 224 224 = dot-product benchmark (!OmpSs-2) =225 = Dot-product Benchmark (!OmpSs-2) = 225 226 226 227 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/dot-product] and transfer it to a DEEP working directory. … … 232 233 There are **3 implementations** of this benchmark. 233 234 234 == Execution instructions ==235 == Execution Instructions == 235 236 236 237 `./dot_product SIZE CHUNK_SIZE` … … 248 249 249 250 250 = mergesort benchmark (!OmpSs-2) =251 = Mergesort Benchmark (!OmpSs-2) = 251 252 252 253 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/mergesort] and transfer it to a DEEP working directory. … … 258 259 There are **6 implementations** of this benchmark. 259 260 260 == Execution instructions ==261 == Execution Instructions == 261 262 262 263 `./mergesort N BLOCK_SIZE` … … 275 276 276 277 277 = nqueens benchmark (!OmpSs-2) =278 = Nqueens Benchmark (!OmpSs-2) = 278 279 279 280 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/nqueens] and transfer it to a DEEP working directory. … … 287 288 There are **7 implementations** of this benchmark. 288 289 289 == Execution instructions ==290 == Execution Instructions == 290 291 291 292 `./n-queens N [threshold]` … … 306 307 307 308 308 = matmul benchmark (!OmpSs-2) =309 = Matmul Benchmark (!OmpSs-2) = 309 310 310 311 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/matmul] and transfer it to a DEEP working directory. … … 317 318 There are **3 implementations** of this benchmark. 318 319 319 == Execution instructions ==320 == Execution Instructions == 320 321 321 322 `./matmul N M P BLOCK_SIZE` … … 337 338 338 339 339 = Cholesky benchmark (!OmpSs-2+MKL) =340 = Cholesky Benchmark (!OmpSs-2+MKL) = 340 341 341 342 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/cholesky] and transfer it to a DEEP working directory. … … 357 358 For the graph instrumentation, it is recommended to view the resulting PDF in single page mode and to advance through the pages. This will show the actual instantiation and execution of the code. For the extrae instrumentation, extrae must be loaded and available at least through the `LD_LIBRARY_PATH` environment variable. 358 359 359 == Execution instructions ==360 == Execution Instructions == 360 361 361 362 `./cholesky SIZE BLOCK_SIZE` … … 375 376 376 377 377 = nbody benchmark (MPI+!OmpSs-2+TAMPI) =378 = Nbody Benchmark (MPI+!OmpSs-2+TAMPI) = 378 379 379 380 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/nbody] and transfer it to a DEEP working directory. … … 390 391 The interoperability versions (MPI+!OmpSs-2+TAMPI) are compiled only if the environment variable `TAMPI_HOME` is set to the Task-Aware MPI (TAMPI) library's installation directory. 391 392 392 == Execution instructions ==393 == Execution Instructions == 393 394 394 395 The binaries accept several options. The most relevant options are the number … … 409 410 410 411 411 = heat benchmark (MPI+!OmpSs-2+TAMPI) =412 = Heat Benchmark (MPI+!OmpSs-2+TAMPI) = 412 413 413 414 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/heat] and transfer it to a DEEP working directory. … … 424 425 The interoperability versions (MPI+!OmpSs-2+TAMPI) are compiled only if the environment variable `TAMPI_HOME` is set to the Task-Aware MPI (TAMPI) library's installation directory. 425 426 426 == Execution instructions ==427 == Execution Instructions == 427 428 428 429 The binaries accept several options. The most relevant options are the size … … 445 446 ---- 446 447 447 = krist benchmark (!OmpSs-2+CUDA) =448 = Krist Benchmark (!OmpSs-2+CUDA) = 448 449 449 450 Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/krist] and transfer it to a DEEP working directory. … … 455 456 There are **2 implementations** of this benchmark, ''krist'' and ''krist-unified'' using regular and unified CUDA memory, repectively. 456 457 457 == Execution instructions ==458 == Execution Instructions == 458 459 459 460 `./krist N_A N_R`