71 | | Notice that both commands return consistent outputs and, even though an entire node with two sockets has been requested, only the first NUMA node (i.e. socket) has been correctly bind. As a result, only 48 threads of the first socket (0-11, 24-35), from which 24 are physical and 24 logical (hyper-threading enabled), are going to be utilised whilst the other 48 threads available on the second socket will remain idle. Therefore, **the system affinity showed above does not represent the resources requested via SLURM.** |
| 73 | Notice that both commands return consistent outputs and, even though an entire node with two sockets has been requested, only the first NUMA node (i.e. socket) has been correctly bind. As a result, only 48 threads of the first socket (0-11, 24-35), from which 24 are physical and 24 logical (hyper-threading enabled), are going to be utilised whilst the other 48 threads available on the second socket will remain idle. Therefore, **the system affinity showed above is not valid since it does not represent the resources requested via SLURM.** |
79 | | == Examples == |
| 81 | = Repository with examples = |
| 82 | |
| 83 | All the examples shown here are publicly available at [https://pm.bsc.es/gitlab/ompss-2/examples]. Users must clone/download each example's repository and then transfer it to a DEEP working directory. |
| 84 | |
| 85 | == System configuration == |
| 86 | |
| 87 | Please refer to section [#QuickSetuponDEEPSystem Quick Setup on DEEP System] to get a functional version of OmpSs-2 on DEEP. It is also recommended to run OmpSs-2 on a cluster module (CM) node. |
| 88 | |
| 89 | == Building and running the examples == |
| 90 | |
| 91 | All the examples come with a Makefile already configured to build (e.g. `make`) and run (e.g. `make run`) them. |
| 92 | |
| 93 | == Controlling available threads == |
| 94 | |
| 95 | In order to limit or constraint the available threads for an application, the Unix ''taskset'' tool can be used to launch applications with a given thread affinity. In order to use taskset, simply precede the application's binary with ''taskset'' followed by a list of CPU IDs specifying the desired affinity: |
| 96 | |
| 97 | `taskset -c 0,2-4 ./application` |
| 98 | |
| 99 | The example above will run ''application'' with 4 cores: 0, 2, 3, 4. |
| 100 | |
| 101 | == Dependency graphs == |
| 102 | |
| 103 | Nanos6 allows for a graphical representation of data dependencies to be extracted. In order to generate said graph, run the application with the ''NANOS6'' environment variable set to ''graph'': |
| 104 | |
| 105 | `NANOS6=graph ./application` |
| 106 | |
| 107 | By default graph nodes will include the full path of the source code. To remove these, set the following environment variable: |
| 108 | |
| 109 | `NANOS6_GRAPH_SHORTEN_FILENAMES=1` |
| 110 | |
| 111 | The result will be a PDF file with several pages, each representing the graph at a certain point in time. For best results, we suggest to display the PDF with ''single page'' view, showing a full page and to advance page by page. |
| 112 | |
| 113 | == Obtaining statistics == |
| 114 | |
| 115 | Another equally interesting feature of Nanos6 is obtaining statistics. To do so, simply run the application as: |
| 116 | |
| 117 | `NANOS6=stats ./application` or `NANOS6=stats-papi ./application` |
| 118 | |
| 119 | The first collects timing statistics while the second also records hardware counters (compilation with PAPI is needed for the second). By default, the statistics are emitted standard error when the program ends. |
| 120 | |
| 121 | == Tracing with Extrae == |
| 122 | |
| 123 | A ''trace.sh'' file can be used to include all the environment variables needed to get an instrumentation trace of the execution. The content of this file is as follows: |
| 124 | |
| 125 | {{{ |
| 126 | #!/bin/bash |
| 127 | export EXTRAE_CONFIG_FILE=extrae.xml |
| 128 | export NANOS6="extrae" |
| 129 | $* |
| 130 | }}} |
| 131 | |
| 132 | Additionally, you will need to change your running script in order to invoke the program through this ''trace.sh'' script. Although you can also edit your running script adding all the environment variables related with the instrumentation, it is preferable to use this extra script to easily change between instrumented and non-instrumented executions. When in need to instrument your execution, simply include ''trace.sh'' before the program invocation. Note that the ‘’extrae.xml’’ file, which is used to configure the Extrae library to get a Paraver trace, is also needed. |
| 133 | |
| 134 | = Example: Multisaxpy = |
| 135 | |
| 136 | The examples shown here are publicly available at [https://pm.bsc.es/gitlab/ompss-2/examples]. |
| 137 | |
| 138 | Users must clone/download this example's repository from [https://pm.bsc.es/gitlab/ompss-2/examples/multisaxpy] and transfer it to a DEEP working directory. |