Changes between Version 8 and Version 9 of Public/User_Guide/Offloading_hybrid_apps


Ignore:
Timestamp:
Sep 17, 2019, 3:12:32 PM (5 years ago)
Author:
Kevin Sala
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/Offloading_hybrid_apps

    v8 v9  
    1313== N-Body Benchmark ==
    1414
    15 Users can clone or download this examples from the https://pm.bsc.es/gitlab/DEEP-EST/apps/NBody
     15Users can clone or download this examples from the [https://pm.bsc.es/gitlab/DEEP-EST/apps/NBody]
    1616repository and transfer it to a DEEP working directory.
    1717
     
    3636  * '''!OmpSs-2''': !OmpSs-2 is the second generation of the '''!OmpSs''' programming model. It is a task-based
    3737    programming model originated from the ideas of the OpenMP and !StarSs programming models. The
    38     specification and user-guide are available at https://pm.bsc.es/ompss-2-docs/spec/ and
    39     https://pm.bsc.es/ompss-2-docs/user-guide/, respectively. !OmpSs-2 requires both '''Mercurium''' and
     38    specification and user-guide are available at [https://pm.bsc.es/ompss-2-docs/spec/] and
     39    [https://pm.bsc.es/ompss-2-docs/user-guide/], respectively. !OmpSs-2 requires both '''Mercurium''' and
    4040    '''Nanos6''' tools. Mercurium is a source-to-source compiler which provides the necessary support for
    4141    transforming the high-level directives into a parallelized version of the application. The Nanos6
    4242    runtime system library provides the services to manage all the parallelism in the application
    43     (e.g., task creation, synchronization, scheduling, etc). Downloads at https://github.com/bsc-pm.
     43    (e.g., task creation, synchronization, scheduling, etc). Downloads at [https://github.com/bsc-pm].
    4444  * '''Clang + LLVM OpenMP''' (derived):
    4545  * '''MPI''': This application requires an MPI library supporting the multi-threading level of
     
    5151    the N-body kernels are executed on the available GPU devices.
    5252  * '''Task-Aware MPI (TAMPI)''': The Task-Aware MPI library provides the interoperability mechanism
    53     for MPI and OpenMP/!OmpSs-2. Downloads and more information at https://github.com/bsc-pm/tampi.
     53    for MPI and OpenMP/!OmpSs-2. Downloads and more information at [https://github.com/bsc-pm/tampi].
    5454
    5555=== Versions ===
     
    9898$ source ./setenv_deep.sh
    9999
    100 # Compile the code
     100# Compile all N-Body variants
    101101$ make
    102102}}}
     
    104104The benchmark versions are built with a specific block size, which is
    105105decided at compilation time (i.e., the binary names contain the block
    106 size). The default block size of the benchmark is `2048`. Optionally,
    107 you can indicate a different block size when compiling by doing:
     106size). The default block size is `2048`. Optionally, you can indicate
     107a different block size when compiling by doing:
    108108
    109109{{{#!bash
     
    114114this benchmarks targets the offloading of computational tasks to the Unified
    115115Memory GPU devices, we must execute it in a DEEP partition that features this
    116 kind of devices. A good example is the `dp-dam` partition, where each nodes
    117 features:
     116kind of devices. A good example is the [wiki:Public/User_Guide/DEEP-EST_DAM dp-dam]
     117partition, where each nodes features:
    118118
    119119* 2x Intel® Xeon® Platinum 8260M CPU @ 2.40GHz (24 cores/socket, 2 threads/core), '''96 CPUs/node'''
     
    161161is kept but the CUDA tasks are replaced by regular CPU tasks.
    162162
    163 Finally, the OpenMP variants can be executed similarly, but setting the `OMP_NUM_THREADS`
    164 to the corresponding number of CPUs per process. As an example, we could execute the
    165 following command:
     163Similarly, the OpenMP variants can be executed following the same steps but setting
     164the `OMP_NUM_THREADS` to the corresponding number of CPUs per process. As an example,
     165we could execute the following command:
    166166
    167167{{{#!bash
     
    169169}}}
    170170
     171Finally, if you want to execute the benchmark without using an interactive session, you
     172can modify the `submit.job` script and submit it into the job queue through the `sbatch`
     173command.
     174
    171175== References ==
    172176
    173 
     177* [https://pm.bsc.es/ompss-2]
     178* [https://github.com/bsc-pm]
     179* [https://github.com/bsc-pm/tampi]
     180* [https://en.wikipedia.org/wiki/N-body_simulation]
     181* [https://pm.bsc.es/gitlab/DEEP-EST/apps/NBody]