Changes between Version 9 and Version 10 of Public/ParaStationMPI


Ignore:
Timestamp:
Oct 24, 2019, 12:18:03 PM (5 years ago)
Author:
Simon Pickartz
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/ParaStationMPI

    v9 v10  
    1010= CUDA Support by !ParaStation MPI =
    1111
    12 === What is CUDA-awareness for MPI? ===
    13 In brief, ''CUDA-awareness'' in an MPI library means that a mixed CUDA + MPI application is allowed to pass pointers to CUDA buffers (these are memory regions located on the GPU, the so-called ''Device'' memory) directly to MPI functions like `MPI_Send` or `MPI_Recv`. A non CUDA-aware MPI library would fail in such a case because the CUDA-memory cannot be accessed directly e.g. via load/store or `memcpy()` but has previously to be transferred via special routines like `cudaMemcpy()` to the Host memory. In contrast to this, a CUDA-aware MPI library recognizes that a pointer is associated with a buffer within the Device memory and can then copy this buffer before communication temporarily into the Host memory -- what is called ''Staging'' of this buffer. In addition, a CUDA-aware MPI library may also apply some kind of optimizations, for example, by means of exploiting so-called ''GPUDirect'' capabilities that allow for direct RDMA transfers from and to Device memory.
     12=== What is CUDA awareness for MPI? ===
     13In brief, ''CUDA awareness'' in an MPI library means that a mixed CUDA + MPI application is allowed to pass pointers to CUDA buffers (these are memory regions located on the GPU, the so-called ''Device'' memory) directly to MPI functions such as `MPI_Send()` or `MPI_Recv()`. A non CUDA-aware MPI library would fail in such a case because the CUDA-memory cannot be accessed directly, e.g., via load/store or `memcpy()` but has to be transferred in advance to the host memory via special routines such as `cudaMemcpy()`. As opposed to this, a CUDA-aware MPI library recognizes that a pointer is associated with a buffer within the device memory and can then copy this buffer prior to the communication into a temporarily host buffer -- what is called ''staging'' of this buffer. Additionally, a CUDA-aware MPI library may also apply some kind of optimizations, e.g., by means of exploiting so-called ''GPUDirect'' capabilities that allow for direct RDMA transfers from and to the device memory.
    1414
    1515=== Some external Resources ===
     
    2828**Warning:** ''This manual section is currently under development. Therefore, the following usage guidelines may be not flawless and are likely to change in some respects in the near future! ''
    2929
    30 On the DEEP system, the CUDA-awareness can be enabled by loading a dedicated module that links to a dedicated !ParaStation MPI library that has been compiled with CUDA support:
     30On the DEEP system, the CUDA awareness can be enabled by loading a module that links to a dedicated !ParaStation MPI library providing CUDA support:
    3131{{{
    3232module load GCC