Context Navigation

Changes between Version 8 and Version 9 of Public/ParaStationMPI

Timestamp:: Oct 24, 2019, 12:10:42 PM (5 years ago)
Author:: Carsten Clauß
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

Public/ParaStationMPI

-                      v8
+                      v9
 === What is CUDA-awareness for MPI? ===
 In brief, ''CUDA-awareness'' in an MPI library means that a mixed CUDA + MPI application is allowed to pass pointers to CUDA buffers (these are memory regions located on the GPU, the so-called ''Device'' memory) directly to MPI functions like `MPI_Send` or `MPI_Recv`. A non CUDA-aware MPI library would fail in such a case because the CUDA-memory cannot be accessed directly e.g. via load/store or `memcpy()` but has previously to be transferred via special routines like `cudaMemcpy()` to the Host memory. In contrast to this, a CUDA-aware MPI library recognizes that a pointer is associated with a buffer within the Device memory and can then copy this buffer before communication temporarily into the Host memory -- what is called ''Staging'' of this buffer. In addition, a CUDA-aware MPI library may also apply some kind of optimizations, for example, by means of exploiting so-called ''GPUdirect'' capabilities that allow for direct RDMA transfers from and to Device memory.
+In brief, ''CUDA-awareness'' in an MPI library means that a mixed CUDA + MPI application is allowed to pass pointers to CUDA buffers (these are memory regions located on the GPU, the so-called ''Device'' memory) directly to MPI functions like `MPI_Send` or `MPI_Recv`. A non CUDA-aware MPI library would fail in such a case because the CUDA-memory cannot be accessed directly e.g. via load/store or `memcpy()` but has previously to be transferred via special routines like `cudaMemcpy()` to the Host memory. In contrast to this, a CUDA-aware MPI library recognizes that a pointer is associated with a buffer within the Device memory and can then copy this buffer before communication temporarily into the Host memory -- what is called ''Staging'' of this buffer. In addition, a CUDA-aware MPI library may also apply some kind of optimizations, for example, by means of exploiting so-called ''GPUDirect'' capabilities that allow for direct RDMA transfers from and to Device memory.
 === Some external Resources ===
 …
 === Current status on the DEEP system ===
 Currently (effective October 2019), !ParaStation MPI supports CUDA-awareness for Extoll just from the semantic-related point of view: The usage of Device pointers as arguments for send and receive buffers when calling MPI functions is supported but by an explicit ''Staging'' when Extoll is used.
 This is because the Extoll runtime up to now does not support GPUdirect, but EXTOLL is currently working on this in the context of DEEP-EST.
 As soon as GPUdirect will be supported by Extoll, this will also be integrated and enabled in !ParaStation MPI.
 (BTW: For !InfiniBand communication, !ParaStation MPI is already GPUdirect enabled.)
+This is because the Extoll runtime up to now does not support GPUDirect, but EXTOLL is currently working on this in the context of DEEP-EST.
+As soon as GPUDirect will be supported by Extoll, this will also be integrated and enabled in !ParaStation MPI.
+(BTW: For !InfiniBand communication, !ParaStation MPI is already GPUDirect enabled.)
 === Usage on the DEEP system ===