Context Navigation

Changes between Version 28 and Version 29 of Public/ParaStationMPI

Timestamp:: May 28, 2021, 5:26:26 PM (4 years ago)
Author:: Carsten Clauß
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

Public/ParaStationMPI

-                      v28
+                      v29
 ----
+== Heterogeneous Jobs using inter-module MPI communication ==
+== Modular MPI Jobs ==
+=== Inter-module MPI Communication ===
 !ParaStation MPI provides support for inter-module communication in federated high-speed networks.
 Therefore, so-called Gateway (GW) daemons bridge the MPI traffic between the modules.
 …
 An MPI job started with this colon notation via srun will run in a single `MPI_COMM_WORLD`.
 However, workflows across modules may demand for multiple `MPI_COMM_WOLRD`s that may connect (and later disconnect) with each other during runtime.
+However, workflows across modules may demand for multiple `MPI_COMM_WOLRD` sessions that may connect (and later disconnect) with each other during runtime.
 The following simple job script is example that supports such a case:
 …
 === Application-dependent Tuning ===
+The GW protocol supports the fragmentation of larger messages into smaller chunks of a given length, i.e., the Maximum Transfer Unit (MTU). This way, the GW daemon may benefit from pipelining effect resulting in an overlapping of the message transfer from the source to the GW daemon and from the GW daemon to the destination. The chunk size may be influenced by setting the following environment variable:
+The Gateway protocol supports the fragmentation of larger messages into smaller chunks of a given length, i.e., the Maximum Transfer Unit (MTU).
+This way, the Gateway daemon may benefit from pipelining effect resulting in an overlapping of the message transfer from the source to the Gateway daemon and from the Gateway daemon to the destination.
+The chunk size may be influenced by setting the following environment variable:
 {{{
 PSP_GW_MTU=<chunk size in byte>
 }}}
 The optimal chunk size is highly dependent on the communication pattern and therefore has to be chosen for each application individually.
+=== API Extensions for MSA awareness ===
+Besides transparent MSA support, there is the possibility for the application to adapt to modularity explicitly.
+For doing so, on the one hand, !ParaStation MPI provides a portable API addition for retrieving topology information by querying a ''Module ID'' via the `MPI_INFO_ENV` object:
+{{{
+  int module_id;
+  char value[MPI_MAX_INFO_VAL];
+  MPI_Info_get(MPI_INFO_ENV, "msa_module_id", MPI_MAX_INFO_VAL, value, &flag);
+  if (flag) { /* This MPI environment is modularity-aware! */
+    my_module_id = atoi(value); /* Determine the module affinity of this process. */
+  } else { /* This MPI environment is NOT modularity-aware! */
+          my_module_id = 0;     /* Assume a flat topology for all processes. */
+  }
+}}}
+On the other hand, there is the possibility to use a newly added ''split type'' for the standardized `MPI_Comm_split_type()` function for creating MPI communicators according to the modular topology of an MSA system:
+{{{
+  MPI_Comm_split(MPI_COMM_WORLD, my_module_id, 0, &module_local_comm);
+  /*    After the split call, module_local_comm contains from the view of each
+   *    process all the other processes that belong to the same local MSA module.
+   */
+  MPI_Comm_rank(module_local_comm, &my_module_local_rank);
+  printf("My module ID is %d and my module-local rank is %d\n", my_module_id, my_module_local_rank);
+}}}
 …
 PSP_UCP=1    # support GPUDirect via UCX in InfiniBand networks (e.g., this is currently true for the ESB nodes)
 }}}
+=== Testing for CUDA awareness ===
+!ParaStation MPI features three API extensions for querying whether the MPI library at hand is CUDA-aware or not.
+The first targets the compile time:
+{{{
+#if defined(MPIX_CUDA_AWARE_SUPPORT) && MPIX_CUDA_AWARE_SUPPORT
+printf("The MPI library is CUDA-aware\n");
+#endif
+}}}
+...and the other two also the runtime:
+{{{
+if (MPIX_Query_cuda_support())
+    printf("The CUDA awareness is activated\n");
+}}}
+or alternatively:
+{{{
+MPI_Info_get(MPI_INFO_ENV, "cuda_aware", ..., value, &flag);
+/*
+ * If flag is set, then the library was built with CUDA support.
+ * If, in addition, value points to the string "true", then the
+ * CUDA awareness is also activated (i.e., PSP_CUDA=1 is set).
+ */
+}}}
+Please note that the first two API extensions are similar to those that Open MPI also provides with respect to CUDA awareness, whereas the latter is specific solely to !ParaStation MPI, but which is still quite portable due to the use of the generic `MPI_INFO_ENV` object.