Context Navigation

Changes between Version 34 and Version 35 of Public/ParaStationMPI

Timestamp:: May 31, 2021, 3:17:16 PM (3 years ago)
Author:: Carsten Clauß
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

Public/ParaStationMPI

-                      v34
+                      v35
 MPI_Win_get_info(win, &info);
 MPI_Info_get(info, "psnam_window_name", INFO_VALUE_LEN, info_value, &flag);
+if(flag) {              strcpy(window_name, info_value);
+if(flag) {
+    strcpy(window_name, info_value);
                 printf("The window's name is: %s\n", window_name);
 } else {                printf("No psnam window name found!\n");
 …
 MPI_Win_free(&win);
 if(comm_rank == 0) {
                 sprintf(service_name, "%s:my-peristent-psnam-window", argv[0]);
                 MPI_Publish_name(service_name, MPI_INFO_NULL, window_name);
+    sprintf(service_name, "%s:my-peristent-psnam-window", argv[0]);
+    MPI_Publish_name(service_name, MPI_INFO_NULL, window_name);
+}
 }}}
 …
 …
 }}}
+=== Pre-Allocated Memory and Segments ===
+Without further info parameters than described so far, \texttt{MPI\_Win\_allocate()} will always try to allocate NAM memory itself and "on-demand".
+However, a common use case might be that the required NAM memory needed by an application has already been allocated beforehand via the batch system—and the question is how such pre-allocated memory can be handled on MPI level.
+In fact, using an existing NAM allocation during an `MPI_Win_allocate()` call instead of allocating new space in quite straight forward by applying `psnam_libnam_allocation_id` as a further info key plus the respective NAM allocation ID as the related info value.
+==== Usage of Segments ====
+However, a NAM-based MPI window may possibly still consist of multiple regions, and it should also still be possible to build multiple MPI windows from the space of a single NAM (pre-)allocation.
+Therefore, a means for subdividing NAM allocations needs to be provided—and that's exactly what segments are intended for:
+A segment is a "meta-manifestation" that maintains a size and offset information for a sub-region within a larger allocation.
+This offset can either be set explicitly via `psnam_segment_offset` (e.g., for splitting an allocation among multiple processes), or it can be managed dynamically and implicitly by the PSNAM layer (e.g., for using the allocated memory across multiple MPI windows).
+==== Recursive Use of Segments ====
+The concept of segments can also be applied recursively.
+For doing so, PSNAM windows of the "raw and flat" structure feature the info key `psnam_allocation_id` plus respective value that in turn can be used to pass a reference to an already existing allocation to a subsequent `MPI_Win_allocate()` call with `psnam_manifestation_segment` as the region manifestation.
+That way, existing allocations can be divided into segments—which could then even further sub-divided into sub-sections, and so forth.
+==== Example ====
+{{{
+MPI_Info_create(&info_set);
+MPI_Info_set(info_set, "psnam_manifestation", "psnam_manifestation_libnam");
+MPI_Info_set(info_set, "psnam_libnam_allocation_id", getenv("SLURM_NAM_ALLOC_ID");
+MPI_Info_set(info_set, "psnam_structure", "psnam_structure_raw_and_flat");
+MPI_Win_allocate(allocation_size, 1, info_set, MPI_COMM_WORLD, NULL, &raw_nam_win);
+MPI_Win_get_info(raw_nam_win, &info_get);
+MPI_Info_get(info_get, "psnam_allocation_id", MPI_MAX_INFO_VAL, segment_name, &flag);
+MPI_Info_set(info_set, "psnam_manifestation", "psnam_manifestation_segment");
+MPI_Info_set(info_set, "psnam_segment_allocation_id", segment_name);
+sprintf(offset_value_str, "%d", (allocation_size / num_ranks) * my_rank);
+MPI_Info_set(info_set, "psnam_segment_offset", offset_value_str);
+MPI_Info_set(info_set, "psnam_structure", "psnam_structure_managed_contiguous");
+MPI_Win_allocate(num_int_elements * sizeof(int), sizeof(int), info_set, MPI_COMM_WORLD, NULL, &win);
+}}}
+=== Accessing Data in NAM Memory ===
+Accesses to the NAM memory must always be made via `MPI_Put()` and `MPI_Get()` calls.
+Direct load/store accesses are (of course) not possible--and `MPI_Accumulate()` is currently also not supported since the NAM is just a passive memory device, at least so far.
+However, after an epoch of accessing the NAM, the respective origin buffers must not be reused or read until a synchronization has been performed.
+Currently, only the `MPI_Win_fence()` mechanism is supported for doing so.
+According to this loosely-synchronous model, computation phases alternate with NAM access phases, each completed by a call of `MPI_Win_fence()`, acting as a memory barrier and process synchronization point.
+==== Example ====
+{{{
+for (pos = 0; pos < region_size; pos++) put_buf[pos] = put_rank+pos;
+MPI_Put(put_buf, region_size, MPI_INT, target_region_rank, 0, region_size, MPI_INT, win);
+MPI_Get(get_buf, region_size, MPI_INT, target_region_rank, 0, region_size, MPI_INT, win);
+MPI_Win_fence(0, win);
+for (pos = 0; pos < region_size - WIN_DISP; pos++) {
+    if (get_buf[pos] != put_rank+pos) {
+        fprintf(stderr, "ERROR at %d: %d vs. %d\n", pos, get_buf[pos], put_rank+pos);
+                }
+}
+}}}
+=== Alternative interface ===
+The extensions presented so far were all of semantic nature, i.e. without introducing new API functions.
+However, the changed usage of MPI standard functions may also be a bit confusing, which is why a set of macros is also provided, which in turn encapsulate the MPI functions used for the NAM handling.
+That way, readability of application code with NAM employment can be improved.
+These encapsulating macros are the following:
+ * `MPIX_Win_allocate_intercomm(size, disp_unit, info_set, comm, intercomm, win)` ...as an alias for `MPI_Win_allocate()`.
+ * `MPIX_Win_connect_intercomm(window_name, info, root, comm, intercomm)` ...as an alias for `MPI_Comm_connect()`.
+ * `MPIX_Win_create_intercomm(info, comm, win)` ...as an alias for `MPI_Win_create_dynamic()`.
+ * `MPIX_Win_intercomm_query(win, rank, size, disp_unit)` ...as an alias for `MPI_Win_shared_query()`.