Changes between Version 34 and Version 35 of Public/ParaStationMPI


Ignore:
Timestamp:
May 31, 2021, 3:17:16 PM (3 years ago)
Author:
Carsten Clauß
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Public/ParaStationMPI

    v34 v35  
    523523MPI_Win_get_info(win, &info);
    524524MPI_Info_get(info, "psnam_window_name", INFO_VALUE_LEN, info_value, &flag);
    525 if(flag) {              strcpy(window_name, info_value);
     525if(flag) {
     526    strcpy(window_name, info_value);
    526527                printf("The window's name is: %s\n", window_name);
    527528} else {                printf("No psnam window name found!\n");
     
    534535MPI_Win_free(&win);
    535536if(comm_rank == 0) {
    536                 sprintf(service_name, "%s:my-peristent-psnam-window", argv[0]);
    537                 MPI_Publish_name(service_name, MPI_INFO_NULL, window_name);
     537    sprintf(service_name, "%s:my-peristent-psnam-window", argv[0]);
     538    MPI_Publish_name(service_name, MPI_INFO_NULL, window_name);
    538539}
    539540}}}
     
    607608
    608609}}}
     610
     611
     612=== Pre-Allocated Memory and Segments ===
     613
     614Without further info parameters than described so far, \texttt{MPI\_Win\_allocate()} will always try to allocate NAM memory itself and "on-demand".
     615However, a common use case might be that the required NAM memory needed by an application has already been allocated beforehand via the batch system—and the question is how such pre-allocated memory can be handled on MPI level.
     616In fact, using an existing NAM allocation during an `MPI_Win_allocate()` call instead of allocating new space in quite straight forward by applying `psnam_libnam_allocation_id` as a further info key plus the respective NAM allocation ID as the related info value.
     617
     618
     619==== Usage of Segments ====
     620
     621However, a NAM-based MPI window may possibly still consist of multiple regions, and it should also still be possible to build multiple MPI windows from the space of a single NAM (pre-)allocation.
     622Therefore, a means for subdividing NAM allocations needs to be provided—and that's exactly what segments are intended for:
     623A segment is a "meta-manifestation" that maintains a size and offset information for a sub-region within a larger allocation.
     624This offset can either be set explicitly via `psnam_segment_offset` (e.g., for splitting an allocation among multiple processes), or it can be managed dynamically and implicitly by the PSNAM layer (e.g., for using the allocated memory across multiple MPI windows).
     625
     626
     627==== Recursive Use of Segments ====
     628
     629The concept of segments can also be applied recursively.
     630For doing so, PSNAM windows of the "raw and flat" structure feature the info key `psnam_allocation_id` plus respective value that in turn can be used to pass a reference to an already existing allocation to a subsequent `MPI_Win_allocate()` call with `psnam_manifestation_segment` as the region manifestation.
     631That way, existing allocations can be divided into segments—which could then even further sub-divided into sub-sections, and so forth.
     632
     633
     634==== Example ====
     635
     636{{{
     637MPI_Info_create(&info_set);       
     638MPI_Info_set(info_set, "psnam_manifestation", "psnam_manifestation_libnam");
     639MPI_Info_set(info_set, "psnam_libnam_allocation_id", getenv("SLURM_NAM_ALLOC_ID");
     640MPI_Info_set(info_set, "psnam_structure", "psnam_structure_raw_and_flat");
     641
     642MPI_Win_allocate(allocation_size, 1, info_set, MPI_COMM_WORLD, NULL, &raw_nam_win);
     643MPI_Win_get_info(raw_nam_win, &info_get);
     644MPI_Info_get(info_get, "psnam_allocation_id", MPI_MAX_INFO_VAL, segment_name, &flag);
     645
     646MPI_Info_set(info_set, "psnam_manifestation", "psnam_manifestation_segment");
     647MPI_Info_set(info_set, "psnam_segment_allocation_id", segment_name);
     648sprintf(offset_value_str, "%d", (allocation_size / num_ranks) * my_rank);
     649MPI_Info_set(info_set, "psnam_segment_offset", offset_value_str);
     650
     651MPI_Info_set(info_set, "psnam_structure", "psnam_structure_managed_contiguous");
     652MPI_Win_allocate(num_int_elements * sizeof(int), sizeof(int), info_set, MPI_COMM_WORLD, NULL, &win);
     653}}}
     654
     655
     656=== Accessing Data in NAM Memory ===
     657
     658Accesses to the NAM memory must always be made via `MPI_Put()` and `MPI_Get()` calls.
     659Direct load/store accesses are (of course) not possible--and `MPI_Accumulate()` is currently also not supported since the NAM is just a passive memory device, at least so far.
     660However, after an epoch of accessing the NAM, the respective origin buffers must not be reused or read until a synchronization has been performed.
     661Currently, only the `MPI_Win_fence()` mechanism is supported for doing so.
     662According to this loosely-synchronous model, computation phases alternate with NAM access phases, each completed by a call of `MPI_Win_fence()`, acting as a memory barrier and process synchronization point.
     663
     664
     665==== Example ====
     666
     667{{{
     668for (pos = 0; pos < region_size; pos++) put_buf[pos] = put_rank+pos;
     669MPI_Put(put_buf, region_size, MPI_INT, target_region_rank, 0, region_size, MPI_INT, win);
     670MPI_Get(get_buf, region_size, MPI_INT, target_region_rank, 0, region_size, MPI_INT, win);
     671
     672MPI_Win_fence(0, win);
     673
     674for (pos = 0; pos < region_size - WIN_DISP; pos++) {
     675    if (get_buf[pos] != put_rank+pos) {
     676        fprintf(stderr, "ERROR at %d: %d vs. %d\n", pos, get_buf[pos], put_rank+pos);
     677                }
     678}
     679}}}
     680
     681
     682=== Alternative interface ===
     683
     684The extensions presented so far were all of semantic nature, i.e. without introducing new API functions.
     685However, the changed usage of MPI standard functions may also be a bit confusing, which is why a set of macros is also provided, which in turn encapsulate the MPI functions used for the NAM handling.
     686That way, readability of application code with NAM employment can be improved.
     687
     688These encapsulating macros are the following:
     689
     690 * `MPIX_Win_allocate_intercomm(size, disp_unit, info_set, comm, intercomm, win)` ...as an alias for `MPI_Win_allocate()`.
     691
     692 * `MPIX_Win_connect_intercomm(window_name, info, root, comm, intercomm)` ...as an alias for `MPI_Comm_connect()`.
     693
     694 * `MPIX_Win_create_intercomm(info, comm, win)` ...as an alias for `MPI_Win_create_dynamic()`.
     695
     696 * `MPIX_Win_intercomm_query(win, rank, size, disp_unit)` ...as an alias for `MPI_Win_shared_query()`.