| 540 | |
| 541 | |
| 542 | === Releasing PSNAM Memory === |
| 543 | |
| 544 | According to the standard, an MPI RMA window must be freed by the collective call of `MPI_Win_free()`. |
| 545 | In case of a PSNAM window, the selection of the `psnam_consistency` MPI info key decided whether the corresponding NAM memory regions are to be freed, too. |
| 546 | Since `MPI_Win_free()` has no info parameter, the corresponding selection has either already to be made when calling `MPI_Win_allocate()` and/or can also be made/changed later by using `MPI_Win_info_set()`. |
| 547 | |
| 548 | A sound MPI application must free all MPI window objects before calling `MPI_Finalize()` -- regardless whether the corresponding NAM region should be persistent or not. |
| 549 | According to this, there are different degrees with respect to the lifetime of an MPI window: |
| 550 | Common MPI windows just live as long as `MPI_Win_free() has not been called and the related session is still alive. |
| 551 | In contrast to this, persistent NAM windows exist as long as the assigned NAM space is granted by the NAM manager. |
| 552 | Upon an `MPI_Win_free()` call, such windows are merely freed from the perspective of the MPI current application, not from the view of the NAM manager. |
| 553 | |
| 554 | |
| 555 | === Attaching to Persistent Memory Regions === |
| 556 | |
| 557 | Obviously, there needs to be a way for subsequent MPI sessions to attach to the persistent NAM regions previous MPI sessions have created. |
| 558 | The PSNAM wrapper layer enables this to be done via a call to `MPI_Comm_connect()`, which is normally used for establishing communication between distinct MPI sessions: |
| 559 | |
| 560 | {{{ |
| 561 | MPI_Comm_connect(window_name, info, root, comm, newcomm) |
| 562 | IN window_name // globally unique window name (string, used only on root) |
| 563 | IN info // implementation-dependent information (handle, used only on root) |
| 564 | IN root // rank in comm of root node (integer) |
| 565 | IN comm // intra-communicator over which call is collective (handle) |
| 566 | OUT newcomm //inter-communicator with server as remote group (handle) |
| 567 | }}} |
| 568 | |
| 569 | When passing a valid name of a persistent NAM window plus an info argument with the key `psnam_window_connect` and the value true, this function will return an inter-communicator that then serves for accessing the remote NAM memory regions. |
| 570 | However, this returned inter-communicator is just a pseudo communicator that cannot be used for any point-to-point or collective communication, but that rather acts like a handle for RMA operations on a virtual window object embodied by the remote NAM memory. |
| 571 | In doing so, the original structure of the NAM window is being retained. |
| 572 | That means that the window is still divided (and thus addressable) in terms of the MPI ranks of that process group that created the window before. |
| 573 | Therefore, a call to `MPI_Comm_remote_size()` on the returned inter-communicator reveals the former number of processes in that group. |
| 574 | For actually creating the local representative for the window in terms of an `MPI_Win` datatype, the `MPI_Win_create_dynamic()` function can be used with the inter-communicator as the input and the window handle as the output parameter. |
| 575 | |
| 576 | ==== Querying Information about a Remote Window ==== |
| 577 | |
| 578 | After determining the size of the former progress group via `MPI_Comm_remote_size()`, there might also be a demand for getting the information about the remote region sizes as well as the related unit sizes for displacements. |
| 579 | For this purpose, the PSNAM wrapper hooks into the `MPI_Win_shared_query()` function that returns these values according to the passed rank: |
| 580 | |
| 581 | {{{ |
| 582 | MPI_Win_shared_query(win, rank, size, disp_unit, baseptr) |
| 583 | IN win // window object used for communication (handle) |
| 584 | IN rank // remote region rank |
| 585 | OUT size // size of the region at the given rank |
| 586 | OUT disp_unit // local unit size for displacements |
| 587 | OUT baseptr // always NULL in case of PSNAM windows |
| 588 | }}} |
| 589 | |
| 590 | |
| 591 | ==== Example ==== |
| 592 | |
| 593 | {{{ |
| 594 | MPI_Info_create(&win_info); |
| 595 | MPI_Info_set(win_info, "psnam_window_connect", "true"); |
| 596 | MPI_Comm_connect(window_name, info, 0, MPI_COMM_WORLD, &inter_comm); |
| 597 | MPI_Info_free(&info); |
| 598 | |
| 599 | printf("Connection to persistent memory region established!\n"); |
| 600 | MPI_Comm_remote_size(inter_comm, &remote_group_size); |
| 601 | printf("Number of former process group that created the NAM window: %d\n", remote_group_size); |
| 602 | MPI_Win_create_dynamic(MPI_INFO_NULL, inter_comm, &win); |
| 603 | … |
| 604 | For (int region_rank=0; region_rank < remote_group_size; region_rank++) { |
| 605 | MPI_Win_shared_query(win, region_rank, ®ion_size[i], &disp_unit[i], NULL); |
| 606 | } |
| 607 | … |
| 608 | }}} |