Changes between Version 6 and Version 7 of Public/User_Guide/SIONlib


Ignore:
Timestamp:
Dec 3, 2019, 9:43:16 AM (4 years ago)
Author:
Benedikt Steinbusch
Comment:

add description of I/O forwarding

Legend:

Unmodified
Added
Removed
Modified
  • Public/User_Guide/SIONlib

    v6 v7  
    8282
    8383SIONlib inspects the pointer and if it points to an on-device buffer performs a block-wise copy of the data into host memory before writing to disk or into device memory after reading from disk.
     84
     85== I/O forwarding ==
     86
     87MSA aware collective I/O has the potential of making more efficient use of the storage system by using a subset of tasks that are well suited for performing I/O operations as collectors.
     88The collective I/O approach however imposes additional constraints that make it inapplicable in certain scenarios:
     89- By design, collective I/O operations force application tasks to coordinate in order to all perform the same sequence of operations. This is at odds with SIONlib's world view of separate files per task that can be accessed independently.
     90- Collector tasks in general have to be application tasks, i.e. they have to run the user's application. This can generate conflicts on MSA systems, if the nodes that are capable of performing I/O operations efficiently are part of a module that the user application does not map well onto.
     91
     92I/O forwarding can help in both scenarios.
     93It works by relaying calls to low-level I/O functions (e.g. `open`, `write`, `stat`, etc.) via a remote procedure call (RPC) mechanism from a client task (running the user's application) to a server task (running a dedicated server program) that then executes the functions on behalf of the client.
     94Because the server tasks are dedicated to performing I/O, they can dynamically respond to individual requests from client tasks rather than imposing coordination constraints.
     95Also, on MSA systems, the server tasks can run on different modules than the user application.
     96
     97I/O forwarding has been implemented in SIONlib through an additional software package, SIONfwd (https://gitlab.version.fz-juelich.de/SIONlib/SIONfwd).
     98It consists of a server program and a corresponding client library that is used by SIONlib to relay the low-level I/O operations that it wants to perform to the server.
     99The implementation uses a custom made, minimal RPC mechanism based only on MPI's message passing, ports, and pack/unpack mechanisms.
     100In the future we intend to evaluate more general and more optimised third party RPC solutions as they become available.
     101
     102To use I/O forwarding in SIONlib, the SIONfwd package first has to be installed (it uses a standard CMake based build system) and SIONlib has to be configured to make use of it:
     103
     104{{{!#sh
     105./configure --enable-sionfwd=/path/to/sionfwd # ... more configure arguments
     106}}}
     107
     108In the user application, just like MSA aware collectives, I/O forwarding has to be selected when opening a file (I/O forwarding is treated like an additional low-level API like POSIX and C standard I/O).
     109This is done by adding the word `sionfwd` to the `file_mode` argument of SIONlib's `open` functions:
     110
     111{{{#!c
     112sion_paropen_mpi("filename", "...,sionfwd,...", ...);
     113}}}
     114
     115Although in principle MPI contains a mechanism for dynamically spawning additional processes, it is not used to spawn the forwarding server processes for two reasons.
     116First, the feature is loosely specified with many of the details left for implementations to decide.
     117This makes it hard to precisely control process placement which is especially important on MSA systems.
     118Second, the resources necessary to run the server tasks (additional compute nodes) in many cases have to be requested at job submission time anyway.
     119Thus, the server tasks have to be launched from the user's job script before the application tasks are launched.
     120A typical job script could look like this:
     121
     122{{{#!sh
     123#!/bin/bash
     124# Slurm's heterogeneous jobs can be used to partition resources
     125# for the user's application and the forwarding server, even
     126# when not running on an MSA system.
     127#SBATCH --nodes=32 --partition=dp-cn
     128#SBATCH packjob
     129#SBATCH --nodes=4 --cpus-per-taks=1 --partition=dp-dam
     130
     131module load intel-para SIONlib/1.7.6
     132
     133# Defines a shell function sionfwd-spawn that is used to
     134# facilitate communication of MPI ports connection details
     135# between the server and the client.
     136eval $(sionfwd-server bash-defs)
     137
     138# Spawns the server, captures the connection details and
     139# exports them to the environment to be picked up by the
     140# client library used from the user's application.
     141sionfwd-spawn srun --pack-group 1 sionfwd-server
     142
     143# Spawn the user application.
     144srun --pack-group 0 user_application
     145
     146# Shut down the server.
     147srun --pack-group 0 sionfwd-server shutdown
     148
     149# Wait for all tasks to end.
     150wait
     151}}}