Skip to content

Commit

Permalink
Parallel JSON (#1475)
Browse files Browse the repository at this point in the history
* Plumberwork for MPI communicator in JSON backend

* Parallel reading

* ... and writing

* Set padding according to MPI rank

* Write README.txt file

* Bug fix: don't double prepend base dir

* Test parallel output in openpmd-pipe test

* Bug fix: use mpi_rank_%i.toml when writing to TOML

* Refactor `if` statement

* Add documentation
  • Loading branch information
franzpoeschel authored Feb 28, 2024
1 parent 5fec415 commit d64dbc2
Show file tree
Hide file tree
Showing 8 changed files with 294 additions and 31 deletions.
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1370,7 +1370,7 @@ if(openPMD_BUILD_TESTING)
--outfile \
../samples/git-sample/thetaMode/data_%T.bp && \
\
${Python_EXECUTABLE} \
${MPI_TEST_EXE} ${Python_EXECUTABLE} \
${openPMD_RUNTIME_OUTPUT_DIRECTORY}/openpmd-pipe \
--infile ../samples/git-sample/thetaMode/data_%T.bp \
--outfile ../samples/git-sample/thetaMode/data%T.json \
Expand Down
37 changes: 35 additions & 2 deletions docs/source/backends/json.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,6 @@ propagate the exception thrown by Niels Lohmann's library.

The (keys) names ``"attributes"``, ``"data"`` and ``"datatype"`` are reserved and must not be used for base/mesh/particles path, records and their components.

A parallel (i.e. MPI) implementation is *not* available.

TOML Restrictions
-----------------
Expand All @@ -106,7 +105,41 @@ TOML does not support null values.

The (keys) names ``"attributes"``, ``"data"`` and ``"datatype"`` are reserved and must not be used for base/mesh/particles path, records and their components.

A parallel (i.e. MPI) implementation is *not* available.

Using in parallel (MPI)
-----------------------

Parallel I/O is not a first-class citizen in the JSON and TOML backends, and neither backend will "go out of its way" to support parallel workflows.

However there is a rudimentary form of read and write support in parallel:

Parallel reading
................

In order not to overload the parallel filesystem with parallel reads, read access to JSON datasets is done by rank 0 and then broadcast to all other ranks.
Note that there is no granularity whatsoever in reading a JSON file.
A JSON file is always read into memory and broadcast to all other ranks in its entirety.

Parallel writing
................

When executed in an MPI context, the JSON/TOML backends will not directly output a single text file, but instead a folder containing one file per MPI rank.
Neither backend will perform any data aggregation at all.

.. note::

The parallel write support of the JSON/TOML backends is intended mainly for debugging and prototyping workflows.

The folder will use the specified Series name, but append the postfix ``.parallel``.
(This is a deliberate indication that this folder cannot directly be opened again by the openPMD-api as a JSON/TOML dataset.)
This folder contains for each MPI rank *i* a file ``mpi_rank_<i>.json`` (resp. ``mpi_rank_<i>.toml``), containing the serial output of that rank.
A ``README.txt`` with basic usage instructions is also written.

.. note::

There is no direct support in the openPMD-api to read a JSON/TOML dataset written in this parallel fashion. The single files (e.g. ``data.json.parallel/mpi_rank_0.json``) are each valid openPMD files and can be read separately, however.

Note that the auxiliary function ``json::merge()`` (or in Python ``openpmd_api.merge_json()``) is not adequate for merging the single JSON/TOML files back into one, since it does not merge anything below the array level.


Example
Expand Down
15 changes: 14 additions & 1 deletion include/openPMD/IO/JSON/JSONIOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,30 @@
#include "openPMD/IO/AbstractIOHandler.hpp"
#include "openPMD/IO/JSON/JSONIOHandlerImpl.hpp"

#if openPMD_HAVE_MPI
#include <mpi.h>
#endif

namespace openPMD
{
class JSONIOHandler : public AbstractIOHandler
{
public:
JSONIOHandler(
std::string const &path,
std::string path,
Access at,
openPMD::json::TracingJSON config,
JSONIOHandlerImpl::FileFormat,
std::string originalExtension);
#if openPMD_HAVE_MPI
JSONIOHandler(
std::string path,
Access at,
MPI_Comm,
openPMD::json::TracingJSON config,
JSONIOHandlerImpl::FileFormat,
std::string originalExtension);
#endif

~JSONIOHandler() override;

Expand Down
20 changes: 19 additions & 1 deletion include/openPMD/IO/JSON/JSONIOHandlerImpl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@

#include <istream>
#include <nlohmann/json.hpp>
#if openPMD_HAVE_MPI
#include <mpi.h>
#endif

#include <complex>
#include <fstream>
Expand Down Expand Up @@ -70,6 +73,7 @@ struct File

std::string name;
bool valid = true;
bool printedReadmeWarningAlready = false;
};

std::shared_ptr<FileState> fileState;
Expand Down Expand Up @@ -167,6 +171,15 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl
FileFormat,
std::string originalExtension);

#if openPMD_HAVE_MPI
JSONIOHandlerImpl(
AbstractIOHandler *,
MPI_Comm,
openPMD::json::TracingJSON config,
FileFormat,
std::string originalExtension);
#endif

~JSONIOHandlerImpl() override;

void
Expand Down Expand Up @@ -230,6 +243,10 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl
std::future<void> flush();

private:
#if openPMD_HAVE_MPI
std::optional<MPI_Comm> m_communicator;
#endif

using FILEHANDLE = std::fstream;

// map each Writable to its associated file
Expand Down Expand Up @@ -323,7 +340,8 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl

// write to disk the json contents associated with the file
// remove from m_dirty if unsetDirty == true
void putJsonContents(File const &, bool unsetDirty = true);
auto putJsonContents(File const &, bool unsetDirty = true)
-> decltype(m_jsonVals)::iterator;

// figure out the file position of the writable
// (preferring the parent's file position) and extend it
Expand Down
19 changes: 17 additions & 2 deletions src/IO/AbstractIOHandlerHelper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -125,8 +125,23 @@ std::unique_ptr<AbstractIOHandler> createIOHandler<json::TracingJSON>(
"ssc",
std::move(originalExtension));
case Format::JSON:
throw error::WrongAPIUsage(
"JSON backend not available in parallel openPMD.");
return constructIOHandler<JSONIOHandler, openPMD_HAVE_JSON>(
"JSON",
path,
access,
comm,
std::move(options),
JSONIOHandlerImpl::FileFormat::Json,
std::move(originalExtension));
case Format::TOML:
return constructIOHandler<JSONIOHandler, openPMD_HAVE_JSON>(
"JSON",
path,
access,
comm,
std::move(options),
JSONIOHandlerImpl::FileFormat::Toml,
std::move(originalExtension));
default:
throw error::WrongAPIUsage(
"Unknown file format! Did you specify a file ending? Specified "
Expand Down
18 changes: 16 additions & 2 deletions src/IO/JSON/JSONIOHandler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,29 @@ namespace openPMD
JSONIOHandler::~JSONIOHandler() = default;

JSONIOHandler::JSONIOHandler(
std::string const &path,
std::string path,
Access at,
openPMD::json::TracingJSON jsonCfg,
JSONIOHandlerImpl::FileFormat format,
std::string originalExtension)
: AbstractIOHandler{path, at}
: AbstractIOHandler{std::move(path), at}
, m_impl{this, std::move(jsonCfg), format, std::move(originalExtension)}
{}

#if openPMD_HAVE_MPI
JSONIOHandler::JSONIOHandler(
std::string path,
Access at,
MPI_Comm comm,
openPMD::json::TracingJSON jsonCfg,
JSONIOHandlerImpl::FileFormat format,
std::string originalExtension)
: AbstractIOHandler{std::move(path), at}
, m_impl{JSONIOHandlerImpl{
this, comm, std::move(jsonCfg), format, std::move(originalExtension)}}
{}
#endif

std::future<void> JSONIOHandler::flush(internal::ParsedFlushParams &)
{
return m_impl.flush();
Expand Down
Loading

0 comments on commit d64dbc2

Please sign in to comment.