Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset-specific JSON/TOML configuration #1646

Open
wants to merge 11 commits into
base: dev
Choose a base branch
from
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -402,6 +402,7 @@ set(CORE_SOURCE
src/auxiliary/Date.cpp
src/auxiliary/Filesystem.cpp
src/auxiliary/JSON.cpp
src/auxiliary/JSONMatcher.cpp
src/auxiliary/Mpi.cpp
src/backend/Attributable.cpp
src/backend/BaseRecordComponent.cpp
Expand Down
111 changes: 98 additions & 13 deletions examples/13_write_dynamic_configuration.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,15 @@ using namespace openPMD;

int main()
{
if (!getVariants()["adios2"])
if (!getVariants()["hdf5"])
{
// Example configuration below selects the ADIOS2 backend
return 0;
}

using position_t = double;

#if !__NVCOMPILER // see https://github.com/ToruNiina/toml11/issues/205
/*
* This example demonstrates how to use JSON/TOML-based dynamic
* configuration for openPMD.
Expand All @@ -34,14 +36,11 @@ int main()
# be passed by adding an at-sign `@` in front of the path
# The format will then be recognized by filename extension, i.e. .json or .toml

backend = "adios2"
backend = "hdf5"
iteration_encoding = "group_based"
# The following is only relevant in read mode
defer_iteration_parsing = true

[adios1.dataset]
transform = "blosc:compressor=zlib,shuffle=bit,lvl=5;nometa"

[adios2.engine]
type = "bp4"

Expand All @@ -60,13 +59,104 @@ parameters.clevel = 5
# type = "some other parameter"
# # ...

[hdf5.dataset]
chunks = "auto"
# Sometimes, dataset configurations should not affect all datasets, but only
# specific ones, e.g. only particle data.
# Dataset configurations can be given as a list, here at the example of HDF5.
# In such lists, each entry is an object with two keys:
#
# 1. 'cfg': Mandatory key, this is the actual dataset configuration.
# 2. 'select': A Regex or a list of Regexes to match against the dataset name.
#
# This makes it possible to give dataset-specific configurations.
# The dataset name is the same as returned
# by `Attributable::myPath().openPMDPath()`.
# The regex must match against either the full path (e.g. "/data/1/meshes/E/x")
# or against the path within the iteration (e.g. "meshes/E/x").

# Example:
# Let HDF5 datasets be automatically chunked by default
[[hdf5.dataset]]
cfg.chunks = "auto"

# For particles, we can specify the chunking explicitly
[[hdf5.dataset]]
# Multiple selection regexes can be given as a list.
# They will be fused into a single regex '($^)|(regex1)|(regex2)|(regex3)|...'.
select = ["/data/1/particles/e/.*", "/data/2/particles/e/.*"]
cfg.chunks = [5]

# Selecting a match works top-down, the order of list entries is important.
[[hdf5.dataset]]
# Specifying only a single regex.
# The regex can match against the full dataset path
# or against the path within the Iteration.
# Capitalization is irrelevant.
select = "particles/e/.*"
CFG.CHUNKS = [10]
)END";
#else
/*
* This is the same configuration in JSON. We need this in deprecated
* NVHPC-compilers due to problems that those compilers have with the
* toruniina::toml11 library.
*/
std::string const defaults = R"(
{
"backend": "hdf5",
"defer_iteration_parsing": true,
"iteration_encoding": "group_based",

"adios2": {
"engine": {
"type": "bp4"
},
"dataset": {
"operators": [
{
"parameters": {
"clevel": 5
},
"type": "zlib"
}
]
}
},

"hdf5": {
"dataset": [
{
"cfg": {
"chunks": "auto"
}
},
{
"select": [
"/data/1/particles/e/.*",
"/data/2/particles/e/.*"
],
"cfg": {
"chunks": [
5
]
}
},
{
"select": "particles/e/.*",
"CFG": {
"CHUNKS": [
10
]
}
}
]
}
}
)";
#endif

// open file for writing
Series series =
Series("../samples/dynamicConfig.bp", Access::CREATE, defaults);
Series("../samples/dynamicConfig.h5", Access::CREATE, defaults);

Datatype datatype = determineDatatype<position_t>();
constexpr unsigned long length = 10ul;
Expand Down Expand Up @@ -103,11 +193,6 @@ chunks = "auto"
std::string const differentCompressionSettings = R"END(
{
"resizable": true,
"adios1": {
"dataset": {
"transform": "blosc:compressor=zlib,shuffle=bit,lvl=1;nometa"
}
},
"adios2": {
"dataset": {
"operators": [
Expand Down
2 changes: 0 additions & 2 deletions include/openPMD/IO/ADIOS/ADIOS2IOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -120,15 +120,13 @@ class ADIOS2IOHandlerImpl
ADIOS2IOHandlerImpl(
AbstractIOHandler *,
MPI_Comm,
json::TracingJSON config,
std::string engineType,
std::string specifiedExtension);

#endif // openPMD_HAVE_MPI

explicit ADIOS2IOHandlerImpl(
AbstractIOHandler *,
json::TracingJSON config,
std::string engineType,
std::string specifiedExtension);

Expand Down
37 changes: 26 additions & 11 deletions include/openPMD/IO/AbstractIOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,11 @@

namespace openPMD
{
namespace json
{
class JsonMatcher;
}

/**
* @brief Determine what items should be flushed upon Series::flush()
*
Expand Down Expand Up @@ -186,6 +191,8 @@ class AbstractIOHandler
{
friend class Series;
friend class ADIOS2IOHandlerImpl;
friend class JSONIOHandlerImpl;
friend class HDF5IOHandlerImpl;
friend class detail::ADIOS2File;

private:
Expand Down Expand Up @@ -222,22 +229,30 @@ class AbstractIOHandler
m_encoding = encoding;
}

protected:
// Needs to be a pointer due to include structure, this header is
// transitively included in user code, but we don't reexport the JSON
// library
std::unique_ptr<json::JsonMatcher> jsonMatcher;

public:
#if openPMD_HAVE_MPI
AbstractIOHandler(std::string path, Access at, MPI_Comm)
: directory{std::move(path)}, m_backendAccess{at}, m_frontendAccess{at}
{}
template <typename TracingJSON>
AbstractIOHandler(
std::string path, Access at, TracingJSON &&jsonConfig, MPI_Comm);
#endif
AbstractIOHandler(std::string path, Access at)
: directory{std::move(path)}, m_backendAccess{at}, m_frontendAccess{at}
{}
virtual ~AbstractIOHandler() = default;

AbstractIOHandler(AbstractIOHandler const &) = default;
AbstractIOHandler(AbstractIOHandler &&) = default;
template <typename TracingJSON>
AbstractIOHandler(std::string path, Access at, TracingJSON &&jsonConfig);
virtual ~AbstractIOHandler();

AbstractIOHandler(AbstractIOHandler const &) = delete;
// std::queue::queue(queue&&) is not noexcept
// NOLINTNEXTLINE(performance-noexcept-move-constructor)
AbstractIOHandler(AbstractIOHandler &&) noexcept(false);

AbstractIOHandler &operator=(AbstractIOHandler const &) = default;
AbstractIOHandler &operator=(AbstractIOHandler &&) = default;
AbstractIOHandler &operator=(AbstractIOHandler const &) = delete;
AbstractIOHandler &operator=(AbstractIOHandler &&) noexcept;

/** Add provided task to queue according to FIFO.
*
Expand Down
5 changes: 1 addition & 4 deletions include/openPMD/IO/HDF5/HDF5IOHandlerImpl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,7 @@ class HDF5IOHandlerImpl : public AbstractIOHandlerImpl
friend class ParallelHDF5IOHandler;

public:
HDF5IOHandlerImpl(
AbstractIOHandler *,
json::TracingJSON config,
bool do_warn_unused_params = true);
HDF5IOHandlerImpl(AbstractIOHandler *, bool do_warn_unused_params = true);
~HDF5IOHandlerImpl() override;

void
Expand Down
3 changes: 1 addition & 2 deletions include/openPMD/IO/HDF5/ParallelHDF5IOHandlerImpl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,7 @@ namespace openPMD
class ParallelHDF5IOHandlerImpl : public HDF5IOHandlerImpl
{
public:
ParallelHDF5IOHandlerImpl(
AbstractIOHandler *, MPI_Comm, json::TracingJSON config);
ParallelHDF5IOHandlerImpl(AbstractIOHandler *, MPI_Comm);
~ParallelHDF5IOHandlerImpl() override;

MPI_Comm m_mpiComm;
Expand Down
8 changes: 8 additions & 0 deletions include/openPMD/IO/IOTask.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ namespace openPMD
{
class Attributable;
class Writable;
namespace json
{
class JsonMatcher;
}

Writable *getWritable(Attributable *);

Expand Down Expand Up @@ -356,6 +360,10 @@ struct OPENPMDAPI_EXPORT Parameter<Operation::CREATE_DATASET>
TracingJSON &,
std::string const &currentBackendName,
std::string const &warningMessage);

template <typename TracingJSON>
TracingJSON
compileJSONConfig(Writable const *writable, json::JsonMatcher &) const;
};

template <>
Expand Down
6 changes: 1 addition & 5 deletions include/openPMD/IO/JSON/JSONIOHandlerImpl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -166,16 +166,12 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl
};

explicit JSONIOHandlerImpl(
AbstractIOHandler *,
openPMD::json::TracingJSON config,
FileFormat,
std::string originalExtension);
AbstractIOHandler *, FileFormat, std::string originalExtension);

#if openPMD_HAVE_MPI
JSONIOHandlerImpl(
AbstractIOHandler *,
MPI_Comm,
openPMD::json::TracingJSON config,
FileFormat,
std::string originalExtension);
#endif
Expand Down
61 changes: 58 additions & 3 deletions include/openPMD/auxiliary/JSON.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,12 @@

#pragma once

#include "openPMD/config.hpp"

#if openPMD_HAVE_MPI
#include <mpi.h>
#endif

#include <string>

namespace openPMD
Expand Down Expand Up @@ -53,13 +59,62 @@ namespace json
* users to overwrite default options, while keeping any other ones.
*
* @param defaultValue A string containing either a JSON or a TOML dataset.
* If the string begins with an `@`, the JSON/TOML dataset will be
* read from the filesystem at the specified path.
* @param overwrite A string containing either a JSON or TOML dataset (does
* not need to be the same as `defaultValue`).
* not need to be the same as `defaultValue`).
* If the string begins with an `@`, the JSON/TOML dataset will be
* read from the filesystem at the specified path.
* @return std::string The merged dataset, according to the above rules. If
* `defaultValue` was a JSON dataset, then as a JSON string, otherwise as a
* TOML string.
* `overwrite` was a JSON dataset, then as a JSON string, otherwise
* as a TOML string.
*/
std::string
merge(std::string const &defaultValue, std::string const &overwrite);

#if openPMD_HAVE_MPI
/**
* @brief Merge two JSON/TOML datasets into one.
*
* Merging rules:
* 1. If both `defaultValue` and `overwrite` are JSON/TOML objects, then the
* resulting JSON/TOML object will contain the union of both objects'
* keys. If a key is specified in both objects, the values corresponding
* to the key are merged recursively. Keys that point to a null value
* after this procedure will be pruned.
* 2. In any other case, the JSON/TOML dataset `defaultValue` is replaced in
* its entirety with the JSON/TOML dataset `overwrite`.
*
* Note that item 2 means that datasets of different type will replace each
* other without error.
* It also means that array types will replace each other without any notion
* of appending or merging.
*
* Possible use case:
* An application uses openPMD-api and wants to do the following:
* 1. Set some default backend options as JSON/TOML parameters.
* 2. Let its users specify custom backend options additionally.
*
* By using the json::merge() function, this application can then allow
* users to overwrite default options, while keeping any other ones.
*
* @param defaultValue A string containing either a JSON or a TOML dataset.
* If the string begins with an `@`, the JSON/TOML dataset will be
* read in parallel (using the MPI Communicator)
* from the filesystem at the specified path.
* @param overwrite A string containing either a JSON or TOML dataset (does
* not need to be the same as `defaultValue`).
* If the string begins with an `@`, the JSON/TOML dataset will be
* read in parallel (using the MPI Communicator)
* from the filesystem at the specified path.
* @return std::string The merged dataset, according to the above rules. If
* `overwrite` was a JSON dataset, then as a JSON string, otherwise
* as a TOML string.
*/
std::string merge(
std::string const &defaultValue,
std::string const &overwrite,
MPI_Comm);
#endif
} // namespace json
} // namespace openPMD
Loading
Loading