Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON/TOML backend: introduce abbreviated IO modes #1493

Open
wants to merge 27 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
224a0af
Add openPMD 2.0 standard setting
franzpoeschel Oct 9, 2023
2ef798c
Fix clang-tidy warning
franzpoeschel Nov 28, 2023
c2f6e30
Introduce getStandardMaximum(), deprecate getStandard()
franzpoeschel Dec 4, 2023
fe5c1b7
Add warning: openPMD 2.0 still under development
franzpoeschel May 24, 2024
5525a94
Introduce dataset template mode to JSON backend
franzpoeschel Aug 4, 2023
c65b091
Write used mode to JSON file
franzpoeschel Aug 4, 2023
de877d3
Use Attribute::getOptional for snapshot attribute
franzpoeschel Feb 23, 2023
a62841e
Introduce attribute mode
franzpoeschel Aug 4, 2023
8a8e794
Add example 14_toml_template.cpp
franzpoeschel Aug 4, 2023
f24ae79
Use Datatype::UNDEFINED to indicate no dataset definition in template
franzpoeschel Mar 10, 2023
59c93c9
Extend example
franzpoeschel May 19, 2022
5a7ed2e
Test short attribute mode
franzpoeschel Aug 7, 2023
c25743f
Copy datatypeToString to JSON implementation
franzpoeschel Aug 7, 2023
c8dd534
Fix after rebase: Init JSON config in parallel mode
franzpoeschel Sep 22, 2023
baf1bdc
Fix after rebase: Don't erase JSON datasets when writing
franzpoeschel Sep 22, 2023
0721e6e
openpmd-pipe: use short modes for test
franzpoeschel Oct 12, 2023
fd4a00a
Less intrusive warnings, allow disabling them
franzpoeschel Oct 12, 2023
c35c2e4
TOML: Use short modes by default
franzpoeschel Oct 12, 2023
e2fa15e
Python formatting
franzpoeschel Oct 26, 2023
deba1e9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 24, 2023
bd357fe
Documentation
franzpoeschel Nov 24, 2023
a75060e
Short mode in default in openPMD >= 2.
franzpoeschel Nov 24, 2023
3f95d77
Short value by default in TOML
franzpoeschel Mar 19, 2024
1e948b2
Store the openPMD version information in the IOHandler
franzpoeschel Mar 19, 2024
dcb16d0
Fixes
franzpoeschel Mar 26, 2024
a4a0771
Adapt test to recent rebase
franzpoeschel Jun 7, 2024
20d6502
toml11 4.0 compatibility
franzpoeschel Aug 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -771,6 +771,7 @@ set(openPMD_EXAMPLE_NAMES
10_streaming_read
12_span_write
13_write_dynamic_configuration
14_toml_template
)
set(openPMD_PYTHON_EXAMPLE_NAMES
2_read_serial
Expand Down Expand Up @@ -1383,6 +1384,9 @@ if(openPMD_BUILD_TESTING)
${openPMD_RUNTIME_OUTPUT_DIRECTORY}/openpmd-pipe \
--infile ../samples/git-sample/thetaMode/data_%T.bp \
--outfile ../samples/git-sample/thetaMode/data%T.json \
--outconfig ' \
json.attribute.mode = \"short\" \n\
json.dataset.mode = \"template_no_warn\"' \
"
WORKING_DIRECTORY ${openPMD_RUNTIME_OUTPUT_DIRECTORY}
)
Expand Down
36 changes: 31 additions & 5 deletions docs/source/backends/json.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,20 +38,46 @@ when working with the JSON backend.
Datasets and groups have the same namespace, meaning that there may not be a subgroup
and a dataset with the same name contained in one group.

Any **openPMD dataset** is a JSON object with three keys:
Datasets
........

* ``attributes``: Attributes associated with the dataset. May be ``null`` or not present if no attributes are associated with the dataset.
* ``datatype``: A string describing the type of the stored data.
* ``data`` A nested array storing the actual data in row-major manner.
Datasets can be stored in two modes, either as actual datasets or as dataset templates.
The mode is selected by the :ref:`JSON/TOML parameter<backendconfig>` ``json.dataset.mode`` (resp. ``toml.dataset.mode``) with possible values ``["dataset", "template"]`` (default: ``"dataset"``).

Stored as an actual dataset, an **openPMD dataset** is a JSON object with three JSON keys:

* ``datatype`` (required): A string describing the type of the stored data.
* ``data`` (required): A nested array storing the actual data in row-major manner.
The data needs to be consistent with the fields ``datatype`` and ``extent``.
Checking whether this key points to an array can be (and is internally) used to distinguish groups from datasets.
* ``attributes``: Attributes associated with the dataset. May be ``null`` or not present if no attributes are associated with the dataset.

Stored as a **dataset template**, an openPMD dataset is represented by three JSON keys:

* ``datatype`` (required): As above.
* ``extent`` (required): A list of integers, describing the extent of the dataset.
* ``attributes``: As above.

**Attributes** are stored as a JSON object with a key for each attribute.
This mode stores only the dataset metadata.
Chunk load/store operations are ignored.

Attributes
..........

In order to avoid name clashes, attributes are generally stored within a separate subgroup ``attributes``.

Attributes can be stored in two formats.
The format is selected by the :ref:`JSON/TOML parameter<backendconfig>` ``json.attribute.mode`` (resp. ``toml.attribute.mode``) with possible values ``["long", "short"]`` (default: ``"long"`` for JSON in openPMD 1.*, ``"short"`` otherwise, i.e. generally in openPMD 2.*, but always in TOML).

Attributes in **long format** store the datatype explicitly, by representing attributes as JSON objects.
Every such attribute is itself a JSON object with two keys:

* ``datatype``: A string describing the type of the value.
* ``value``: The actual value of type ``datatype``.

Attributes in **short format** are stored as just the simple value corresponding with the attribute.
Since JSON/TOML values are pretty-printed into a human-readable format, byte-level type details can be lost when reading those values again later on (e.g. the distinction between different integer types).

TOML File Format
----------------

Expand Down
23 changes: 19 additions & 4 deletions docs/source/details/backendconfig.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ The key ``rank_table`` allows specifying the creation of a **rank table**, used
Configuration Structure per Backend
-----------------------------------

Please refer to the respective backends' documentations for further information on their configuration.

.. _backendconfig-adios2:

ADIOS2
Expand Down Expand Up @@ -231,8 +233,21 @@ The parameters eligible for being passed to flush calls may be configured global

.. _backendconfig-other:

Other backends
^^^^^^^^^^^^^^
JSON/TOML
^^^^^^^^^

Do currently not read the configuration string.
Please refer to the respective backends' documentations for further information on their configuration.
A full configuration of the JSON backend:

.. literalinclude:: json.json
:language: json

The TOML backend is configured analogously, replacing the ``"json"`` key with ``"toml"``.

All keys found under ``hdf5.dataset`` are applicable globally as well as per dataset.
Explanation of the single keys:

* ``json.dataset.mode`` / ``toml.dataset.mode``: One of ``"dataset"`` (default) or ``"template"``.
In "dataset" mode, the dataset will be written as an n-dimensional (recursive) array, padded with nulls (JSON) or zeroes (TOML) for missing values.
In "template" mode, only the dataset metadata (type, extent and attributes) are stored and no chunks can be written or read.
* ``json.attribute.mode`` / ``toml.attribute.mode``: One of ``"long"`` (default in openPMD 1.*) or ``"short"`` (default in openPMD 2.*).
The long format explicitly encodes the attribute type in the dataset on disk, the short format only writes the actual attribute as a JSON/TOML value, requiring readers to recover the type.
10 changes: 10 additions & 0 deletions docs/source/details/json.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"json": {
"dataset": {
"mode": "template"
},
"attribute": {
"mode": "short"
}
}
}
111 changes: 111 additions & 0 deletions examples/14_toml_template.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
#include <openPMD/openPMD.hpp>

std::string backendEnding()
{
auto extensions = openPMD::getFileExtensions();
if (auto it = std::find(extensions.begin(), extensions.end(), "toml");
it != extensions.end())
{
return *it;
}
else
{
// Fallback for buggy old NVidia compiler
return "json";
}
}

void write()
{
std::string config = R"(
{
"iteration_encoding": "variable_based",
"json": {
"dataset": {"mode": "template"},
"attribute": {"mode": "short"}
},
"toml": {
"dataset": {"mode": "template"},
"attribute": {"mode": "short"}
}
}
)";

openPMD::Series writeTemplate(
"../samples/tomlTemplate." + backendEnding(),
openPMD::Access::CREATE,
config);
auto iteration = writeTemplate.writeIterations()[0];

openPMD::Dataset ds{openPMD::Datatype::FLOAT, {5, 5}};

auto temperature =
iteration.meshes["temperature"][openPMD::RecordComponent::SCALAR];
temperature.resetDataset(ds);

auto E = iteration.meshes["E"];
E["x"].resetDataset(ds);
E["y"].resetDataset(ds);
/*
* Don't specify datatype and extent for this one to indicate that this
* information is not yet known.
*/
E["z"].resetDataset({openPMD::Datatype::UNDEFINED});

ds.extent = {10};

auto electrons = iteration.particles["e"];
electrons["position"]["x"].resetDataset(ds);
electrons["position"]["y"].resetDataset(ds);
electrons["position"]["z"].resetDataset(ds);

electrons["positionOffset"]["x"].resetDataset(ds);
electrons["positionOffset"]["y"].resetDataset(ds);
electrons["positionOffset"]["z"].resetDataset(ds);
electrons["positionOffset"]["x"].makeConstant(3.14);
electrons["positionOffset"]["y"].makeConstant(3.14);
electrons["positionOffset"]["z"].makeConstant(3.14);

ds.dtype = openPMD::determineDatatype<uint64_t>();
electrons.particlePatches["numParticles"][openPMD::RecordComponent::SCALAR]
.resetDataset(ds);
electrons
.particlePatches["numParticlesOffset"][openPMD::RecordComponent::SCALAR]
.resetDataset(ds);
electrons.particlePatches["offset"]["x"].resetDataset(ds);
electrons.particlePatches["offset"]["y"].resetDataset(ds);
electrons.particlePatches["offset"]["z"].resetDataset(ds);
electrons.particlePatches["extent"]["x"].resetDataset(ds);
electrons.particlePatches["extent"]["y"].resetDataset(ds);
electrons.particlePatches["extent"]["z"].resetDataset(ds);
}

void read()
{
/*
* The config is entirely optional, these things are also detected
* automatically when reading
*/

// std::string config = R"(
// {
// "iteration_encoding": "variable_based",
// "toml": {
// "dataset": {"mode": "template"},
// "attribute": {"mode": "short"}
// }
// }
// )";
Comment on lines +90 to +98

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.

openPMD::Series read(
"../samples/tomlTemplate." + backendEnding(),
openPMD::Access::READ_LINEAR);
read.readIterations(); // @todo change to read.parseBase()
openPMD::helper::listSeries(read);
}

int main()
{
write();
read();
}
2 changes: 1 addition & 1 deletion include/openPMD/Dataset.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ class Dataset
JOINED_DIMENSION = std::numeric_limits<std::uint64_t>::max()
};

Dataset(Datatype, Extent, std::string options = "{}");
Dataset(Datatype, Extent = {1}, std::string options = "{}");

/**
* @brief Constructor that sets the datatype to undefined.
Expand Down
6 changes: 6 additions & 0 deletions include/openPMD/Error.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,12 @@ namespace error
public:
NoSuchAttribute(std::string attributeName);
};

class IllegalInOpenPMDStandard : public Error
{
public:
IllegalInOpenPMDStandard(std::string what);
};
} // namespace error

/**
Expand Down
2 changes: 2 additions & 0 deletions include/openPMD/IO/AbstractIOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -186,9 +186,11 @@ class AbstractIOHandler
{
friend class Series;
friend class ADIOS2IOHandlerImpl;
friend class JSONIOHandlerImpl;
friend class detail::ADIOS2File;

private:
std::string m_openPMDVersion;
IterationEncoding m_encoding = IterationEncoding::groupBased;

void setIterationEncoding(IterationEncoding encoding)
Expand Down
1 change: 1 addition & 0 deletions include/openPMD/IO/JSON/JSONIOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

#include "openPMD/IO/AbstractIOHandler.hpp"
#include "openPMD/IO/JSON/JSONIOHandlerImpl.hpp"
#include "openPMD/auxiliary/JSON_internal.hpp"

#if openPMD_HAVE_MPI
#include <mpi.h>
Expand Down
67 changes: 65 additions & 2 deletions include/openPMD/IO/JSON/JSONIOHandlerImpl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,8 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl
std::string originalExtension);
#endif

void init(openPMD::json::TracingJSON config);

~JSONIOHandlerImpl() override;

void
Expand Down Expand Up @@ -265,8 +267,69 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl
*/
FileFormat m_fileFormat{};

std::string backendConfigKey() const;

/*
* First return value: The location of the JSON value (either "json" or
* "toml") Second return value: The value that was maybe found at this place
*/
std::pair<std::string, std::optional<openPMD::json::TracingJSON>>
getBackendConfig(openPMD::json::TracingJSON &) const;

std::string m_originalExtension;

enum class SpecificationVia
{
DefaultValue,
Manually
};

/////////////////////
// Dataset IO mode //
/////////////////////

enum class IOMode
{
Dataset,
Template
};

IOMode m_mode = IOMode::Dataset;
SpecificationVia m_IOModeSpecificationVia = SpecificationVia::DefaultValue;
bool m_printedSkippedWriteWarningAlready = false;

struct DatasetMode
{
IOMode m_IOMode;
SpecificationVia m_specificationVia;
bool m_skipWarnings;

template <typename A, typename B, typename C>
operator std::tuple<A, B, C>()
{
return std::tuple<A, B, C>{
m_IOMode, m_specificationVia, m_skipWarnings};
}
};
DatasetMode retrieveDatasetMode(openPMD::json::TracingJSON &config) const;

///////////////////////
// Attribute IO mode //
///////////////////////

enum class AttributeMode
{
Short,
Long
};

AttributeMode m_attributeMode = AttributeMode::Long;
SpecificationVia m_attributeModeSpecificationVia =
SpecificationVia::DefaultValue;

std::pair<AttributeMode, SpecificationVia>
retrieveAttributeMode(openPMD::json::TracingJSON &config) const;

// HELPER FUNCTIONS

// will use the IOHandler to retrieve the correct directory.
Expand Down Expand Up @@ -313,7 +376,7 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl
// essentially: m_i = \prod_{j=0}^{i-1} extent_j
static Extent getMultiplicators(Extent const &extent);

static Extent getExtent(nlohmann::json &j);
static std::pair<Extent, IOMode> getExtent(nlohmann::json &j);

// remove single '/' in the beginning and end of a string
static std::string removeSlashes(std::string);
Expand Down Expand Up @@ -371,7 +434,7 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl

// check whether the json reference contains a valid dataset
template <typename Param>
void verifyDataset(Param const &parameters, nlohmann::json &);
IOMode verifyDataset(Param const &parameters, nlohmann::json &);

static nlohmann::json platformSpecifics();

Expand Down
2 changes: 1 addition & 1 deletion include/openPMD/RecordComponent.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ class RecordComponent : public BaseRecordComponent
*
* @return RecordComponent&
*/
virtual RecordComponent &resetDataset(Dataset);
RecordComponent &resetDataset(Dataset);

uint8_t getDimensionality() const;
Extent getExtent() const;
Expand Down
Loading
Loading