Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JSON schema #1426

Open
wants to merge 11 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion .github/workflows/linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ jobs:
- name: Install
run: |
sudo apt-get update
sudo apt-get install g++ libopenmpi-dev libhdf5-openmpi-dev python3 python3-numpy python3-mpi4py python3-pandas
sudo apt-get install g++ libopenmpi-dev libhdf5-openmpi-dev python3 python3-numpy python3-mpi4py python3-pandas python3-pip
# TODO ADIOS2
- name: Build
env: {CXXFLAGS: -Werror, PKG_CONFIG_PATH: /usr/lib/x86_64-linux-gnu/pkgconfig}
Expand All @@ -272,6 +272,22 @@ jobs:
cd build
ctest --output-on-failure

python3 -m pip install jsonschema
franzpoeschel marked this conversation as resolved.
Show resolved Hide resolved
cd ../share/openPMD/json_schema
PATH="../../../build/bin:$PATH" make -j 2
# We need to exclude the thetaMode example since that has a different
# meshesPath and the JSON schema needs to hardcode that.
Comment on lines +278 to +279
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to patch this in check.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very easily. The JSON schema is on the file system and the single .json files refer to each other by their file names. Changing this would require (1) traversing the entire JSON schema and overriding the meshes path, the particles path and the references and (2) somehow setting up python-jsonschema to cross-reference in-memory schemas which I don't even know if it supports that, both at runtime of check.py.

find ../../../build/samples/ \
! -path '*thetaMode*' \
! -path '/*many_iterations/*' \
! -name 'profiling.json' \
! -name '*config.json' \
-iname '*.json' \
| while read i; do
echo "Checking $i"
./check.py "$i"
done

musllinux_py10:
runs-on: ubuntu-20.04
if: github.event.pull_request.draft == false
Expand Down
6 changes: 5 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -747,11 +747,12 @@ set(openPMD_TEST_NAMES
# command line tools
set(openPMD_CLI_TOOL_NAMES
ls
convert-json-toml
)
set(openPMD_PYTHON_CLI_TOOL_NAMES
pipe
)
set(openPMD_PYTHON_CLI_MODULE_NAMES ${openPMD_CLI_TOOL_NAMES})
set(openPMD_PYTHON_CLI_MODULE_NAMES ls)
# examples
set(openPMD_EXAMPLE_NAMES
1_structure
Expand Down Expand Up @@ -920,6 +921,9 @@ if(openPMD_BUILD_CLI_TOOLS)
endif()

target_link_libraries(openpmd-${toolname} PRIVATE openPMD)
target_include_directories(openpmd-${toolname} SYSTEM PRIVATE
$<TARGET_PROPERTY:openPMD::thirdparty::nlohmann_json,INTERFACE_INCLUDE_DIRECTORIES>
$<TARGET_PROPERTY:openPMD::thirdparty::toml11,INTERFACE_INCLUDE_DIRECTORIES>)
endforeach()
endif()

Expand Down
15 changes: 12 additions & 3 deletions include/openPMD/auxiliary/JSON_internal.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -190,16 +190,25 @@ namespace json
* @param options as a parsed JSON object.
* @param considerFiles If yes, check if `options` refers to a file and read
* from there.
* @param convertLowercase If yes, lowercase conversion is applied
* recursively to keys and values, except for some hardcoded places
* that should be left untouched.
*/
ParsedConfig parseOptions(std::string const &options, bool considerFiles);
ParsedConfig parseOptions(
std::string const &options,
bool considerFiles,
bool convertLowercase = true);

#if openPMD_HAVE_MPI

/**
* Parallel version of parseOptions(). MPI-collective.
*/
ParsedConfig
parseOptions(std::string const &options, MPI_Comm comm, bool considerFiles);
ParsedConfig parseOptions(
std::string const &options,
MPI_Comm comm,
bool considerFiles,
bool convertLowercase = true);

#endif

Expand Down
13 changes: 13 additions & 0 deletions share/openPMD/json_schema/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
convert := openpmd-convert-json-toml

json_files = attribute_defs.json attributes.json dataset_defs.json iteration.json mesh.json mesh_record_component.json particle_patches.json particle_species.json patch_record.json record.json record_component.json series.json

.PHONY: all
all: $(json_files)

$(json_files): %.json: %.toml
$(convert) @$^ > $@

.PHONY: clean
clean:
-rm $(json_files)
47 changes: 47 additions & 0 deletions share/openPMD/json_schema/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# JSON Validation

This folder contains a JSON schema for validation of openPMD files written as `.json` files.

## Usage

### Generating the JSON schema

For improved readability, maintainability and documentation purposes, the JSON schema is written in `.toml` format and needs to be "compiled" to `.json` files first before usage.
To do this, the openPMD-api installs a tool named `openpmd-convert-json-toml` which can be used to convert between JSON and TOML files in both directions, e.g.:

```bash
openpmd_convert-json-toml @series.toml > series.json
```

A `Makefile` is provided in this folder to simplify the application of this conversion tool.

### Verifying a file against the JSON schema

In theory, the JSON schema should be applicable by any JSON validator. This JSON schema is written in terms of multiple files however, and most validators require special care to properly set up the links between the single files. A Python script `check.py` is provided in this folder which sets up the [Python jsonschema](https://python-jsonschema.readthedocs.io) library and verifies a file against it, e.g.:

```bash
./check.py path/to/my/dataset.json
```

For further usage notes check the documentation of the script itself `./check.py --help`.

## Caveats

The openPMD standard is not entirely expressible in terms of a JSON schema:

* Many semantic dependencies, e.g. that the `position/x` and `position/y` vector of a particle species be of the same size, or that the `axisLabels` have the same dimensionality as the dataset itself, will go unchecked.
* The `meshesPath` is assumed to be `meshes/` and the `particlesPath` is assumed to be `particles/`. This dependency cannot be expressed.

While a large part of the openPMD standard can indeed be verified by checking against a JSON schema, the standard is generally large enough to make this approach come to its limits. Verification of a JSON schema is similar to the use of a naive recursive-descent parser. Error messages will often be unexpectedly verbose and not very informative.
A challenge for the JSON validator are disjunctive statements such as "A Record is either a scalar Record Component or a vector of non-scalar Record Components". If there is even a tiny mistake somewhere down in the hierarchy, the entire disjunctive branch will fail evaluating.

The layout of attributes is assumed to be that which is created by the JSON backend of the openPMD-api, e.g.:

```json
"meshesPath": {
"datatype": "STRING",
"value": "meshes/"
}
```

Support for an abbreviated notation such as `"meshesPath": "meshes/"` is currently not (yet) available.
236 changes: 236 additions & 0 deletions share/openPMD/json_schema/attribute_defs.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@

["$defs"]

######################
# Vectors of strings #
######################

[["$defs".vec_string_attribute.oneOf]]
title = "Shorthand notation"
anyOf = [
{ type = "string" },
{ type = "array", items = { "type" = "string" } },
]

[["$defs".vec_string_attribute.oneOf]]
title = "Long notation"
type = "object"
required = ["value", "datatype"]

["$defs".vec_string_attribute.oneOf.properties]

value.anyOf = [
{ type = "string" },
{ type = "array", items = { "type" = "string" } },
]

datatype.enum = [
"STRING",
"CHAR",
"SCHAR",
"UCHAR",
"VEC_STRING",
"VEC_CHAR",
"VEC_SCHAR",
"VEC_UCHAR",
]

##################
# Vectors of int #
##################

[["$defs".vec_int_attribute.oneOf]]
title = "Shorthand notation"
anyOf = [
{ type = "integer" },
{ type = "array", items = { "type" = "integer" } },
]

[["$defs".vec_int_attribute.oneOf]]
title = "Long notation"
type = "object"
required = ["value", "datatype"]

["$defs".vec_int_attribute.oneOf.properties]

value.anyOf = [
{ type = "integer" },
{ type = "array", items = { "type" = "integer" } },
]

datatype.enum = [
"SHORT",
"INT",
"LONG",
"LONGLONG",
"USHORT",
"UINT",
"ULONG",
"ULONGLONG",
"VEC_SHORT",
"VEC_INT",
"VEC_LONG",
"VEC_LONGLONG",
"VEC_USHORT",
"VEC_UINT",
"VEC_ULONG",
"VEC_ULONGLONG",
]

####################
# Vectors of float #
####################

[["$defs".vec_float_attribute.oneOf]]
title = "Shorthand notation"
anyOf = [
{ type = "number" },
{ type = "array", items = { "type" = "number" } },
]

[["$defs".vec_float_attribute.oneOf]]
title = "Long notation"
type = "object"
required = ["value", "datatype"]

["$defs".vec_float_attribute.oneOf.properties]

value.anyOf = [
{ type = "number" },
{ type = "array", items = { "type" = "number" } },
]

datatype.enum = [
"CHAR",
"UCHAR",
"SCHAR",
"SHORT",
"INT",
"LONG",
"LONGLONG",
"USHORT",
"UINT",
"ULONG",
"ULONGLONG",
"FLOAT",
"DOUBLE",
"LONG_DOUBLE",
"CFLOAT",
"CDOUBLE",
"CLONG_DOUBLE",
"VEC_SHORT",
"VEC_INT",
"VEC_LONG",
"VEC_LONGLONG",
"VEC_USHORT",
"VEC_UINT",
"VEC_ULONG",
"VEC_ULONGLONG",
"VEC_FLOAT",
"VEC_DOUBLE",
"VEC_LONG_DOUBLE",
"VEC_CFLOAT",
"VEC_CDOUBLE",
"VEC_CLONG_DOUBLE",
]

###########################
# Special case: #
# unitDimension attribute #
###########################

[["$defs".unitDimension.oneOf]]
title = "Shorthand notation"
type = "array"
items.type = "number"

[["$defs".unitDimension.oneOf]]
title = "Long notation"
type = "object"
required = ["value", "datatype"]

["$defs".unitDimension.oneOf.properties]

value = { type = "array", items = { type = "number" } }
datatype.const = "ARR_DBL_7"

#####################
# string attributes #
#####################

[["$defs".string_attribute.oneOf]]
title = "Shorthand notation"
type = "string"

[["$defs".string_attribute.oneOf]]
title = "Long notation"
type = "object"
required = ["value", "datatype"]

["$defs".string_attribute.oneOf.properties]

value.type = "string"
datatype.enum = ["STRING", "CHAR", "SCHAR", "UCHAR"]

##################
# int attributes #
##################

[["$defs".int_attribute.oneOf]]
title = "Shorthand notation"
type = "integer"

[["$defs".int_attribute.oneOf]]
title = "Long notation"
type = "object"
required = ["value", "datatype"]

["$defs".int_attribute.oneOf.properties]

value.type = "integer"
datatype.enum = [
"SHORT",
"INT",
"LONG",
"LONGLONG",
"USHORT",
"UINT",
"ULONG",
"ULONGLONG",
]

####################
# float attributes #
####################

[["$defs".float_attribute.oneOf]]
title = "Shorthand notation"
type = "number"

[["$defs".float_attribute.oneOf]]
title = "Long notation"
type = "object"
required = ["value", "datatype"]

["$defs".float_attribute.oneOf.properties]

value.type = "number"
datatype.enum = [
"CHAR",
"UCHAR",
"SCHAR",
"SHORT",
"INT",
"LONG",
"LONGLONG",
"USHORT",
"UINT",
"ULONG",
"ULONGLONG",
"FLOAT",
"DOUBLE",
"LONG_DOUBLE",
"CFLOAT",
"CDOUBLE",
"CLONG_DOUBLE",
]
Loading
Loading