Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use JSON/TOML template for defining openPMD metadata in a config file #1277

Open
wants to merge 35 commits into
base: dev
Choose a base branch
from

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented May 17, 2022

Not relevant for next release

  • Until now: Simulations specify their metadata in-code via API calls
  • With this PR: In some workflows (e.g. experiments) there is no omniscient simulation, but metadata is instead input by the experimentors via configuration files, using the API is not a good workflow for that

Idea: We already have a JSON backend, use an openPMD-conforming JSON dataset to define only metadata. With this, the configuration file will be just another openPMD dataset.
Then, add some functionality to initialize an empty Series from such a metadata file.

TODO:

https://github.com/franzpoeschel/openPMD-api/compare/topic-json-short-modes..topic-json-template

@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented May 18, 2022

An openPMD dataset in TOML:

[platform_byte_widths]
USHORT = 2
ULONG = 8
BOOL = 1
CLONG_DOUBLE = 32
LONGLONG = 8
CFLOAT = 8
CHAR = 1
DOUBLE = 8
CDOUBLE = 16
SHORT = 2
UCHAR = 1
FLOAT = 4
INT = 4
ULONGLONG = 8
UINT = 4
LONG = 8
LONG_DOUBLE = 16

[data]

[data.0]

[data.0.meshes]

[data.0.meshes.E]

[data.0.meshes.E.x]
datatype = "FLOAT"
data = [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

[data.0.meshes.E.x.attributes]

[data.0.meshes.E.x.attributes.unitSI]
value = 1.0
datatype = "DOUBLE"

[data.0.meshes.E.x.attributes.position]
value = [0.0]
datatype = "VEC_DOUBLE"

[data.0.meshes.E.attributes]

[data.0.meshes.E.attributes.timeOffset]
value = 0.0
datatype = "FLOAT"

[data.0.meshes.E.attributes.gridUnitSI]
value = 1.0
datatype = "DOUBLE"

[data.0.meshes.E.attributes.gridSpacing]
value = [1.0]
datatype = "VEC_DOUBLE"

[data.0.meshes.E.attributes.gridGlobalOffset]
value = [0.0]
datatype = "VEC_DOUBLE"

[data.0.meshes.E.attributes.unitDimension]
value = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
datatype = "ARR_DBL_7"

[data.0.meshes.E.attributes.geometry]
value = "cartesian"
datatype = "STRING"

[data.0.meshes.E.attributes.dataOrder]
value = "C"
datatype = "STRING"

[data.0.meshes.E.attributes.axisLabels]
value = ["x"]
datatype = "VEC_STRING"

[data.0.attributes]

[data.0.attributes.timeUnitSI]
value = 1.0
datatype = "DOUBLE"

[data.0.attributes.time]
value = 0.0
datatype = "DOUBLE"

[data.0.attributes.dt]
value = 1.0
datatype = "DOUBLE"

[attributes]

[attributes.softwareVersion]
value = "0.15.0-dev"
datatype = "STRING"

[attributes.software]
value = "openPMD-api"
datatype = "STRING"

[attributes.openPMDextension]
value = 0
datatype = "UINT"

[attributes.meshesPath]
value = "meshes/"
datatype = "STRING"

[attributes.iterationFormat]
value = "many_iterations_%T"
datatype = "STRING"

[attributes.iterationEncoding]
value = "fileBased"
datatype = "STRING"

[attributes.openPMD]
value = "1.1.0"
datatype = "STRING"

[attributes.date]
value = "2022-05-18 12:20:23 +0000"
datatype = "STRING"

[attributes.basePath]
value = "/data/%T/"
datatype = "STRING"

@franzpoeschel franzpoeschel force-pushed the topic-json-template branch 3 times, most recently from 0d475a5 to 1a23a03 Compare May 19, 2022 11:54
@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented May 19, 2022

This is now a simplified TOML openPMD template, created by {"json":{"mode": "template"}}:

[data]

[data.meshes]

[data.meshes.temperature]
extent = [5, 5]
datatype = "FLOAT"

[data.meshes.temperature.attributes]
timeOffset = 0.0
# Explicit datatype can still be used if needed
unitSI = {"value" = 1.0, "datatype" = "FLOAT"}
position = [0.0]
gridUnitSI = 1.0
gridSpacing = [1.0]
gridGlobalOffset = [0.0]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
geometry = "cartesian"
dataOrder = "C"
axisLabels = ["x"]

[data.attributes]
timeUnitSI = 1.0
snapshot = 0
time = 0.0
dt = 1.0

[attributes]
softwareVersion = "0.15.0-dev"
software = "openPMD-api"
openPMDextension = 0
meshesPath = "meshes/"
iterationFormat = "/data"
iterationEncoding = "variableBased"
openPMD = "1.1.0"
date = "2022-05-19 11:55:07 +0000"
basePath = "/data"

Differences to regular JSON/TOML openPMD datasets:

  1. Platform byte width table is missing
  2. Attributes don't explicitly store their datatypes, datatypes are dynamically (and a bit heuristically) restored from what is there.
  3. No actual datasets can be written, instead just the extent is stored.

Template mode is also available in json:

{
  "attributes": {
    "basePath": "/data",
    "date": "2022-05-19 12:00:09 +0000",
    "iterationEncoding": "variableBased",
    "iterationFormat": "/data",
    "meshesPath": "meshes/",
    "openPMD": "1.1.0",
    "openPMDextension": 0,
    "software": "openPMD-api",
    "softwareVersion": "0.15.0-dev"
  },
  "data": {
    "attributes": {
      "dt": 1,
      "snapshot": 0,
      "time": 0,
      "timeUnitSI": 1
    },
    "meshes": {
      "temperature": {
        "attributes": {
          "axisLabels": [
            "x"
          ],
          "dataOrder": "C",
          "geometry": "cartesian",
          "gridGlobalOffset": [
            0
          ],
          "gridSpacing": [
            1
          ],
          "gridUnitSI": 1,
          "position": [
            0
          ],
          "timeOffset": 0,
          "unitDimension": [
            0,
            0,
            0,
            0,
            0,
            0,
            0
          ],
          "unitSI": 1
        },
        "datatype": "FLOAT",
        "extent": [
          5,
          5
        ]
      }
    }
  }
}

@franzpoeschel
Copy link
Contributor Author

Longer example:

[data]

[data.particles]

[data.particles.e]

[data.particles.e.positionOffset]

[data.particles.e.positionOffset.z]

[data.particles.e.positionOffset.z.attributes]
value = 3.14
unitSI = 1.0
shape = [5, 5]

[data.particles.e.positionOffset.y]

[data.particles.e.positionOffset.y.attributes]
value = 3.14
unitSI = 1.0
shape = [5, 5]

[data.particles.e.positionOffset.x]

[data.particles.e.positionOffset.x.attributes]
value = 3.14
unitSI = 1.0
shape = [5, 5]

[data.particles.e.positionOffset.attributes]
unitDimension = [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
timeOffset = 0.0

[data.particles.e.position]

[data.particles.e.position.z]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.position.z.attributes]
unitSI = 1.0

[data.particles.e.position.y]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.position.y.attributes]
unitSI = 1.0

[data.particles.e.position.x]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.position.x.attributes]
unitSI = 1.0

[data.particles.e.position.attributes]
unitDimension = [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
timeOffset = 0.0

[data.particles.e.particlePatches]

[data.particles.e.particlePatches.numParticlesOffset]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.numParticlesOffset.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.numParticles]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.numParticles.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.offset]

[data.particles.e.particlePatches.offset.z]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.offset.z.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.offset.y]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.offset.y.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.offset.x]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.offset.x.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.offset.attributes]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

[data.particles.e.particlePatches.extent]

[data.particles.e.particlePatches.extent.z]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.extent.z.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.extent.y]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.extent.y.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.extent.x]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.extent.x.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.extent.attributes]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

[data.meshes]

[data.meshes.temperature]
extent = [5, 5]
datatype = "FLOAT"

[data.meshes.temperature.attributes]
timeOffset = 0.0
unitSI = 1.0
position = [0.0]
gridUnitSI = 1.0
gridSpacing = [1.0]
gridGlobalOffset = [0.0]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
geometry = "cartesian"
dataOrder = "C"
axisLabels = ["x"]

[data.meshes.E]

[data.meshes.E.z]
extent = [5, 5]
datatype = "FLOAT"

[data.meshes.E.z.attributes]
unitSI = 1.0
position = [0.0]

[data.meshes.E.y]
extent = [5, 5]
datatype = "FLOAT"

[data.meshes.E.y.attributes]
unitSI = 1.0
position = [0.0]

[data.meshes.E.x]
extent = [5, 5]
datatype = "FLOAT"

[data.meshes.E.x.attributes]
unitSI = 1.0
position = [0.0]

[data.meshes.E.attributes]
timeOffset = 0.0
gridUnitSI = 1.0
gridSpacing = [1.0]
gridGlobalOffset = [0.0]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
geometry = "cartesian"
dataOrder = "C"
axisLabels = ["x"]

[data.attributes]
timeUnitSI = 1.0
snapshot = 0
time = 0.0
dt = 1.0

[attributes]
softwareVersion = "0.15.0-dev"
particlesPath = "particles/"
software = "openPMD-api"
openPMDextension = 0
meshesPath = "meshes/"
iterationFormat = "/data"
iterationEncoding = "variableBased"
openPMD = "1.1.0"
date = "2022-05-19 15:26:37 +0000"
basePath = "/data"

@franzpoeschel franzpoeschel force-pushed the topic-json-template branch 5 times, most recently from 475be7b to ee8bdf1 Compare May 23, 2022 11:14
@franzpoeschel franzpoeschel changed the title Use JSON/TOML template for defining openPMD metadata in a config file [WIP] Use JSON/TOML template for defining openPMD metadata in a config file May 23, 2022
@franzpoeschel franzpoeschel force-pushed the topic-json-template branch 3 times, most recently from d32fff3 to 376bc2a Compare July 5, 2022 09:23
@franzpoeschel franzpoeschel force-pushed the topic-json-template branch 2 times, most recently from 3ee509d to a662865 Compare July 21, 2022 16:35
@franzpoeschel
Copy link
Contributor Author

Notes for myself on the recent reodering of commits:

5 3ee509de (HEAD -> topic-json-template, origin/topic-json-template) Properly deal with undefined datasets
2 06da2d58 Make JSON and TOML look like two different backends
5 960ab21a Initialize Dataset definitions from template
5 b88bae67 Initialize Series attributes from template
3 6302a33c Fix NVHPC Toml11 open mode
2 d825008b Fix precision-losing type conversion
4 da960a23 Enable .toml tests in generic tests
4 0398b86f Extend example
3 7332996e Windows compatibility
x 85527799 Add and use Attribute::getOptional<T>()
1 64cde966 Template mode: Fill with zero upon read
1 fa483843 Write/read shorthand attributes without explicit datatype
3 bd8da013 CI fixes
1 d802d2ac Don't write platform datatype size table in template mode
2 cba71f7f Use .toml as filename extension
2 b019a7d1 TOML as alternative backend for JSON backend
1 4b25de8c Select template mode via JSON param
1 8ef4753f Add template mode to JSON backend

@franzpoeschel franzpoeschel force-pushed the topic-json-template branch 2 times, most recently from 1db63f6 to 55b72f8 Compare July 29, 2022 09:00
@franzpoeschel franzpoeschel force-pushed the topic-json-template branch 2 times, most recently from b07a2a8 to a41c2c6 Compare August 17, 2022 09:21
// throw error::WrongAPIUsage(
// "[RecordComponent] Must set specific datatype (Use "
// "resetDataset call).");
// }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this check was inactive up to now, since RecordComponentData::RecordComponentData initialized that field with Datatype::CHAR. Using an optional would make these things more obvious and avoid such pitfalls.
To be done in a different PR though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1316 now uses std::optional

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: new additions to the API backend: JSON
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants