Replies: 3 comments
-
I think there's a point to be made in favor of a TOML-based schema for TOML. I'm personally also in favor of #792 over #116 in terms of how the schema is defined, as 792 is rather simple and concise. This also fits better with being easily parsable by humans.
To me, this would seem like a dealbreaker - if you can't validate valid TOML with it, it's not fit for the job imo. I'd also like to re-raise a concern from 792 regarding optional keys and defaults. This would mean output on valid TOML matching the schema would be different between schema-aware and unaware parsers! |
Beta Was this translation helpful? Give feedback.
-
For a lot of TOML use cases, "data read from the TOML file can also be serialised as JSON" is a design requirement. Even for those cases where it isn't, the file format frequently won't containing any floating point fields at all, and when it does, "NaN" and "inf" are often going to be invalid values in the floating point fields anyway. In either of those situations, the fact JSON schema intrinsically disallows passing NaN and Inf values through number fields ends up not being a problem. The Compared to spinning up an entire parallel schema validation ecosystem, defining a mapping from TOML floats to a dual number/string JSON format that can handle NaN and Inf is a much smaller task. |
Beta Was this translation helpful? Give feedback.
-
"data read from the TOML file can also be serialised as JSON" does not, imo, mean "data from TOML maps 1:1 with JSON without processing." I agree the JSON approach seems like less work, but either the TOML and the JSON Schema features allowed with validation would have to be more limited, or either format would have to be changed to accommodate the other. And since I doubt JSON Schema is going to be changed to accommodate non-JSON formats, and the original linked issues establish that TOML will not be changed over validation challenges, that means forking either one. Don't get me wrong, for where it currently stands I think Taplo is doing an amazing job. I just think that to truly validate all valid TOML a specific validation format (perhaps based on JSON Schema with some different definitions to accomodate TOML features) would be better. Though maybe the people over at JSON Schema could be convinced to broaden their scope slightly to encompass other data formats a bit more. And on that note, as much as I like the idea of 792, I think some features like the default injection should be a for a custom parser to support, and not an mainline TOML (parser) problem or a validator's task. |
Beta Was this translation helpful? Give feedback.
-
(Inspired by #792, but as a discussion rather than an issue since I don't think it should even be a documentation proposal yet until there's an initial agreement that this is a good path to take)
Defining comprehensive data schemas is difficult (especially if they can reference each other), so using JSON Schema to validate TOML documents seems like a more pragmatic path forward than attempting to build a separate TOML-specific schema validation ecosystem.
A version of this idea is already implemented in
taplo
, which uses#:schema ./foo-schema.json
comments to reference JSON schema documents: https://taplo.tamasfe.dev/configuration/directives.html#the-schema-directive(While the Python standard library's
tomllib
module doesn't provide access to TOML comments, the feature is available by iterating over thebody
attribute of atomlkit.TOMLDocument
instance, allowing scanning for schema references using the same format astaplo
)Given a JSON schema reference, validating a TOML document against a JSON schema specification at runtime is going to be fairly straightforward: load the data from the TOML file, load the schema file into your preferred JSON schema validation library, and then check the data matches the schema.
What's missing is a clear explanation of how the different pieces of a TOML document map to different concepts in JSON schema, since the two specifications sometimes use different terminology for the same things, and there are some features of TOML that need to be skipped if you want the data read from the document to validate as JSON at all (let alone against a specific schema).
The TOML mapping for the basic JSON Schema types is straightforward (TOML type -> JSON type):
All of the regular JSON schema features for these types can be applied to TOML documents, remembering that they apply to the parsed values, not the exact text as written into the TOML file (so things like the string quoting format or whether a table is inline or not don't matter).
Notable caveats and limitations for the basic types:
float
JSON schema field. To pass schema validation,nan
andinf
(and their positive and negative variants) need to be encoded as dual type["float", "string"]
fields rather than using the native float representations of the special values.null
values. The closest TOML has to a representation ofnull
is "omit that key", which only applies to tables and the top level keys of a document.The final case to consider is how dates, times, and their optional timezone offsets should be matched to the JSON schema RFC 3339 guidelines in https://json-schema.org/draft/2020-12/json-schema-validation#name-defined-formats
This last part isn't actually a TOML question, it's a question of how the structured date/time objects emitted by a compliant TOML parser are serialised to strings before being passed to the chosen JSON Schema validator (passing the structured date/time objects directly will always fail, since they're not a valid JSON type).
For Python, for example, making
jsonschema
happy with serialiseddatetime
values requires ensuring that they're converted to strings which comply with RFC 3339 as JSON Schema specifies (the ISO 8601 basedisoformat()
methods are sufficient for this, since they include the separators that RFC 3339 requires)Beta Was this translation helpful? Give feedback.
All reactions