Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JSON-schema generator #103

Merged
merged 5 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/buildcheck.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,7 @@ jobs:
ifexgen_dbus vehicle_service_catalog/comfort-service.yml >d-bus.out
cat d-bus.out

- name: Run JSON-Schema generator
run: |
python ifex/schema/ifex_to_json_schema.py >temp-schema
python ifex/schema/pretty_print_json.py temp-schema >ifex-core-idl-schema.json
11 changes: 7 additions & 4 deletions ifex/model/ifex_ast.py
Original file line number Diff line number Diff line change
Expand Up @@ -633,10 +633,13 @@ class Namespace:


@dataclass
class AST(Namespace):
class AST():
"""
Dataclass used to represent root element in a IFEX AST.
Behaviour is inherited from Namespace class.
"""

pass
name: Optional[str] = str() # Represents name of file. Usually better to name the Namespaces and Interfaces
description: Optional[str] = str()
major_version: Optional[int] = None # Version of file. Usually better to version Interfaces, and Namespaces!
minor_version: Optional[int] = None # ------ " ------
includes: Optional[List[Include]] = field(default_factory=EmptyList)
namespaces: Optional[List[Namespace]] = field(default_factory=EmptyList)
7 changes: 7 additions & 0 deletions ifex/model/ifex_ast_introspect.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,13 @@ def type_name(type_indicator):
else:
return type_indicator.__name__

def field_referenced_type(f):
"""Return the type of the field, but if it's a list, return the type inside the list"""
if field_is_list(f):
return field_inner_type(f)
else:
return field_actual_type(f)

VERBOSE = False

# Tree processing function:
Expand Down
33 changes: 33 additions & 0 deletions ifex/schema/LICENSE.README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Comments on the LICENSE file of this directory.

## Applicability:

- `LICENSE.json-schema` (Revised BSD license) applies ONLY to the file `json-schema.org_draft_2020-12_schema`
- All _other_ files in this directory are subject to the license of the IFEX project repository, and information can be found in the root directory of the repository.

## Additional information

JSON-Schema is a specification published by the Internet Engineering Task Force (IETF) at this location: https://json-schema.org/specification

The IFEX project includes a file named `json-schema.org_draft_2020-12_schema` which is a copy of the meta-schema ("JSON-schema for JSON-schemas")
It is used to validate that the JSON schema we generate is itself valid against the specification.

The meta-schema, has been downloaded from this standard well-known URL: https://json-schema.org/draft/2020-12/schema
(The link to this file and to specifications can otherwise be found at: https://json-schema.org/specification-links)

The JSON Schema Core Specification, specifically the one stored here: https://json-schema.org/draft/2020-12/json-schema-core is at the time of writing the following version:

Internet-Draft: draft-bhutton-json-schema-01
Published: 16 June 2022

That specification states:

> "Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions"

It references this address for the legal provisions: `https://trustee.ietf.org/documents/trust-legal-provisions/`

The legal provisions include the Revised BSD license text, which is provided here in `LICENSE.json-schema`

- Since JSON documents cannot include comments, the actual meta-schema file itself does not include any Copyright/License information.
- It is interpreted as being a "code component" deriving from the JSON-Schema Specification, as per the above description.
- Thus, the file is subject to the Revised BSD License text as described in the Trust Legal Provisions. The license text is stored in the file `LICENSE.json-schema` in this directory.
8 changes: 8 additions & 0 deletions ifex/schema/LICENSE.json-schema
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of Internet Society, IETF or IETF Trust, nor the names of specific contributors, may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
59 changes: 59 additions & 0 deletions ifex/schema/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# IFEX JSON-schema

This file explains why JSON-schema support was added to the project and how it relates to the development philosophy.

## Background

Since its beginning IFEX development has strived to follow the elusive "single source of truth" principle. Translated into development it means that the language syntax shall as far as possible be defined in a single place, and in a machine-processable format so that **all** programs and documentation derive from this single definition, and are thereby always consistent with eachother.

It is anticipated that the IFEX specification will evolve before becoming stabilized. It is crucial to minimize the number of inconsistencies and errors during that period.

As of now, important sections of the IFEX Core IDL Specification are _generated_ from an internal model of the language (a "meta-model" of sorts). Documentation is generated from the same model that is used to read and validate IFEX input and as a basis for output (e.g. when translating another input format to IFEX).

JSON-schema support is not a step away from this (see below for details). JSON-schema is added for the moment _primarily_ for IDE support (see 4th item under **Iteration 3** heading).

#### Iteration 1:
- The IFEX Core IDL language was first defined in a custom data-structure inside of the parser implementation (python). Early on, the common idea of a schema (likely JSON-schema specifically) was considered.

- Option A: Define the allowed syntax for the IDL in a separate "specification file" (JSON-schema). Write a program that reads the definition, which can then be used inside of a program. Then read the input, and use a library to see if the input conforms.
- Option B: Define a simple definition directly in python source code (still very readable), and use that directly inside the program. It is already written in the programming language, and does not need to be "created" by a previous step.
- Option C: Keep the schema file separate and implement some parts of implementation independently. Many programs take this approach and try to keep the consistency manually. This breaks the goal of a single place of definition as described in the background, so this option was discarded immediately.

Option B was chosen because a schema file felt like an unnecessary intermediate step. Furthermore, JSON-schema is somewhat complex and the files are noisy and a lot less readable than any of the alternative ways. To understand the allowed structure of the language, the python source file was arguably more readable than the JSON schema.

- The data-structure defining the "meta-"- model was then iterated over to build python classes at run-time. These classes define the nodes inside the internal representation (abstract syntax tree) built from parsing the input which is in YAML, specifically in the IFEX Core IDL format.

- In practice, the classes defining the types for the AST are dataclasses since they do not include any operations, just fields.

- A python dict is also used as an intermediate representation in all cases because that is what the standard YAML parsing library outputs.

- The python dict directly represents what was in YAML. We get to this stage for any valid YAML file, but it is not necessarily valid according to IFEX Core IDL.

- One reason to build up a "tree" of python (data-)classes/objects instead of simply keeping a python-dict as the representation, is that it enables the minimal syntax for traversing the object tree when writing python code inside Jinja templates: `item.namespaces[0].childnode.feature` as opposed to `item['namespaces'][0]['childnode']['feature']`.
- Furthermore, even though python is a dynamic language, it is more likely (for IDEs, for example) to indicate _before_ runtime errors occur, if some `.fieldname` does not exist at that position in the tree hierarchy (the valid fields are expressly defined by the class definitions).

#### Iteration 2:

- Some developers were fond of the @dataclass definition in python, combined with finding the `dacite` library. Part of the logic that was encoded in the original meta-definition datastructures could instead be defined in the dataclasses by adding in the typing module, namely: the intended type of each field, is it a single item or a list, and is it optional or mandatory.

- Dacite library then also replaced the custom code that translates between the python dict and the AST representation. Dacite will fail and provide some error information if the dict does not correspond to the hierarchy of dataclass definitions. However, there was less detailed error information than we could give in the previous custom code that expressly iterated over the dict and could give IFEX-specific hints about what was wrong.

#### Iteration 3: JSON Schema addition

- First we note that JSON-schema is better supported than (various) YAML-schema alternatives, and simultaneously JSON-YAML equivalence makes it possible to use JSON-schemas to validate YAML. That is why JSON-schema is selected.
- Second, the main validation strategy has not changed - the core program is still working according to Iteration 2. In other words, the "single source of definition" of the language has _not_ at this point switched to be a JSON schema file and the official definition is still in `ifex_ast.py`.
- Because of that, we flip things around, and to ensure a consistent definition with minimal errors the JSON-schema file is not provided but instead generated from the internal model! Such a program is now provided in this directory: `ifex_to_json_schema.py`.
- The primary driver to still create a JSON-schema was IDE support. By providing a JSON-schema, editing IFEX files in for example Visual Studio Code, will get automatic syntax-checking without any other custom plugin. Validating against a JSON schema is already built into existing VSCode plugins.
- A future reason is that we could if it feels useful use a JSON-schema validation library to augment the formal check of the input. We might possibly get better error messages than the current YAML->dict->dacite chain does. In addition, it may be a pathway for alternative IFEX tool implementations (in another programming language) since JSON-schema is generally very well supported in all languages.

#### Iteration 4 (future?):
- It is possible that there will be additional rewrites, which should still continue to be derived from the one formal definition to ensure full compatibility. It is not yet decided but:
- This rewrite might modify dacite or replace it, to get more detailed validation.
- It might switch to JSON-schema as the official definition of the language (but there are no such plans now -- JSON-schema is still less readable and more complex).
- In either case, detailed validation of fields needs improvement - the tree structure is checked but individual strings are not parsed to see if they make sense
- As noted above, IFEX tools may later on be implemented in other programming languages than python, and when that happens there might be some strategy to move the source-of-truth from `ifex_ast.py` to something else. It could be JSON-schema file as the official definition. Or alternatively the initial seed for such development gets generated out of `ifex_ast.py` in some way.

### Issues to fix

- Examples are still not verified. Although they are written inside of the `ifex_ast.py` file to facilitate consistency with the actual code model, they are still just opaque strings, and not programatically checked against any model or schema. This means that examples in the specification can still be wrong due to human error.

174 changes: 174 additions & 0 deletions ifex/schema/ifex_to_json_schema.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# SPDX-License-Identifier: MPL-2.0

# =======================================================================
# (C) 2023 MBition GmbH
# Author: Gunnar Andersson
#
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at https://mozilla.org/MPL/2.0/.
# =======================================================================

"""
Generate JSON Schema equivalent to the python-internal model definition.
"""

from dataclasses import fields
from ifex.model.ifex_ast import AST
from ifex.model.ifex_ast_introspect import actual_type, field_actual_type, field_inner_type, inner_type, is_forwardref, is_list, type_name, field_is_list, field_is_optional, field_referenced_type
from typing import Any
import json

# =======================================================================
# Helpers
# (Most general ones are in ifex_ast_introspect module instead)
# =======================================================================

# This function is local because "string" and "integer" are neither python
# or IFEX types -- it is the way JSON-schema spells them
def get_type_name(t):
if t is Any:
return "Any"
elif t is str:
return "string"
elif t is int:
return "integer"
else: # "complex type"
return type_name(actual_type(t))

# =======================================================================
# JSON SCHEMA OUTPUT
# =======================================================================

# Here are print-functions for the main object type variations in the JSON
# schema such as a single object, or an array of objects.

# Special case for "Any" (used with Option values). For now we allow it to be
# either an integer or a string when checking against a schema. Many languages
# assume enum value to be only represented by integers, but IFEX **in theory**
# allows _any_ data type for enumerations, and thus any value of that type.
# However, in YAML it seems, for now, only realistic to allow specifying
# constant values as either numbers and strings.

def print_field(field_name, type_name, is_primitive=False, description=None):
if type_name == 'Any':
# For now, considered to be *either* a number or string.
print(f'"{field_name}" : {{ "anyOf": [\n {{ "type": "integer" }},\n {{ "type": "string" }}\n]\n', end="")
elif is_primitive:
print(f'"{field_name}" : {{ "type": "{type_name}"\n', end="")
else: # complex/object type
print(f'"{field_name}" : {{ "type": "object",\n"$ref": "#/definitions/{type_name}"\n', end="")
if description:
print(f', "description": "{description}"')
print('}')

def print_array_field(field_name, type_name, is_primitive=False, description=None):
if type_name == 'Any':
print(f'"{field_name}" : {{ "type": "array", "items": {{ "anyOf": [\n {{ "type": "integer" }},\n {{ "type": "string"}}\n]\n}}', end="")
elif is_primitive:
print(f'"{field_name}" : {{ "type": "array", "items": {{ "type": "{type_name}" }}')
else:
print(f'"{field_name}" : {{ "type": "array", "items": {{ "$ref": "#/definitions/{type_name}" }}')
if description:
print(f', "description": "{description}"')
print('}')

def print_type(t, fields):
print(f'"{t}": {{ "type": "object", "properties": {{')
for n, (field_name, field_type, is_array, _) in enumerate(fields):
# FIXME add description to print_field
is_primitive = (get_type_name(field_type) in ["string", "integer"])
if is_array:
print_array_field(field_name, get_type_name(field_type), is_primitive)
else:
print_field(field_name, get_type_name(field_type), is_primitive)
# Skipping last comma - is there a better idiom for this? Probably.
if n != len(fields)-1:
print(',')
print('},')

# Same loop for the "required" field
print(f'"required" : [')
for n, (field_name, field_type, is_array, is_required) in enumerate(fields):
if is_required:
# Comma, if there were any previous ones
if n != 0:
print(',')
print(f'"{field_name}"', end="")

print(f'],')
print('"additionalProperties" : false')

# =======================================================================
# Model traversal
# =======================================================================

def collect_type_info(t, collection={}, seen={}):
"""This is the main recursive function that loops through tree and collects
information about its structure which is later used to output the schema:"""

# We don't need to gather information about primitive types because they
# will not have any member fields below them.
if t in [str, int, Any]:
return

# ForwardRef will fail if we try to recurse over its children. However,
# the types that are handled with ForwardRef (Namespace) ought to appear
# anyhow *somewhere else* in the tree as a real type -> so we can skip it.
if is_forwardref(t):
return

# Also skip types we have already seen because the tree search will
# encounter duplicates
typename = type_name(t)
if seen.get(typename):
return

seen[typename] = True

# From here, we know it is a composite type (a dataclass from ifex_ast.py)
# Process each of its member fields, remembering the name and type of each so that
# that can be printed to the JSON schema
for f in fields(t):
field_name = f.name
field_type = field_referenced_type(f)

# Define each type by storing each of its fields and those fields' types.
# We should also remember if it is a collection/list (in JSON-schema called array)
if not collection.get(typename):
collection[typename] = []
collection[typename].append((field_name, field_type, field_is_list(f), not field_is_optional(f)))

# Self recursion on the type of each found member field
collect_type_info(field_type, collection, seen)


# =======================================================================
# MAIN PROGRAM
# =======================================================================

if __name__ == "__main__":

# First, collect info
types={}
collect_type_info(AST, types)

# Then print JSON-schema
print('''{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "IFEX Core IDL (YAML format)",
"type": "object",
"allOf": [ { "$ref": "#/definitions/AST" } ],
"definitions": {
''')

items=types.items()
for n, (typ,fields) in enumerate(items):
print_type(typ, fields)
# print comma, but not on last item
if n != len(items)-1:
print('},')
else:
print('}')
print('}\n}')

Loading
Loading