Codegen JSON Schema into Python TypedDict
classes.
This project is currently an experiment to see how well JSON Schema can be converted into Python TypedDict
classes. If it goes well then it'll graduate to a PyPI package.
Running this command:
python -m json_schema_to_python --input examples/petstore.json
Will generate this Python code:
from __future__ import annotations
from enum import Enum
from typing import Literal, Union
from typing_extensions import NotRequired, TypedDict
class Animal(TypedDict):
is_adorable: NotRequired[bool]
species: NotRequired[Species]
weight: NotRequired[float]
class Pet(Animal):
id: int
name: str
toys: list[Toy]
class Toy(TypedDict):
is_squeaky: NotRequired[bool]
Species = Literal["cat", "dog"]
From this JSON Schema:
{
"id": "#root",
"properties": {
"Animal": {
"id": "#Animal",
"type": "object",
"properties": {
"is_adorable": {
"type": "boolean"
},
"species": {
"$ref": "#Species"
},
"weight": {
"type": "number"
}
}
},
"Pet": {
"id": "#Pet",
"type": "object",
"$ref": "#Animal",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string"
},
"toys": {
"type": "array",
"items": [{ "ref": "#Toy" }]
}
},
"required": ["id", "is_adorable", "name", "species", "toys"]
},
"Species": {
"id": "#Species",
"type": "string",
"enum": ["cat", "dog"]
},
"Toy": {
"id": "#Toy",
"type": "object",
"properties": {
"is_squeaky": {
"type": "boolean"
}
}
}
}
}
This is because Mypy doesn't support anonymous TypedDict
s (see this discussion).
If we had this schema:
{
"id": "#root",
"properties": {
"Foo": {
"properties": {
"bar": {
"type": "object",
"properties": {
"baz": { "type": "integer" }
}
}
}
}
}
}
Then it'd be nice to generate something like this:
Foo = TypedDict(
{
"bar": {
{"baz": int},
},
},
)
But that's impossible right now.
This is because TypedDict
has spotty support "extra" keys during dict creation (see this discussion). Sometimes Mypy is OK with extra keys and sometimes it isn't:
class Foo(TypedDict):
a: int
class Bar(TypedDict):
a: int
b: int
# Mypy error when declaring a dict with extra keys
foo: Foo = {"a": 1, "b": 2}
bar: Bar = {"a": 1, "b": 2}
def stuff(value: Foo) -> None:
pass
# No Mypy error when passing a dict variable with extra keys
stuff(bar)
# Mypy error when passing an anonymous dict with extra keys
stuff({"a": 1, "b": 2})
This is because Python Enum
members are actually objects and not primitives:
>>> from enum import Enum
>>> class Foo(Enum):
... a = "a"
...
>>> Foo.a
<Foo.a: 'a'>
>>> Foo.a.value
'a'
That won't match your runtime data if you just did a json.loads
on a request body.
This is primarily to simplify the library. It isn't impossible to avoid using id
, but the extra complexity doesn't seem worthwhile.
The primary reason for this decision is to avoid messy class name issues. For example, if you had schemas in #properties/foo/Thing
and #properties/bar/Thing
then what would you call each? FooThing
and BarThing
? And what would happen if you had another schema whose id
was already FooThing
? Adding class name magic is a can of worms, so it seems best to avoid it by using explicit class names via id
.