Schema-Harvester is a tool that parses exsiting JSON documents and tries to derive a JSON schema from these documents.
It comes with different "frontends" to consume JSON documents from different sources, currently via CLI or from Kafka.
You need a kafka-topic where the service publishes schemas to. Schemas are published with the source-topic as key.
It makes sense to enable log-compaction (cleanup.policy=compact
) for the schema-topic, but of
course this is optional.
Create a config.toml
(e.g. copy config.sample.toml
,
see `config.default.toml for all options) and start the service:
harvesterd
By default, it consumes all topics it has access to.
Consume a file with line separated JSON documents:
$ cat line_separated.json | schema-harvester
Consume via MQTT (using Eclipse Mosquitto):
$ mosquitto_sub -t homeassistant/event | schema-harvester
Consume from Kafka (using kcat):
$ kcat -b $KAFKA_BROKER_ADDRESS_LIST -t your_topic | schema-harvester
To verify that the generated schema is a valid JSON schema, we use the jsonschema crate's schema-validation baked into an executable.
cargo run --example validate schema.json
# or, eg directly from kafka
kcat -b localhost:9092 -t schemas -o-1 -C -e | cargo run --example validate