Skip to content

SchemaHarvester is a tool that parses exsiting JSON documents and tries to derive a JSON schema from these documents.

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

elmarx/schema-harvester

 
 

Repository files navigation

Schema-Harvester

Tests

Schema-Harvester is a tool that parses exsiting JSON documents and tries to derive a JSON schema from these documents.

It comes with different "frontends" to consume JSON documents from different sources, currently via CLI or from Kafka.

Kafka-Service usage

You need a kafka-topic where the service publishes schemas to. Schemas are published with the source-topic as key.

It makes sense to enable log-compaction (cleanup.policy=compact) for the schema-topic, but of course this is optional.

Create a config.toml (e.g. copy config.sample.toml, see `config.default.toml for all options) and start the service:

harvesterd

By default, it consumes all topics it has access to.

CLI Usage

Consume a file with line separated JSON documents:

$ cat line_separated.json | schema-harvester

Consume via MQTT (using Eclipse Mosquitto):

$ mosquitto_sub -t homeassistant/event | schema-harvester

Consume from Kafka (using kcat):

$ kcat -b $KAFKA_BROKER_ADDRESS_LIST -t your_topic | schema-harvester

Verify schemas

To verify that the generated schema is a valid JSON schema, we use the jsonschema crate's schema-validation baked into an executable.

cargo run --example validate schema.json
# or, eg directly from kafka
kcat -b localhost:9092 -t schemas -o-1 -C -e | cargo run --example validate

About

SchemaHarvester is a tool that parses exsiting JSON documents and tries to derive a JSON schema from these documents.

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • Rust 97.0%
  • Just 1.5%
  • Dockerfile 1.5%