Why use JSON Schema for validating CSV files? Why not JSON Table Schema? #322
jqnatividad
started this conversation in
FAQ
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When the
schema
andvalidate
commands specifications were first written, the intent was to use JSON Table Schema - which is specifically designed for tabular data.Of course, being active in the CKAN community, I considered using Frictionless Data. However, it was limited to Python.
After surveying the available crates that can be leverage to build these commands - it became clear that we had to use JSON Schema instead using the jsonschema crate.
And
validate
is quite performant! Validating a million rows in less than 3 seconds.1Still, as the jsonschema crate is still evolving, qsv will also support JSON Table Schema if and when it becomes doable/available in Rust.
Footnotes
Using this 1M row sample of NYC 311 Data and this JSON schema file that was run on the first 50,000 rows of the same 1M row 311 sample.
There were 2,995 "invalid" rows as the 50K sample didn't have certain enums that was present later in the 1M row sample.
This was run on a Ryzen 4800H laptop with 8 physical cores/16 logical cores and 32gb memory.
validate
is multi-threaded and used all 16 cores. ↩Beta Was this translation helpful? Give feedback.
All reactions