-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(jsonschema): reworking how we handle json schema #65
Conversation
ab38c2d
to
f09df59
Compare
4fdfb09
to
b60b331
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick mid-point review, mainly pointing out style things to make life easier later. Looking great!
b60b331
to
732e0ba
Compare
We're entering reviewable PR teritory. All things considered this is very much work in progress and needs the following finalized:
However I would like to get going on the "overal logic" front. Here's a breakdown on what changed functionally from the previous implementation:
Codewise:
Schema parsing:
Schema validation:
Notable pieces of code:
There is also this prerequisite PR for qri-io/jsonpointer which is functionally the same, but carries some new options for performance improvements. Points for further discussion:
|
I deem this ready to start being reviewed. Refer to the above comment for a bit more guidance on the changes in general. The top comment reflects the current state of what still needs to be done before merging and what will be left for future work. |
c0215a4
to
4b4c01f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is a first pass. I've looked at 23/34 files. Looking great. I'd love to get back over to package jsonschema
ASAP.
The other thing we need to do is cut a release (v0.2.0) that updates CHANGELOG.md and specifies this the last version before a jump to 2019 draft support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now we get to chat about SchemaContext
😄
schema_context.go
Outdated
type SchemaContext struct { | ||
Local *Schema | ||
Root *Schema | ||
RecursiveAnchor *Schema | ||
Instance interface{} | ||
LastEvaluatedIndex int | ||
LocalLastEvaluatedIndex int | ||
BaseURI string | ||
InstanceLocation *jptr.Pointer | ||
RelativeLocation *jptr.Pointer | ||
BaseRelativeLocation *jptr.Pointer | ||
|
||
LocalRegistry *SchemaRegistry | ||
|
||
EvaluatedPropertyNames map[string]bool | ||
LocalEvaluatedPropertyNames map[string]bool | ||
Misc map[string]interface{} | ||
|
||
ApplicationContext *context.Context | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, after a bunch of reading, I think this API needs work, but seems to clearly point to how it can be improved.
- It seems to me this struct is a state machine, not a context. a "context" in go carries scope across API boundaries. This struct harmonizes state into a single location, allowing keywords to be stateless.
- because all fields are exported, the guts of this state machine are open to the world to modify
- the methods & fields on this struct are the primary API for keyword developers. Because we support custom keywords, It's a public API and needs to be written defensively.
- all keywords in this package are consumers of this API, and examples for other keyword developers, a massive win IMHO.
- this struct is not safe for concurrent use. That might be ok for now, but we should have a plan for making it safe for concurrency.
- the
Instance
couples Instance data to validation state, which feels incorrect. data should flow separately though the keyword API, especially in a streaming context - this state machine should collect validation errors (we've been using an outParam for this)
Starting from the top level Schema
"user api". Instead of being "a wrapper function to maintain some level of backwards compatibility with versions v0.1.2 and prior", let's break the API complete and define a very "generic" go function that initializes a root state and transitions to the "keyword API":
func (s *Schema) Validate(ctx context.Context, data interface{}) {
st := NewValidationState(s)
s.ValidateKeyword(ctx, st, data)
}
The call to ValidateKeyword
transitions us to the "keyword API", where we've changed the primary ValidateFromContext
interface function to something like ValidateKeyword
:
// Keyword is an interface for anything that can validate.
// JSON-Schema keywords are all examples of Keyword
type Keyword interface {
// ValidateKeyword runs a validation check against decoded JSON data,
// calling methods on ValidationState to record any discovered errors
ValidateKeyword(ctx context.Context, state *ValidationState, data interface{})
// ...
}
A keyword implementation would change to implement like this:
// ValidateKeyword implements the Keyword interface for Maximum
func (m Maximum) ValidateKeyword(ctx context.Context, state *ValidateionState, data interface{}) {
SchemaDebug("[Maximum] Validating")
if num, ok := data.(float64); ok {
if num > float64(m) {
// state now keeps the errs slice internally, has all the info it needs to
// populate error fields
state.AddError(fmt.Sprintf("must be less than or equal to %f", m))
}
}
}
a complex keyword needs a fairly rich API from the state
struct. Here I've made up methods to satistify methods Items
needs. Warning, untested code:
// ValidateKeyword implements the Keyword interface for Items
func (it Items) ValidateKeyword(ctx context.Context, state *ValidationState, data interface{}) {
SchemaDebug("[Items] Validating")
if arr, ok := schCtx.Instance.([]interface{}); ok {
// instead of "NewSchemaContextFromSourceClean(*schCtx)", state gets a method "subState"
// that initializes & returns a clean substate from the parent
subState := state.NewSubState()
if it.single {
// BaseRelativeLocation should be a method that reads from a private field,
// the method defends against nil access, making it safe to use like this:
if newPtr := state.BaseRelativeDescendant("items"); newPtr != nil {
subState.SetBaseRelativeLocation(newPtr)
}
// this could probably be turned into a one-liner:
newPtr := state.RelativeDescendant("items")
subState.SetRelativeLocation(newPtr)
for i, elem := range arr {
if _, ok := state.LocalKeyword("additionalItems"); ok {
state.SetPropertiesEvaluated("0")
state.SetLocalPropertiesEvaluated("0")
// These might be combined into some sort of "only increment if higher" setter
if state.LastEvaluatedIndex() < i {
state.SetEvaluatedIndex(i)
}
if state.LocalLastEvaluatedIndex() < i {
state.SetLocalLastEvaluatedIndex(i)
}
}
subState.ClearContext()
newPtr = state.InstanceLocationDescendant(strconv.Itoa(i))
subState.SetInstanceLocation(newPtr)
// here it's clearer we're using a subState with a different data element:
it.Schemas[0].ValidateKeyword(ctx, subState, elem)
if _, ok := state.LocalKeyword("additionalItems"); ok {
// TODO(arqu): this might clash with additionalProperties
// should separate items out
state.SetPropertiesEvaluated(subState.EvaluatedProperties()...)
state.SetLocalPropertiesEvaluated(subState.LocalEvaluatedProperties()...)
}
}
} else {
for i, vs := range it.Schemas {
if i < len(arr) {
if _, ok := state.LocalKeyword("additionalItems"); ok {
state.SetPropertyEvaluated(strconv.Itoa(i))
state.SetLocalPropertyEvaluated(strconv.Itoa(i))
// These might be combined into some sort of "only increment if higher" setter
if state.LastEvaluatedIndex() < i {
state.SetEvaluatedIndex(i)
}
if state.LocalLastEvaluatedIndex() < i {
state.SetLocalLastEvaluatedIndex(i)
}
}
subState.ClearContext()
if newPtr := state.BaseRelativeDescendant("items", strconv.Itoa(i)); newPtr != nil {
subState.SetBaseRelativeLocation(newPtr)
}
newPtr, _ := state.RelativeLocationDescendant("items", strconv.Itoa(i))
subState.SetRelativeLocation(newPtr)
newPtr = state.InstanceLocationDescendant(strconv.Itoa(i))
subState.SetInstanceLocation(newPtr)
vs.ValidateKeyword(ctx, subState, arr[i])
if _, ok := state.LocalKeyword("additionalItems"); ok {
state.SetPropertiesEvaluated(subState.EvaluatedProperties()...)
state.SetLocalPropertiesEvaluated(subState.LocalEvaluatedProperties()...)
}
}
}
}
}
}
The hard work of figuring out how to arrange this is done, all of these comments are just ergonomics, but important considering we're going to break the API. I think breaking the API is wholly appropriate with the transition to the 2019_09
spec.
8e6be66
to
f41640c
Compare
f41640c
to
2d24f12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🌟 🌟 🚀 🧑🚀 🚀 🌟 🌟
🔥 🔥 🎸 👩🎤 🎸 🔥 🔥
🚂 🚋 🚋 🚋 🚋 🚋 🚋
Let's merge a version bump first, but THIS IS SO GOOD YAY @Arqu |
WIP PR for the jsonschema implementation rework.
Things to do:
traversal_test.go
,val_error_test.go
&validate_test.go
ref.json
(or disregard them)main
and removemain.go
Things that will not be covered in this release: