Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify that contentSchema holds a subschema and when/how it applies #1564

Merged
merged 6 commits into from
Feb 14, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 21 additions & 8 deletions specs/jsonschema-validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -533,19 +533,32 @@ defined by [RFC 2046](#rfc2046).

### `contentSchema`

If the instance is a string, and if `contentMediaType` is present, this property
contains a schema which describes the structure of the string.
If the instance is a string, and if `contentMediaType` is present, this
keyword's subschema describes the structure of the string.

This keyword MAY be used with any media type that can be mapped into JSON
Schema's data model. Specifying such mappings is outside of the scope of this
specification.

The value of this property MUST be a valid JSON schema. It SHOULD be ignored if
`contentMediaType` is not present. Accessing the schema through the schema
location IRI included as part of the annotation will ensure that it is correctly
processed as a subschema. Using the extracted annotation value directly is only
safe if the schema is an embedded resource with both `$schema` and an
absolute IRI `$id`.
The value of this property MUST be a valid JSON schema. The subschema is
produced as an annotation.

Since `contentMediaType` is required to provide instruction on how to interpret
string content, `contentSchema` SHOULD NOT produce an annotation if
`contentMediaType` is not present.

Note that evaluating the `contentSchema` subschema in-place (i.e. as part of its
parent schema) will ensure that it is correctly processed. Independent use of
Comment on lines +550 to +551
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial reaction was that this doesn't make sense. The annotation is just the subschema. It's no longer in-place. It doesn't include the context of where it came from. So, how can it be evaluated in-place? Then it occurred to me that an annotation includes not just it's value, but also the schema location it came from and that location can be used to evaluate the contentSchema in-place. I don't think most readers are going to be knowledgeable enough to make that leap. This could use some clarification.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the footnote ([^7]) that follows not provide that clarity?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's not the thing I'm saying needs to be clarified. We say that the annotation is the subschema. We say that the subschema shouldn't be evaluated out of context from where it appeared in the schema and we explain why in footnote 7. What we don't explain is given a subschema without its parent context, how is it even possible to evaluate it in context. The value of the annotation is just the subschema, not the context. We can't evaluate the subschema in context because we don't know the context in which it needs to be evaluated. I hope that makes sense this time.

Of course the solution is that the location of the annotation keyword in the schema is how you know the context, but that's not intuitive. This is the only annotation where the location of the keyword in the schema is useful or necessary to know. Usually, we only care about the value of the annotation. In this case, we need to know the value and the schema location of the annotation. Actually, when used correctly (in context), the value of the annotation is useless and it's the schema location that the user actually uses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only annotation where the location of the keyword in the schema is useful or necessary to know. Usually, we only care about the value of the annotation.

This is incorrect. Annotation location has always been useful, especially in cases where you receive annotations from the same keyword in different locations, e.g. from title. The location allows the consumer to decide which (or both/all) it wants to use. This is Core, where annotations are defined.

We can't evaluate the subschema in context because we don't know the context in which it needs to be evaluated.

How would we not know the context? It's conveyed by the annotation location, which has always been defined to be a part of an annotation.


It's still not clear why you think that the existing text (including the lines following these) is insufficient. It's saying, "Don't just evaluate this annotation value as a schema because it may rely on things that exist externally to it. You probably need to evaluate it where it came from."

It's actually saying all of that, and then the footnote expands on that warning using an example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't evaluate the subschema in context because we don't know the context in which it needs to be evaluated.

How would we not know the context? It's conveyed by the annotation location, which has always been defined to be a part of an annotation.

Yes, you're right. I acknowledged that in next paragraph. I was walking you through my thought process when I first read it and what I believe the vast majority of readers will be thinking when they read this section. If it took me a minute to make that connection, most readers won't make it at all. Yes, the concept is unambiguously documented elsewhere, but most readers won't have every detail of JSON Schema memorized and I think this is a pretty esoteric detail.

the extracted subschema (as returned in an annotation) is only safe if the
subschema is an embedded resource which defines both a `$schema` and an absolute
IRI `$id`.[^7]

[^7] Processing a non-resource subschema in place will ensure that any
references (e.g. `$ref`) are always resolved properly. This isn't a problem when
the subschema is itself a resource. See
https://github.com/json-schema-org/json-schema-spec/issues/1381 for several
examples where processing this subschema independently can cause `$ref`
resolution failure.

### Example

Expand Down
Loading