-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] [JSON] Broader support of allOf
JSON keyword
#1070
Conversation
…alidate (arbitrary...)
… implementation of)
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1070 +/- ##
==========================================
- Coverage 65.35% 65.19% -0.17%
==========================================
Files 65 65
Lines 4892 5102 +210
==========================================
+ Hits 3197 3326 +129
- Misses 1695 1776 +81 ☔ View full report in Codecov by Sentry. |
guidance/library/_json.py
Outdated
const: Union[Unset, Any] = _unset | ||
|
||
def handle_keyword(key: str, value: Any, base_uri: str): | ||
nonlocal type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have learned a new Python keyword today.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super handy for modifying closure variables :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of 'closed but not really' ? :-)
I made the mistake of refactoring test_json into a folder as well as running black on it... The diff in the tests looks much bigger than it should... ctrl-f for "unsatisfiable" in |
This reverts commit 496718e. Undo blacken to reduce diff size of PR
allOf
JSON keywordallOf
JSON keyword
@riedgar-ms just a heads up that I moved this back to WIP as @Saibo-creator gave me some examples of schemas for which this fails -- will let you know when it's ready for review again. |
Thanks. One suggestion: why not do the split into separate test files now, as a separate PR. Keep things orthogonal and more reasonably sized (well, slightly). |
…alidate (now prioritizing base schema)
allOf
JSON keywordallOf
JSON keyword
@riedgar-ms refactored a little bit just to move some of the potentially reusable parts out into their own methods. I was hoping to make TLDR: ready for review. |
try: | ||
additional_properties_grammar = self.json(json_schema=additional_properties, base_uri=base_uri) | ||
except UnsatisfiableSchemaError as e: | ||
if any(k not in properties for k in required): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No else
needed here? Even to just re-raise (and in other exception blocks with a conditional), since this is going to swallow exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As-is, this says "if the additionalProperties schema is unsatisfiable, we simply disallow additionalProperties". This can come up, for example, if you take the intersection (allOf) of two schemas with contradicting additionalProperties. From JSON schema's point of view, this is expected behavior. additionalProperties=False
(which we previously had an explicit check for) is just a sub-case of this, NOT a special case.
This only makes the parent schema unsatisfiable in the case that there are some required properties that don't have schemas in the properties dict -- (thereby requiring a property that is unsatisfiable). See just a few lines earlier (L574-L582) where we only raise an UnsatisfiableSchemaError for an unsatisfiable property if it's required, and we otherwise "blacklist" it.
TLDR: UnsatisfiableSchemaError
is semantically different from all other exception types, as unsatisfiable schemas are part of JSON schema's design. We only re-raise them in particular situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. If this and other places this sort of pattern pops up are considered normal. Perhaps a comment, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to add a comment
# i corresponds to the number of items we've already satisfied | ||
if i < min_items: | ||
raise UnsatisfiableSchemaError(f"prefixItems[{i}] is unsatisfiable but min_items is {min_items}") from e | ||
# Having an unsatisfiable prefix item is fine if we've already satisfied min_items, but this effectively sets max_items to i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this worth a warnings.warn()
? Both from a Python perspective (breaking out of a loop by catching an exception is not normal flow control), and from a JSON-schema one? Your comment is one which makes me feel that the user might not have quite written the schema they meant to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
breaking out of a loop by catching an exception is not normal flow control
Not sure I agree with this premise, but I'll admit that doing something like {"prefixItems": [True, True, False, True]}
would be a VERY odd way of saying "no more than two items please". That being said, this behavior is semantically consistent with JSON schema, and we only need to re-raise an UnsatisfiableSchemaError if minItems
conflicts.
More philosophically, I'm not sure we should be in the business of policing the semantic meaning of peoples' schemas. I'd err on the side of giving the user what they asked for (without polluting them with warnings), only raising if there's something actually semantically invalid about what they asked for (e.g. an UnsatisfiableSchemaError makes its way up to the top without being caught) or if they ask for something we don't support (e.g. a NotImplementedError). Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some additional color on above -- this can arise organically if you try and do something like
{
"allOf": [
{"prefixItems": [{"type": "integer"}, {"type": "string"}, {"type": "integer"}]}
{"prefixItems": [{"type": "integer"}, {"type": "boolean"}, {"type": "integer"}]}
]
}
This schema should be equivalent to
{
"prefixItems": [
{"type": "integer"},
{"allOf": [{"type": "string"}, {"type": "string"}]},
{"type": "integer"}]
}
and therefore
{"prefixItems": [{"type": "integer"}, false, {"type": "integer"}]}
As false
isn't actually a grammar, we can't really return those from further down the stack. I suppose we could return a null grammar, but I (personally) think that's a bit trickier than having a special exception type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm happy to leave it as-is for now. Your point about expecting users to know how to write a JSON schema is fair (although it seems that it's easy to mess up by accident in some subtle and non-obvious way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hope is that people writing JSON know json, and everyone else will use pydantic, which is less expressive and harder to shoot yourself in the foot with 😅
if len(allof_list) != 1: | ||
raise ValueError("Only support allOf with exactly one item") | ||
return lm + self.json(json_schema=allof_list[0]) | ||
assert not sibling_keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that these bare assertions (without even a text message) should never be hit, unless things are going really wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, there's a philosophy that "production code" should never have bare asserts, which I may agree with. But yes, these asserts are essentially "whoever wrote this code really messed up".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to try and find something better to raise in these situations if it would make you more comfortable. I have a (maybe bad) tendency to use asserts almost as comments to help myself reason about state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like it's good to leave them in as-is. I tend to like rechecking things too, since it's often the really simple bit of code that couldn't possibly be wrong.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few queries, but happy to keep this moving
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Implements framework for support of
allOf
by manually intersecting schemas. Tests are adapted from the JSON Schema test suite.Out of the box, this supports:
properties
, recursively merging duplicate properties usingallOf
required
listsadditionalProperties
with a recursiveallOf
items
with a recursiveallOf
type
senum
sminItems
,maxItems
,minLength
,maxLength
,pattern
,format
,minimum
,maximum
, ...allOf
will cause an exception, however we can extend this implementation in the futureallOf
,anyOf
,oneOf
, or$ref
by internally rewriting the node as anallOf
A couple of notes:
additionalProperties
of one schema with theproperties
of another in anallOf
. This is subtle and here is whereunevaluatedProperties
may start to become important. I say we punt and make incremental improvements hereadditionalProperties
note.)$ref
implementation again. TheallOf
tests revealed some edge cases I previously didn't know about. This caused some additional ugliness with passing around abase_uri
string, so I'll have a think on how to simplify this if possible. I think it's ok for now.