-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add field validation on structs #52
Add field validation on structs #52
Conversation
c27ced8
to
8e6ccec
Compare
I think this is a seriously elegant solution, and I really hope we can make it work for the general case. I think the skipped test error is related to the following example: Consider a nullable struct column: import patito as pt
class XY(pt.Model):
x: int
y: int
class Coord(pt.Model):
id: int
xy: Optional[XY] # <-- nullable What we are saying here is that we allow the following values for
But not
Now let's take a look at how polars materializes the above three column values: import polars as pl
schema = {"id": pl.Int64, "xy": pl.Struct({"x": pl.Int64, "y": pl.Int64})}
# note the difference between the two last elements in the struct:
data = {"id": [0, 1, 2], "xy": [{"x": 1, "y": 1}, None, {"x": None, "y": None}]}
df = pl.DataFrame(data, schema=schema)
print(df)
---
shape: (3, 2)
┌─────┬─────────────┐
│ id ┆ xy │
│ --- ┆ --- │
│ i64 ┆ struct[2] │
╞═════╪═════════════╡
│ 0 ┆ {1,1} │
│ 1 ┆ {null,null} │
│ 2 ┆ {null,null} │
└─────┴─────────────┘ I think I was expecting shape: (3, 2)
┌─────┬─────────────┐
│ id ┆ xy │
│ --- ┆ --- │
│ i64 ┆ struct[2] │
╞═════╪═════════════╡
│ 1 ┆ null │
│ 2 ┆ {null,null} │
└─────┴─────────────┘ I'm going to have to have a little think about how we do this one 😊 |
@thomasaarholt I've updated the PR to correctly select Since polars coerces
I'll likely update this PR to include a few more test cases for testing optional lists of structs and optional deeply nested structs, but I don't think the current fix will need to change. What do you think? |
Super! I overall agree with you - I've asked here on the discord just to make sure the implementation is correct. This is kinda an edge case, but someone might have an opinion. |
Hi any news on this PR? |
Let's go ahead and merge this, as well make a new release. @dsgibbons, if you want to make a few more test cases, especially demonstrating null behaviour, in another PR, then I'd appreciate it. |
Yes, I'm still looking at adding additional test cases @thomasaarholt. I've been a bit busy lately but hopefully I'll be able to add more test cases over the next few weeks. Thanks for accepting the PR! |
Closes #43.
This fix takes advantage of the existing validation logic by recursively calling
_find_errors
wheneverpl.Struct
orpl.List(pl.Struct)
columns are detected.This fix exposes a (possible) bug with one of the pre-existing test cases
test_model.test_missing_date_struct
. I've currently added a@pytest.mark.skip
to the failing test. Is this actually a bug, and if so, can I continue to skip this test case and raise a new issue to fix it? Do you have any insights into whyTest.model_fields['c'].annotation.columns
does not exist?