-
Notifications
You must be signed in to change notification settings - Fork 166
feat: Add assert_series_equal
#2983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
modin & cudf `clip` issue leading to `is_close` exceptionThe modin errors are related to `is_close` failing, which is because of `clip` with pyarrow integers (I couldn't really replicate with floats). And that brought up an old memory: https://github.com/modin-project/modin/issues/7415.Which now leads to the harsh part:
Once that's merged, I will take care of the issues in the tests with old pandas version is now solved with a workaround which enables |
|
def assert_series_equal( | ||
left: IntoSeriesT, | ||
right: IntoSeriesT, | ||
*, | ||
check_dtypes: bool = True, | ||
check_names: bool = True, | ||
check_order: bool = True, | ||
check_exact: bool = False, | ||
rel_tol: float = 1e-05, | ||
abs_tol: float = 1e-08, | ||
categorical_as_str: bool = False, | ||
) -> None: | ||
"""Assert that the left and right Series are equal. | ||
Raises a detailed `AssertionError` if the Series differ. | ||
This function is intended for use in unit tests. | ||
Arguments: | ||
left: The first Series to compare. | ||
right: The second Series to compare. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be accepting anything that isn't nw.Series
here
I'm not opposed to that being another function (e.g. assert_native_series_equal
), but this one should just be comparing - not doing conversion as well IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair (and thank for the commits fixing it). However, then we might need to be even more e.g. also check that the narwhals namespace version is the same. In principle the check:
narwhals/narwhals/testing/asserts/series.py
Lines 83 to 90 in 5d7040b
if any(not is_narwhals_series(obj) for obj in (left, right)): | |
msg = ( | |
"Expected `narwhals.Series` instance, found:\n" | |
f"[left]: {qualified_type_name(type(left))}\n" | |
f"[right]: {qualified_type_name(type(left))}\n\n" | |
"Hint: Use `nw.from_native(obj, series_only=True) to convert to `narwhals.Series`" | |
) | |
raise TypeError(msg) |
would pass for:
import narwhals.stable.v1 as nw_v1
import narwhals.stable.v2 as nw_v2
assert_series_equal(
nw_v1.from_native(...),
nw_v2.from_native(...),
...
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I may have been too quick suggesting this 😅
Coercing the right
or whatever is equivalent to expected
; might make sense for ergonomics. E.g. like our assert_equal_data
But I think allowing both inputs to be native is a bit much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing you've already checked this, but pandas
has quite a few of these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think allowing both inputs to be native is a bit much
I might disagree with this. I would prefer having the API to be symmetric in both left
and right
arguments, so either cast both or none of them. We can have both assert_series_equal
and assert_native_series_equal
as you suggested. The second is quite low effort after the first
Coercing the
right
or whatever is equivalent toexpected
; might make sense for ergonomics. E.g. like ourassert_equal_data
We can always create a tests utility function for better ergonomics!
I actually didn't check pandas at all 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might disagree with this. I would prefer having the API to be symmetric in both left and right arguments, so either cast both or none of them.
Totally fair!
I actually didn't check pandas at all 🙈
Now that surprised me!
I suspected the _check_*
naming scheme and usage of __tracebackhide__
came from there - but just an interesting coincidence it seems 😂
check_exact=check_exact, | ||
rel_tol=rel_tol, | ||
abs_tol=abs_tol, | ||
categorical_as_str=categorical_as_str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested cases need coverage for an inner Categorical
+ categorical_as_str
was surprised to see this didn't have an effect
diff --git a/narwhals/testing/asserts/series.py b/narwhals/testing/asserts/series.py
index 1521a335d..ca4e29c70 100644
--- a/narwhals/testing/asserts/series.py
+++ b/narwhals/testing/asserts/series.py
@@ -105,7 +105,6 @@ def assert_series_equal(
check_exact=check_exact,
rel_tol=rel_tol,
abs_tol=abs_tol,
- categorical_as_str=categorical_as_str,
)
else:
_check_approximate_values(left_vals, right_vals, rel_tol=rel_tol, abs_tol=abs_tol)
@@ -160,7 +159,6 @@ def _check_exact_values(
check_exact: bool,
rel_tol: float,
abs_tol: float,
- categorical_as_str: bool,
) -> None:
"""Check exact value equality for various data types."""
left_impl = left.implementation
@@ -182,7 +180,6 @@ def _check_exact_values(
check_exact=check_exact,
rel_tol=rel_tol,
abs_tol=abs_tol,
- categorical_as_str=categorical_as_str,
)
_check_list_like(left, right, left_dtype, right_dtype, check_fn=check_fn)
# If `_check_list_like` didn't raise, then every nested element is equal
@@ -196,7 +193,6 @@ def _check_exact_values(
check_exact=check_exact,
rel_tol=rel_tol,
abs_tol=abs_tol,
- categorical_as_str=categorical_as_str,
)
_check_struct(left, right, left_dtype, right_dtype, check_fn=check_fn)
# If `_check_struct` didn't raise, then every nested element is equal
assert_series_equal
assert_series_equal
def _check_metadata( | ||
left: SeriesT, right: SeriesT, *, check_dtypes: bool, check_names: bool | ||
) -> None: | ||
"""Check metadata information: implementation, length, dtype, and names.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you feel about moving every function from here onwards into utils
?
https://github.com/narwhals-dev/narwhals/blob/9a4b605da3e7ef83555c94c413c8648b39865a92/narwhals/testing/asserts/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a strong preference but slightly in favor of keeping it like this until we are aware that something can be re-used also for dataframes. The reason for this is that say we move _check_metadata
, it's very likely that dataframe metadata is composed of different check statements. Then we would end up with _check_series_metadata
and _check_frame_metadata
.
These functions are internal only, we can always move them around as we please
I do hope to get back to reviewing this soon - please nag me if I dont! 🙏 |
It falls back to `Series[Any]` already
Will add a thread on `SeriesDetail`
SeriesDetail: TypeAlias = Literal[ | ||
"implementation mismatch", | ||
"length mismatch", | ||
"dtype mismatch", | ||
"name mismatch", | ||
"null value mismatch", | ||
"exact value mismatch", | ||
"values not within tolerance", | ||
"nested value mismatch", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor(suggestion): Ensure consistent reasons?
I noticed a couple were used more than once - so I thought this might help us keep them consistent in the future? 🙂
If as you've planned, we add assert_frame_equal
, then we can probably share some of these between the two
For now it's just a lil bit of autocomplete 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would partially argue that this seems a bit of an overkill?
I can see the advantage of
For now it's just a lil bit of autocomplete
but otherwise if I was developing something and would not know any better, it would feel like I am constrained to use one of those detail value without a clear reason 🧐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add a # NOTE
by the alias saying to extend it when adding new features?
@MarcoGorelli gentle ping to get your review here :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks both, i haven't checked through everything but i'm on board with the idea, if you're both happy with it then feel free to ship it
Closes #3128
Description
This PR introduces the testing module (mirroring the polars structure) and
assert_series_equal
. It's a step towards: #739 and #2804.I think it's a good structure to leave space for other modules (let's say, a
constructors
module within it, see #2959 and in particular the discord conversation linked in the issue).Additional comments:
NotImplementedError
).assert_series_equal
since if this goes through, thenassert_frame_equal
can re-use it for the value checks. However I didn't want to bloat the PR with 1k+ lines changesassert_frame_equal
end up being implemented, I think we can have an issue to track usage for them in the test suite itselfWhat type of PR is this? (check all applicable)
Checklist
If you have comments or can explain your changes, please do so below