-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
FIX: assert_identical now considers xindexes + improve RangeIndex equals
#11035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thinking about @keewis comment: #11033 (comment) here I have it attempt to compare via |
assert_identical now considers xindexes
|
Oh no. I should have run the full test suite locally instead of just my new ones Not super stoked to go in and modify many tests 🤔 |
|
draft until tests are something resembling passing |
|
Ok so a number of the test failures are basically floating point accumulation errors. e.g.: import xarray as xr
from xarray.indexes import RangeIndex
# Create a RangeIndex-backed dataset
index = RangeIndex.arange(0.0, 1.0, 0.1, dim='x')
ds = xr.Dataset(coords=xr.Coordinates.from_xindex(index))
# Slice it
sliced = ds.isel(x=slice(1, 3))
# Create a fresh RangeIndex with the 'same' values
fresh_index = RangeIndex.arange(0.1, 0.3, 0.1, dim='x')
fresh = xr.Dataset(coords=xr.Coordinates.from_xindex(fresh_index))
# Compare the indexes
sliced_idx = sliced.xindexes['x']
fresh_idx = fresh.xindexes['x']
print('Both have the same coordinate values:')
print(f' sliced.x.values: {sliced.x.values}') # [0.1 0.2]
print(f' fresh.x.values: {fresh.x.values}') # [0.1 0.2]
print('But the internal RangeIndex state differs due to floating point:')
print(f' sliced: stop={sliced_idx.stop}, step={sliced_idx.step}')
# sliced: stop=0.30000000000000004, step=0.10000000000000002
print(f' fresh: stop={fresh_idx.stop}, step={fresh_idx.step}')
# fresh: stop=0.3, step=0.09999999999999999
print(f'sliced_idx.equals(fresh_idx): {sliced_idx.equals(fresh_idx)}') # False
print(f'sliced.identical(fresh): {sliced.identical(fresh)}') # Falsegives: so I have added a backwards compat that uses |
assert_identical now considers xindexesassert_identical now considers xindexes + improve RangeIndex equals
Without this you get: AssertionError: Left and right Dataset objects are not identical Differing indexes: L x IntervalIndex([(-1, 0], (0, 1]], dtype='interval[int64, right]', name='x') R x Index([(-1, 0], (0, 1]], dtype='object', name='x')
just matching what it is set to above. this was not caught before by assert_identical
|
I would support making incremental changes if that lets us make changes — e.g. make the change to the function, fix a few of the tests, but then have an LLM set some flag and then future contributions can work through the 50 places... |
I got sucked into a rhythm. I've fixed most of the issues and left a commit by commit breakdown in the first comment. The remaining ones I ran out of steam to fully fix are the changes in test_units and test_dataset which i use an escape hatch with a TODO: 27b4275 |
| return compat | ||
|
|
||
|
|
||
| def diff_indexes_repr(a_indexes, b_indexes, col_width: int = 20) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented something very similar: https://github.com/xarray-contrib/xdggs/blob/52d8b1dd23bf809757c7e3f5c04945129f6905af/xdggs/tests/__init__.py#L64-L105
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh neat! I'll take a close look tomorrow. Is there anything different we should do here that would have made your xdggs use cases easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's not much that's different (the diff formatting is slightly different). However, compared to indexes_equal it may be worth grouping indexes with indexes.group_by_index() (which would mean we don't have to worry about caching)
| try: | ||
| a_repr = inline_index_repr( | ||
| a_indexes.to_pandas_indexes()[key], max_width=70 | ||
| ) | ||
| b_repr = inline_index_repr( | ||
| b_indexes.to_pandas_indexes()[key], max_width=70 | ||
| ) | ||
| except TypeError: | ||
| # Custom indexes may not support to_pandas_index() | ||
| a_repr = repr(a_idx) | ||
| b_repr = repr(b_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth calling index._repr_inline_(max_width=70) with a fallback to repr(index)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this well defined API for a custom index to support? Def happy to add it, just also wondering if the knoweldge of that being helpful is (or should be) written down somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already use those in inline_index_repr, so yes, this should be well defined.
This should definitely be part of the custom index development page, and worth adding if it is not already part of that.
|
see: 27b4275 I'm happy to open an issue about this instead of fixing here. If I understand correctly that there is a bug here? |
|
that would be great, thanks. In the long run I'd like to replace those with the tests in xarray-array-testing but didn't have time to make progress on that. |
|
thanks @ianhi ! |
identicaland friends now also compare xindexes. Checking that they are__equals__Note for Reviewers
This PR contains the initial fix for
assert_identical. However this uncovered several would have been failures in tests. The rest of this PR is correcting those issues. I tried to keep them in separate commits as much as possible.RangeIndex - dbb649a
RangeIndex was failing assert_identical due to floating point error accumulation after slicing. I updated the RangeIndex equals method to use
np.iscloseby default, but with the ability to fall back to exact comparisontest_concat - dcfc46e
It seems like this test was not checking the correct behavior. I think this changed at some point after #6385 and it wasn't caught by the identical check.
groupby intervalindex - 1a1219a
similar to concat. the current behavior is that an intervalIndex gets constructed.
timedelta dtype - 6214d4f
I think this was a typo that wasn't caught. the data is encoded with
sso I would expect it to come back withsnotnsThe thing I feel least confident in is how nice the formatting of the assertion error diff looks like. Some examples (from the tests):
🤖 Ideas and directions mine, typing by claude. I went through a few rounds of local review and revision before opening.
assert_identicalalso compare indexes? #11033whats-new.rstapi.rst