Skip to content

Conversation

@ianhi
Copy link
Contributor

@ianhi ianhi commented Dec 18, 2025

identical and friends now also compare xindexes. Checking that they are

  1. the same type
  2. are equal if they define __equals__
  3. fallback to comparing their coords otherwise.

Note for Reviewers

This PR contains the initial fix for assert_identical. However this uncovered several would have been failures in tests. The rest of this PR is correcting those issues. I tried to keep them in separate commits as much as possible.

RangeIndex - dbb649a

RangeIndex was failing assert_identical due to floating point error accumulation after slicing. I updated the RangeIndex equals method to use np.isclose by default, but with the ability to fall back to exact comparison

test_concat - dcfc46e

It seems like this test was not checking the correct behavior. I think this changed at some point after #6385 and it wasn't caught by the identical check.

groupby intervalindex - 1a1219a

similar to concat. the current behavior is that an intervalIndex gets constructed.

timedelta dtype - 6214d4f

I think this was a typo that wasn't caught. the data is encoded with s so I would expect it to come back with s not ns


The thing I feel least confident in is how nice the formatting of the assertion error diff looks like. Some examples (from the tests):

AssertionError: Left and right Dataset objects are not identical
Indexes only on the right object: ['time_metadata']
E       AssertionError: Left and right DataArray objects are not identical
E       Differing indexes:
E       L   group_bins           IntervalIndex([(-2, -1], (-1, 0], (0, 1], (1, 2]], dtype='interval[int64, right]', name='group_bins')
E       R   group_bins           Index([(-2, -1], (-1, 0], (0, 1], (1, 2]], dtype='object', name='group_bins')
AssertionError: Left and right Dataset objects are not identical
Differing indexes:
    Indexes only on the left object: ['x']
    Indexes only on the right object: ['y']
AssertionError: Left and right Dataset objects are not identical
Differing indexes:
    Differing index types: ['x: PandasIndex vs CustomIndex']
AssertionError: Left and right Dataset objects are not identical
Differing coordinates:
L * x        (z) int64 32B 10 10 20 20
R * x        (z) int64 32B 10 20 10 20
L * y        (z) <U1 16B 'a' 'b' 'a' 'b'
R * y        (z) <U1 16B 'a' 'a' 'b' 'b'
L * z        (z) object 32B MultiIndex
R * z        (z) object 32B MultiIndex
Differing data variables:
L   data     (z) int64 32B 1 2 3 4
R   data     (z) int64 32B 1 3 2 4
Differing indexes:
    Differing index values: ['x', 'y', 'z']

🤖 Ideas and directions mine, typing by claude. I went through a few rounds of local review and revision before opening.

@ianhi
Copy link
Contributor Author

ianhi commented Dec 18, 2025

Thinking about @keewis comment: #11033 (comment)

here I have it attempt to compare via __equals__ if available, and only then fall back to coord level compare. But I have not changed the indexes_equal to do a similar thing. We can add that here if you'd like, or leave that function be

@ianhi ianhi changed the title True ident FIX: assert_identical now considers xindexes Dec 18, 2025
@ianhi ianhi changed the title FIX: assert_identical now considers xindexes FIX: assert_identical now considers xindexes Dec 18, 2025
@ianhi
Copy link
Contributor Author

ianhi commented Dec 18, 2025

Oh no. I should have run the full test suite locally instead of just my new ones

Not super stoked to go in and modify many tests 🤔

FAILED xarray/tests/test_backends.py::TestGenericNetCDFData::test_roundtrip_timedelta_data - AssertionError: Left and right Dataset objects are not identical
Indexes with differing values: ['td']
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_roundtrip_timedelta_data - AssertionError: Left and right Dataset objects are not identical
Indexes with differing values: ['td']
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_roundtrip_timedelta_data - AssertionError: Left and right Dataset objects are not identical
Indexes with differing values: ['td']
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_roundtrip_timedelta_data - AssertionError: Left and right Dataset objects are not identical
Indexes with differing values: ['td']
FAILED xarray/tests/test_formatting.py::TestFormatting::test_diff_array_repr - AssertionError: assert 'Left and rig...ription: desc' == 'Left and rig...ription: desc'
  
  Skipping 401 identical leading characters in diff, use -v to show
    4 16B 1 2
  + Indexes only on the left object:  ['y']
  + Indexes with differing values: ['x']
    Differing attributes:
    L   units: m
    R   units: kg
    Attributes only on the left object:
        description: desc
FAILED xarray/tests/test_formatting.py::TestFormatting::test_diff_dataset_repr - AssertionError: assert 'Left and rig...ription: desc' == 'Left and rig...ription: desc'
  
  Skipping 606 identical leading characters in diff, use -v to show
    4 16B 3 4
  + Indexes only on the left object:  ['y']
  + Indexes with differing values: ['x']
    Differing attributes:
    L   title: mytitle
    R   title: newtitle
    Attributes only on the left object:
        description: desc
FAILED xarray/tests/test_concat.py::TestConcatDataset::test_concat_promote_shape_with_scalar_coordinates - AssertionError: Left and right Dataset objects are not identical
Indexes with differing values: ['x']
FAILED xarray/tests/test_groupby.py::TestDataArrayGroupBy::test_groupby_bins_multidim - AssertionError: Left and right DataArray objects are not identical
Indexes with differing values: ['group_bins']
FAILED xarray/tests/test_dataset.py::TestDataset::test_rename_dims - AssertionError: Left and right Dataset objects are not identical
Indexes only on the right object: ['x']
FAILED xarray/tests/test_dataset.py::TestDataset::test_rename_vars - AssertionError: Left and right Dataset objects are not identical
Indexes only on the right object: ['x_new']
FAILED xarray/tests/test_dataset.py::TestDataset::test_expand_dims_create_index_from_iterable - AssertionError: Left and right Dataset objects are not identical
Indexes only on the left object:  ['x']
FAILED xarray/tests/test_dataset.py::TestDataset::test_to_and_from_dict_with_nan_nat[array] - AssertionError: Left and right Dataset objects are not identical
Indexes with differing values: ['t']
FAILED xarray/tests/test_groupby.py::test_multiple_groupers_mixed[True-True] - AssertionError: Left and right Dataset objects are not identical
Indexes with differing values: ['x_bins']
FAILED xarray/tests/test_groupby.py::test_multiple_groupers_mixed[True-False] - AssertionError: Left and right Dataset objects are not identical
Indexes with differing values: ['x_bins']
FAILED xarray/tests/test_range_index.py::test_range_index_isel - AssertionError: Left and right Dataset objects are not identical
Indexes with differing values: ['x']
FAILED xarray/tests/test_range_index.py::test_range_index_sel - AssertionError: Left and right Dataset objects are not identical
Indexes with differing values: ['x']
FAILED xarray/tests/test_groupby.py::test_multiple_groupers_mixed[False-True] - AssertionError: Left and right Dataset objects are not identical
Indexes with differing values: ['x_bins']
FAILED xarray/tests/test_groupby.py::test_multiple_groupers_mixed[False-False] - AssertionError: Left and right Dataset objects are not identical
Indexes with differing values: ['x_bins']

@ianhi ianhi marked this pull request as draft December 18, 2025 21:03
@ianhi
Copy link
Contributor Author

ianhi commented Dec 18, 2025

draft until tests are something resembling passing

@ianhi
Copy link
Contributor Author

ianhi commented Dec 18, 2025

Ok so a number of the test failures are basically floating point accumulation errors. e.g.:

import xarray as xr
from xarray.indexes import RangeIndex

# Create a RangeIndex-backed dataset
index = RangeIndex.arange(0.0, 1.0, 0.1, dim='x')
ds = xr.Dataset(coords=xr.Coordinates.from_xindex(index))

# Slice it
sliced = ds.isel(x=slice(1, 3))

# Create a fresh RangeIndex with the 'same' values
fresh_index = RangeIndex.arange(0.1, 0.3, 0.1, dim='x')
fresh = xr.Dataset(coords=xr.Coordinates.from_xindex(fresh_index))

# Compare the indexes
sliced_idx = sliced.xindexes['x']
fresh_idx = fresh.xindexes['x']

print('Both have the same coordinate values:')
print(f'  sliced.x.values: {sliced.x.values}')  # [0.1 0.2]
print(f'  fresh.x.values:  {fresh.x.values}')   # [0.1 0.2]

print('But the internal RangeIndex state differs due to floating point:')
print(f'  sliced: stop={sliced_idx.stop}, step={sliced_idx.step}')
# sliced: stop=0.30000000000000004, step=0.10000000000000002
print(f'  fresh:  stop={fresh_idx.stop}, step={fresh_idx.step}')
# fresh:  stop=0.3, step=0.09999999999999999

print(f'sliced_idx.equals(fresh_idx): {sliced_idx.equals(fresh_idx)}')  # False
print(f'sliced.identical(fresh): {sliced.identical(fresh)}')  # False

gives:

Both have the same coordinate values:
  sliced.x.values: [0.1 0.2]
  fresh.x.values:  [0.1 0.2]
But the internal RangeIndex state differs due to floating point:
  sliced: stop=0.30000000000000004, step=0.10000000000000002
  fresh:  stop=0.3, step=0.09999999999999999
sliced_idx.equals(fresh_idx): False
sliced.identical(fresh): False

so I have added a backwards compat that uses check_default_indexes=False to imply not checking indexes for identicalness. This make far fewer tests fail which is nice. but it probably remains worthwhile to go through them all one by one and see

@ianhi ianhi changed the title FIX: assert_identical now considers xindexes FIX: assert_identical now considers xindexes + improve RangeIndex equals Dec 18, 2025
Without this you get:

AssertionError: Left and right Dataset objects are not identical
Differing indexes:
L   x                    IntervalIndex([(-1, 0], (0, 1]],
dtype='interval[int64, right]', name='x')
R   x                    Index([(-1, 0], (0, 1]], dtype='object',
name='x')
just matching what it is set to above. this was not caught before by
assert_identical
@max-sixty
Copy link
Collaborator

I would support making incremental changes if that lets us make changes — e.g. make the change to the function, fix a few of the tests, but then have an LLM set some flag check_indexes=False and a TODO in 50 places

and then future contributions can work through the 50 places...

@ianhi
Copy link
Contributor Author

ianhi commented Dec 18, 2025

I would support making incremental changes if that lets us make changes — e.g. make the change to the function, fix a few of the tests, but then have an LLM set some flag check_indexes=False and a TODO in 50 places

and then future contributions can work through the 50 places...

I got sucked into a rhythm. I've fixed most of the issues and left a commit by commit breakdown in the first comment. The remaining ones I ran out of steam to fully fix are the changes in test_units and test_dataset which i use an escape hatch with a TODO: 27b4275

return compat


def diff_indexes_repr(a_indexes, b_indexes, col_width: int = 20) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh neat! I'll take a close look tomorrow. Is there anything different we should do here that would have made your xdggs use cases easier?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's not much that's different (the diff formatting is slightly different). However, compared to indexes_equal it may be worth grouping indexes with indexes.group_by_index() (which would mean we don't have to worry about caching)

Comment on lines +1010 to +1020
try:
a_repr = inline_index_repr(
a_indexes.to_pandas_indexes()[key], max_width=70
)
b_repr = inline_index_repr(
b_indexes.to_pandas_indexes()[key], max_width=70
)
except TypeError:
# Custom indexes may not support to_pandas_index()
a_repr = repr(a_idx)
b_repr = repr(b_idx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth calling index._repr_inline_(max_width=70) with a fallback to repr(index)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this well defined API for a custom index to support? Def happy to add it, just also wondering if the knoweldge of that being helpful is (or should be) written down somewhere

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already use those in inline_index_repr, so yes, this should be well defined.

This should definitely be part of the custom index development page, and worth adding if it is not already part of that.

@keewis
Copy link
Collaborator

keewis commented Dec 18, 2025

I ran out of steam to fully fix are the changes in test_units

test_units predates the custom indexes, which means it tries not to create any indexes (units would get stripped by the pandas index). If there are indexes anyways it might be worth marking (those would be bugs)

@ianhi ianhi marked this pull request as ready for review December 18, 2025 22:44
@ianhi
Copy link
Contributor Author

ianhi commented Dec 18, 2025

If there are indexes anyways it might be worth marking (those would be bugs)

see: 27b4275

I'm happy to open an issue about this instead of fixing here. If I understand correctly that there is a bug here?

@keewis
Copy link
Collaborator

keewis commented Dec 18, 2025

that would be great, thanks. In the long run I'd like to replace those with the tests in xarray-array-testing but didn't have time to make progress on that.

@max-sixty max-sixty added the plan to merge Final call for comments label Dec 19, 2025
@max-sixty max-sixty merged commit 0c07685 into pydata:main Dec 21, 2025
52 checks passed
@max-sixty
Copy link
Collaborator

thanks @ianhi !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plan to merge Final call for comments topic-indexing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Should assert_identical also compare indexes?

3 participants