Skip to content

Conversation

@max-sixty
Copy link
Collaborator

@max-sixty max-sixty commented Dec 16, 2025

fixes #11019

I'm not sure this is the best solution for the specific case, but it's at least consistent with our existing behavior (I think? not super confident), and is quite reasonable behavior.

other options:

  • drop all attrs on keep_attrs=False
  • use a dict-like merge on keep_attrs=True

Changes:

  • Dataset.map() / DataTree.map(): When keep_attrs=True, merge attrs from function result and original using drop_conflicts (matching attrs kept, conflicting attrs dropped). When keep_attrs=False, leave attrs as the function returned them.

  • Weighted operations: Explicitly clear attrs when keep_attrs=False, since internal computations (like dot) propagate attrs from weights.

Update the `keep_attrs` behavior in `Dataset.map()` and `DataTree.map()` to
merge attributes from the original and function results using the
`drop_conflicts` strategy, rather than unconditionally copying original attrs.

When `keep_attrs=True`, matching attrs are kept and conflicting attrs are
dropped. When `keep_attrs=False`, only attrs set by the function are retained.

Add comprehensive tests for the new attr merging behavior.
@github-actions github-actions bot added the topic-DataTree Related to the implementation of a DataTree class label Dec 16, 2025
max-sixty and others added 4 commits December 15, 2025 22:19
Weighted operations internally propagate attrs from weights through
computations like dot(). When keep_attrs=False is passed, users expect
no attrs on the result, but attrs from weights were leaking through.

Clear attrs explicitly in _implementation when keep_attrs is False.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The `test-nightly` environment uses pandas nightly wheels from PyPI,
which currently don't have win-64 builds available. This causes
`pixi lock` to fail when solving for all platforms.

RTD builds fail because they have no lock file cache (unlike GitHub
Actions CI which caches pixi.lock). When RTD runs `pixi install -e doc`,
pixi must generate the lock file from scratch, which fails on the
unsolvable test-nightly/win-64 combination.

This restriction can be removed once pandas nightly provides win-64
wheels again.

Co-authored-by: Claude <[email protected]>
@dopplershift
Copy link
Contributor

I'll pipe in to say that this PR greatly reduces the number of test failures MetPy has with the latest xarray, so it's a 👍 from me for what that's worth.

Comment on lines 6964 to +6977
if keep_attrs:
# Merge attrs from function result and original, dropping conflicts
from xarray.structure.merge import merge_attrs

for k, v in variables.items():
v._copy_attrs_from(self.data_vars[k])
v.attrs = merge_attrs(
[v.attrs, self.data_vars[k].attrs], "drop_conflicts"
)
for k, v in coords.items():
if k in self.coords:
v._copy_attrs_from(self.coords[k])
else:
for v in variables.values():
v.attrs = {}
for v in coords.values():
v.attrs = {}
v.attrs = merge_attrs(
[v.attrs, self.coords[k].attrs], "drop_conflicts"
)
# When keep_attrs=False, leave attrs as the function returned them
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with interpreting keep_attrs=False as "leave the attrs as returned by the function" is that that means we don't have keep_attrs="drop" anymore.

I'd argue that keep_attrs=True should be closer to what you're proposing for keep_attrs=False, which I do think would be more intuitive.

So instead we may need to consider supporting keep_attrs with strategy names / a strategy function, like apply_ufunc does. That would still allow you to choose "drop_conflicts" if preferred (or maybe as the default? Not sure), while not changing behavior too drastically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the proposed code treats keep_attrs=False as "remove all the input attrs". but not "remove all the output attrs".

@keewis can you see a reasonable change to fix the immediate issue without adding a whole strategy to keep_attrs? I don't have a particularly strong view on this specific implementation, but it does seem reasonable / logical, and it does let us solve this immediate bug...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(zooming out — as I mentioned before, for me the best "blank-slate" implementation for keep_attrs is to mostly not have a the option at all, and folks can drop attrs if they want. though I agree with you that merging is case that neither approach handles well...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs review topic-DataTree Related to the implementation of a DataTree class

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problem using assign_attrs() in map()

3 participants