-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Refactor Dataset.map to merge attrs instead of copying #11020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Update the `keep_attrs` behavior in `Dataset.map()` and `DataTree.map()` to merge attributes from the original and function results using the `drop_conflicts` strategy, rather than unconditionally copying original attrs. When `keep_attrs=True`, matching attrs are kept and conflicting attrs are dropped. When `keep_attrs=False`, only attrs set by the function are retained. Add comprehensive tests for the new attr merging behavior.
Weighted operations internally propagate attrs from weights through computations like dot(). When keep_attrs=False is passed, users expect no attrs on the result, but attrs from weights were leaking through. Clear attrs explicitly in _implementation when keep_attrs is False. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The `test-nightly` environment uses pandas nightly wheels from PyPI, which currently don't have win-64 builds available. This causes `pixi lock` to fail when solving for all platforms. RTD builds fail because they have no lock file cache (unlike GitHub Actions CI which caches pixi.lock). When RTD runs `pixi install -e doc`, pixi must generate the lock file from scratch, which fails on the unsolvable test-nightly/win-64 combination. This restriction can be removed once pandas nightly provides win-64 wheels again. Co-authored-by: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
I'll pipe in to say that this PR greatly reduces the number of test failures MetPy has with the latest xarray, so it's a 👍 from me for what that's worth. |
| if keep_attrs: | ||
| # Merge attrs from function result and original, dropping conflicts | ||
| from xarray.structure.merge import merge_attrs | ||
|
|
||
| for k, v in variables.items(): | ||
| v._copy_attrs_from(self.data_vars[k]) | ||
| v.attrs = merge_attrs( | ||
| [v.attrs, self.data_vars[k].attrs], "drop_conflicts" | ||
| ) | ||
| for k, v in coords.items(): | ||
| if k in self.coords: | ||
| v._copy_attrs_from(self.coords[k]) | ||
| else: | ||
| for v in variables.values(): | ||
| v.attrs = {} | ||
| for v in coords.values(): | ||
| v.attrs = {} | ||
| v.attrs = merge_attrs( | ||
| [v.attrs, self.coords[k].attrs], "drop_conflicts" | ||
| ) | ||
| # When keep_attrs=False, leave attrs as the function returned them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with interpreting keep_attrs=False as "leave the attrs as returned by the function" is that that means we don't have keep_attrs="drop" anymore.
I'd argue that keep_attrs=True should be closer to what you're proposing for keep_attrs=False, which I do think would be more intuitive.
So instead we may need to consider supporting keep_attrs with strategy names / a strategy function, like apply_ufunc does. That would still allow you to choose "drop_conflicts" if preferred (or maybe as the default? Not sure), while not changing behavior too drastically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the proposed code treats keep_attrs=False as "remove all the input attrs". but not "remove all the output attrs".
@keewis can you see a reasonable change to fix the immediate issue without adding a whole strategy to keep_attrs? I don't have a particularly strong view on this specific implementation, but it does seem reasonable / logical, and it does let us solve this immediate bug...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(zooming out — as I mentioned before, for me the best "blank-slate" implementation for keep_attrs is to mostly not have a the option at all, and folks can drop attrs if they want. though I agree with you that merging is case that neither approach handles well...)
fixes #11019
I'm not sure this is the best solution for the specific case, but it's at least consistent with our existing behavior (I think? not super confident), and is quite reasonable behavior.
other options:
keep_attrs=Falsekeep_attrs=TrueChanges:
Dataset.map()/DataTree.map(): Whenkeep_attrs=True, merge attrs from function result and original usingdrop_conflicts(matching attrs kept, conflicting attrs dropped). Whenkeep_attrs=False, leave attrs as the function returned them.Weighted operations: Explicitly clear attrs when
keep_attrs=False, since internal computations (likedot) propagate attrs from weights.whats-new.rstapi.rst