`Dataset.eval` works with >2 dims #11064

max-sixty · 2025-12-31T23:11:30Z

this ended up being a much bigger effort than expected

we needed to leave behind pandas' implementation for Dataset.eval because it's limited to 2 dims
we keep the pandas' implementation for .query, because we should be more careful about changing that, it uses numexpr which is fast, and doesn't have a requirement for > 2 dims
so then I added the code that kept it consistent with the .query interface; e.g. and & or, etc

I added some similar constraints that pandas has around limiting what eval can do. I'm not that confident that it's robust. and not sure how valuable it is.

most of the added code is tests

Commentary from Claude below (+ Claude wrote the code, for transparency, albeit with lots of oversight)

This commit removes the dependency on pandas.eval() and implements a native expression evaluator in Dataset.eval() using Python's ast module. The new implementation provides better support for multi-dimensional arrays and maintains backward compatibility with deprecated operators through automatic transformation.

Key changes:

Remove pd.eval() call and replace with custom _eval_expression() method
Add _LogicalOperatorTransformer to convert deprecated operators (and/or/not) to bitwise operators (&/|/~) that work element-wise on arrays
Implement automatic transformation of chained comparisons to explicit bitwise AND operations
Add security validation to block lambda expressions and private attributes
Emit FutureWarning for deprecated constructs (logical operators, chained comparisons, parser= argument)
Support assignment statements (target = expression) in eval()
Make data variables and coordinates take priority in namespace resolution
Provide safe builtins (abs, min, max, round, len, sum, pow, any, all, type constructors, iteration helpers) while blocking import, open, etc.
Add comprehensive test coverage including edge cases, error messages, dask compatibility, and security validation

Closes #xxxx
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

This commit removes the dependency on pandas.eval() and implements a native expression evaluator in Dataset.eval() using Python's ast module. The new implementation provides better support for multi-dimensional arrays and maintains backward compatibility with deprecated operators through automatic transformation. Key changes: - Remove pd.eval() call and replace with custom _eval_expression() method - Add _LogicalOperatorTransformer to convert deprecated operators (and/or/not) to bitwise operators (&/|/~) that work element-wise on arrays - Implement automatic transformation of chained comparisons to explicit bitwise AND operations - Add security validation to block lambda expressions and private attributes - Emit FutureWarning for deprecated constructs (logical operators, chained comparisons, parser= argument) - Support assignment statements (target = expression) in eval() - Make data variables and coordinates take priority in namespace resolution - Provide safe builtins (abs, min, max, round, len, sum, pow, any, all, type constructors, iteration helpers) while blocking __import__, open, etc. - Add comprehensive test coverage including edge cases, error messages, dask compatibility, and security validation

- Use pd.isna(ds["a"].values) instead of pd.isna(ds["a"]) since pandas type stubs don't have overloads for DataArray - Use abs() instead of np.abs() to get DataArray return type Co-authored-by: Claude <[email protected]>

The lambda and dunder restrictions emulate pd.eval() behavior rather than providing security guarantees. Pandas explicitly doesn't claim these as security measures. Co-authored-by: Claude <[email protected]>

xarray/core/dataset.py

Extract AST-based expression evaluation code to xarray/core/eval.py: - EVAL_BUILTINS dict - LogicalOperatorTransformer class - validate_expression function This addresses the review feedback to keep the Dataset class focused. Co-authored-by: Claude <[email protected]>

Extract eval tests from test_dataset.py to test_eval.py: - 35 tests covering basic functionality, error messages, edge cases, and dask - Mirrors the implementation structure (core/eval.py <-> tests/test_eval.py) - Reduces test_dataset.py by 574 lines Co-authored-by: Claude <[email protected]>

xarray/tests/test_eval.py

Illviljan · 2026-01-05T01:14:44Z

xarray/core/eval.py

+    __bool__(), which is ambiguous for multi-element arrays.
+    """
+
+    def visit_BoolOp(self, node: ast.BoolOp) -> ast.AST:


Why not snake_case? I'm surprised we don't have a linter that catches this.

This is a required convention from Python's ast.NodeTransformer. The visitor methods must be named visit_<NodeType> where <NodeType> matches the AST node class name exactly (e.g., BoolOp, UnaryOp, Compare). Using snake_case like visit_bool_op would break the visitor pattern - Python's ast module wouldn't find the methods.

[This is Claude Code on behalf of max-sixty]

Here's the implementation, the only documentation I managed to understand:
https://github.com/python/cpython/blob/d0e9f4445a0d9039e1a2367ecee376b4b3ba7593/Lib/ast.py#L502-L506

right — I think it needs to be visit_Foo, we can't change that. which the link seems to support?

method = 'visit_' + node.__class__.__name__

(this is Max himself!)

Address review feedback: - Convert TestEvalErrorMessages class to test_eval_error_* functions - Convert TestEvalEdgeCases class to test_eval_* functions - Convert TestEvalDask class to test_eval_dask_* functions This follows xarray's preference for standalone test functions over classes. Co-authored-by: Claude <[email protected]>

max-sixty · 2026-01-17T04:41:38Z

will merge but feel free to revert if anyone disagrees / even has a small reservation, and we can reopen (but I think reasonable to merge something like this after a couple of weeks of plan to merge, please let me know if this is too eager...)

max-sixty mentioned this pull request Dec 31, 2025

Dataset.eval does not support N-dimensional objects with N > 2 #11062

Closed

max-sixty and others added 3 commits January 1, 2026 10:57

Merge branch 'main' into eval

ca87f1b

Fix mypy errors in eval tests

cac8e5a

- Use pd.isna(ds["a"].values) instead of pd.isna(ds["a"]) since pandas type stubs don't have overloads for DataArray - Use abs() instead of np.abs() to get DataArray return type Co-authored-by: Claude <[email protected]>

Remove security framing, frame restrictions as pd.eval() compatibility

67f27c2

The lambda and dunder restrictions emulate pd.eval() behavior rather than providing security guarantees. Pandas explicitly doesn't claim these as security measures. Co-authored-by: Claude <[email protected]>

TomNicholas reviewed Jan 2, 2026

View reviewed changes

xarray/core/dataset.py Outdated Show resolved Hide resolved

max-sixty and others added 2 commits January 2, 2026 10:50

Illviljan reviewed Jan 5, 2026

View reviewed changes

max-sixty added the plan to merge Final call for comments label Jan 5, 2026

jsignell linked an issue Jan 12, 2026 that may be closed by this pull request

Dataset.eval does not support N-dimensional objects with N > 2 #11062

Closed

Merge branch 'main' into eval

3cf050c

max-sixty enabled auto-merge (squash) January 17, 2026 04:41

max-sixty merged commit f1526ac into pydata:main Jan 17, 2026
39 checks passed

max-sixty deleted the eval branch January 17, 2026 05:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`Dataset.eval` works with >2 dims #11064

`Dataset.eval` works with >2 dims #11064

max-sixty commented Dec 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Illviljan Jan 5, 2026

Uh oh!

max-sixty Jan 5, 2026

Uh oh!

Illviljan Jan 5, 2026

Uh oh!

max-sixty Jan 5, 2026

Uh oh!

max-sixty commented Jan 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Dataset.eval works with >2 dims #11064

Dataset.eval works with >2 dims #11064

Conversation

max-sixty commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Illviljan Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

max-sixty Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Illviljan Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

max-sixty Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

max-sixty commented Jan 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`Dataset.eval` works with >2 dims #11064

`Dataset.eval` works with >2 dims #11064

max-sixty commented Dec 31, 2025 •

edited

Loading