Skip to content

Conversation

@max-sixty
Copy link
Collaborator

@max-sixty max-sixty commented Dec 31, 2025

this ended up being a much bigger effort than expected

  • we needed to leave behind pandas' implementation for Dataset.eval because it's limited to 2 dims
  • we keep the pandas' implementation for .query, because we should be more careful about changing that, it uses numexpr which is fast, and doesn't have a requirement for > 2 dims
  • so then I added the code that kept it consistent with the .query interface; e.g. and & or, etc

I added some similar constraints that pandas has around limiting what eval can do. I'm not that confident that it's robust. and not sure how valuable it is.

most of the added code is tests

Commentary from Claude below (+ Claude wrote the code, for transparency, albeit with lots of oversight)


This commit removes the dependency on pandas.eval() and implements a native expression evaluator in Dataset.eval() using Python's ast module. The new implementation provides better support for multi-dimensional arrays and maintains backward compatibility with deprecated operators through automatic transformation.

Key changes:

  • Remove pd.eval() call and replace with custom _eval_expression() method
  • Add _LogicalOperatorTransformer to convert deprecated operators (and/or/not) to bitwise operators (&/|/~) that work element-wise on arrays
  • Implement automatic transformation of chained comparisons to explicit bitwise AND operations
  • Add security validation to block lambda expressions and private attributes
  • Emit FutureWarning for deprecated constructs (logical operators, chained comparisons, parser= argument)
  • Support assignment statements (target = expression) in eval()
  • Make data variables and coordinates take priority in namespace resolution
  • Provide safe builtins (abs, min, max, round, len, sum, pow, any, all, type constructors, iteration helpers) while blocking import, open, etc.
  • Add comprehensive test coverage including edge cases, error messages, dask compatibility, and security validation
  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

This commit removes the dependency on pandas.eval() and implements a native
expression evaluator in Dataset.eval() using Python's ast module. The new
implementation provides better support for multi-dimensional arrays and
maintains backward compatibility with deprecated operators through automatic
transformation.

Key changes:
- Remove pd.eval() call and replace with custom _eval_expression() method
- Add _LogicalOperatorTransformer to convert deprecated operators (and/or/not)
  to bitwise operators (&/|/~) that work element-wise on arrays
- Implement automatic transformation of chained comparisons to explicit
  bitwise AND operations
- Add security validation to block lambda expressions and private attributes
- Emit FutureWarning for deprecated constructs (logical operators, chained
  comparisons, parser= argument)
- Support assignment statements (target = expression) in eval()
- Make data variables and coordinates take priority in namespace resolution
- Provide safe builtins (abs, min, max, round, len, sum, pow, any, all, type
  constructors, iteration helpers) while blocking __import__, open, etc.
- Add comprehensive test coverage including edge cases, error messages, dask
  compatibility, and security validation
max-sixty and others added 3 commits January 1, 2026 10:57
- Use pd.isna(ds["a"].values) instead of pd.isna(ds["a"]) since pandas
  type stubs don't have overloads for DataArray
- Use abs() instead of np.abs() to get DataArray return type

Co-authored-by: Claude <[email protected]>
The lambda and dunder restrictions emulate pd.eval() behavior rather than
providing security guarantees. Pandas explicitly doesn't claim these as
security measures.

Co-authored-by: Claude <[email protected]>
max-sixty and others added 2 commits January 2, 2026 10:50
Extract AST-based expression evaluation code to xarray/core/eval.py:
- EVAL_BUILTINS dict
- LogicalOperatorTransformer class
- validate_expression function

This addresses the review feedback to keep the Dataset class focused.

Co-authored-by: Claude <[email protected]>
Extract eval tests from test_dataset.py to test_eval.py:
- 35 tests covering basic functionality, error messages, edge cases, and dask
- Mirrors the implementation structure (core/eval.py <-> tests/test_eval.py)
- Reduces test_dataset.py by 574 lines

Co-authored-by: Claude <[email protected]>
__bool__(), which is ambiguous for multi-element arrays.
"""

def visit_BoolOp(self, node: ast.BoolOp) -> ast.AST:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not snake_case? I'm surprised we don't have a linter that catches this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a required convention from Python's ast.NodeTransformer. The visitor methods must be named visit_<NodeType> where <NodeType> matches the AST node class name exactly (e.g., BoolOp, UnaryOp, Compare). Using snake_case like visit_bool_op would break the visitor pattern - Python's ast module wouldn't find the methods.

[This is Claude Code on behalf of max-sixty]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the implementation, the only documentation I managed to understand:
https://github.com/python/cpython/blob/d0e9f4445a0d9039e1a2367ecee376b4b3ba7593/Lib/ast.py#L502-L506

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right — I think it needs to be visit_Foo, we can't change that. which the link seems to support?

        method = 'visit_' + node.__class__.__name__

(this is Max himself!)

Address review feedback:
- Convert TestEvalErrorMessages class to test_eval_error_* functions
- Convert TestEvalEdgeCases class to test_eval_* functions
- Convert TestEvalDask class to test_eval_dask_* functions

This follows xarray's preference for standalone test functions over classes.

Co-authored-by: Claude <[email protected]>
@max-sixty max-sixty added the plan to merge Final call for comments label Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plan to merge Final call for comments

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants