Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Implement inequality joins by translating to cross + filter #17000

Draft
wants to merge 23 commits into
base: branch-24.12
Choose a base branch
from

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Oct 4, 2024

Description

Before working through the plumbing in pylibcudf for mixed and conditional joins and the ast evaluator, let's just support inequality joins by doing the basic thing.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@wence-
Copy link
Contributor Author

wence- commented Oct 4, 2024

Needs pola-rs/polars#19104

@github-actions github-actions bot added Python Affects Python cuDF API. cudf.polars Issues specific to cudf.polars labels Oct 4, 2024
@wence-
Copy link
Contributor Author

wence- commented Oct 9, 2024

Needs pola-rs/polars#19104

Which is now merged but not yet released.

We will use this to provide infrastructure for making IR nodes easier
to traverse. Expr nodes already use this facility, but we want to
share it.
And tests of basic functionality.
This way we will be able to write generic traversals more easily.
Now that we have a uniform child attribute, this is easier.
We will use this for inequality joins and filter pushdown in the
parquet reader.

The handling is a bit complicated, since the subset of expressions
that the parquet filter accepts is smaller than all possible
expressions. Since much of the logic is similar, however, we just
dispatch on a transformer state variable to determine which case we're
handling.
We attempt to turn the predicate into a filter expression that the
parquet reader understands. If successful then we don't have to apply
the predicate as a post-filter.

We can only do this when a row index is not requested.
Before working through the plumbing in pylibcudf for mixed and
conditional joins and the ast evaluator, let's just support inequality
joins by doing the basic thing.
Expressions referring to the right table must be suffixed if the name
overlaps with that in the left table.
@wence- wence- force-pushed the wence/fea/16926-polars-iejoin branch from fdfe737 to 4350006 Compare October 16, 2024 16:01
@github-actions github-actions bot added the pylibcudf Issues specific to the pylibcudf package label Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf.polars Issues specific to cudf.polars pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant