-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Refactor create_hashes to accept array references #18448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4737894 to
57f98ad
Compare
|
What is this series of PRs implementing? Is it for query anti-pattern in #18393 (comment), stat pruning is not working, so we're pushing down build-side dynamic filter like By the way I suggest to open an umbrella issue to describe the high-level ideas, now the discussion seems to be scattered around, and the work is hard to follow. |
Yes that is precisely it. I'm not sure what you mean by I'll put a summary and links to the relevant PRs in #17171 (comment). That said #17171 is already a pretty busy issue so it's buried at the bottom... I'm not sure what more I can do about that. |
Maybe
Perhaps edit the issue top description, or open a new one? 🤔 |
Okay yes agreed. Anti-pattern was only confusing to me because I thought it might be suggesting that pattern of query is wrong / that the query itself is wrong and had me thinking maybe my example was a bad one 😔
Hmm there's others working on the same issue I don't want to make my work sounds like the only solution. I'll add a more extensive description and cross references to the PRs tomorrow. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adriangb
datafusion/common/src/hash_utils.rs
Outdated
| /// | ||
| /// This is the same as [`create_hashes`] but accepts `&dyn Array`s instead of requiring | ||
| /// `ArrayRef`s. | ||
| pub fn create_hashes_from_arrays<'a>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can avoid this new function and instead update create_hashes with a little generics magic so it can take array references
Here is a PR that shows how it works
|
🤖 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am somewhat biased, but this looks good to me 👍
it looks like this PR has some conflicts that need to be resolved, and I am running some benchmarks on it just to be sure it doesn't impact performance (I don't expect that it would)
| Ok(()) | ||
| } | ||
|
|
||
| pub trait AsDynArray { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pub trait AsDynArray { | |
| /// Something that can be returned as a `&dyn Array` | |
| /// | |
| /// For some reason we can't use `AsRef<dyn Array>` because | |
| /// it is not implemented for`&dyn Array` | |
| pub trait AsDynArray { |
e50907a to
a14159a
Compare
Co-authored-by: Andrew Lamb <[email protected]>
a14159a to
da34dac
Compare
|
🤖: Benchmark completed Details
|
|
I'll reschedule to double check (no need to wait for merge) |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
it is strange that the second run also shows speedups. I agree that overall it looks good to me and shipping is the right solution here |
|
Not the same speedups / in the same queries though. I think it's just noise. Unfortunate that noise is +-25%... |
Yeah, I agree FYI @rluvaton had a great idea here: |
Background
This PR is part of an EPIC to push down hash table references from HashJoinExec into scans. The EPIC is tracked in #17171.
A "target state" is tracked in #18393.
There is a series of PRs to get us to this target state in smaller more reviewable changes that are still valuable on their own:
HashJoinExecand use CASE expressions for more precise filters #18451Changes in this PR
Change create_hashes and related functions to work with &dyn Array references instead of requiring ArrayRef (Arc-wrapped arrays). This avoids unnecessary Arc::clone() calls and enables calls that only have an &dyn Array to use the hashing utilities.