-
Notifications
You must be signed in to change notification settings - Fork 3
Delete redundant joins by substitution equi-join keys for their mirror to render one side pruneable #450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hadia206
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Only have some questions
| return False | ||
| # The current level is fine, so check any levels above it next. | ||
| return True if self.parent is None else self.parent.always_exists() | ||
| return True if self.parent is None else self.parent.is_singular() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug in the previous version that I happened to notice. This function is checking if a hybrid tree is singular with regards to its parent context, which mis true if the current level is singular + all levels above it are also singular.
| lhs_refs = { | ||
| ref | ||
| for ref in col_refs | ||
| if ref.input_name == join.default_input_aliases[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, how the default_input_aliases identify LHS refs?
I thought it includes both LHS and RHS inputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this is using join.default_input_aliases[0], which is the LHS name, while join.default_input_aliases[1] is the RHS name.
john-sanchez31
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job with this one!
| @@ -0,0 +1,101 @@ | |||
| """ | |||
| Logic for switching references to join keys from one side of a join to the other | |||
| when certain conditions are met, thus allowing the join to be removed by the | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add in a really high level what are these conditions?
| JOIN(condition=t0.s_suppkey == t1.ps_suppkey, type=ANTI, columns={'s_name': t0.s_name}) | ||
| SCAN(table=tpch.SUPPLIER, columns={'s_name': s_name, 's_suppkey': s_suppkey}) | ||
| JOIN(condition=t0.ps_partkey == t1.p_partkey, type=INNER, cardinality=SINGULAR_FILTER, reverse_cardinality=PLURAL_FILTER, columns={'ps_suppkey': t0.ps_suppkey}) | ||
| JOIN(condition=t0.ps_partkey == t1.p_partkey, type=INNER, cardinality=SINGULAR_FILTER, reverse_cardinality=PLURAL_ACCESS, columns={'ps_suppkey': t0.ps_suppkey}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related to this optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incidental from the bugfix in hybrid_tree.py
Co-authored-by: Hadia Ahmed <[email protected]>
for more information, see https://pre-commit.ci
Adds an optimization pass which substitutes columns used by a join from one side to the other when passing through a join key. This is done under a certain set of conditions to ensure that the one side of the join no longer has any of its columns being used, thus ensuring that under those same conditions, the join can be optimized out by the column pruner.