Skip to content

Conversation

@knassre-bodo
Copy link
Contributor

@knassre-bodo knassre-bodo commented Oct 30, 2025

Adds an optimization pass which substitutes columns used by a join from one side to the other when passing through a join key. This is done under a certain set of conditions to ensure that the one side of the join no longer has any of its columns being used, thus ensuring that under those same conditions, the join can be optimized out by the column pruner.

@knassre-bodo knassre-bodo marked this pull request as ready for review November 7, 2025 18:21
@knassre-bodo knassre-bodo requested review from a team, hadia206, john-sanchez31 and juankx-bodo and removed request for a team November 7, 2025 18:22
Copy link
Contributor

@hadia206 hadia206 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Only have some questions

return False
# The current level is fine, so check any levels above it next.
return True if self.parent is None else self.parent.always_exists()
return True if self.parent is None else self.parent.is_singular()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Contributor Author

@knassre-bodo knassre-bodo Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug in the previous version that I happened to notice. This function is checking if a hybrid tree is singular with regards to its parent context, which mis true if the current level is singular + all levels above it are also singular.

lhs_refs = {
ref
for ref in col_refs
if ref.input_name == join.default_input_aliases[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, how the default_input_aliases identify LHS refs?
I thought it includes both LHS and RHS inputs

Copy link
Contributor Author

@knassre-bodo knassre-bodo Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this is using join.default_input_aliases[0], which is the LHS name, while join.default_input_aliases[1] is the RHS name.

Copy link
Contributor

@john-sanchez31 john-sanchez31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job with this one!

@@ -0,0 +1,101 @@
"""
Logic for switching references to join keys from one side of a join to the other
when certain conditions are met, thus allowing the join to be removed by the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add in a really high level what are these conditions?

JOIN(condition=t0.s_suppkey == t1.ps_suppkey, type=ANTI, columns={'s_name': t0.s_name})
SCAN(table=tpch.SUPPLIER, columns={'s_name': s_name, 's_suppkey': s_suppkey})
JOIN(condition=t0.ps_partkey == t1.p_partkey, type=INNER, cardinality=SINGULAR_FILTER, reverse_cardinality=PLURAL_FILTER, columns={'ps_suppkey': t0.ps_suppkey})
JOIN(condition=t0.ps_partkey == t1.p_partkey, type=INNER, cardinality=SINGULAR_FILTER, reverse_cardinality=PLURAL_ACCESS, columns={'ps_suppkey': t0.ps_suppkey})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to this optimization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidental from the bugfix in hybrid_tree.py

Base automatically changed from kian/join_aggregate_transpose to main November 24, 2025 18:35
@knassre-bodo knassre-bodo merged commit 3dd37c4 into main Nov 26, 2025
12 checks passed
@knassre-bodo knassre-bodo deleted the kian/join_key_substitution branch November 26, 2025 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants