Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary FilterElementsNodes before measure aggregation #1506

Merged
merged 4 commits into from
Nov 6, 2024

Conversation

courtneyholcomb
Copy link
Contributor

We are working to dedupe logic in the dataflow plan builder in order to reduce bugs. There are currently three different paths the dataflow plan may take, each of which duplicates much of the dataflow plan building logic:

  • measure aggregation
  • secondary aggregation for conversion metrics
  • no-metric queries
    And I am in the process of adding one more path related to time spine nodes.

The differences in these paths is frequently the source of bugs that could be easily prevented. For example, forgetting to include a where filter spec in the original source node is a bug that frequently pops up for new paths in the dataflow plan.

Further up the stack, I'm going to dedupe a lot of the dataflow plan logic in order to prevent these bugs from popping up in the future. Specifically, I'll move all the logic after building a source node recipe and before applying aggregations to a helper function that can be reused in all paths (current or future). That will help to ensure that applying joins, joining custom granularities, applying where filters, and applying time constraints, always uses the correct logic.

Measure aggregation logic applies several FilterElementsNodes that do not actually impact the outcome of optimized queries. Other paths do not have these unnecessary nodes. Removing these nodes is a step needed to ensure measures use the same logic as other dataflow plans, which will then allow us to dedupe that logic.

Copy link

github-actions bot commented Nov 5, 2024

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@courtneyholcomb courtneyholcomb added Skip Changelog Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment labels Nov 5, 2024
@github-actions github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 5, 2024
Copy link
Contributor

@WilliamDee WilliamDee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks right to me! The constant filtering of nodes throughout does seem alil silly in the world where we have the optimizers, but it does seem to make the unoptimized plans abit more wild. Though I don't see a use case of us ever dropping the optimizer for whatever reason, so i think this is a good change!

Comment on lines -1663 to -1677
specs_to_keep_after_join = InstanceSpecSet(measure_specs=(measure_spec,)).merge(
InstanceSpecSet.create_from_specs(measure_recipe.all_linkable_specs_required_for_source_nodes.as_tuple),
)

after_join_filtered_node = FilterElementsNode.create(
parent_node=filtered_measures_with_joined_elements, include_specs=specs_to_keep_after_join
)
unaggregated_measure_node = after_join_filtered_node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: This logic is all unnecessary because it gets filtered out in the optimizer anyways?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep exactly

Measure aggregation logic applys several FilterElementsNodes that do not actually impact the outcome of optimized queries. Me and Will are making an effort to align the logic in the dataflow plan builder as much as possible. In order to align measure aggregation logic with other logic in the dataflow plan builder, we need to remove these nodes.
@courtneyholcomb courtneyholcomb merged commit 5f2ab0b into main Nov 6, 2024
15 checks passed
@courtneyholcomb courtneyholcomb deleted the court/filter-less2 branch November 6, 2024 19:45
courtneyholcomb added a commit that referenced this pull request Nov 14, 2024
The difference between `queried_linkable_specs`,
`required_linkable_specs`, and `extraneous_linkable_specs` has often
been overlooked and therefore been the source of many bugs. In [this
PR](#1506), we removed the
unneeded `FilterElementsNodes` that were the main reason we needed those
different spec variables. So now we can remove the
`extraneous_linkable_specs` variables altogether, and hopefully reduce
confusion here.
There is also some related cleanup in here. It's not a huge PR but I
still recommend reviewing by commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants