Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[V1] Adds APIs for Strategy and Pattern to compiler. #1625

Draft
wants to merge 4 commits into
base: v1
Choose a base branch
from

Conversation

RCHowell
Copy link
Member

Description

This PR begins the strategy-based compiler which enables users to provide patterns for matching logical sub-tree operators, then the match is sent to a strategy which returns a custom physical operators.

This is similar to both calcite's rule-based logical optimizer and the physical operator strategies from spark.

Other Information

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link

github-actions bot commented Oct 22, 2024

CROSS-ENGINE-REPORT ❌

BASE (LEGACY-V0.14.8) TARGET (EVAL-095FD52) +/-
% Passing 89.67% 94.39% 4.72% ✅
Passing 5287 5565 278 ✅
Failing 609 50 -559 ✅
Ignored 0 281 281 🔶
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: v0.14.8
  • Base Engine: LEGACY
  • Target Commit: 095fd52
  • Target Engine: EVAL

Result Details

  • ❌ REGRESSION DETECTED. See Now Failing/Ignored Tests. ❌
  • Passing in both: 2643
  • Failing in both: 17
  • Ignored in both: 0
  • PASSING in BASE but now FAILING in TARGET: 3
  • PASSING in BASE but now IGNORED in TARGET: 108
  • FAILING in BASE but now PASSING in TARGET: 180
  • IGNORED in BASE but now PASSING in TARGET: 0

Now FAILING Tests ❌

The following 3 test(s) were previously PASSING in BASE but are now FAILING in TARGET:

Click here to see
  1. undefinedUnqualifiedVariableWithUndefinedVariableBehaviorMissing, compileOption: PERMISSIVE
  2. undefinedUnqualifiedVariableIsNullExprWithUndefinedVariableBehaviorMissing, compileOption: PERMISSIVE
  3. undefinedUnqualifiedVariableIsMissingExprWithUndefinedVariableBehaviorMissing, compileOption: PERMISSIVE

Now IGNORED Tests ❌

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

Now Passing Tests

180 test(s) were previously failing in BASE (LEGACY-V0.14.8) but now pass in TARGET (EVAL-095FD52). Before merging, confirm they are intended to pass.

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

CROSS-COMMIT-REPORT ✅

BASE (EVAL-F0569BD) TARGET (EVAL-095FD52) +/-
% Passing 94.39% 94.39% 0.00% ✅
Passing 5565 5565 0 ✅
Failing 50 50 0 ✅
Ignored 281 281 0 ✅
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: f0569bd
  • Base Engine: EVAL
  • Target Commit: 095fd52
  • Target Engine: EVAL

Result Details

  • Passing in both: 5565
  • Failing in both: 50
  • Ignored in both: 281
  • PASSING in BASE but now FAILING in TARGET: 0
  • PASSING in BASE but now IGNORED in TARGET: 0
  • FAILING in BASE but now PASSING in TARGET: 0
  • IGNORED in BASE but now PASSING in TARGET: 0

Base automatically changed from v1-udop to v1 October 22, 2024 23:00
Copy link
Member

@johnedquinn johnedquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes seem intuitive, though adding tests for replacing a single node and multiple nodes would garner some confidence in the public APIs exposed.

Comment on lines 167 to 177
for (strategy in strategies) {
val op = strategy.apply(operator)
if (op != null) {
// first match
return op
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this PR is only intending for the strategy to work for single node replacements, but I'm trying to see how the public APIs will allow for multi-node replacements.

Let's say a tree looks like:

SELECT var(0)
  \__ PROJECT t.c
        \__ FILTER t.b > 127
              \__ FILTER t.a == 49
                    \__ SCAN t

Perhaps you want to consolidate two of those relational operators to a single node:

SELECT_IMPL var(0)
  \__ PROJECT_IMPL t.c
        \__ FILTER_IMPL t.b > 127 AND t.a == 49
              \__ SCAN_IMPL t

With the existing APIs, I don't know if you'd be able to do this. I could be reading it wrong, but it would be useful to have a test applying a multi-node strategy to replace a subtree. I think each strategy might need to have a reference to all of the existing strategies to allow for replacing of its children.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point and a bit more interesting since the calcite model is only based on logical nodes, so need to recursively invoke the compiler – whereas spark has a "plan later" stub for its strategies. This model is somewhat in-between and not completed. The simplest solution would be to pass the compiler instance to apply(..).

In the interest on time, I will do this and add the test to show it in practice. Interestingly the cascades paper blends physical and logical, so an optimizer rule need only return a top-level physical and the remaining can be logical. I don't think I have a strong grasp of these various models yet, but seems Spark is more like cascades insofar as it returns exec nodes (physical impls) but can stub out logical ones for "plan later".

We might consider this, but the benefit isn't immediate - it's simple enough now to pass the compiler in apply.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read the cascades paper, but it seems similar to how substrait handles logical/physical. They are all part of the same tree. From my interpretation, the compilation of those nodes (either logical/physical) would therefore be 1:1 with the executable that it is compiled to.

So, a plan might go through your strategies (a more comprehensive implementation of physical planner pass) and produce a plan, not an executable. So, from the example above, it could look like:

Select -- var(0)
  \__ Project -- t.c
        \__ Filter -- t.b > 127
              \__ ScanKey : Custom  -- expr = t, key = a, value = 49

And, the compiler would have a registry of 1:1 plan nodes (logical and physical) to implementations:

Select::class -> SelectImplDefault(it)
Project::class -> ProjectImplDefault(it)
Filter::class -> FilterImplDefault(it)
ScanKey::class -> SomeCustomImplProvidedByUser(it)

Not proposing this -- just conveying another option on how we can leverage strategies by introducing a "custom" rel node -- which would just be an empty interface (maybe with a string identifier).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this is what the rules are though in the logical planning phase. Strategies are different than rules. While a rule maps plan-to-plan (calcite/volcano), the strategies map plans to expression (physical).

An interesting idea is the 1:1 to physical, but then we limit custom operators to what is available in the logical plans – meaning we may need more custom operators. This might be simplest though, but would introduce many more logical operators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarity for users reading this in the future -- we discussed offline, and this implementation doesn't limit ourselves from introducing a custom logical node with a logical plan rewrite. This PR is the second half of the processing -- converting plan node(s) (either the built-in ones or the potential custom ones) into physical implementations. If users, however, wanted to introduce custom logical operators, they are not barred from doing that -- they can use their custom plan rewrites in combination with these strategies for whatever their use-case is.

@RCHowell RCHowell changed the title Adds APIs for Strategy and Pattern to compiler. [V1] Adds APIs for Strategy and Pattern to compiler. Nov 4, 2024
johnedquinn added a commit that referenced this pull request Nov 5, 2024
 [V1] Adds basic strategies to compiler (see #1625)
@johnedquinn johnedquinn marked this pull request as draft November 15, 2024 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants