-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(planner): Add MaterializedViewScanNode #26492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideThis PR adds support for non-legacy materialized views by capturing view metadata in the analyzer, introducing a new MaterializedViewScanNode, extending the statement and relation planners to produce unified plans over both data table and view query, and adding a rewrite rule that chooses between fresh and materialized data at runtime, with SPI, optimizer, and test updates. Sequence diagram for planning and rewriting a MaterializedViewScanNodesequenceDiagram
participant Analyzer
participant StatementAnalyzer
participant RelationPlanner
participant Optimizer
participant MaterializedViewRewrite
Analyzer->>StatementAnalyzer: setMaterializedViewInfo(table, mvInfo)
StatementAnalyzer->>RelationPlanner: process(table)
RelationPlanner->>RelationPlanner: planMaterializedView(table, mvInfo, context)
RelationPlanner->>RelationPlanner: create MaterializedViewScanNode
Optimizer->>MaterializedViewRewrite: apply(MaterializedViewScanNode, ...)
MaterializedViewRewrite->>MaterializedViewScanNode: getDataTablePlan()/getViewQueryPlan()
MaterializedViewRewrite->>Optimizer: return rewritten ProjectNode
Class diagram for the new MaterializedViewScanNode and related changesclassDiagram
class MaterializedViewScanNode {
+PlanNode dataTablePlan
+PlanNode viewQueryPlan
+QualifiedObjectName materializedViewName
+Map<VariableReferenceExpression, VariableReferenceExpression> dataTableMappings
+Map<VariableReferenceExpression, VariableReferenceExpression> viewQueryMappings
+List<VariableReferenceExpression> outputVariables
+getDataTablePlan()
+getViewQueryPlan()
+getMaterializedViewName()
+getDataTableMappings()
+getViewQueryMappings()
+getOutputVariables()
+getSources()
+sourceOutputLayout(int)
+replaceChildren(List<PlanNode>)
+accept(PlanVisitor, context)
+assignStatsEquivalentPlanNode(Optional<PlanNode>)
}
MaterializedViewScanNode --|> PlanNode
class Analysis {
+Map<NodeRef<Table>, MaterializedViewInfo> materializedViewInfoMap
+setMaterializedViewInfo(Table, MaterializedViewInfo)
+getMaterializedViewInfo(Table): Optional<MaterializedViewInfo>
}
class Analysis.MaterializedViewInfo {
+QualifiedObjectName materializedViewName
+QualifiedName storageTableName
+Table dataTable
+Query viewQuery
+MaterializedViewDefinition materializedViewDefinition
+getMaterializedViewName()
+getStorageTableName()
+getDataTable()
+getViewQuery()
+getMaterializedViewDefinition()
}
Analysis "1" *-- "*" Analysis.MaterializedViewInfo
class MaterializedViewRewrite {
+apply(MaterializedViewScanNode, Captures, Context): Result
}
MaterializedViewRewrite ..> MaterializedViewScanNode
class PlanVisitor {
+visitMaterializedViewScan(MaterializedViewScanNode, context)
}
PlanVisitor ..> MaterializedViewScanNode
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java:187-188` </location>
<code_context>
@Override
protected RelationPlan visitTable(Table node, SqlPlannerContext context)
{
+ Optional<Analysis.MaterializedViewInfo> materializedViewInfo = analysis.getMaterializedViewInfo(node);
+ if (materializedViewInfo.isPresent()) {
+ return planMaterializedView(node, materializedViewInfo.get(), context);
+ }
</code_context>
<issue_to_address>
**issue:** Consider handling the case where the viewQueryPlan and dataTablePlan have mismatched field counts.
Adding a check or assertion for matching field counts and orderings between dataTablePlan and viewQueryPlan would help prevent runtime errors and ensure query correctness.
</issue_to_address>
### Comment 2
<location> `presto-memory/src/test/java/com/facebook/presto/plugin/memory/TestMemoryMaterializedViewPlanner.java:70` </location>
<code_context>
+ }
+
+ @Test
+ public void testMaterializedViewRefreshed()
+ {
+ assertUpdate("CREATE TABLE base_table (id BIGINT, value BIGINT)");
</code_context>
<issue_to_address>
**suggestion (testing):** Suggestion to add error condition tests for querying dropped materialized views.
Please add a test that verifies querying a dropped materialized view results in the expected error and no execution plan is generated.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java
Show resolved
Hide resolved
...emory/src/test/java/com/facebook/presto/plugin/memory/TestMemoryMaterializedViewPlanner.java
Show resolved
Hide resolved
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- There’s a lot of duplicated logic in StatementAnalyzer.processMaterializedView between the legacy and new materialized‐view paths – consider extracting the common setup and teardown steps into helper methods to reduce maintenance overhead.
- Using NodeRef as the key for materializedViewInfoMap could lead to lookup misses if Table nodes aren’t guaranteed to be canonical; consider keying by QualifiedObjectName or otherwise ensuring stable identity.
- The default getMaterializedViewStatus overload discards the baseQueryDomain, which might hide important filtering information from connectors – either make the overload abstract or document its limitations more explicitly.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There’s a lot of duplicated logic in StatementAnalyzer.processMaterializedView between the legacy and new materialized‐view paths – consider extracting the common setup and teardown steps into helper methods to reduce maintenance overhead.
- Using NodeRef<Table> as the key for materializedViewInfoMap could lead to lookup misses if Table nodes aren’t guaranteed to be canonical; consider keying by QualifiedObjectName or otherwise ensuring stable identity.
- The default getMaterializedViewStatus overload discards the baseQueryDomain, which might hide important filtering information from connectors – either make the overload abstract or document its limitations more explicitly.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java:338-339` </location>
<code_context>
+ ImmutableMap.Builder<VariableReferenceExpression, VariableReferenceExpression> dataTableMappingsBuilder = ImmutableMap.builder();
+ ImmutableMap.Builder<VariableReferenceExpression, VariableReferenceExpression> viewQueryMappingsBuilder = ImmutableMap.builder();
+
+ for (Field field : dataTableDescriptor.getVisibleFields()) {
+ int fieldIndex = dataTableDescriptor.indexOf(field);
+ VariableReferenceExpression dataTableVar = dataTableVariables.get(fieldIndex);
+ VariableReferenceExpression viewQueryVar = viewQueryVariables.get(fieldIndex);
</code_context>
<issue_to_address>
**issue (bug_risk):** Potential mismatch if visible fields differ from output variable order.
This approach relies on visible fields and output variables having matching order, which may not always be guaranteed. Please ensure consistency or clarify this assumption in the documentation.
</issue_to_address>
### Comment 2
<location> `presto-spi/src/main/java/com/facebook/presto/spi/plan/MaterializedViewScanNode.java:123-132` </location>
<code_context>
+ return unmodifiableList(sources);
+ }
+
+ public List<VariableReferenceExpression> sourceOutputLayout(int sourceIndex)
+ {
+ if (sourceIndex == 0) {
+ // Data table plan - use dataTableMappings
+ return outputVariables.stream()
+ .map(dataTableMappings::get)
+ .collect(toList());
+ }
+ else if (sourceIndex == 1) {
+ // View query plan - use viewQueryMappings
+ return outputVariables.stream()
+ .map(viewQueryMappings::get)
+ .collect(toList());
+ }
+ else {
+ throw new IllegalArgumentException("MaterializedViewScanNode only has 2 sources (index 0 and 1), but got index " + sourceIndex);
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** sourceOutputLayout may return nulls if mappings are incomplete.
Currently, missing mappings result in nulls in the returned list, which could lead to errors later. Please validate that all output variables have corresponding mappings and handle missing cases appropriately.
</issue_to_address>
### Comment 3
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestMaterializedViewRewrite.java:47-48` </location>
<code_context>
+public class TestMaterializedViewRewrite
+ extends BaseRuleTest
+{
+ @Test
+ public void testUseFreshDataWhenFullyMaterialized()
+ {
+ QualifiedObjectName materializedViewName = QualifiedObjectName.valueOf("catalog.schema.mv");
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding tests for error conditions and invalid mappings in MaterializedViewScanNode.
Additional tests should cover scenarios with invalid output-to-source variable mappings and cases where exceptions are thrown by underlying plans to ensure robustness against edge cases.
</issue_to_address>Sourcery is free for open source - if you like our reviews please consider sharing them ✨
| for (Field field : dataTableDescriptor.getVisibleFields()) { | ||
| int fieldIndex = dataTableDescriptor.indexOf(field); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Potential mismatch if visible fields differ from output variable order.
This approach relies on visible fields and output variables having matching order, which may not always be guaranteed. Please ensure consistency or clarify this assumption in the documentation.
| public List<VariableReferenceExpression> sourceOutputLayout(int sourceIndex) | ||
| { | ||
| if (sourceIndex == 0) { | ||
| // Data table plan - use dataTableMappings | ||
| return outputVariables.stream() | ||
| .map(dataTableMappings::get) | ||
| .collect(toList()); | ||
| } | ||
| else if (sourceIndex == 1) { | ||
| // View query plan - use viewQueryMappings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): sourceOutputLayout may return nulls if mappings are incomplete.
Currently, missing mappings result in nulls in the returned list, which could lead to errors later. Please validate that all output variables have corresponding mappings and handle missing cases appropriately.
| @Test | ||
| public void testUseFreshDataWhenFullyMaterialized() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Consider adding tests for error conditions and invalid mappings in MaterializedViewScanNode.
Additional tests should cover scenarios with invalid output-to-source variable mappings and cases where exceptions are thrown by underlying plans to ensure robustness against edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this feature. Overall looks good to me! Just one nit and a few points for discussion.
...rc/test/java/com/facebook/presto/sql/planner/iterative/rule/TestMaterializedViewRewrite.java
Outdated
Show resolved
Hide resolved
presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java
Show resolved
Hide resolved
...se/src/main/java/com/facebook/presto/sql/planner/iterative/rule/MaterializedViewRewrite.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. LGTM!
| materializedViewName.getCatalogName(), | ||
| materializedViewName.getSchemaName(), | ||
| materializedViewDefinition.getTable()); | ||
| Table dataTable = new Table(storageTableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So dataTable and storageTable are the same. Shall we rename storageTableName to be dataTableName to make it clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done. Also removed the unused storage table name.
6aaaafa to
354bbe9
Compare
354bbe9 to
f5bb038
Compare
|
|
||
| builder.add(new LogicalCteOptimizer(metadata)); | ||
|
|
||
| builder.add(new IterativeOptimizer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the plans in the MaterializedViewScanNode could contain CTEReference nodes ? If so, we should move this above L316.
If this is not possible, lets add an invariant for this in the MaterializedViewScanNode constructor
…en columns (#26545) ## Description This PR fixes a bug when using the planning-oriented materialized view framework (introduced in PR #26492) in Hive connector. Unlike the tables in the Memory connector, the underlying hive tables of the materialized view contain hidden columns. Therefore, for the comparison with the number of output variables in the view query, we should use the underlying data table's visible field count instead of its total field count. ## Motivation and Context - Fix a bug when using the planning-oriented materialized view framework in Hive connector ## Impact N/A ## Test Plans - New test case in `TestHiveMaterializedViewLogicalPlanner` which would fail without this change ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. - [ ] If adding new dependencies, verified they have an [OpenSSF Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or higher (or obtained explicit TSC approval for lower scores). ## Release Notes ``` == NO RELEASE NOTE == ```
Description
Implements the planning-oriented materialized view framework from RFC-0016. Adds MaterializedViewScanNode that holds both the materialized data plan and view query plan, letting the optimizer choose based on freshness during planning instead of at analysis time.
New code path is opt-in via
legacy_materialized_views=falsesession property. Legacy SQL stitching path is preserved as default.Motivation and Context
See RFC-0016.
Impact
No user-facing changes with default settings.
Test Plan
Tests are included in this PR, existing Memory connector end to end tests validate the behavior for unpartitioned tables.
Contributor checklist
Release Notes