Skip to content

Conversation

@tdcmeehan
Copy link
Contributor

@tdcmeehan tdcmeehan commented Oct 31, 2025

Description

Implements the planning-oriented materialized view framework from RFC-0016. Adds MaterializedViewScanNode that holds both the materialized data plan and view query plan, letting the optimizer choose based on freshness during planning instead of at analysis time.

New code path is opt-in via legacy_materialized_views=false session property. Legacy SQL stitching path is preserved as default.

Motivation and Context

See RFC-0016.

Impact

No user-facing changes with default settings.

Test Plan

Tests are included in this PR, existing Memory connector end to end tests validate the behavior for unpartitioned tables.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Oct 31, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 31, 2025

Reviewer's Guide

This PR adds support for non-legacy materialized views by capturing view metadata in the analyzer, introducing a new MaterializedViewScanNode, extending the statement and relation planners to produce unified plans over both data table and view query, and adding a rewrite rule that chooses between fresh and materialized data at runtime, with SPI, optimizer, and test updates.

Sequence diagram for planning and rewriting a MaterializedViewScanNode

sequenceDiagram
    participant Analyzer
    participant StatementAnalyzer
    participant RelationPlanner
    participant Optimizer
    participant MaterializedViewRewrite

    Analyzer->>StatementAnalyzer: setMaterializedViewInfo(table, mvInfo)
    StatementAnalyzer->>RelationPlanner: process(table)
    RelationPlanner->>RelationPlanner: planMaterializedView(table, mvInfo, context)
    RelationPlanner->>RelationPlanner: create MaterializedViewScanNode
    Optimizer->>MaterializedViewRewrite: apply(MaterializedViewScanNode, ...)
    MaterializedViewRewrite->>MaterializedViewScanNode: getDataTablePlan()/getViewQueryPlan()
    MaterializedViewRewrite->>Optimizer: return rewritten ProjectNode
Loading

Class diagram for the new MaterializedViewScanNode and related changes

classDiagram
    class MaterializedViewScanNode {
        +PlanNode dataTablePlan
        +PlanNode viewQueryPlan
        +QualifiedObjectName materializedViewName
        +Map<VariableReferenceExpression, VariableReferenceExpression> dataTableMappings
        +Map<VariableReferenceExpression, VariableReferenceExpression> viewQueryMappings
        +List<VariableReferenceExpression> outputVariables
        +getDataTablePlan()
        +getViewQueryPlan()
        +getMaterializedViewName()
        +getDataTableMappings()
        +getViewQueryMappings()
        +getOutputVariables()
        +getSources()
        +sourceOutputLayout(int)
        +replaceChildren(List<PlanNode>)
        +accept(PlanVisitor, context)
        +assignStatsEquivalentPlanNode(Optional<PlanNode>)
    }
    MaterializedViewScanNode --|> PlanNode

    class Analysis {
        +Map<NodeRef<Table>, MaterializedViewInfo> materializedViewInfoMap
        +setMaterializedViewInfo(Table, MaterializedViewInfo)
        +getMaterializedViewInfo(Table): Optional<MaterializedViewInfo>
    }
    class Analysis.MaterializedViewInfo {
        +QualifiedObjectName materializedViewName
        +QualifiedName storageTableName
        +Table dataTable
        +Query viewQuery
        +MaterializedViewDefinition materializedViewDefinition
        +getMaterializedViewName()
        +getStorageTableName()
        +getDataTable()
        +getViewQuery()
        +getMaterializedViewDefinition()
    }
    Analysis "1" *-- "*" Analysis.MaterializedViewInfo

    class MaterializedViewRewrite {
        +apply(MaterializedViewScanNode, Captures, Context): Result
    }
    MaterializedViewRewrite ..> MaterializedViewScanNode

    class PlanVisitor {
        +visitMaterializedViewScan(MaterializedViewScanNode, context)
    }
    PlanVisitor ..> MaterializedViewScanNode
Loading

File-Level Changes

Change Details Files
Capture and propagate materialized view metadata in analysis
  • Added materializedViewInfoMap map with set/get methods
  • Introduced immutable MaterializedViewInfo class to hold view metadata
presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/Analysis.java
Extend StatementAnalyzer to plan non-legacy materialized views
  • Branched on legacy_materialized_views session property to choose new planning path
  • Parsed original view SQL and storage table into MaterializedViewInfo
  • Processed view query and data table to build scopes and unregister temporary named query
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java
Implement MaterializedViewScanNode planning in RelationPlanner
  • Checked for MaterializedViewInfo when visiting Table nodes
  • Built data and view query plans, mapped output variables, and created MaterializedViewScanNode
presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java
Define new MaterializedViewScanNode in SPI
  • Added new PlanNode class with JSON annotations and mapping logic
  • Implemented getSources, replaceChildren, visitor accept, and sourceOutputLayout
presto-spi/src/main/java/com/facebook/presto/spi/plan/MaterializedViewScanNode.java
Add rewrite rule to choose between data table or view query
  • Implemented MaterializedViewRewrite rule using MetadataResolver.getMaterializedViewStatus
  • Integrated rule into PlanOptimizers and enabled in ApplyConnectorOptimization
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/MaterializedViewRewrite.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/Patterns.java
presto-spi/src/main/java/com/facebook/presto/spi/plan/PlanVisitor.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/ApplyConnectorOptimization.java
Ensure validator and rule infrastructure supports the new scan node
  • Updated ValidateDependenciesChecker to validate MaterializedViewScanNode mappings
  • Extended test PlanBuilder to construct MaterializedViewScanNode
  • Added default overload in MetadataResolver SPI
presto-main-base/src/main/java/com/facebook/presto/sql/planner/sanity/ValidateDependenciesChecker.java
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/test/PlanBuilder.java
presto-spi/src/main/java/com/facebook/presto/spi/analyzer/MetadataResolver.java
Add end-to-end tests for materialized view planning and rewrite
  • Created TestMaterializedViewRewrite for full/partial materialization scenarios
  • Added TestMemoryMaterializedViewPlanner for connector-level planning verification
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestMaterializedViewRewrite.java
presto-memory/src/test/java/com/facebook/presto/plugin/memory/TestMemoryMaterializedViewPlanner.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@tdcmeehan
Copy link
Contributor Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java:187-188` </location>
<code_context>
     @Override
     protected RelationPlan visitTable(Table node, SqlPlannerContext context)
     {
+        Optional<Analysis.MaterializedViewInfo> materializedViewInfo = analysis.getMaterializedViewInfo(node);
+        if (materializedViewInfo.isPresent()) {
+            return planMaterializedView(node, materializedViewInfo.get(), context);
+        }
</code_context>

<issue_to_address>
**issue:** Consider handling the case where the viewQueryPlan and dataTablePlan have mismatched field counts.

Adding a check or assertion for matching field counts and orderings between dataTablePlan and viewQueryPlan would help prevent runtime errors and ensure query correctness.
</issue_to_address>

### Comment 2
<location> `presto-memory/src/test/java/com/facebook/presto/plugin/memory/TestMemoryMaterializedViewPlanner.java:70` </location>
<code_context>
+    }
+
+    @Test
+    public void testMaterializedViewRefreshed()
+    {
+        assertUpdate("CREATE TABLE base_table (id BIGINT, value BIGINT)");
</code_context>

<issue_to_address>
**suggestion (testing):** Suggestion to add error condition tests for querying dropped materialized views.

Please add a test that verifies querying a dropped materialized view results in the expected error and no execution plan is generated.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@tdcmeehan
Copy link
Contributor Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • There’s a lot of duplicated logic in StatementAnalyzer.processMaterializedView between the legacy and new materialized‐view paths – consider extracting the common setup and teardown steps into helper methods to reduce maintenance overhead.
  • Using NodeRef as the key for materializedViewInfoMap could lead to lookup misses if Table nodes aren’t guaranteed to be canonical; consider keying by QualifiedObjectName or otherwise ensuring stable identity.
  • The default getMaterializedViewStatus overload discards the baseQueryDomain, which might hide important filtering information from connectors – either make the overload abstract or document its limitations more explicitly.
  • Prompt for AI Agents
    Please address the comments from this code review:
    
    ## Overall Comments
    - There’s a lot of duplicated logic in StatementAnalyzer.processMaterializedView between the legacy and new materialized‐view paths – consider extracting the common setup and teardown steps into helper methods to reduce maintenance overhead.
    - Using NodeRef<Table> as the key for materializedViewInfoMap could lead to lookup misses if Table nodes aren’t guaranteed to be canonical; consider keying by QualifiedObjectName or otherwise ensuring stable identity.
    - The default getMaterializedViewStatus overload discards the baseQueryDomain, which might hide important filtering information from connectors – either make the overload abstract or document its limitations more explicitly.
    
    ## Individual Comments
    
    ### Comment 1
    <location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java:338-339` </location>
    <code_context>
    +        ImmutableMap.Builder<VariableReferenceExpression, VariableReferenceExpression> dataTableMappingsBuilder = ImmutableMap.builder();
    +        ImmutableMap.Builder<VariableReferenceExpression, VariableReferenceExpression> viewQueryMappingsBuilder = ImmutableMap.builder();
    +
    +        for (Field field : dataTableDescriptor.getVisibleFields()) {
    +            int fieldIndex = dataTableDescriptor.indexOf(field);
    +            VariableReferenceExpression dataTableVar = dataTableVariables.get(fieldIndex);
    +            VariableReferenceExpression viewQueryVar = viewQueryVariables.get(fieldIndex);
    </code_context>
    
    <issue_to_address>
    **issue (bug_risk):** Potential mismatch if visible fields differ from output variable order.
    
    This approach relies on visible fields and output variables having matching order, which may not always be guaranteed. Please ensure consistency or clarify this assumption in the documentation.
    </issue_to_address>
    
    ### Comment 2
    <location> `presto-spi/src/main/java/com/facebook/presto/spi/plan/MaterializedViewScanNode.java:123-132` </location>
    <code_context>
    +        return unmodifiableList(sources);
    +    }
    +
    +    public List<VariableReferenceExpression> sourceOutputLayout(int sourceIndex)
    +    {
    +        if (sourceIndex == 0) {
    +            // Data table plan - use dataTableMappings
    +            return outputVariables.stream()
    +                    .map(dataTableMappings::get)
    +                    .collect(toList());
    +        }
    +        else if (sourceIndex == 1) {
    +            // View query plan - use viewQueryMappings
    +            return outputVariables.stream()
    +                    .map(viewQueryMappings::get)
    +                    .collect(toList());
    +        }
    +        else {
    +            throw new IllegalArgumentException("MaterializedViewScanNode only has 2 sources (index 0 and 1), but got index " + sourceIndex);
    +        }
    </code_context>
    
    <issue_to_address>
    **issue (bug_risk):** sourceOutputLayout may return nulls if mappings are incomplete.
    
    Currently, missing mappings result in nulls in the returned list, which could lead to errors later. Please validate that all output variables have corresponding mappings and handle missing cases appropriately.
    </issue_to_address>
    
    ### Comment 3
    <location> `presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestMaterializedViewRewrite.java:47-48` </location>
    <code_context>
    +public class TestMaterializedViewRewrite
    +        extends BaseRuleTest
    +{
    +    @Test
    +    public void testUseFreshDataWhenFullyMaterialized()
    +    {
    +        QualifiedObjectName materializedViewName = QualifiedObjectName.valueOf("catalog.schema.mv");
    </code_context>
    
    <issue_to_address>
    **suggestion (testing):** Consider adding tests for error conditions and invalid mappings in MaterializedViewScanNode.
    
    Additional tests should cover scenarios with invalid output-to-source variable mappings and cases where exceptions are thrown by underlying plans to ensure robustness against edge cases.
    </issue_to_address>

    Sourcery is free for open source - if you like our reviews please consider sharing them ✨
    Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +338 to +339
for (Field field : dataTableDescriptor.getVisibleFields()) {
int fieldIndex = dataTableDescriptor.indexOf(field);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Potential mismatch if visible fields differ from output variable order.

This approach relies on visible fields and output variables having matching order, which may not always be guaranteed. Please ensure consistency or clarify this assumption in the documentation.

Comment on lines +123 to +132
public List<VariableReferenceExpression> sourceOutputLayout(int sourceIndex)
{
if (sourceIndex == 0) {
// Data table plan - use dataTableMappings
return outputVariables.stream()
.map(dataTableMappings::get)
.collect(toList());
}
else if (sourceIndex == 1) {
// View query plan - use viewQueryMappings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): sourceOutputLayout may return nulls if mappings are incomplete.

Currently, missing mappings result in nulls in the returned list, which could lead to errors later. Please validate that all output variables have corresponding mappings and handle missing cases appropriately.

Comment on lines +47 to +49
@Test
public void testUseFreshDataWhenFullyMaterialized()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding tests for error conditions and invalid mappings in MaterializedViewScanNode.

Additional tests should cover scenarios with invalid output-to-source variable mappings and cases where exceptions are thrown by underlying plans to ensure robustness against edge cases.

@tdcmeehan tdcmeehan changed the title Add MaterializedViewScanNode feat(planner): Add MaterializedViewScanNode Oct 31, 2025
@tdcmeehan tdcmeehan requested a review from zation99 October 31, 2025 19:54
@tdcmeehan tdcmeehan marked this pull request as ready for review October 31, 2025 19:54
@prestodb-ci prestodb-ci requested review from a team, NivinCS and nishithakbhaskaran and removed request for a team October 31, 2025 19:55
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this feature. Overall looks good to me! Just one nit and a few points for discussion.

@tdcmeehan tdcmeehan requested a review from aaneja November 3, 2025 16:06
hantangwangd
hantangwangd previously approved these changes Nov 3, 2025
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. LGTM!

materializedViewName.getCatalogName(),
materializedViewName.getSchemaName(),
materializedViewDefinition.getTable());
Table dataTable = new Table(storageTableName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So dataTable and storageTable are the same. Shall we rename storageTableName to be dataTableName to make it clear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done. Also removed the unused storage table name.

@tdcmeehan tdcmeehan force-pushed the matview-pr-3-thru-6 branch from 354bbe9 to f5bb038 Compare November 5, 2025 02:21
@tdcmeehan tdcmeehan merged commit b6c6475 into prestodb:master Nov 5, 2025
81 of 82 checks passed

builder.add(new LogicalCteOptimizer(metadata));

builder.add(new IterativeOptimizer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the plans in the MaterializedViewScanNode could contain CTEReference nodes ? If so, we should move this above L316.
If this is not possible, lets add an invariant for this in the MaterializedViewScanNode constructor

tdcmeehan pushed a commit that referenced this pull request Nov 6, 2025
…en columns (#26545)

## Description

This PR fixes a bug when using the planning-oriented materialized view
framework (introduced in PR #26492) in Hive connector. Unlike the tables
in the Memory connector, the underlying hive tables of the materialized
view contain hidden columns. Therefore, for the comparison with the
number of output variables in the view query, we should use the
underlying data table's visible field count instead of its total field
count.

## Motivation and Context

- Fix a bug when using the planning-oriented materialized view framework
in Hive connector

## Impact

N/A

## Test Plans

- New test case in `TestHiveMaterializedViewLogicalPlanner` which would
fail without this change

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Release Notes

```
== NO RELEASE NOTE ==
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants