Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6592] RelMdPredicates pull up wrong predicate from UNION whe… #3975

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

NobiGo
Copy link
Contributor

@NobiGo NobiGo commented Sep 24, 2024

…n it's input include NULL VALUE

@mihaibudiu mihaibudiu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Sep 24, 2024
@NobiGo
Copy link
Contributor Author

NobiGo commented Sep 25, 2024

Forch-pushed to rerun CI.

Copy link
Member

@caicancai caicancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -475,7 +475,7 @@ public RelOptPredicateList getPredicates(Union union, RelMetadataQuery mq) {
RexNode disjunctivePredicate =
new RexSimplify(rexBuilder, RelOptPredicateList.EMPTY, executor)
.simplifyUnknownAs(rexBuilder.makeCall(SqlStdOperatorTable.OR, finalResidualPredicates),
RexUnknownAs.FALSE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some methods in this class are considering RexUnknownAs.FALSE and some others RexUnknownAs.UNKNOWN when performing the simplification. Why do we need to simplify differently?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question needs to be answered before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zabetak I will check this.

final RelMetadataQuery mq = rel.getCluster().getMetadataQuery();
RelOptPredicateList inputSet = mq.getPulledUpPredicates(rel);
ImmutableList<RexNode> pulledUpPredicates = inputSet.pulledUpPredicates;
assertThat(pulledUpPredicates, sortsAs("[OR(null, =($0, 5))]"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result does not look 100% accurate. Shouldn't this rather be OR(IS NULL($0), =($0, 5)). What happens if there were more columns involved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zabetak Hi, testPullUpPredicatesFromUnionWithValues3 added for more columns. Maybe we should extract nothing when we can't get a specific value like OR(IS NULL($0), =($0, 5)). WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we can't extract IS NULL($0) in this case. Can you please elaborate?

Copy link
Contributor

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the impression that the problem is actually on the way that we extract predicates from Values and not so on what happens in Union. I think we should use the eqConstant method instead of plain equals.

@zabetak
Copy link
Contributor

zabetak commented Sep 25, 2024

If the problem is indeed in Values please add specific tests for this expression and update the JIRA/PR accordingly.

@NobiGo
Copy link
Contributor Author

NobiGo commented Sep 26, 2024

If the problem is indeed in Values please add specific tests for this expression and update the JIRA/PR accordingly.

@zabetak Hi, It is not a VALUES problem. But When I try to add a test for VALUERS. I find another problem in VALUES.
For example:

VALUES(1,2,3)

Predicates should be ($0=1,$1=2,$2=3), but now is $0=1. In my PR I have make it right and add the unit test.

Copy link

sonarcloud bot commented Sep 26, 2024

final RelMetadataQuery mq = values.getCluster().getMetadataQuery();
RelOptPredicateList inputSet = mq.getPulledUpPredicates(values);
ImmutableList<RexNode> pulledUpPredicates = inputSet.pulledUpPredicates;
assertThat(pulledUpPredicates, sortsAs("[=($0, null)]"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result seems wrong.The predicate =($0, null) does not hold in the output rows of the values(null) query. If you add a filter on top of the values that says =($0, null) it will evaluate to false but here we know that $0 is always null so the correct predicate to infer is IS NULL($0).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More general, I think that the Predicates metadata should never use the = operator with the null literal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We can use IS NULL($0) to replace =($0, null). I will take care of it.

Copy link
Contributor Author

@NobiGo NobiGo Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CALCITE-6599 has covered the [IS NULL($0)] replace [=$($0, null)]. IS NULL($0) can eliminate ambiguity caused by [=$($0, null)].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider is_not_distinct_from

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zabetak @mihaibudiu I noticed we have a method to handle this:

  1. If field i is equal to a constant expression, then we use EQUALS
  2. If field i is nullable, then we use IS NOT DISTINCT FROM
  3. If filed i is the literal null, then we use IS NULL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have a method. See the link on my previous comment: #3975 (review)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I missed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is not distinct from can be used in all cases

final RelMetadataQuery mq = rel.getCluster().getMetadataQuery();
RelOptPredicateList inputSet = mq.getPulledUpPredicates(rel);
ImmutableList<RexNode> pulledUpPredicates = inputSet.pulledUpPredicates;
assertThat(pulledUpPredicates, sortsAs("[OR(null, =($0, 5))]"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we can't extract IS NULL($0) in this case. Can you please elaborate?

@@ -475,7 +475,7 @@ public RelOptPredicateList getPredicates(Union union, RelMetadataQuery mq) {
RexNode disjunctivePredicate =
new RexSimplify(rexBuilder, RelOptPredicateList.EMPTY, executor)
.simplifyUnknownAs(rexBuilder.makeCall(SqlStdOperatorTable.OR, finalResidualPredicates),
RexUnknownAs.FALSE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question needs to be answered before merging.

@zabetak
Copy link
Contributor

zabetak commented Sep 26, 2024

I see that you are performing the changes that we discuss here in another pull request (#3981). This is a bit confusing and I am losing track on where I should comment on.

@NobiGo
Copy link
Contributor Author

NobiGo commented Sep 26, 2024

I see that you are performing the changes that we discuss here in another pull request (#3981). This is a bit confusing and I am losing track on where I should comment on.

These are two different questions. So I open another PR for VALUES. But we can talk about it together here.
These are some of my opinions:

For Values:
values(1, 2, 3, null), (1, 2, null, null), (5, 2, 3, null)
We pull up the predicate is [=($1, 2), IS NULL($3)].
Do we need to extract [OR(IS NULL($2), =($2, 3))]? I think we don't need it. Because when VALUES size is large, It will be complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants