-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat: Add RemoveInsensitiveAggregateDistinct rule #26493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: Add RemoveInsensitiveAggregateDistinct rule #26493
Conversation
Reviewer's GuideThis PR introduces a new distinct‐sensitivity flag for aggregation functions (both in Java SPI and C++ protocol), propagates the flag through all built-in and SQL-invoked implementations and function namespaces, and adds a new optimizer rule to strip DISTINCT from aggregates that are insensitive to duplicates or order (including registration and tests). Class diagram for AggregationFunctionMetadata and related changesclassDiagram
class AggregationFunctionMetadata {
-TypeSignature intermediateType
-boolean isOrderSensitive
-boolean isDistinctSensitive
+AggregationFunctionMetadata(intermediateType, isOrderSensitive, isDistinctSensitive)
+TypeSignature getIntermediateType()
+boolean isOrderSensitive()
+boolean isDistinctSensitive()
+String toString()
}
class AggregationFunctionImplementation {
<<interface>>
+boolean isDecomposable()
+boolean isOrderSensitive()
+boolean isDistinctSensitive()
}
class SqlInvokedAggregationFunctionImplementation {
-Type intermediateType
-Type finalType
-boolean isOrderSensitive
-boolean isDistinctSensitive
-List<Type> parameterTypes
+SqlInvokedAggregationFunctionImplementation(intermediateType, finalType, isOrderSensitive, isDistinctSensitive, parameterTypes)
+boolean isOrderSensitive()
+boolean isDistinctSensitive()
+List<Type> getParameterTypes()
}
AggregationFunctionImplementation <|.. SqlInvokedAggregationFunctionImplementation
Class diagram for BuiltInAggregationFunctionImplementation and AggregationHeader changesclassDiagram
class BuiltInAggregationFunctionImplementation {
-List<Class> lambdaInterfaces
-boolean decomposable
-boolean orderSensitive
-boolean distinctSensitive
-AggregationMetadata aggregationMetadata
+BuiltInAggregationFunctionImplementation(..., decomposable, orderSensitive, distinctSensitive, ...)
+boolean isOrderSensitive()
+boolean isDistinctSensitive()
+AggregationMetadata getAggregationMetadata()
}
class AggregationHeader {
-String name
-Optional<String> description
-boolean decomposable
-boolean orderSensitive
-boolean distinctSensitive
-SqlFunctionVisibility visibility
-boolean isCalledOnNullInput
+AggregationHeader(name, description, decomposable, orderSensitive, distinctSensitive, visibility, isCalledOnNullInput)
+boolean isOrderSensitive()
+boolean isDistinctSensitive()
}
Class diagram for HiveAggregationFunctionDescription and HiveAggregationFunctionImplementation changesclassDiagram
class HiveAggregationFunctionDescription {
-QualifiedObjectName name
-List<Type> parameterTypes
-List<Type> intermediateTypes
-Type finalType
-boolean decomposable
-boolean orderSensitive
-boolean distinctSensitive
+HiveAggregationFunctionDescription(..., decomposable, orderSensitive, distinctSensitive)
+boolean isOrderSensitive()
+boolean isDistinctSensitive()
}
class HiveAggregationFunctionImplementation {
+boolean isOrderSensitive()
+boolean isDistinctSensitive()
}
HiveAggregationFunctionImplementation --> HiveAggregationFunctionDescription : uses
Class diagram for new optimizer rule RemoveInsensitiveAggregateDistinctclassDiagram
class RemoveInsensitiveAggregateDistinct {
-Pattern<AggregationNode> pattern
-FunctionAndTypeManager functionAndTypeManager
+RemoveInsensitiveAggregateDistinct(functionAndTypeManager)
+Pattern<AggregationNode> getPattern()
+Result apply(AggregationNode node, Captures captures, Context context)
-boolean canRemoveDistinct(AggregationNode aggregationNode)
-boolean canRemoveDistinct(Aggregation aggregation)
}
Class diagram for AggregationFunction annotation changesclassDiagram
class AggregationFunction {
<<annotation>>
+boolean isOrderSensitive() default false
+boolean isDistinctSensitive() default true
+SqlFunctionVisibility visibility() default PUBLIC
+String[] alias() default ""
}
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
| public class RemoveInsensitiveAggregateDistinct | ||
| implements Rule<AggregationNode> | ||
| { | ||
| private static final Set<QualifiedObjectName> DISTINCT_INSENSITIVE_AGGREGATION_NAMES = ImmutableSet.<QualifiedObjectName>builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to think about how we can put this into the function metadata itself, rather than hardcoding this list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan Thanks for the suggestion! I've updated the code to remove the hardcoding and use a flag in AggregationFunctionMetadata for this. Could you take another look when you have a chance? Thanks again!
4500631 to
44e138b
Compare
44e138b to
d373b7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/RemoveInsensitiveAggregateDistinct.java:51-52` </location>
<code_context>
+ @Override
+ public Result apply(AggregationNode node, Captures captures, Context context)
+ {
+ ImmutableMap.Builder<VariableReferenceExpression, Aggregation> aggregations = ImmutableMap.builder();
+ for (Map.Entry<VariableReferenceExpression, Aggregation> entry : node.getAggregations().entrySet()) {
+ Aggregation aggregation = entry.getValue();
+ if (canRemoveDistinct(aggregation)) {
</code_context>
<issue_to_address>
**suggestion (performance):** Aggregations are always rebuilt, even if no changes are made.
Consider returning the original node when no distinct flags are removed to prevent unnecessary reconstruction and reduce plan node churn.
Suggested implementation:
```java
public Result apply(AggregationNode node, Captures captures, Context context)
{
ImmutableMap.Builder<VariableReferenceExpression, Aggregation> aggregations = ImmutableMap.builder();
boolean anyDistinctRemoved = false;
for (Map.Entry<VariableReferenceExpression, Aggregation> entry : node.getAggregations().entrySet()) {
Aggregation aggregation = entry.getValue();
if (canRemoveDistinct(aggregation)) {
aggregations.put(entry.getKey(),
new Aggregation(
aggregation.getCall(),
aggregation.getFilter(),
aggregation.getOrderBy(),
false,
aggregation.getMask()));
anyDistinctRemoved = true;
}
else {
aggregations.put(entry);
```
```java
}
if (!anyDistinctRemoved) {
return Result.empty();
}
return Result.of(
new AggregationNode(
node.getId(),
node.getSource(),
aggregations.build(),
node.getGroupingSets(),
node.getPreGroupedVariables(),
node.getStep(),
node.getHashVariable(),
node.getGroupIdVariable()));
}
```
</issue_to_address>
### Comment 2
<location> `presto-main-base/src/main/java/com/facebook/presto/operator/aggregation/BitwiseOrAggregation.java:27` </location>
<code_context>
import com.facebook.presto.spi.function.OutputFunction;
import com.facebook.presto.spi.function.SqlType;
-@AggregationFunction("bitwise_or_agg")
+@AggregationFunction(value = "bitwise_or_agg", isDistinctSensitive = false)
public class BitwiseOrAggregation
{
</code_context>
<issue_to_address>
**question (bug_risk):** Explicitly setting isDistinctSensitive to false may affect semantic expectations.
Confirm that ignoring duplicates in bitwise_or_agg aligns with all use cases, as this change may cause correctness issues if not intended.
</issue_to_address>
### Comment 3
<location> `presto-main-base/src/main/java/com/facebook/presto/operator/aggregation/BooleanAndAggregation.java:31` </location>
<code_context>
import static com.facebook.presto.operator.aggregation.state.TriStateBooleanState.NULL_VALUE;
import static com.facebook.presto.operator.aggregation.state.TriStateBooleanState.TRUE_VALUE;
-@AggregationFunction(value = "bool_and", alias = "every")
+@AggregationFunction(value = "bool_and", alias = "every", isDistinctSensitive = false)
public final class BooleanAndAggregation
{
</code_context>
<issue_to_address>
**question:** Setting isDistinctSensitive to false for bool_and may have semantic implications.
Please verify that handling of duplicates and nulls in bool_and remains correct with isDistinctSensitive set to false.
</issue_to_address>
### Comment 4
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestRemoveInsensitiveAggregateDistinct.java:143` </location>
<code_context>
+ @Test
+ public void testMixedDistinct()
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for aggregations with filters or order by clauses.
Please include test cases with filters and order by clauses in aggregations to ensure the rule handles these scenarios correctly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| ImmutableMap.Builder<VariableReferenceExpression, Aggregation> aggregations = ImmutableMap.builder(); | ||
| for (Map.Entry<VariableReferenceExpression, Aggregation> entry : node.getAggregations().entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (performance): Aggregations are always rebuilt, even if no changes are made.
Consider returning the original node when no distinct flags are removed to prevent unnecessary reconstruction and reduce plan node churn.
Suggested implementation:
public Result apply(AggregationNode node, Captures captures, Context context)
{
ImmutableMap.Builder<VariableReferenceExpression, Aggregation> aggregations = ImmutableMap.builder();
boolean anyDistinctRemoved = false;
for (Map.Entry<VariableReferenceExpression, Aggregation> entry : node.getAggregations().entrySet()) {
Aggregation aggregation = entry.getValue();
if (canRemoveDistinct(aggregation)) {
aggregations.put(entry.getKey(),
new Aggregation(
aggregation.getCall(),
aggregation.getFilter(),
aggregation.getOrderBy(),
false,
aggregation.getMask()));
anyDistinctRemoved = true;
}
else {
aggregations.put(entry); }
if (!anyDistinctRemoved) {
return Result.empty();
}
return Result.of(
new AggregationNode(
node.getId(),
node.getSource(),
aggregations.build(),
node.getGroupingSets(),
node.getPreGroupedVariables(),
node.getStep(),
node.getHashVariable(),
node.getGroupIdVariable()));
}| import com.facebook.presto.spi.function.OutputFunction; | ||
| import com.facebook.presto.spi.function.SqlType; | ||
|
|
||
| @AggregationFunction("bitwise_or_agg") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (bug_risk): Explicitly setting isDistinctSensitive to false may affect semantic expectations.
Confirm that ignoring duplicates in bitwise_or_agg aligns with all use cases, as this change may cause correctness issues if not intended.
| import static com.facebook.presto.operator.aggregation.state.TriStateBooleanState.NULL_VALUE; | ||
| import static com.facebook.presto.operator.aggregation.state.TriStateBooleanState.TRUE_VALUE; | ||
|
|
||
| @AggregationFunction(value = "bool_and", alias = "every") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Setting isDistinctSensitive to false for bool_and may have semantic implications.
Please verify that handling of duplicates and nulls in bool_and remains correct with isDistinctSensitive set to false.
Description
Added isDistinctSensitive() flag in AggregationFunction to indicate whether the function is sensitive to duplicate inputs, and an optimizer rule to remove distinct from aggregates which are insensitive to duplicates and orders.
Fixes #26075.
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.