-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the filter rewrite optimization #14464
base: main
Are you sure you want to change the base?
Refactor the filter rewrite optimization #14464
Conversation
Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path. Rename the class name to make it more straightforward and general Signed-off-by: bowenlan-amzn <[email protected]>
refactor the canOptimize logic sort out the basic rule about how to provide data from aggregator, and where to put common logic Signed-off-by: bowenlan-amzn <[email protected]>
refactor the data provider and try optimize logic Signed-off-by: bowenlan-amzn <[email protected]>
❌ Gradle check result for 1a067ba: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <[email protected]>
extract segment match all logic Signed-off-by: bowenlan-amzn <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #14464 +/- ##
============================================
+ Coverage 71.42% 71.64% +0.22%
- Complexity 59978 62107 +2129
============================================
Files 4985 5122 +137
Lines 282275 291966 +9691
Branches 40946 42200 +1254
============================================
+ Hits 201603 209169 +7566
- Misses 63999 65583 +1584
- Partials 16673 17214 +541 ☔ View full report in Codecov by Sentry. |
Signed-off-by: bowenlan-amzn <[email protected]>
inline class Signed-off-by: bowenlan-amzn <[email protected]>
❌ Gradle check result for 8f10faf: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <[email protected]>
@github-actions commented on Jun 20, 2024, 10:03 PM PDT:
Failure can reproduce consistently with the seed
But if I remove the seed, it succeed. @rishabhmaurya already has a PR to fix this #14445 |
|
||
protected boolean canOptimize(ValuesSourceConfig config, RangeAggregator.Range[] ranges) { | ||
if (config.fieldType() == null) return false; | ||
MappedFieldType fieldType = config.fieldType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this local variable needed? Can this.fieldType be set instead.
|
||
if (parent != null || subAggLength != 0) return false; | ||
|
||
boolean rewriteable = aggregatorBridge.canOptimize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local var rewriteable
needed? Can this.rewriteable
be used instead?
return false; | ||
} | ||
|
||
Ranges ranges = prepareFromSegment(leafCtx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ranges ranges
to this.ranges
?
protected boolean canOptimize(CompositeValuesSourceConfig[] sourceConfigs) { | ||
if (sourceConfigs.length != 1 || !(sourceConfigs[0].valuesSource() instanceof RoundingValuesSource)) return false; | ||
return canOptimize(sourceConfigs[0].missingBucket(), sourceConfigs[0].hasScript(), sourceConfigs[0].fieldType()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make sense to have different canOptimize
methods in the same class. We should further subclass DateHistogramAggregatorBridge
to ensure only needed functionality is accessible by the specific Aggregator class
*/ | ||
public abstract class DateHistogramAggregatorBridge extends AggregatorBridge { | ||
|
||
protected boolean canOptimize(boolean missing, boolean hasScript, MappedFieldType fieldType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can be private
} | ||
|
||
@Override | ||
final void tryFastFilterAggregation(PointValues values, BiConsumer<Long, Long> incrementDocCount, OptimizationContext.Ranges ranges) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: tryFastFilterAggregation
seems like a misnomer now
@Override | ||
protected boolean canOptimize() { | ||
return canOptimize(valuesSourceConfig); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can define specific constructors in bridge class to avoid inline implementations.
private static class RangeCollectorForPointTree { | ||
private final BiConsumer<Integer, Integer> incrementRangeDocCount; | ||
private int counter = 0; | ||
|
||
private final Ranges ranges; | ||
private int activeIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the multiRangesTraverse logic be part of separate class?
Description
As more code coming into the filter rewrite optimization, it starts to be hard to understand.
Not only making the code review slower and painful, it also will slow down the new contributors into this area. So here comes the refactoring work.
Idea
The refactoring shouldn't change any business logic.
This refactoring keeps the same philosophy to structure the code as before, and make it more clear.
Refactoring
Why the name —
filter rewrite optimization
?Filter in OpenSearch world has similar meaning as query, while it indicates no relavance scoring calculated.
Rewrite in OpenSearch world can mean transform OpenSearch query into lucene query, or transform a query to perform better.
Generally speaking, the optimization rewrites the aggregation into certain filters to improve performance. Aggregation execution is plain and simple iteration and collection on all matches, while filters can take advantage of the Lucene index to get expected results in log or even constant time.
Benchmark
TBD
Related Issues
Resolves #14435
Check List
[ ] Functionality includes testing.[ ] API changes companion pull request created, if applicable.[ ] Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.