Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the filter rewrite optimization #14464

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

bowenlan-amzn
Copy link
Member

@bowenlan-amzn bowenlan-amzn commented Jun 19, 2024

Description

As more code coming into the filter rewrite optimization, it starts to be hard to understand.
Not only making the code review slower and painful, it also will slow down the new contributors into this area. So here comes the refactoring work.

Idea

The refactoring shouldn't change any business logic.
This refactoring keeps the same philosophy to structure the code as before, and make it more clear.

  • Add only declarative code to the Aggregator, while keep any optimization related business logic in another place.
    • Declarative code here is mainly just a Context object, which can invoke the optimization workflow, combined with a Bridge object to provide Aggregator's fields access to optimization.
  • Utilize an inner Bridge class to access any needed fields of the Aggregator for the optimization logic.
    • Any field needed can be passed in a function of AggregatorBridge. Compare to saving the field in a class, the method way is more readable because it tells you where this field is needed and why is it needed.
    • Other than providing access, AggregatorBridge can also host the aggregator specific optimization related logic.
  • This time I didn't bother to onboard the term aggregation optimization which uses a similar method, but this refactoring is a solid step towards combining them.

Refactoring

  • Split the Helper calss into independent components
  • Tighten up any member access modifier of the components
  • Clean the unnecessary references from the components. For example, SearchContext, instead of passing into the OptimizationContext, try to utilize the functions in AggregatorBridge to provide it whenever needed.

Why the name — filter rewrite optimization?

Filter in OpenSearch world has similar meaning as query, while it indicates no relavance scoring calculated.
Rewrite in OpenSearch world can mean transform OpenSearch query into lucene query, or transform a query to perform better.

Generally speaking, the optimization rewrites the aggregation into certain filters to improve performance. Aggregation execution is plain and simple iteration and collection on all matches, while filters can take advantage of the Lucene index to get expected results in log or even constant time.

Benchmark

TBD

Related Issues

Resolves #14435

Check List

  • [ ] Functionality includes testing.
  • [ ] API changes companion pull request created, if applicable.
  • [ ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Split the single Helper classes and move the classes into a new package for any optimization we introduced for search path.
Rename the class name to make it more straightforward and general

Signed-off-by: bowenlan-amzn <[email protected]>
refactor the canOptimize logic
sort out the basic rule about how to provide data from aggregator, and where to put common logic

Signed-off-by: bowenlan-amzn <[email protected]>
refactor the data provider and try optimize logic

Signed-off-by: bowenlan-amzn <[email protected]>
@bowenlan-amzn bowenlan-amzn changed the title Refactor Refactor the filter rewrite optimization Jun 19, 2024
@github-actions github-actions bot added Search:Aggregations v2.16.0 Issues and PRs related to version 2.16.0 labels Jun 19, 2024
Copy link
Contributor

❌ Gradle check result for 1a067ba: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: bowenlan-amzn <[email protected]>
extract segment match all logic

Signed-off-by: bowenlan-amzn <[email protected]>
Copy link
Contributor

✅ Gradle check result for 7c491b9: SUCCESS

Copy link

codecov bot commented Jun 20, 2024

Codecov Report

Attention: Patch coverage is 85.64232% with 57 lines in your changes missing coverage. Please review.

Project coverage is 71.64%. Comparing base (b15cb0c) to head (7c491b9).
Report is 468 commits behind head on main.

Current head 7c491b9 differs from pull request most recent head 6c097f3

Please upload reports for the commit 6c097f3 to get more accurate results.

Files Patch % Lines
...earch/optimization/ranges/OptimizationContext.java 85.71% 13 Missing and 10 partials ⚠️
.../opensearch/search/optimization/ranges/Helper.java 85.00% 4 Missing and 8 partials ⚠️
...nges/AbstractDateHistogramAggAggregatorBridge.java 85.00% 1 Missing and 8 partials ⚠️
...mization/ranges/AbstractRangeAggregatorBridge.java 83.33% 1 Missing and 5 partials ⚠️
...egations/bucket/composite/CompositeAggregator.java 85.71% 2 Missing and 1 partial ⚠️
.../bucket/histogram/AutoDateHistogramAggregator.java 90.90% 1 Missing ⚠️
...ions/bucket/histogram/DateHistogramAggregator.java 91.66% 1 Missing ⚠️
...rch/aggregations/bucket/range/RangeAggregator.java 91.66% 1 Missing ⚠️
...h/search/optimization/ranges/AggregatorBridge.java 75.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #14464      +/-   ##
============================================
+ Coverage     71.42%   71.64%   +0.22%     
- Complexity    59978    62107    +2129     
============================================
  Files          4985     5122     +137     
  Lines        282275   291966    +9691     
  Branches      40946    42200    +1254     
============================================
+ Hits         201603   209169    +7566     
- Misses        63999    65583    +1584     
- Partials      16673    17214     +541     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: bowenlan-amzn <[email protected]>
inline class

Signed-off-by: bowenlan-amzn <[email protected]>
Copy link
Contributor

❌ Gradle check result for 8f10faf: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: bowenlan-amzn <[email protected]>
@bowenlan-amzn
Copy link
Member Author

bowenlan-amzn commented Jun 21, 2024

@github-actions commented on Jun 20, 2024, 10:03 PM PDT:

❌ Gradle check result for 8f10faf: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Originally posted by @github-actions[bot] in #14464 (comment)

https://build.ci.opensearch.org/job/gradle-check/41432/testReport/junit/org.opensearch.painless/LangPainlessClientYamlTestSuiteIT/test__yaml_painless_derived_fields_30_derived_field_search_definition_Test_derived_field_supported_type_using_search_definition_/

Failure can reproduce consistently with the seed

 ./gradlew ':modules:lang-painless:yamlRestTest' --tests "org.opensearch.painless.LangPainlessClientYamlTestSuiteIT.test {yaml=painless/derived_fields/30_derived_field_search_definition/Test derived_field supported type using search definition}" -Dtests.seed=E20A1378A0D1317D

But if I remove the seed, it succeed.

@rishabhmaurya already has a PR to fix this #14445

Copy link
Contributor

✅ Gradle check result for 6c097f3: SUCCESS


protected boolean canOptimize(ValuesSourceConfig config, RangeAggregator.Range[] ranges) {
if (config.fieldType() == null) return false;
MappedFieldType fieldType = config.fieldType();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this local variable needed? Can this.fieldType be set instead.


if (parent != null || subAggLength != 0) return false;

boolean rewriteable = aggregatorBridge.canOptimize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local var rewriteable needed? Can this.rewriteable be used instead?

return false;
}

Ranges ranges = prepareFromSegment(leafCtx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ranges ranges to this.ranges?

Comment on lines +61 to +64
protected boolean canOptimize(CompositeValuesSourceConfig[] sourceConfigs) {
if (sourceConfigs.length != 1 || !(sourceConfigs[0].valuesSource() instanceof RoundingValuesSource)) return false;
return canOptimize(sourceConfigs[0].missingBucket(), sourceConfigs[0].hasScript(), sourceConfigs[0].fieldType());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense to have different canOptimize methods in the same class. We should further subclass DateHistogramAggregatorBridge to ensure only needed functionality is accessible by the specific Aggregator class

*/
public abstract class DateHistogramAggregatorBridge extends AggregatorBridge {

protected boolean canOptimize(boolean missing, boolean hasScript, MappedFieldType fieldType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can be private

}

@Override
final void tryFastFilterAggregation(PointValues values, BiConsumer<Long, Long> incrementDocCount, OptimizationContext.Ranges ranges)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: tryFastFilterAggregation seems like a misnomer now

Comment on lines +121 to +124
@Override
protected boolean canOptimize() {
return canOptimize(valuesSourceConfig);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can define specific constructors in bridge class to avoid inline implementations.

Comment on lines +197 to +202
private static class RangeCollectorForPointTree {
private final BiConsumer<Integer, Integer> incrementRangeDocCount;
private int counter = 0;

private final Ranges ranges;
private int activeIndex;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the multiRangesTraverse logic be part of separate class?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Search:Aggregations skip-changelog v2.16.0 Issues and PRs related to version 2.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor FastFilterRewriteHelper
4 participants