-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Coordinator reduce logic Optimization to use QuickSelect #18777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Rishabh Maurya <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
❌ Gradle check result for a711100: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for a711100: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for f2ad692: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
0b85cba
to
862535c
Compare
❌ Gradle check result for 862535c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 5410283: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
5410283
to
5087f77
Compare
❌ Gradle check result for 5087f77: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for c563b43: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
1a2c2a9
to
775f402
Compare
❌ Gradle check result for 775f402: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
eb6b16e
to
f638796
Compare
❌ Gradle check result for f638796: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18777 +/- ##
============================================
- Coverage 72.80% 72.79% -0.02%
- Complexity 68535 68542 +7
============================================
Files 5572 5573 +1
Lines 314779 314843 +64
Branches 45691 45703 +12
============================================
+ Hits 229166 229178 +12
- Misses 67014 67033 +19
- Partials 18599 18632 +33 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f638796
to
6c58e5f
Compare
Benchmarking ResultsTo measure the performance improvements from this optimization, I conducted comprehensive benchmarking tests using opensearch-benchmark. Test SetupInitial Data Ingestion: opensearch-benchmark execute-test --workload=big5 --target-hosts=localhost:9200 \
--kill-running-processes --workload-params "corpus_size:60,ingest_percentage:1.16" \
--exclude-tasks="check-cluster-health" Test Configuration:
Benchmark Execution: opensearch-benchmark execute-test --pipeline=benchmark-only --workload=big5 \
--target-hosts=localhost:9200 --kill-running-processes \
--include-tasks "numeric-terms-agg" --telemetry=node-stats Iteration details for this task: numeric-terms-agg Performance ResultsI measured the execution time for the
Resource UtilizationThe optimization also significantly reduced system resource consumption:
![]() ![]() Key Findings
|
We now have two approaches for reducing merged buckets to obtain the top N buckets:
Based on theoretical analysis, we expected the Quick Select solution to outperform the Priority Queue when the request size (top N) is large relative to cardinality, while the Priority Queue should perform better for smaller request sizes. To validate this hypothesis, I conducted performance tests measuring the execution duration for each method similar to previous comment. Performance Results:
Conclusion: The Quick Select solution introduces slight overhead for smaller datasets (≤1,000 items) but demonstrates significant performance improvements for larger datasets. At 5,000+ items, Quick Select shows better performance, making it the preferred choice for high-cardinality use cases. |
6c58e5f
to
7edf16f
Compare
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Adding @rishabhmaurya for review |
❌ Gradle check result for 0b7e48f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
See if we can reuse the code - #18702 (comment), all 3 optimizations have duplicate logic, its better we maintain one version of it and reuse it |
Although high level logic looks same, the way we use, steps are different.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we should use the object allocation metric(similar to #17447) instead of the JVM utilization metric in the benchmark shared - #18777 (comment)?
This PR is stalled because it has been open for 30 days with no activity. |
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
ca8a1c5
to
70904ce
Compare
❌ Gradle check result for 70904ce: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
Optimized the reduce logic by:
Applying QuickSelect instead of PriorityQueue when the final size was smaller than the bucket count and simply copying the entire buckets and sorting when the final size == bucket count
I have aded the tests in the comment sections bellow: #18777 (comment)
Benchmarking results:
#18777 (comment)
Related Issues
Resolves #18705
Related: #18650
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.