Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce ApproximateRangeQuery and ApproximateableQuery #13788

Merged
merged 20 commits into from
Sep 2, 2024

Conversation

harshavamsi
Copy link
Contributor

@harshavamsi harshavamsi commented May 22, 2024

Description

Most of the logic is as per #13566. I've introduced a new ApproximateableQuery that is virtually similar to what IndexOrDocValues does today. It returns either an originalQuery or an approximateQuery. During search time we evaluate if a query matches a particular requirement for it to be rewritten from originalQuery to approximateQuery. Here I started off with just converting the DateRangeQuery to use the approximation. If we have a top level range query on a date field, we will approximate the results by only scoring 10K or size.

Related Issues

Resolves #11251 #9541 #13566

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • API changes companion pull request created.
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Search:Performance labels May 22, 2024
Copy link
Contributor

❌ Gradle check result for 95236d6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for c98b56c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 76b4abe: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 76b4abe: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 2cf5e27: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 090ddc6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 9ac309a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@msfroh
Copy link
Collaborator

msfroh commented Sep 2, 2024

I tried retrying the Mend Security Check, but that doesn't seem to be working.

As mentioned by @harshavamsi above, the bulk of the uncovered code is copy/pasted from PointRangeQuery.

Merging...

@msfroh msfroh merged commit 2e9db40 into opensearch-project:main Sep 2, 2024
37 of 39 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-13788-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 2e9db40a50735eacc95a4fc8926e8bb7042a696a
# Push it to GitHub
git push --set-upstream origin backport/backport-13788-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-13788-to-2.x.

harshavamsi added a commit to harshavamsi/OpenSearch that referenced this pull request Sep 2, 2024
…ect#13788)

This introduces a basic "approximation" framework that improves
query performance by modifying the query in a way that should be
functionally equivalent.

To start, we can reduce the bounds of a range query in order to
satisfy the `track_total_hits` value (which defaults to 10,000).

---------

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
Signed-off-by: Michael Froh <[email protected]>
Co-authored-by: Michael Froh <[email protected]>
(cherry picked from commit 2e9db40)
@harshavamsi harshavamsi deleted the pointrange_optimization branch September 2, 2024 22:47
@jainankitk
Copy link
Collaborator

As mentioned by @harshavamsi above, the bulk of the uncovered code is copy/pasted from PointRangeQuery.

Merging...

Ideally we should avoid copy/pasting code, but I also don't see good way until PointRangeQuery is more extensible. Was hoping to do this sooner, finally got around to apache/lucene#13711.

@rishabh6788
Copy link
Contributor

hmm, we're not seeing any difference in benchmarks because this is behind a feature flag

You can create a new config, see f351c01 for concurrent search feature.
@harshavamsi

msfroh pushed a commit that referenced this pull request Sep 4, 2024
This introduces a basic "approximation" framework that improves
query performance by modifying the query in a way that should be
functionally equivalent.

To start, we can reduce the bounds of a range query in order to
satisfy the `track_total_hits` value (which defaults to 10,000).

---------

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
Signed-off-by: Michael Froh <[email protected]>
Co-authored-by: Michael Froh <[email protected]>
(cherry picked from commit 2e9db40)
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 4, 2024
This introduces a basic "approximation" framework that improves
query performance by modifying the query in a way that should be
functionally equivalent.

To start, we can reduce the bounds of a range query in order to
satisfy the `track_total_hits` value (which defaults to 10,000).

---------

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
Signed-off-by: Michael Froh <[email protected]>
Co-authored-by: Michael Froh <[email protected]>
(cherry picked from commit 2e9db40)
(cherry picked from commit 3ddb199)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
msfroh added a commit that referenced this pull request Sep 4, 2024
#15700)

This introduces a basic "approximation" framework that improves
query performance by modifying the query in a way that should be
functionally equivalent.

To start, we can reduce the bounds of a range query in order to
satisfy the `track_total_hits` value (which defaults to 10,000).

---------




(cherry picked from commit 2e9db40)
(cherry picked from commit 3ddb199)

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
Signed-off-by: Michael Froh <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michael Froh <[email protected]>
akolarkunnu pushed a commit to akolarkunnu/OpenSearch that referenced this pull request Sep 10, 2024
…ect#13788)

This introduces a basic "approximation" framework that improves
query performance by modifying the query in a way that should be
functionally equivalent.

To start, we can reduce the bounds of a range query in order to
satisfy the `track_total_hits` value (which defaults to 10,000).

---------

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
Signed-off-by: Michael Froh <[email protected]>
Co-authored-by: Michael Froh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed enhancement Enhancement or improvement to existing feature or request Search:Performance v2.17.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Understand/Improve the performance of range queries
9 participants