Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix record skipping when querying paginated data across shards #3061

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Swiddis
Copy link
Collaborator

@Swiddis Swiddis commented Oct 8, 2024

Description

When pulling unordered data from an index with multiple shards, data gets lost if the fetchSize is not a multiple of the shard count, as the persisted cursor position to continue paging is based on the last seen _doc which is duplicated when the primary shard count exceeds 1. This PR currently adds a reproducer for the bug -- finding a fix is still in progress.

Related Issues

N/A

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@Swiddis Swiddis added the bug Something isn't working label Oct 9, 2024
Signed-off-by: Simeon Widdis <[email protected]>
Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it impact Join in legacy module also?


@Test
public void testMultiShardPagesEqualsActualData() throws IOException {
// A bug made it so when pulling unordered data from an index with multiple shards, data gets lost if the fetchSize
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulling unordered data from an index
it is not accurate, it is ordered data (sord by _doc)?

@@ -189,6 +189,9 @@ public OpenSearchResponse searchWithPIT(Function<SearchRequest, SearchResponse>
// Set sort field for search_after
if (this.sourceBuilder.sorts() == null) {
this.sourceBuilder.sort(DOC_FIELD_NAME, ASC);
// Workaround to preserve sort location more exactly,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is workaround, could u add the long-term solution issue?

@@ -189,6 +189,9 @@ public OpenSearchResponse searchWithPIT(Function<SearchRequest, SearchResponse>
// Set sort field for search_after
if (this.sourceBuilder.sorts() == null) {
this.sourceBuilder.sort(DOC_FIELD_NAME, ASC);
// Workaround to preserve sort location more exactly,
// see https://github.com/opensearch-project/sql/pull/3061
this.sourceBuilder.sort("_id", ASC);
Copy link
Contributor

@fddattal fddattal Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there are a couple places in the code where we sort on doc field.
Can you help me understand why we only need this here?

https://github.com/search?q=repo%3Aopensearch-project%2Fsql+%22DOC_FIELD_NAME%22&type=code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants