-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix record skipping when querying paginated data across shards #3061
base: main
Are you sure you want to change the base?
Fix record skipping when querying paginated data across shards #3061
Conversation
Signed-off-by: Simeon Widdis <[email protected]>
ba741cd
to
e143507
Compare
Signed-off-by: Simeon Widdis <[email protected]>
3946e6a
to
a8af162
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it impact Join in legacy module also?
|
||
@Test | ||
public void testMultiShardPagesEqualsActualData() throws IOException { | ||
// A bug made it so when pulling unordered data from an index with multiple shards, data gets lost if the fetchSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pulling unordered data from an index
it is not accurate, it is ordered data (sord by _doc)?
@@ -189,6 +189,9 @@ public OpenSearchResponse searchWithPIT(Function<SearchRequest, SearchResponse> | |||
// Set sort field for search_after | |||
if (this.sourceBuilder.sorts() == null) { | |||
this.sourceBuilder.sort(DOC_FIELD_NAME, ASC); | |||
// Workaround to preserve sort location more exactly, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is workaround, could u add the long-term solution issue?
@@ -189,6 +189,9 @@ public OpenSearchResponse searchWithPIT(Function<SearchRequest, SearchResponse> | |||
// Set sort field for search_after | |||
if (this.sourceBuilder.sorts() == null) { | |||
this.sourceBuilder.sort(DOC_FIELD_NAME, ASC); | |||
// Workaround to preserve sort location more exactly, | |||
// see https://github.com/opensearch-project/sql/pull/3061 | |||
this.sourceBuilder.sort("_id", ASC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see there are a couple places in the code where we sort on doc field.
Can you help me understand why we only need this here?
https://github.com/search?q=repo%3Aopensearch-project%2Fsql+%22DOC_FIELD_NAME%22&type=code
Description
When pulling unordered data from an index with multiple shards, data gets lost if the fetchSize is not a multiple of the shard count, as the persisted cursor position to continue paging is based on the last seen
_doc
which is duplicated when the primary shard count exceeds 1. This PR currently adds a reproducer for the bug -- finding a fix is still in progress.Related Issues
N/A
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.