Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct offloading of large batch entries to s3. Closes #154 #155

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

shotmk
Copy link
Contributor

@shotmk shotmk commented Jun 13, 2024

Fix Issue #154

Fix case when individual entries are under the threshold, but total batch size exceeds the threshold. Add test case covering the use-case, adjust old test that is now failing (wrong test-case)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Nikita Bahliuk added 2 commits June 13, 2024 17:50
Fix case when individual entries are under the threshold, but total batch size exceeds the threshold.
Add test case covering the use-case, adjust old test that is now failing (wrong test-case)
@advoretsky
Copy link

I'm looking forward to see this PR merged. Good work, @shotmk!

Copy link
Contributor

@ziyanli-amazon ziyanli-amazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented

List<Pair<Integer, Long>> entrySizes = IntStream.range(0, originalEntries.size())
.boxed()
.map(i -> Pair.of(i, sizeOf(originalEntries.get(i))))
.sorted((p1, p2) -> Long.compare(p2.right(), p1.right()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of sorting here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall idea here is that we want to offload the least amount of batch entries possible. Less AWS calls means less time spent on roundtrips.
So we sort by size, and after that we start moving entries from largest to smallest, stoping at the moment totalSize fits the limit.
We only offload according to the sort order, and preserve original order of entries in resulting batch request, as it is important for fifo use-case.

hasS3Entries = true;
for (Pair<Integer, Long> pair : entrySizes) {
// Verify that total size of batch request is within limits
if (totalSize <= clientConfiguration.getPayloadSizeThreshold() && !clientConfiguration.isAlwaysThroughS3()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are checking the total size, does it make sense to move the if condition outside of the for loop? If it is true, we can throw an exception, else, we will keep add entry to s3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall idea described in comment above.
The code follow the idea, we move entries one by one (starting from largest). We keep track of total size after each offload.
We want to stop offloading entries at the moment total is under the threshold to prevent un-necessary offloading, resulting in lower overall execution time and costs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ziyanli-amazon the idea is to offload as few entries as possible, as each offload is a roundtrip against S3. So offloading goes from biggest entries to smallest until batch fits into size limit.

@shotmk
Copy link
Contributor Author

shotmk commented Jul 29, 2024

Can this get some attention, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants