Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inputs/redpanda: add fetch_max_wait option #3100

Merged
merged 1 commit into from
Jan 3, 2025

Conversation

birdayz
Copy link
Contributor

@birdayz birdayz commented Dec 26, 2024

kgo.FetchMaxWait is a config option supported by franz-go. It makes it possible to use kgo.FetchMinBytes to force big batches, but have a rather low max wait time to fill the batch. This makes it possible to force the broker to send big batches if possible, but still wait only for a short time if there's not enough data.

Why add this option now?
This is especially important with the redpanda input, as it's using ordered franz-go. It will only send batches, if the previous batch with the partition has been consumed. If the broker keeps sending very small batches, e.g. size 1, it's likely to stall batched outputs. I could reproduce locally by using a producer that sends lots of batches of size 1.
I tried to overcome this ordering limitation by using batching in my output, but it doesn't work in this specific case. It will only add more records to the batch, if the previous batch of the partition was consumed, so in the extreme case of getting one record per kafka batch, for only one partition, i can't overcome it, rpcn will do only one record at a time.

Using kgo.FetchMinBytes in combination with kgo.FetchMaxWait can solve this problem.
But in any case, it is a useful tuning knob offered by franz-go, but also the standard Java client.

@birdayz birdayz requested a review from Jeffail December 26, 2024 18:20
@birdayz birdayz force-pushed the jb/redpanda-input-fetch-max-wait branch from e66b830 to 416ada1 Compare December 26, 2024 18:33
Copy link
Collaborator

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @birdayz

@rockwotj
Copy link
Collaborator

rockwotj commented Jan 3, 2025

Mind updating the changelog?

kgo.FetchMaxWait is a config option supported by franz-go. It makes it
possible to use kgo.FetchMinBytes, but have a rather low max wait time
to fill the batch. This makes it possible to force the broker to send
big batches if possible, but still wait only for a short time if there's
not enough data.

This is especially important with the redpanda input, as it's using
ordered franz-go. It will only send batches, if the previous batch with
the partition has been consumed. If the broker keeps sending very small
batches, e.g. size 1, it's likely to stall batched outputs. I could
reproduce locally by using a producer that sends lots of batches of size
1.

Using kgo.FetchMinBytes in combination with kgo.FetchMaxWait can solve this
problem.
But in any case, it is a useful tuning knob offered by franz-go, but
also the standard Java client.
@birdayz birdayz force-pushed the jb/redpanda-input-fetch-max-wait branch from 416ada1 to b706752 Compare January 3, 2025 09:53
@birdayz
Copy link
Contributor Author

birdayz commented Jan 3, 2025

added changelog entry.

@rockwotj rockwotj merged commit b2697c6 into main Jan 3, 2025
4 checks passed
@rockwotj rockwotj deleted the jb/redpanda-input-fetch-max-wait branch January 3, 2025 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants