-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow (timeout) shutdown and rebalancing with withRebalanceSafeCommits
#1132
Comments
Thanks @jgordijn, that is a really interesting finding! For now you could apply the workaround of setting a shorter |
If I set Also, please look at the rebalance time. It is not only the shutdown that seems to fail. |
It depends. In the old behavior the program gets no chance at all to do commits. If you set a maxRebalanceTime, it at least gets some chance. Most programs commit everything within a few seconds. With slow processing like here, it will be necessary to reduce the amount of records that are pre-fetched (
Can you elaborate on that please? What error do you see? |
I start consumer1. Then I start consumer 2, which doesn't start immediately (or after a short delay). Meanwhile the new consumer doesn't show anything in the log, and in consumer1 I see:
It takes nearly 3 minutes (differs per run), before consumer2 starts consuming. |
Did you add |
A yes, flag ( I'm a bit worried about the amount of flags and combination I need to use to get it to work. |
The kafka-client does not do any pre-fetching. By default zio-kafka does quite a bit of pre-fetching.
I know... Kafka has a huge amount of knobs that can be turned. Its a pain to support people with it because there is always one more setting that can be tweaked. I am really happy that this solved the issue though! We will need to add this gotcha to the documentation. |
I read about prefetching here: https://www.conduktor.io/kafka/kafka-consumer-important-settings-poll-and-internal-threads-behavior/#Kafka-Consumer-Poll-Behavior-0 . Is that incorrect? |
I can only speculate. Conduktor uses zio-kafka, so most probably they are describing how their product works, not the underlying java client. |
Why did you close this issue? The issue with shutdown is not resolved. |
@jgordijn Yep, you are right. Thanks for correcting me. |
@erikvanoosten You think this would be solved by #1201 ? |
Yes, that would be my expectation 😄 |
Sounds related and perhaps fully fixed by #1358. Shutdown of one of the members of a consumer group will result in a rebalancing. Is that the situation here as well, or was shutdown of a member of a single-instance consmer group resulting in long times shutdown times? During rebalancing sometimes partitions are assigned and then removed without any records having been fetched for those partitions. Partition streams are created and emitted in the top-level stream but may not have pulled from by the downstream stages (in user code). In that case, the safe rebalance mechanism would wait for an end signal for those partition streams that never arrived and timeout only after the We will be adding some logging for this situation. |
I tried out the new
withRebalanceSafeCommits
feature and it has unexpected behavior:It seems that on shutdown the stream is stopped, but the rebalanceListener is waiting until the last message is committed. As the stream is stopped, this will never happen and it takes 3 min (3/5 of the maxPollInterval) to stop.
When I start the second application it seems that rebalancing is started. However it still takes 3 minutes to join.
Maybe it has something to do with the slow (100ms) processing per message, but having a 10 message poll should mitigate this. I would thus expect rebalancing to happen in (worst case) 1 sec (10x100ms).
The text was updated successfully, but these errors were encountered: