strategy for lagging consumers #11145

pdeva · 2023-06-01T15:09:53Z

pdeva
Jun 1, 2023

The current architecture of kafka/rp assumes you know how much load you have in advance, since you need to choose the number of partitions for your topics statically.

However, imagine this scenario:

I have a topic T with 2 partitions A and B. I have thus 2 consumers CA and CB consuming from each partition. Each can process 100 msgs/second, which aligns perfectly with my current load. But during the weekend, one of my consumers CA dies. We thus restart/resume the consumer CA two days later after noticing the issue. Now CA has to process 2 days worth of data. However, this might requires processing 1,000 msgs/second. CA simply cannot process data this fast and it seems like it will forever be lagging behind CB by a very large margin.

What does one do in this case to have the consumer for partition A catch up on data?

Answered by rockwotj

Jun 7, 2023

One potential solution:

Kafka/Redpanda allows for reading specific offset (position in the log) ranges. You could update consumer CA to start reading from "now". Then you have a missing range in the log that hasn't been consumed, you could spin up additional consumers to read just the sections of logs that haven't been processed.

There are situations where this doesn't work (i.e. you must process messages in order, etc). Happy to help with more specifics on how to do this in your given environment/language.

Another solution is to increase the number of partitions, then spin up new consumers to handle the existing load so that the consumer for partition A can catch up. When you've caught u…

View full answer

rockwotj · 2023-06-07T15:09:44Z

rockwotj
Jun 7, 2023
Maintainer

One potential solution:

Kafka/Redpanda allows for reading specific offset (position in the log) ranges. You could update consumer CA to start reading from "now". Then you have a missing range in the log that hasn't been consumed, you could spin up additional consumers to read just the sections of logs that haven't been processed.

There are situations where this doesn't work (i.e. you must process messages in order, etc). Happy to help with more specifics on how to do this in your given environment/language.

Another solution is to increase the number of partitions, then spin up new consumers to handle the existing load so that the consumer for partition A can catch up. When you've caught up you can redirect CA and CA to read from partitions C and D and spin down the other consumers (you don't have to have a 1:1 consumer/partition mapping)

4 replies

pdeva Jun 7, 2023
Author

wont increasing partitions only work for ‘new’ messages. there is still the problem of consuming all the old messages.

rockwotj Jun 7, 2023
Maintainer

correct only new messages go to the new partitions, but the load is now 50 msgs/sec on each partition, which means that consumer CA now have 50 msgs/sec "extra" capacity to process the backlog on that partition.

pdeva Jun 7, 2023
Author

true but since the old data is still all in 1 partition it can still only be consumed by a single consumer. i guess the only real solution is to stop writing new data to the lagging partition either completely or atleast until it has had time to fully catch up (which could take say 10 or 100 hours).

rockwotj Jun 7, 2023
Maintainer

true but since the old data is still all in 1 partition it can still only be consumed by a single consumer.

Not directly true, it's possible to do reads outside of consumer groups over the old ranges (this is my first solution).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strategy for lagging consumers #11145

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

strategy for lagging consumers #11145

pdeva Jun 1, 2023

Replies: 1 comment · 4 replies

rockwotj Jun 7, 2023 Maintainer

pdeva Jun 7, 2023 Author

rockwotj Jun 7, 2023 Maintainer

pdeva Jun 7, 2023 Author

rockwotj Jun 7, 2023 Maintainer

pdeva
Jun 1, 2023

Replies: 1 comment 4 replies

rockwotj
Jun 7, 2023
Maintainer

pdeva Jun 7, 2023
Author

rockwotj Jun 7, 2023
Maintainer

pdeva Jun 7, 2023
Author

rockwotj Jun 7, 2023
Maintainer