-
The current architecture of kafka/rp assumes you know how much load you have in advance, since you need to choose the number of partitions for your topics statically. However, imagine this scenario: I have a topic T with 2 partitions A and B. I have thus 2 consumers CA and CB consuming from each partition. Each can process 100 msgs/second, which aligns perfectly with my current load. But during the weekend, one of my consumers CA dies. We thus restart/resume the consumer CA two days later after noticing the issue. Now CA has to process 2 days worth of data. However, this might requires processing 1,000 msgs/second. CA simply cannot process data this fast and it seems like it will forever be lagging behind CB by a very large margin. What does one do in this case to have the consumer for partition A catch up on data? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
One potential solution: Kafka/Redpanda allows for reading specific offset (position in the log) ranges. You could update consumer CA to start reading from "now". Then you have a missing range in the log that hasn't been consumed, you could spin up additional consumers to read just the sections of logs that haven't been processed. There are situations where this doesn't work (i.e. you must process messages in order, etc). Happy to help with more specifics on how to do this in your given environment/language. Another solution is to increase the number of partitions, then spin up new consumers to handle the existing load so that the consumer for partition A can catch up. When you've caught up you can redirect CA and CA to read from partitions C and D and spin down the other consumers (you don't have to have a 1:1 consumer/partition mapping) |
Beta Was this translation helpful? Give feedback.
One potential solution:
Kafka/Redpanda allows for reading specific offset (position in the log) ranges. You could update consumer CA to start reading from "now". Then you have a missing range in the log that hasn't been consumed, you could spin up additional consumers to read just the sections of logs that haven't been processed.
There are situations where this doesn't work (i.e. you must process messages in order, etc). Happy to help with more specifics on how to do this in your given environment/language.
Another solution is to increase the number of partitions, then spin up new consumers to handle the existing load so that the consumer for partition A can catch up. When you've caught u…