-
Notifications
You must be signed in to change notification settings - Fork 232
batching parameter on consume() #781
Comments
Thanks for the suggestion, @arita37. A Your mention of needing to "limit database connection access" does give me some pause, though. Are you imagining an argument that would perform some kind of grouping based on message content? If so, that's better handled by producing with a partition key. It's also straightforward enough to build the list in client code, like so:
|
1) Yes, that is the core of the question.
grouping message based on contents.
And post process the grouping messages.
Usually, latency come from post-processing
the messages (database access).
|
Pykafka aims to remain completely ignorant of the contents of the messages it processes, so I don't think mechanisms allowing grouping based on message content will ever be added. That said, Kafka itself provides partition keying as a way to consume messages in meaningful subgroups within a topic. You can pass an instance of the I would also recommend examining your topic setup and considering using more topics to achieve your logical grouping. If every message in your topic is part of the same logical group, you'll be able to take advantage of pykafka's automatic consumption balancing via the |
Reason is :
Need to limit database connection access
by grouping key access together.
Hence, the idea of processing message by mini-batch.
msg_list = consume(10msg,...)
We can do ourself as a loop,
but wondering if this can be part of pykafka.
The text was updated successfully, but these errors were encountered: