Skip to content
This repository has been archived by the owner on Mar 24, 2021. It is now read-only.

Consume multiple topics at once #395

Closed
magcius opened this issue Dec 17, 2015 · 6 comments
Closed

Consume multiple topics at once #395

magcius opened this issue Dec 17, 2015 · 6 comments
Assignees
Labels

Comments

@magcius
Copy link

magcius commented Dec 17, 2015

Hey,

Since you retrieve a consumer from a topic, the API doesn't have a possibility to consume multiple topics at once. Is it possible to consume multiple topics at once from the same consumer, with messages from each topic interleaved?

I suppose I could create multiple consumers and somehow round-robin between them, but I can't figure out a way to poll on all of them at a time, to know when one is ready.

Am I missing something obvious?

@emmettbutler
Copy link
Contributor

Duplicate of #354

@emmettbutler
Copy link
Contributor

Hi @magcius, thanks for asking. The current pykafka consumer API doesn't support using a single SimpleConsumer instance to consume multiple topics. This behavior is something that's been considered before (see #354) and I wouldn't be surprised to see it included in a future release.
You can get similar behavior by creating multiple consumers and polling them continuously - something as simple as this would get you most of the way there:

consumers = mymodule.get_consumers()
while True:
    for consumer in consumers:
        msg = consumer.consume(block=False)
        if msg is not None:
            print "Got message {} for topic {}".format(msg.value, consumer.topic.name)

I think this should be ok CPU-wise, since the consumer internally uses a semaphore to avoid busywaiting. If that doesn't prove to be true, you could look at accessing its internal semaphore to avoid unnecessary CPU usage.

I'm going to close this as a duplicate, so please post any other concerns related to this issue on #354

@magcius
Copy link
Author

magcius commented Dec 17, 2015

Oh, I didn't realize there was a block=False flag. Thanks, that's exactly what I needed!

@magcius
Copy link
Author

magcius commented Dec 17, 2015

Actually, this will still spin CPU-wise, since we still need to keep checking the semaphore over and over -- there's no way to wait on multiple consumers at once, you need an event flag for that.

A better approach would be to split out the topic balancer and the thing that pulls from the consumer, so that all the partitions go through the semaphore.

@magcius
Copy link
Author

magcius commented Dec 18, 2015

Yeah, can confirm that your approach, as written, rapidly spins the CPU 100% on my EC2 boxes. Access to the internal semaphore doesn't help, since you effectively need to poll on multiple semaphores at once.

@pdex
Copy link
Contributor

pdex commented Mar 1, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants