Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying multiple topics in the topics property attribute logically fails in AEP if topic messages need different dataset id's, operations, stream flow id's etc #53

Open
Leeacarroll opened this issue May 30, 2023 · 3 comments

Comments

@Leeacarroll
Copy link

Subject of the issue

This is an enhancement. Support for multiple topic ingestion is very limited to the use-case that all topics messages have the same dataset id, operation, dataflow id etc.

Messages should be filtered by topic and streamed into AEPPublisher.producer.post updates based upon topic specific batches.
Each topic batch could then have topic specific headers appended. Using something like this in the config:

aep.connection.endpoint.topic-x.headers=...
aep.connection.endpoint.topic-y.headers=...
aep.connection.endpoint.topic-z.headers=...
aep.connection.endpoint.headers=...

the current headers attribute could provide default/common headers with the specific topic headers over writing/adding new ones

I could create a pull request for this if the committers are interested / supportive. The issues I'm concern with are:

  • how does this change playout in terms of connector performance (each set of sink records provided from kafka will now produce 0 to many http requests to the aep end point.
  • can we share the same auth token (I think we can...)
  • how does this impact on the configuration of the kafka micro batching parameters (maybe it doesn't)

Your environment

All

Steps to reproduce

set property topics=a,b,c

where a,b,c are topics with messages which have different aep dataset id's or require different operations or flow id's

Observe stitching, topic update logic and values will be broken within aep

Expected behaviour

NA

Actual behaviour

NA

@OneCricketeer
Copy link
Contributor

The recommendation would be to make N different configs

name=connector-a
topics=a
aep.connection.endpoint.headers=a-headers
name=connector-b
topics=b
aep.connection.endpoint.headers=b-headers

@Leeacarroll
Copy link
Author

Hi
The issue with running multiple connectors is expense when running on thrird party saas offerings such as MSK. Effectively you end up running x number of serverless clusters rather than just 1. MSK also limits the number of compacted partitions on a kafka cluster so that the cluster can only handle <4 connectors.

The above begin to add up to a valid user-case. At the very least a documentation change to explain when to use "topics" (plural) property and when not to would be good.

@OneCricketeer
Copy link
Contributor

OneCricketeer commented May 31, 2023

I'd recommend using ECS over MSK Connect.

Compacted topics have nothing to do with running connectors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants