Skip to content
This repository has been archived by the owner on Sep 2, 2024. It is now read-only.

KafkaChannel Configuration option on a per object base #1005

Open
matzew opened this issue Nov 30, 2021 · 23 comments
Open

KafkaChannel Configuration option on a per object base #1005

matzew opened this issue Nov 30, 2021 · 23 comments
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature-request triage/accepted Issues which should be fixed (post-triage)

Comments

@matzew
Copy link
Contributor

matzew commented Nov 30, 2021

Problem

We currently have a global configuration for one dedicated Apache Kafka cluster:
https://github.com/knative-sandbox/eventing-kafka/blob/main/config/channel/consolidated/configmaps/kafka-config.yaml#L24

this forces all KafkaChannel objects to be connected against a single kafka cluster.

This does not fly well, when offering "Knative Kafka" as a service, where customers have desire for individual Kafka cluster usage, on a per channel bases.

The global default is a convenient setting for on prem, but not for more cloud-centric cases

Proposal

Adding the following fields to the KafkaChannel spec

  • bootstrapServers
  • authSecretName (assumes that the actual secret is living in the namespace of the KafkaChannel object)

Persona:
Which persona is this feature for?

Exit Criteria
A measurable (binary) test that would indicate that the problem has been resolved.

Time Estimate (optional):
How many developer-days do you think this may take to resolve?

Additional context (optional)
Add any other context about the feature request here.

@aliok
Copy link
Member

aliok commented Nov 30, 2021

+1

Why these are missing in the first place:

  • Being able to use the generic Channel
  • Preventing app developers to change "admin"y things like the bootstrap server and auth that's defined per cluster

Although I understand these usecases, it would be really nice to have the ability to pass these in the spec.

Something logic like:

  • If available in the spec, use them
  • If not, fallback to the configmap

In managed service case, we wouldn't like the fallback actually. So, maybe also a way to enable/disable fallback too (probably in the configmap) ?

@pierDipi pierDipi added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Nov 30, 2021
@pierDipi
Copy link
Member

pierDipi commented Nov 30, 2021

What about this similar to KafkaSink:

spec:
   bootstrapServers:
      - my-cluster-kafka-bootstrap.kafka:9092
   auth:
     secret:
       ref:
         name: my_secret

It's a bit more verbose, but it will give us more flexibility in the future regarding "auth(n|z)".

@matzew
Copy link
Contributor Author

matzew commented Nov 30, 2021

Being able to use the generic Channel

IMO a weak part of the API, only interesting on the surface

@matzew
Copy link
Contributor Author

matzew commented Nov 30, 2021

In managed service case, we wouldn't like the fallback actually. So, maybe also a way to enable/disable fallback too (probably in the configmap) ?

I'd expect someone would set it up WITHOUT any default, on cases like that - hence the fallback would fail, due to no bootstrap present. Log will tell the devs.

No need to configure this too

@travis-minke-sap
Copy link
Contributor

You would still be limited to a single "configuration" of the Kafka "clients" (i.e. SASL, TLS, etc in Sarama.Config) - is that an acceptable assumption (i.e. all Kafka instances are similar and use the same set of Kafka Root Certs, etc..) or should there also be a field pointing to the ConfigMap (in which case the brokers and Secret name/namespace can just stay in the ConfigMap)? We are also watching the ConfigMap / Secret for changes and reacting to them in controller - would have to consider impacts there (watching multiple/all such instances, etc.)

I thought the intent was to use the "scope" (namespaced) annotation to allow distinct ConfigMap / Secret per-namespace ? (not that the we support that fully today or anything ; ). This allows for a middle-ground where each Kafka instance is specified once without requiring every Topic in the Kafka instance to replicate it. I have't thought through that solution completely so maybe there are blockers/difficulties? Has that annotation based approach been scrapped, or are the two supposed to co-exist?

The intent is valid/good (multi-tenant separate Kafka instance support) and in general this approach is ok as long as there is fallback to current usage for those who don't want to have to specify the brokers every time (because they will always only have a single instance).

@pierDipi
Copy link
Member

You would still be limited to a single "configuration" of the Kafka "clients" (i.e. SASL, TLS, etc in Sarama.Config) - is that an acceptable assumption (i.e. all Kafka instances are similar and use the same set of Kafka Root Certs, etc..) or should there also be a field pointing to the ConfigMap (in which case the brokers and Secret name/namespace can just stay in the ConfigMap)? We are also watching the ConfigMap / Secret for changes and reacting to them in controller - would have to consider impacts there (watching multiple/all such instances, etc.)

See #1005 (comment), auth and bootstrap servers would be for each KafkaChannel instance.

I thought the intent was to use the "scope" (namespaced) annotation to allow distinct ConfigMap / Secret per-namespace ? (not that the we support that fully today or anything ; ). This allows for a middle-ground where each Kafka instance is specified once without requiring every Topic in the Kafka instance to replicate it. I have't thought through that solution completely so maybe there are blockers/difficulties? Has that annotation based approach been scrapped, or are the two supposed to co-exist?

This might not work in all cases.

The intent is valid/good (multi-tenant separate Kafka instance support) and in general this approach is ok as long as there is fallback to current usage for those who don't want to have to specify the brokers every time (because they will always only have a single instance).

Agree!

@cardil
Copy link
Contributor

cardil commented Jan 27, 2022

I don't like the idea to add auth to channel object. Instead, a better approach would be to create additional CR KafkaAuthentication (or even one more KafkaAuthenticationBinding).

As @aliok mentioned, adding auth breaks option to use default channel, and creep administrative information into developer scoped type.

Putting auth in separate type allows us to keep those important functions. It also, allows to have centralized authentication model, that can be fine tuned in various ways - matching by ref, by namespace, by prefix. It's also less verbose, as you don't need to type auth in every object (sometimes hundreds of times).

@aliok
Copy link
Member

aliok commented Feb 10, 2022

The intent is valid/good (multi-tenant separate Kafka instance support) and in general this approach is ok as long as there is fallback to current usage for those who don't want to have to specify the brokers every time (because they will always only have a single instance).

Glad to hear this @travis-minke-sap, thanks for your support!

What I would want to have in the spec are at least the bootstrapServers and authsecretName/namespace.

... or should there also be a field pointing to the ConfigMap (in which case the brokers and Secret name/namespace can just stay in the ConfigMap)?

I think I would still prefer having brokers and secret name+namespace in separate fields in the spec. ...and another field pointing to a configmap. In this case, configmap wouldn't have the broker and auth info in it. Or, it can, but there would be a an order of preference for broker and secret info as:

  • spec
  • configmap pointed in the spec
  • fall back configmap (config-kafka in knative-eventing namespace)

So, let me add some YAML:

apiVersion: messaging.knative.dev/v1beta1
kind: KafkaChannel
metadata:
  name: kafka-channel
spec:
  bootstrapServers:
    - my-cluster-kafka-bootstrap.kafka:9093
  auth:
    secret:
      ref:
        name: my_secret
        # not 100% sure about this
        namespace: foo
  config:
      ref:
        name: my-config
        namespace: bar

We can fallback to my-config configmap (referenced in config field) for missing spec.bootstrapServers and spec.auth fields.
If those information also don't exist in my-config, we can fallback to the global config, config-kafka in knative-eventing.

I think this way, we can keep all channel implementations getting the fields what they need and things would still work even when users don't change anything.

About the default channel...

Changes above would still make the generic Channel resource working just like before, as things will fallback to global config.

Another thing I mentioned:

In managed service case, we wouldn't like the fallback actually. So, maybe also a way to enable/disable fallback too (probably in the configmap) ?

We can do this on the KafkaChannel implementations. Config doesn't have a strong API anyway, so the implementation doc can say "add field XYZ to prevent fallback".

@matzew
Copy link
Contributor Author

matzew commented Feb 10, 2022

+1 on the bootstrap server field

not sure here:

    secret:
      ref:
        name: my_secret
        # not 100% sure about this
        namespace: foo

I'd kick that.

Instead, let's start w/ Duck typing a spec.config, like we have on the BROKER (do we care about generic ChannelAPI?)

E.g:

...
spec:
  config:
    apiVersion: v1
    kind: ConfigMap
    name: matze-kafka-channel-config
    namespace: mwessend-dev
...

The referenced config map would follow the schema of the auth parts:

apiVersion: v1
kind: ConfigMap
metadata:
  name: matze-kafka-channel-config
  namespace: mwessend-dev
data:
  auth.secret.ref.name: kafka-sasl-secret

That follows the same syntax as in: https://knative.dev/docs/eventing/broker/kafka-broker/#security

Custom Kafka Config CRDs

In a later version, we can create those, and make the spec.config understand those too

@aliok
Copy link
Member

aliok commented Feb 10, 2022

Instead, let's start w/ Duck typing a spec.config, like we have on the BROKER (do we care about generic ChannelAPI?)

That's also an option. What I offered is consistent with KafkaSink, what you offered is consistent with KafkaBroker.
I don't have a strong opinion but, to me, every KafkaChannel implementation will need a bootstrapServer info and also auth secret info. I prefer having more and more strongly typed API fields for these, but I am fine with both.

@pierDipi
Copy link
Member

pierDipi commented Feb 10, 2022

I'd recommend following KafkaSink instead of KafkaBroker because the ConfigMap is an external object and its lifecycle management is something that becomes easily complicated and I don't see a reason for it when you can have proper spec fields.

In addition, with a ConfigMap, we also lose the ability to do validation at the webhook level (which has a nicer UX).

@matzew
Copy link
Contributor Author

matzew commented Feb 10, 2022 via email

@aliok
Copy link
Member

aliok commented Feb 11, 2022

Ok, then you agree to this one?

apiVersion: messaging.knative.dev/v1beta1
kind: KafkaChannel
metadata:
  name: kafka-channel
spec:
  bootstrapServers:
    - my-cluster-kafka-bootstrap.kafka:9093
  auth:
    secret:
      ref:
        name: my_secret
        # not 100% sure about this
        namespace: foo
  config:
      ref:
        name: my-config
        namespace: bar

@matzew
Copy link
Contributor Author

matzew commented Feb 11, 2022

IMO this looks good, except for the config fallback. Dont like that

@aliok
Copy link
Member

aliok commented Feb 11, 2022

IMO this looks good, except for the config fallback. Dont like that

I think we need the fallback because we need to be backwards compatible and we need to support the scenarios that the broker and auth info is not in the spec.

@aliok
Copy link
Member

aliok commented Feb 11, 2022

BTW, one implication of having these in spec is that reconciliation is more expensive.

  • We won't be able to cache the Sarama client and the admin client anymore.
  • We need to read the configmap referenced in the spec in every reconciliation cycle.

We can still cache the global config: config-kafka configmap.

cc @travis-minke-sap @pierDipi

@pierDipi
Copy link
Member

I think the config ref could be part of a separate discussion.

I'm imagining this high-level logic:

  • if there is bootstrapServers and/or auth use them, otherwise fall back to the same behavior as before by reading the global configmap

We won't be able to cache the Sarama client and the admin client anymore.

This is a good point but I don't think we do this even today because sarama is buggy on long-running connections, do we?

We need to read the configmap referenced in the spec in every reconciliation cycle.

That's the reason, I'd separate the 2 discussions, bootstrapServers/auth from the config.

@travis-minke-sap
Copy link
Contributor

travis-minke-sap commented Feb 14, 2022

Hey - sorry for not responding earlier - been wrapping up some internal work...

The Distributed KafkaChannel (receiver and dispatcher deployments) is currently watching a Secret (kafka-cluster), specified via environment variables, in order to dynamically re-create the Sarama clients when the SASL config changes. This is done to support automatic rotation of client passwords / tokens.

Both KafkaChannel implementations dynamically create new Sarama admin clients for every Topic creation / deletion.

If the Secret is specified in the KafkaChannel spec, then this would require control-plane -> data-plane communication from the Controller to the Receiver / Dispatchers to let them know to start using a different Secret. This could be a control-protocol type communication (in-place), or the Controller updating environment variables on the respective Deployments in order to force a rollout restart.

The main problem for the Distributed KafkaChannel though, is that there is only a single "Receiver" Deployment for the entire cluster. This instance services ALL KafkaChannels and couldn't be tied to a specific auth/config in a single KafkaChannel. Not sure how we'd address this without tackling multi-tenant namespacing.

I don't want the run-time architectural needs of the Distributed KafkaChannel to prevent you from moving forward with this more granular configuration, but we will need to present a coherent documented description of what level of configuration is supported by which components. Or have a plan to ensure all the components can support the new structure?

Maybe it's worth considering a split in the config for the new Java implementations and the existing components. I know it is too early to consider deprecating things, but sometimes dragging old baggage forward isn't worth it? Just something to stew on ; )

@aliok
Copy link
Member

aliok commented Feb 16, 2022

That's the reason, I'd separate the 2 discussions, bootstrapServers/auth from the config.

Ok, let's leave the config reference discussion here.

@aliok
Copy link
Member

aliok commented Feb 16, 2022

Thanks for detailed comments @travis-minke-sap

I don't want the run-time architectural needs of the Distributed KafkaChannel to prevent you from moving forward with this more granular configuration, but we will need to present a coherent documented description of what level of configuration is supported by which components. Or have a plan to ensure all the components can support the new structure?

We can add a validation in distributed channel webhook to reject channel resources that use the proposed fields and we can create a good documentation about the implementation differences around the fields.

Would this be an option?

@aliok
Copy link
Member

aliok commented Feb 22, 2022

Quick note:
There's another usecase: cluster admins don't want the developers to specify bootstrapServer and secret in the spec as they want to manage it themselves. So, we need something additional to enforce this.

thx @lionelvillard

@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 24, 2022
@pierDipi
Copy link
Member

/remove-lifecycle stale
/triage accepted

@knative-prow knative-prow bot added triage/accepted Issues which should be fixed (post-triage) and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature-request triage/accepted Issues which should be fixed (post-triage)
Projects
Status: Ready To Work
Development

Successfully merging a pull request may close this issue.

5 participants