Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-6371] Add TLS support #4

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

[DPE-6371] Add TLS support #4

wants to merge 5 commits into from

Conversation

imanenami
Copy link
Contributor

Changes:

  • Adds TLS support on the REST interface.
  • Adds TLS support on the kafka-client interface

Note: There is an issue with kafka-operator when TLS is enabled on kafka-client interface, reported here, which affects this PR. once that issue is fixed, the kafka-client + TLS should work properly on this charm.

@imanenami imanenami changed the base branch from main to dpe-6369 January 27, 2025 13:32
@imanenami imanenami force-pushed the dpe-6371 branch 2 times, most recently from 9a95f78 to 4498bd3 Compare January 27, 2025 14:03
@imanenami imanenami marked this pull request as ready for review January 27, 2025 14:03
@imanenami imanenami requested a review from deusebio January 27, 2025 14:04
Copy link
Collaborator

@deusebio deusebio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks very very good! I feel that we really do have a lot of re-usable code and patterns for TLS that we can use across the board. Reviewing these PRs is becoming almost boring :P I believe the next step here would really be to decouple the classes into a common class, and then implement the specifics bit in a subclass. It is similar to a comment I had to a PR in Spark, see here. Maybe you can pair between teams to discuss this

Also by taking inspiration from the PR in Spark, I only have a small point to add a "direct" check for the certificate in the integration tests.

src/core/models.py Outdated Show resolved Hide resolved
src/literals.py Outdated Show resolved Hide resolved
tests/integration/test_tls.py Show resolved Hide resolved
@imanenami imanenami requested a review from deusebio January 28, 2025 10:53
Copy link
Collaborator

@deusebio deusebio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! thank you for addressing the comments

Copy link

@marcoppenheimer marcoppenheimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outstanding work - though I think you're a victim of success here. I mentioned in a comment already, but I think that a lot of the small tasteful changes you've made here, really should be ported back to Kafka. I think this repo is shaping up to be the example that we should follow in Kafka.

src/core/models.py Show resolved Hide resolved
src/events/kafka.py Show resolved Hide resolved
src/events/tls.py Outdated Show resolved Hide resolved
elif self.context.ready:
self.charm._set_status(Status.SERVICE_NOT_RUNNING)
else:
self.charm._set_status(self.context.status)

def _on_config_changed(self, event: ConfigChangedEvent):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: Here you want to add something similar to what we have in Kafka, to ensure that we request new certificates when our SANs change (if IP changed during a pod reschedule, for example). Example from Kafka:

        current_sans = self.tls_manager.get_current_sans()
        if not (properties and zk_jaas):
            # Event fired before charm has properly started
            event.defer()
            return
        current_sans_ip = set(current_sans["sans_ip"]) if current_sans else set()
        expected_sans_ip = set(self.tls_manager.build_sans()["sans_ip"]) if current_sans else set()
        sans_ip_changed = current_sans_ip ^ expected_sans_ip
        current_sans_dns = set(current_sans["sans_dns"]) if current_sans else set()
        expected_sans_dns = (
            set(self.tls_manager.build_sans()["sans_dns"]) if current_sans else set()
        )

        sans_dns_changed = (current_sans_dns ^ expected_sans_dns) - {
            # we omit 'kafka/{unit_id}' and 'kafka' here to avoid a bug with Digicert not supporting '/' characters in SANs
            # Digicert truncates the 'kafka/{unit_id}' to just 'kafka'
            # i.e don't assume we need new certs if 'diff' includes those value, as these SANs aren't typically used anyway
            self.charm.state.unit_broker.unit.name,
            self.charm.state.cluster.app.name,
        }

        # update environment
        self.config_manager.set_environment()

    
          
            
    

          
          Expand Down
    
    
  
        self.charm.unit.set_workload_version(self.workload.get_version())
        if sans_ip_changed or sans_dns_changed:
            logger.info(
                (
                    f'Broker {self.charm.unit.name.split("/")[1]} updating certificate SANs - '
                    f"OLD SANs IP = {current_sans_ip - expected_sans_ip}, "
                    f"NEW SANs IP = {expected_sans_ip - current_sans_ip}, "
                    f"OLD SANs DNS = {current_sans_dns - expected_sans_dns}, "
                    f"NEW SANs DNS = {expected_sans_dns - current_sans_dns}"
                )
            )
            self.charm.tls.certificates.on.certificate_expiring.emit(
                certificate=self.charm.state.unit_broker.certificate,
                expiry=datetime.now().isoformat(),
            )  # new cert will eventually be dynamically loaded by the broker
            self.charm.state.unit_broker.update(
                {"certificate": ""}
            )  # ensures only single requested new certs, will be replaced on new certificate-available event
return # early return here to ensure new node cert arrives before updating advertised.listeners

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That paste was awful. It's in canonical/kafka-operator#297

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question how would the SANs change here? I mean, pod rescheduling would always re-request certificates I would think, no? it is like a new unit coming up live... We don't have extra listeners / sans or load-balancing yet, right?

Having said this, probably have the code to be already structured when the IP may change is not bad, I'm just wondering whether we would already have a use-case (and therefore a test for this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test the logic, I've already added a unit test. We could add an integration test by simulating an IP change with lxc config device add {machine_name} eth0 none and then lxc config device remove {machine_name} eth0 but I think it's more applicable to k8s ...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the pod reschedules, it's still the same 'unit', so it would grab the existing certs from relation data. But the IP would be different now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though come to think of it, we should probably add the expose-external logic here. I'll put it in the next sprint.

src/managers/tls.py Show resolved Hide resolved
src/managers/tls.py Show resolved Hide resolved
@imanenami imanenami changed the base branch from dpe-6369 to main January 30, 2025 12:11
@imanenami
Copy link
Contributor Author

@deusebio @marcoppenheimer
I had to rebase on main and sign the commits, and I have made the necessary changes for supporting SANs change detection. I'll really appreciate a re-review. The changes after your first approval is on this commit: a0de51c

src/managers/tls.py Show resolved Hide resolved
Copy link
Collaborator

@deusebio deusebio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Great job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants