Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Exception when hot-reloading Let's Encrypt certs issued by alternate intermediate CAs #4509

Open
perios101 opened this issue Jul 1, 2024 · 13 comments · May be fixed by #4752 or parislarkins/security#1
Assignees
Labels
bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.

Comments

@perios101
Copy link

What is the bug?
As part of the process using the REST API to hot reload transport and http certificates, we receive a 500 error.

The old certificate (and all previous ones to date) has been issued by Let's Encrypt's R3 CA (which seems to have been alive and well as DR CA) and the new one by R11 CA.
Same scenario is happening on an environment where intermediate CA changed from R11 to R10. Both are covered by parent ISRG Root X1 CA (along with the old R3).

Restarted individual pods automatically pick up new certificate as valid without issue however hot-reloading existing instances throws the same error over and over despite both CAs are Let's Encrypt issuers covered by root CA.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Configure and use OpenSearch using Let's Encrypt certificate
  2. Renew certificate (using different intermediate CA to the original one
  3. Hot-reload certificates using REST API
  4. See error

What is the expected behavior?
Certificate validation should succeed as both certificates were issued via valid CA - pinning to same CA as the old one is not considered good practice, especially in DR scenarios (yes, Issuers have those too).

What is your host/environment?

  • OS: Amazon Linux (containerised for Kubernetes)
  • Version: OpenSearch 2.14.1
  • Plugins:

Do you have any screenshots?
Exception:
java.io.IOException: OpenSearchSecurityException[Error while initializing transport SSL layer from PEM: java.lang.Exception: New Certs do not have valid Issuer DN, Subject DN or SAN.]; nested: Exception[New Certs do not have valid Issuer DN, Subject DN or SAN.];

Do you have any additional context?

  • From digging into the Security plugin code, it appears logic HERE is the culprit.
  • We have the same issue across three different environments (affects also OpenSearch Dashboards - using same security plugin).
  • Workaround (restarting pods) exists but it's not scalable / fit as a long-term solution.
@perios101 perios101 added bug Something isn't working untriaged Require the attention of the repository maintainers and may need to be prioritized labels Jul 1, 2024
@shikharj05
Copy link
Contributor

Thanks for filing the issue, I see a previous issue #2360 where this was reported and being worked on. @shubhmkr-amazon are you still working on this one?

@shubhmkr-amazon
Copy link

Hi @shikharj05,
Yes, I am still working on this.

@stephen-crawford
Copy link
Contributor

[Triage] HI @perios101 thank you for filing this issue. Looks like a similar bug is still being worked on. I am going to assign this to @willyborankin since he has expressed interest in taking a look at this in the time being.

@stephen-crawford stephen-crawford added triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. and removed untriaged Require the attention of the repository maintainers and may need to be prioritized labels Jul 8, 2024
@tan3-netapp
Copy link

We encountered the same issue after Let's Encrypt's replaced its intermediate certs to R10/ R11. We still have the same workaround method: restart all nodes to pick up the new cert, which seems unscalable for a large number of nodes and could impact clusters without proper replication configuration.

@tan3-netapp
Copy link

Hi @shubhmkr-amazon and @willyborankin,
Could I get to know your progress on this issue? I could give you a hand if necessary.

@shubhmkr-amazon
Copy link

Hi @tan3-netapp, Earlier, I made the changes but got stuck in the unit tests due to self-signed certificates used in testing. I started working on this PR again and will try to raise it by August 29th. In case @willyborankin has made some progress, he can take the lead on this PR.

Thanks

@willyborankin
Copy link
Collaborator

Hi @tan3-netapp, the issue has been resolved in this PR: #4624. It has been merged into both the main and 2.x branches and will be included in the 2.17 release.

@shubhmkr-amazon
Copy link

Hi @willyborankin, it looks like PR: #4624 is solving a different problem. In the mentioned issue, the error is coming because of a change in the intermediate CA. Here, during the hot-reload, the issuer DN of the in-memory certificate will fail to match the new certificate in case the ICA has changed.

@tan3-netapp
Copy link

Hi @shubhmkr-amazon,
I would appreciate it if I could have a look at your working PR, just out of curiosity.

@parislarkins
Copy link

Hi @shubhmkr-amazon / @willyborankin , are either of you still working on a potential fix for this issue?

It is still causing us a fair bit of pain, so I was thinking of adding two new bool configs that could be set to false to skip the DN verification (https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/ssl/DefaultSecurityKeyStore.java#L656), for example:

  • plugins.security.ssl.transport.enforce_cert_reload_dn_verification
  • plugins.security.ssl.http.enforce_cert_reload_dn_verification

How does that sound as a potential approach?

@shubhmkr-amazon
Copy link

Hi @parislarkins, the approach you suggested of bypassing the issuerDN check is not ideal. In this case, a self-signed certificate can also be renewed, which is not trusted. I am working on a fix that will perform issuer DN validation by building a chain of trust with an in-memory trust store(However, there will be one limitation with this approach: what if the root certificate itself gets rotated?).

@parislarkins
Copy link

Hi @shubhmkr-amazon, thanks for getting back to me!

While the point you make about self-signed certificates is true, I'm unsure why this is a particular concern given that there is no such validation when OpenSearch is restarted.

My understanding is that rotating certs (either by a full restart or hot-reload) requires updating the certs on disk and then being a super admin or having the ability to restart OpenSearch. If the concern is that a malicious actor could trigger a hot-reload to change to a self-signed cert, then I think it likely that they would have the ability to restart OpenSearch as well, given they must have access to the OpenSearch server in order to update the cert on disk.

Would you be able to clarify what the concern is with being able to hot-reload to a potentially self-signed cert (on an opt-in basis), given this is allowed by restarting OpenSearch?

Thanks!

@parislarkins
Copy link

I've raised #4752 as a potential fix for this issue, to allow disabling the Distinguished Names validation when performing a hot-reload. In our case we do not think the validation is providing us additional security value, so are happy to disable it. There may be other use cases where this is not appropriate though, so the fix suggested by @shubhmkr-amazon may be the best option there, even if it doesn't work for root CA rotation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.
Projects
None yet
7 participants