Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling TLS Cert chain? #51

Open
ryancbutler opened this issue Nov 18, 2019 · 16 comments
Open

Handling TLS Cert chain? #51

ryancbutler opened this issue Nov 18, 2019 · 16 comments
Assignees

Comments

@ryancbutler
Copy link

ryancbutler commented Nov 18, 2019

Expected Behavior

I should be able to include an intermediate certificate while building a cluster allowing a full tls\ssl chain to be present. Currently using Letsencrypt for a wildcard certificate.

Actual Behavior

Only the host certificate is present in the chain missing any intermediates.

Steps to Reproduce

Export a PFX including all intermediate and\or root certificates.
image
Once exported and TFE cluster is deployed openssl s_client -connect or ssllabs host checker will come back with chain issues causing issues with VCS webhooks.
image
image

@ausfestivus
Copy link
Contributor

Hey @ryancbutler, suggest you use the template available here to better flesh out the explanation in this fault.

@ryancbutler
Copy link
Author

Thanks! Hope this works!

@ryancbutler
Copy link
Author

I attempted to update the cert from https://servername:8800/console/settings which updates the replicated console cert but doesn't look like it does anything for the TFE app. Is there seriously no way to update a certificate?

@bnferguson
Copy link
Contributor

Hi Ryan! Apologies for missing this! Updating the certificate there doesn't work the way it did in the non HA version as now the TLS connection is terminated at the load balancer. You'll need to update the file used in your terraform, and apply that so that it gives the load balancer the new certificate.

We keep that setting there as some folks may want to do their certificate setup differently, but I think we could benefit from a note in the HA version of the application as it is quite confusing!

@ryancbutler
Copy link
Author

Ok, updating the cert makes sense. Thanks! What about the chain issue @bnferguson ?

@bnferguson
Copy link
Contributor

Sure! I believe you need to export the entire cert chain in order for the cert before converting it to PFX as outlined here https://docs.microsoft.com/en-us/azure/app-service/configure-ssl-certificate#upload-a-private-certificate. Not the whole thing, but the merging and exporting bits as prep.

@ryancbutler
Copy link
Author

Yep, verified the PFX contains the entire chain before running (see above). Something along the way only grabs only the main cert and not the entire chain.

@bnferguson
Copy link
Contributor

Hmm, interesting, when you see this is it when connecting to the server or when the server is reaching out (say when doing a run or setting up VCS connections). With the latter you may need to add a CA bundle to the Replicated console, though I'd expect the LetsEncrypt cert to work without it.

@ryancbutler
Copy link
Author

I first noticed it when attempting to configure the VCS since Github didn't like the missing cert. So after checking the actual server (443 and 8800) noticed it was missing. At one point I tried adding to the CA just for troubleshooting without any luck.

@bnferguson
Copy link
Contributor

@ryancbutler Been working out some other Azure issues for the last week. Planning on getting to this this week to try to reproduce this.

@bnferguson
Copy link
Contributor

Sorry for the delay here, had some other pressing bugs and then the holidays hit. Jumped on this first thing and have reproduced the issue. It's quite baffling! The pfx has the full certchain, but when it gets served it seems like it's been stripped down to just the final cert without the intermediate.

I'm also working on the 0.12 conversion so as I do that I'll be looking at how we put together some of the cert things to see what might be causing this. Also am asking around as I've not seen a cert do this before (well, I have, but it was more like someone forgot to add the line that included the intermediate certs. When they're bundled together like they are in a PFX or even some of my experiments with a full chain in the cert I would expect that not to be possible).

Anyway, just wanted to give an update that this is still something I'm looking at!

@bnferguson bnferguson self-assigned this Jan 8, 2020
@bnferguson
Copy link
Contributor

Oh and on the Webhooks side of things, there is a work around of disabling SSL Verification on the GitHub side (https://github.com/[owner]/[repository]/settings/hooks, then look for the hook to the TFE install). It's sub-optimal but it gets things working.

I had no issues with the OAuth/org setup in my reproduction.

@bnferguson
Copy link
Contributor

Have tracked this issue down to how Azure's waagent decodes PFX files from the Key vault. Apparently it's a known thing that it only places the root and the leaf skipping any intermediates. But this only happens when you add the certificate as a certificate to the Vault as opposed to say, as a secret.

If we go this route, we'll probably change the interface to take a cert and a key like we do with other cloud providers (instead of PFX) and rely less on Azure's method of placing certs on servers.

We're reworking the interfaces of the modules to be much easier to work with along with the 0.12 upgrade, and I think we may fix this issue with that since it'd be changing the interface.

@pearcec
Copy link

pearcec commented Jan 21, 2020

Not sure if I am facing the same or similar issue. Looking for some next steps advice. We want to deploy internally, but I was willing to try public IP usage in an attempt to test the deployment. Further we use an internal CA which has an intermediate. Doesn't look like I can specify the Root CA. Also do I need to do a wildcard cert? I couldn't find documentation on this other than this thread.

Basically the install seems to be stalled. I think the healthprobes from the LB are failing cause of the SSL handshake and the installer doesn't continue.

from:tail -f apiserver
{"log":"I0121 18:10:03.498299 1 log.go:172] http: TLS handshake error from 168.63.129.16:58994: EOF\n","stream":"stderr","time":"2020-01-21T18:10:03.498488441Z"}

from:systemctl status kubelet
Jan 21 18:12:01 tfe-iofnjq38-primary-0 kubelet[9944]: E0121 18:12:01.720811 9944 kubelet.go:2248] node "tfe-iofnjq38-primary-0" not found

Is there a terraform module for individual deployment in azure?

@pearcec
Copy link

pearcec commented Jan 21, 2020

I did find https://www.terraform.io/docs/enterprise/before-installing/index.html#tls-certificate-and-private-key about using a wildcard -- I also now see information about the CA bundle and private CAs. I will take a look at this.

@pearcec
Copy link

pearcec commented Jan 22, 2020

I deployed with the private bundle and it picked that up. I could see the log in /tmp/ptfe-customer-certs/. I am confused as to why the load balancer still isn't working. Shouldn't the certificate on primary for port 6443 respond with my certificate? I still get the error messages #51 (comment)

root@tfe-810jzzx7-primary-0:/var/log# wget https://10.134.34.6:6443/
--2020-01-22 14:36:20--  https://10.134.34.6:6443/
Connecting to 10.134.34.6:6443... connected.
ERROR: cannot verify 10.134.34.6's certificate, issued by ‘CN=kubernetes’:
  Unable to locally verify the issuer's authority.
To connect to 10.134.34.6 insecurely, use `--no-check-certificate'.
root@tfe-810jzzx7-primary-0:/var/log# openssl s_client -connect 10.134.34.6:6443
CONNECTED(00000003)
depth=0 CN = kube-apiserver
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 CN = kube-apiserver
verify error:num=21:unable to verify the first certificate
verify return:1
---
Certificate chain
 0 s:/CN=kube-apiserver
   i:/CN=kubernetes
---
Server certificate
-----BEGIN CERTIFICATE-----

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants