diff --git a/linkerd.io/content/blog/2024/0206-linkerd-and-vault.md b/linkerd.io/content/blog/2024/0206-linkerd-and-vault.md new file mode 100644 index 0000000000..b3057bfde7 --- /dev/null +++ b/linkerd.io/content/blog/2024/0206-linkerd-and-vault.md @@ -0,0 +1,668 @@ +--- +author: 'flynn' +date: 2024-02-06T00:00:00Z +title: |- + Workshop Recap: Linkerd Certificate Management with Vault +url: + /2024/02/06/linkerd-certificates-with-vault/ +thumbnail: '/uploads/2023/09/nasa-world-square.jpg' +featuredImage: '/uploads/2023/09/nasa-world-rect.jpg' +tags: [Linkerd, linkerd, "2.14", features, vault] +featured: false +--- + +{{< fig + alt="Linkerd 2.14" + title="image credit: [NASA](https://unsplash.com/@nasa?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)" + src="/uploads/2023/09/nasa-world-rect.jpg" >}} + +_This blog post is based on a workshop that I delivered way back in September +2023(!) at Buoyant’s [Service Mesh +Academy](https://buoyant.io/service-mesh-academy). If this seems interesting, +check out the [full +recording](https://buoyant.io/service-mesh-academy/linkerd-with-external-cas-using-vault)!_ + +## Linkerd Certificate Management with Vault + +Linkerd's ability to automatically secure communications using mTLS has always +been one of its headline features. Of course, mTLS requires certificates, and +managing certificates can be very tricky: you need to generate them, rotate +them, and distribute them to all the places that need them... while still +being careful to avoid exposing any private keys. + +For many of the demos we do, we sidestep all this by letting `linkerd install` +silently generate the certificates we need, then ignoring them beyond +that. This is vaguely OK for a demo, but it's totally unacceptable for the +real world. In the real world: + +- The secret key for Linkerd's trust anchor should never be stored on the + cluster at all. + +- However, you'll need access to the secret key to rotate the identity issuer + certificate, which should happen frequently. + +- Finally, your organization may require that the trust anchor certificate + must be issued by a corporate certifying authority (CA), rather than being + some self-signed monstrosity. (They might require that of the identity + issuer certificate too: in many situations, the corporate security folks + don't like delegating issuing authority, for various reasons.) + +Ultimately, the way to tackle all of these issues is to use an _external CA_ +to issue at least the trust anchor certificate. There are several ways to set +that up: in this article, we'll walk through a fairly real-world scenario: + +- We'll install Linkerd without generating any certificates by hand, and + without having Linkerd generate the certificates itself; + +- We'll have Vault running _outside_ the cluster to store keys and generate + certificates; and + +- We'll have cert-manager running _inside_ the cluster to get the things + Linkerd needs from Vault, and store them where Linkerd needs to find them. + +Note that our goal is **not** to teach you how to use Vault, in particular: +it's to show a practical, relatively low-effort way to actually use external +PKI with Linkerd to bootstrap a zero-trust environment in Kubernetes. Many +companies have existing external PKI already set up (whether with Vault or +something else); being able to make use of it without too much work is a huge +win + +## The Setup + +In order to demo all this simply, we'll be running Kubernetes in a `k3d` +cluster. We'll run Vault in Docker to make things easy to demo, but we will +_not_ be running Docker in Kubernetes: Vault will run as a separate Docker +container that happens to be connected to the same Docker network as our `k3d` +cluster. + +The big win of this setup is that you can run it completely on a laptop with +no external dependencies. If you want to replicate this with a cluster in the +cloud, that's no problem: just figure out a reasonable place outside the +cluster to run your Vault instance, and make sure that both your Kubernetes +cluster and your local machine have IP connectivity to your Vault +instance. Everything else should be pretty much the same. + +The way all the pieces fit together here is more complex than normal: + +- We'll start by creating our `k3d` cluster. This will be named `pki-cluster`, + and we'll tell `k3d` to connect it to a network named `pki-network`. + +- We'll then fire up Vault in a Docker container that's also connected to + `pki-network`. (And yes, we'll use Vault in dev mode to make life easier, + but that's the only way we'll cheat in this setup.) + +- We'll then use the `vault` CLI _running on our local machine_ to configure + Vault in Docker. + +Taken together, this implies that we'll have to make sure that we can talk to +the Vault instance both from inside the Docker network and from our host +machine. This mirrors many real-world setups where your Kubernetes cluster is +on one network, but you do administration from a different network. + +### Tools of the trade + +You'll need several CLI tools for this: + +- `linkerd`, from `https://linkerd.io/2/getting-started/`; +- `kubectl`, from `https://kubernetes.io/docs/tasks/tools/`; +- `helm`, from `https://helm.sh/docs/intro/quickstart/`; +- `jq`, from `https://jqlang.github.io/jq/download/`; +- `vault`, from `https://developer.hashicorp.com/vault/docs/install`; and +- `step`, from `https://smallstep.com/docs/step-cli/installation`. + +Of course you'll also need Docker. You can get that from +`https://docs.docker.com/engine/install/`, or you can try Colima from +`https://github.com/abiosoft/colima` instead. + +### Starting our `k3d` cluster + +Creating the `k3d` cluster looks horrible, but isn't that bad: + +```bash +k3d cluster create pki-cluster \ + -p "80:80@loadbalancer" -p "443:443@loadbalancer" \ + --network=pki-network \ + --k3s-arg '--disable=local-storage,traefik,metrics-server@server:*;agents:*' +``` + +(If you already have a cluster named `pki-cluster`, you'll need to delete it, +or change the name above.) + +This command looks complex, but it's actually less terrible than you might +think -- most of it is just turning off things we don't need (traefik, +local-storage, and metrics-server), and we also expose ports 80 and 443 to our +local system to make it easy to try services out. + +At this point, you should be able to run things like `kubectl get ns` or +`kubectl cluster-info` to verify that you can talk to your cluster. If not, +you'll need to figure out what's wrong and fix it. + +### Starting Vault + +We have a running `k3d` cluster, so now let's get Vault going. This is another +complex-looking command: + +```bash +docker run \ + --detach \ + --rm --name vault \ + -p 8200:8200 \ + --network=pki-network \ + --cap-add=IPC_LOCK \ + hashicorp/vault \ + server \ + -dev -dev-listen-address 0.0.0.0:8200 \ + -dev-root-token-id my-token +``` + +Breaking this down, we start with `docker run` since we want to start a +container running, and then provide a lot of parameters: + +- `--detach`: basically, run the container in the background; + +- `--rm --name vault`: remove the container when it dies, and name it "vault" + so we can find it easily later; + +- `-p 8200:8200`: expose Vault's API port to our local system; + +- `--network=pki-network`: connect to the same network as our `k3d` cluster; + and + +- `--cap-add=IPC_LOCK`: give the container the `IPC_LOCK` capability, which + Vault needs. + +Next is the image name (`hashicorp/vault`), and then comes the command line +for Vault itself: + +- `server` is the (creatively named) command to run; + +- `-dev`: run Vault in developer mode; + +- `-dev-listen-address 0.0.0.0:8200`: bind on port 8200 on all interfaces + rather than just `localhost`; and + +- `-dev-root-token-id my-token`: set the dev-mode root "password" to + `my-token`, which we will use to trivially log in later. + +Once you run that, you'll have Vault running in a Docker container, hooked up +to the same network as the `pki-cluster` we started a moment ago. (Again, if +you already have a container named `vault` you'll either need to kill it or +change the name above.) + +Next up, we'll want to use the `vault` CLI on the local host to configure +Vault. We'll start by setting the `VAULT_ADDR` environment variable, so that +we don't have to include it in every command. Remember, we'll be running the +`vault` CLI on our local system, so we can just do this all using our local +shell. + +```bash +export VAULT_ADDR=http://0.0.0.0:8200/ +``` + +At this point you should be able to run `vault status` to make sure that all +is well. + +### Setting up Vault + +While this isn't a blog about how to operate Vault, we still need to configure +Vault to work the way Linkerd needs it to. We're not going to dive too deep +into the details here, but we'll talk a bit about it as we go. + +First up, we'll authenticate our `vault` CLI to the Vault server, using the +`dev-root-token-id` that we passed to the server when we started it running. + +```bash +vault login my-token +``` + +Next up, we need to enable the Vault PKI engine, so that we can work with +X.509 certificates at all, and configure its maximum allowed expiry time for +certificates. Here we're using 90 days (2160 hours). + +```bash +vault secrets enable pki +vault secrets tune -max-lease-ttl=2160h pki +``` + +After that, we need to tell Vault to enable the URLs that cert-manager expects +to use when talking to Vault. + +```bash +vault write pki/config/urls \ + issuing_certificates="http://127.0.0.1:8200/v1/pki/ca" \ + crl_distribution_points="http://127.0.0.1:8200/v1/pki/crl" +``` + +Finally, cert-manager will need to present Vault with a token before Vault +will actually do things that cert-manager needs. Vault associates tokens with +_policies_, which are kind of like roles in other systems, so we'll start by +creating a policy that allows us to do anything... + +```bash +cat < +Annotations: + +Type: Opaque + +Data +==== +token: 95 bytes +``` + +## Configuring cert-manager: the Vault issuer + +Recall that Linkerd needs two certificates: + +- the _trust anchor_ is the root of the heirarchy for Linkerd; and +- the _identity issuer_ is an intermediate CA cert that must be signed by the + trust anchor. + +We've already told Vault to create the trust anchor for us: next up, we need +to configure cert-manager to create the identity issuer certificate. To do +this, cert-manager will produce a _certificate signing request_ (CSR), which +it will then hand to Vault. Vault will use the CSR to produce a signed +identity issuer for cert-manager. + +To make all this happen, we use a cert-manager ClusterIssuer resource to tell +cert-manager how to talk to Vault. This ClusterIssuer needs three critical bits +of information: + +1. The access token, which we just saved in a Secret. +2. The address of the Vault server. +3. The URL path to use to ask Vault for a new certificate. For Vault, this is + `pki/root/sign-intermediate`. + +So the address of the Vault server is the missing bit at the moment: we can't +use `0.0.0.0` as we've been doing from our local host, because cert-manager +needs to talk to Vault from inside the Docker network. That means we need to +figure out the address of the `vault` container within that network. + +Fortunately, that's not that hard: `docker inspect pki-network` will show us +all the details of everything attached to the `pki-network`, as JSON, so we +can use `jq` to extract the single bit that we need: the `IPv4Address` +contained in the block that also has a `Name` of `vault`: + +```bash +VAULT_DOCKER_ADDRESS=$( + docker inspect pki-network \ + | jq -r '.[0].Containers | .[] | select(.Name == "vault") | .IPv4Address' \ + | cut -d/ -f1 + ) +``` + +Given the right address for Vault, we can assemble the correct YAML: + +```bash +sed -e "s/%VAULT_DOCKER_ADDRESS%/${VAULT_DOCKER_ADDRESS}/g" \ + < /tmp/vault-issuer.yaml +apiVersion: cert-manager.io/v1 +kind: ClusterIssuer +metadata: + name: vault-issuer + namespace: cert-manager +spec: + vault: + path: pki/root/sign-intermediate + server: http://%VAULT_DOCKER_ADDRESS%:8200 + auth: + tokenSecretRef: + name: my-secret-token + key: token +EOF +``` + +(If you look at `/tmp/vault-issuer.yaml`, you'll see that the `server` element +has the correct IP address in it.) Let's go ahead and apply that, then check +to make sure it's happy. + +```bash +kubectl apply -f /tmp/vault-issuer.yaml +kubectl get clusterissuers -o wide +``` + +You should see the `vault-issuer` show with `READY` true and `STATUS` "Vault +verified", telling us that cert-manager was able to talk to Vault. + +```text +NAME READY STATUS AGE +vault-issuer True Vault verified 6s +``` + +Now that cert-manager can sign our certificates, let's go ahead and tell +cert-manager how to set things up for Linkerd. First, we'll use a Certificate +resource to tell cert-manager how to use the Vault issuer to issue our +identity issuer certificate: + +```bash +kubectl apply -f - <