PROXY protocol support for internal-to-LoadBalancer traffic for Kubernetes Ingress users, specifically for cert-manager self-checks.
If you've had problems with ingress-nginx, cert-manager, LetsEncrypt ACME HTTP01 self-check failures, and the PROXY protocol, read on.
Note these instructions are for upstream, you'll need to adapt it for this fork, sorry. PR welcome.
kubectl apply -f https://raw.githubusercontent.com/q-m/hairpin-proxy/v0.5.2/deploy.yml
If you're using ingress-nginx and cert-manager, it will work out of the box. See detailed installation and testing instructions below.
If you run a service behind a load balancer, your downstream server will see all connections as originating from the load balancer's IP address. The user's source IP address will be lost and will not be visible to your server. To solve this, the PROXY protocol preserves source addresses on proxied TCP connections by having the load balancer prepend a simple string such as "PROXY TCP4 255.255.255.255 255.255.255.255 65535 65535\r\n" at the beginning of the downstream TCP connection.
Because this injects data at the application-level, the PROXY protocol must be supported on both ends of the connection. Fortunately, this is widely supported already:
- Load balancers such as AWS ELB, AWS NLB, DigitalOcean Load Balancers, GCP Cloud Load Balancing, and Linode NodeBalancers support adding the PROXY protocol line to their downstream TCP connections.
- Web servers such as Apache, Caddy, Lighttpd, and NGINX support receiving the PROXY protocol line use the passed source IP for access logging and passing it to the application server with an
X-Forwarded-For
HTTP header, where it can be accessed by your backend.
If you configure both your load balancer and web server to send/accept the PROXY protocol, everything just works! Until...
In this case, Kubernetes networking is too smart for its own good. See upstream Kubernetes issue
An ingress controller service deploys a LoadBalancer, which is provisioned by your cloud provider. Kubernetes notices the LoadBalancer's external IP address. As an "optimization", kube-proxy on each node writes iptables rules that rewrite all outbound traffic to the LoadBalancer's external IP address to instead be redirected to the cluster-internal Service ClusterIP address. If your cloud load balancer doesn't modify the traffic, then indeed this is a helpful optimization.
However, when you have the PROXY protocol enabled, the external load balancer does modify the traffic, prepending the PROXY line before each TCP connection. If you connect directly to the web server internally, bypassing the external load balancer, then it will receive traffic without the PROXY line. In the case of ingress-nginx with use-proxy-protocol: "true"
, you'll find that NGINX fails when receiving a bare GET request. As a result, accessing http://subdomain.example.com/ from inside the cluster fails!
This is particularly a problem when using cert-manager for provisioning SSL certificates. Cert-manager uses HTTP01 validation, and before asking LetsEncrypt to hit http://subdomain.example.com/.well-known/acme-challenge/some-special-code, it tries to access this URL itself as a self-check. This fails. Cert-manager does not allow you to skip the self-check. As a result, your certificate is never provisioned, even though the verification URL would be perfectly accessible externally. See upstream cert-manager issues: proxy_protocol mode breaks HTTP01 challenge Check stage, http-01 self check failed for domain, Self check always fail
There are several ways to solve this problem:
- Modify Kubernetes to not rewrite the external IP address of a LoadBalancer.
- Modify nginx to treat the PROXY line as optional.
- Modify cert-manager to add the PROXY line on its self-check.
- Modify cert-manager to bypass the self-check.
None of these are particularly easy without modifying upstream packages, and the upstream maintainers don't seem eager to address the reported issues linked above.
- hairpin-proxy intercepts and modifies cluster-internal DNS lookups for hostnames that are served by your ingress controller, pointing them to the IP of an internal
hairpin-proxy-haproxy
service instead. (This DNS redirection is managed byhairpin-proxy-controller
, which simply polls the Kubernetes API for new/modified Ingress resources, examines theirspec.tls.hosts
, and updates the CoreDNS ConfigMap when necessary.) - The internal
hairpin-proxy-haproxy
service runs a minimal HAProxy instance which is configured to append the PROXY line and forward the traffic on to the internal ingress controller.
As a result, when pods in your cluster (such as cert-manager) try to access http://your-site/, they resolve to the hairpin-proxy, which adds the PROXY line and sends it to your ingress-nginx
. The NGINX parses the PROXY protocol just as it would if it had come from an external load balancer, so it sees a valid request and handles it identically to external requests.
Let's suppose that http://subdomain.example.com/
is served from your cluster, behind a cloud load balancer with PROXY protocol enabled, and served by an ingress-nginx. You've just tried to add cert-manager
but found that your certificates are stuck because the self-check is failing.
Get a shell within your cluster and try to access the site to confirm that it isn't working:
kubectl run my-test-container --image=alpine -it --rm -- /bin/sh
apk add bind-tools curl
dig subdomain.example.com
curl http://subdomain.example.com/
curl http://subdomain.example.com/ --haproxy-protocol
The dig
should show the external load balancer IP address. The first curl
should fail with Empty reply from server
because NGINX expects the PROXY protocol. However, the second curl
with --haproxy-protocol
should succeed, indicating that despite the external-appearing IP address, the traffic is being rewritten by Kubernetes to bypass the external load balancer.
kubectl apply -f https://raw.githubusercontent.com/q-m/hairpin-proxy/v0.5.2/deploy.yml
Note that this hairpin-proxy fork discovers ingress controllers and sets TARGET_SERVER
automatically.
Usually, CoreDNS will listen on port 53, but there are cases where listens on another port. In that case, set the environment variable COREDNS_PORT
correspondingly.
kubectl edit -n hairpin-proxy deployment hairpin-proxy-haproxy
# Within spec.template.spec.containers[0], add something like:
env:
- name: COREDNS_PORT
value: '8053'
kubectl get configmap -n kube-system coredns -o=jsonpath='{.data.Corefile}'
Once the hairpin-proxy-controller pod starts, you should immediately see one rewrite line per TLS-enabled ingress host, such as:
rewrite name subdomain.example.com hairpin-proxy.hairpin-proxy.svc.cluster.local # Added by hairpin-proxy
Note that the comment # Added by hairpin-proxy
is used to prevent hairpin-proxy-controller from modifying any other rewrites you may have.
Step 3: Confirm that your DNS has propagated and that HTTP now works from containers in your cluster
kubectl run my-test-container --image=alpine -it --rm -- /bin/sh
# In the container shell:
apk add bind-tools curl
dig subdomain.example.com
dig hairpin-proxy.hairpin-proxy.svc.cluster.local
curl http://subdomain.example.com/
This time, the first dig
should show an internal service IP address (generally 10.x.y.z
), matching the second dig
. This time, the curl
should succeed.
NOTE: CoreDNS is a cache, so even if you see the rewrite
rules in Step 2, it will take another minute or two before the queries resolve correctly. Be patient. You may wish to watch -n 1 dig subdomain.example.com
to see when this changeover happens.
At this point, cert-manager's self-check will pass, and you'll get valid LetsEncrypt certificates within a few minutes.
Note that the CoreDNS rewrites above only cover access within containers, while the iptables rewrite applies to the Node itself. This mismatch causes a problem if your node itself needs to access something behind your ingress. An example is if you're hosting your own container registry with trow and it's behind the ingress. If you follow only steps 1-3 above, you'll experience image pull failures because the Docker daemon (running on the Node directly, not in a container) can't access your registry.
To resolve this, we need to rewrite the DNS on the Node itself. The Node does not use CoreDNS, so we can instead rewrite /etc/hosts
to point to the IP address of the hairpin-proxy-haproxy
service. This runs as a DaemonSet, so that it can modify each Node's copy of /etc/hosts
.
To install this DaemonSet:
kubectl apply -f https://raw.githubusercontent.com/q-m/hairpin-proxy/v0.5.2/deploy-etchosts-daemonset.yml
untested with this fork