Skip to content

Windows R: Nexus proxy broken by LetsEncrypt intermediate CA rotation #4896

@TonyWildish-BH

Description

@TonyWildish-BH

Windows R: Nexus proxy broken by LetsEncrypt intermediate CA rotation

Symptom

R on Windows workspace VMs cannot install packages via the Nexus proxy. install.packages() fails with:

Warning: unable to access index for repository https://nexus-<tre-id>.uksouth.cloudapp.azure.com/repository/r-proxy/...
cannot open URL '...PACKAGES'
package 'X' is not available for this version of R

R on Linux workspace VMs is unaffected. The Nexus proxy URL, credentials, and network path are all correct.

Root cause

Why Windows fails

Windows' Schannel TLS stack performs certificate revocation checking (CRL/OCSP) for every cert in the chain when making an HTTPS connection. If it cannot reach the revocation endpoints, it aborts the TLS handshake with CRYPT_E_REVOCATION_OFFLINE, which R's libcurl reports as "SSL connect error".

R on Linux uses OpenSSL, which does not check revocation by default. This is why the same Nexus proxy URL works on Linux and fails on Windows.

Why the revocation check fails

The workspace_vm_allowed_fqdns firewall rule in the Nexus shared service (templates/shared_services/sonatype-nexus-vm/terraform/locals.tf) contained only:

r3.o.lencr.org    # OCSP for LetsEncrypt R3 (RSA) intermediate
x1.c.lencr.org    # CRL for ISRG Root X1 (intermediate revocation)

These were correct when the App Gateway certificate was signed by the R3 RSA intermediate. The certificate is now signed by the E8 ECDSA intermediate, which has different revocation endpoints:

e8.o.lencr.org    # OCSP for leaf cert (E8 is issuer)
e8.i.lencr.org    # CA Issuers / AIA (download E8 intermediate if not cached)
e8.c.lencr.org    # CRL for leaf cert (E8 is issuer)

None of the e8.* endpoints were in the firewall allow-list, so all revocation checks timed out.

Why the intermediate changed

The LetsEncrypt certificate is issued by certbot inside the certs shared service Porter bundle (templates/shared_services/certs/Dockerfile.tmpl):

/opt/certbot/bin/pip install --no-cache-dir certbot   # no version pin

certbot 2.3+ (released late 2023) changed the default key type for new certificate requests from RSA to ECDSA. Because certbot is unpinned, a rebuild of the cert shared service pulled a newer certbot, which generated an ECDSA private key on the next cert issuance or renewal. LetsEncrypt responded by signing with the E8 ECDSA intermediate instead of R3. The firewall rules silently became wrong.

The letsencrypt.sh script (templates/shared_services/certs/scripts/letsencrypt.sh) also does not specify --key-type, leaving the key type entirely at certbot's discretion:

/opt/certbot/bin/certbot certonly \
    --manual \
    --preferred-challenges=http \
    ...
    # no --key-type flag

Fix

1. Update workspace_vm_allowed_fqdns

templates/shared_services/sonatype-nexus-vm/terraform/locals.tf:

workspace_vm_allowed_fqdns = "r3.o.lencr.org,x1.c.lencr.org,e8.o.lencr.org,e8.i.lencr.org,e8.c.lencr.org"

r3.o.lencr.org is retained for deployments whose cert was issued under R3. All three e8.* endpoints must be reachable over HTTP port 80 — OCSP and CRL both use plain HTTP.

2. Pin certbot and make key type explicit (prevents recurrence)

templates/shared_services/certs/Dockerfile.tmpl — pin certbot to a known version:

&& /opt/certbot/bin/pip install --no-cache-dir certbot==2.11.0

templates/shared_services/certs/scripts/letsencrypt.sh — make the key type explicit so behaviour does not change silently on a certbot upgrade:

/opt/certbot/bin/certbot certonly \
    --key-type ecdsa \
    --elliptic-curve secp384r1 \
    ...

Pinning to ECDSA (rather than reverting to RSA) is preferred because ECDSA certs are smaller and faster, and the firewall rules now cover it. The important thing is that the key type is declared, not inferred from certbot's default.

Broader note

The workspace_vm_allowed_fqdns list must track whichever LetsEncrypt intermediate CA signs the App Gateway cert. Because LetsEncrypt periodically introduces new intermediates, this list will need updating again if the intermediate changes in future. Making the key type explicit in letsencrypt.sh ensures the intermediate is stable and predictable going forward.

This bug is independent of the TRE version, it only depends on when the cert service was last rebuilt.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions