You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
May 23 00:00:01 monitoring.infra.rust-lang.org systemd[1]: Starting Renew SSL certificates...
May 23 00:00:03 monitoring.infra.rust-lang.org renew-ssl-certs[2866]: 2022/05/23 00:00:03 [INFO] [grafana.rust-lang.org] acme: Trying renewal with 647 hours remaining
May 23 00:00:03 monitoring.infra.rust-lang.org renew-ssl-certs[2866]: 2022/05/23 00:00:03 [INFO] [grafana.rust-lang.org] acme: Obtaining bundled SAN certificate
May 23 00:00:09 monitoring.infra.rust-lang.org renew-ssl-certs[2866]: 2022/05/23 00:00:09 acme: error: 500 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:serverInternal :: Error creating new order, url:
May 23 00:00:09 monitoring.infra.rust-lang.org systemd[1]: renew-ssl-certs.service: Main process exited, code=exited, status=1/FAILURE
May 23 00:00:09 monitoring.infra.rust-lang.org systemd[1]: renew-ssl-certs.service: Failed with result 'exit-code'.
May 23 00:00:09 monitoring.infra.rust-lang.org systemd[1]: Failed to start Renew SSL certificates.
My guess is this is an upstream spurious failure of some kind -- presumably one-off and resolvable by retrying. Let's Encrypt doesn't note any problems on their status page at this time:
So my best guess is low levels of fuzz in their availability; we likely can fix this by retrying on our side at the systemd layer or within the executed script.
The
renew-ssl-certs.service
fails periodically, which triggers the following alert in Grafana:The fix for this alert is to restart the service manually:
We should investigate the reason why the service fails in the first place and retry automatically.
The text was updated successfully, but these errors were encountered: