Skip to content

Conversation

volodimyr
Copy link
Contributor

@volodimyr volodimyr commented May 27, 2025

Summary

#1117
Enhanced certificate monitoring to check the entire certificate chain (leaf, intermediate, and root certificates) instead of just the leaf certificate.
Added new metrics gatus_results_certificate_chain_expiration_seconds with subject and issuer labels to track expiration times for each certificate in the chain.
Updated the endpoint evaluation logic in evaluateSTARTTLS and evaluateTLS functions to store and expose the full certificate chain information, maintaining backward compatibility with existing [CERTIFICATE_EXPIRATION] placeholder.

# HELP gatus_results_certificate_chain_expiration_seconds Number of seconds until each certificate in the chain expires
# TYPE gatus_results_certificate_chain_expiration_seconds gauge
gatus_results_certificate_chain_expiration_seconds{group="",issuer="CN=DigiCert Global G2 TLS RSA SHA256 2020 CA1,O=DigiCert Inc,C=US",key="_cloudflare-tls-chain",name="cloudflare-tls-chain",subject="CN=cloudflare-dns.com,O=Cloudflare\\, Inc.,L=San Francisco,ST=California,C=US",type="TLS"} 2.06647973355771e+07
gatus_results_certificate_chain_expiration_seconds{group="",issuer="CN=DigiCert Global Root G2,OU=www.digicert.com,O=DigiCert Inc,C=US",key="_cloudflare-tls-chain",name="cloudflare-tls-chain",subject="CN=DigiCert Global G2 TLS RSA SHA256 2020 CA1,O=DigiCert Inc,C=US",type="TLS"} 1.842199973355771e+08
gatus_results_certificate_chain_expiration_seconds{group="",issuer="CN=DigiCert Global Root G2,OU=www.digicert.com,O=DigiCert Inc,C=US",key="_cloudflare-tls-chain",name="cloudflare-tls-chain",subject="CN=DigiCert Global Root G2,OU=www.digicert.com,O=DigiCert Inc,C=US",type="TLS"} 3.987943983355771e+08
gatus_results_certificate_chain_expiration_seconds{group="",issuer="CN=GTS Root R4,O=Google Trust Services LLC,C=US",key="_gmail-starttls-chain",name="gmail-starttls-chain",subject="CN=WE2,O=Google Trust Services,C=US",type="STARTTLS"} 1.179151988945293e+08
gatus_results_certificate_chain_expiration_seconds{group="",issuer="CN=GlobalSign Root CA,OU=Root CA,O=GlobalSign nv-sa,C=BE",key="_gmail-starttls-chain",name="gmail-starttls-chain",subject="CN=GTS Root R4,O=Google Trust Services LLC,C=US",type="STARTTLS"} 8.42552408945293e+07
gatus_results_certificate_chain_expiration_seconds{group="",issuer="CN=WE2,O=Google Trust Services,C=US",key="_gmail-starttls-chain",name="gmail-starttls-chain",subject="CN=smtp.gmail.com",type="STARTTLS"} 5.9218428945293e+06
# HELP gatus_results_certificate_expiration_seconds Number of seconds until the certificate expires
# TYPE gatus_results_certificate_expiration_seconds gauge
gatus_results_certificate_expiration_seconds{group="",key="_cloudflare-tls-chain",name="cloudflare-tls-chain",type="TLS"} 2.06647973355771e+07
gatus_results_certificate_expiration_seconds{group="",key="_gmail-starttls-chain",name="gmail-starttls-chain",type="STARTTLS"} 5.9218428945293e+06
gatus_results_certificate_expiration_seconds{group="",key="_google-cert-chain",name="google-cert-chain",type="HTTP"} 4.8374085797242e+06
# HELP gatus_results_code_total Total number of results by code
# TYPE gatus_results_code_total counter
gatus_results_code_total{code="200",group="",key="_google-cert-chain",name="google-cert-chain",type="HTTP"} 1
# HELP gatus_results_connected_total Total number of results in which a connection was successfully established
# TYPE gatus_results_connected_total counter
gatus_results_connected_total{group="",key="_cloudflare-tls-chain",name="cloudflare-tls-chain",type="TLS"} 1
gatus_results_connected_total{group="",key="_gmail-starttls-chain",name="gmail-starttls-chain",type="STARTTLS"} 1
gatus_results_connected_total{group="",key="_google-cert-chain",name="google-cert-chain",type="HTTP"} 1
# HELP gatus_results_duration_seconds Duration of the request in seconds
# TYPE gatus_results_duration_seconds gauge
gatus_results_duration_seconds{group="",key="_cloudflare-tls-chain",name="cloudflare-tls-chain",type="TLS"} 0.0214952
gatus_results_duration_seconds{group="",key="_gmail-starttls-chain",name="gmail-starttls-chain",type="STARTTLS"} 0.2408066
gatus_results_duration_seconds{group="",key="_google-cert-chain",name="google-cert-chain",type="HTTP"} 0.3330691
# HELP gatus_results_endpoint_success Displays whether or not the endpoint was a success
# TYPE gatus_results_endpoint_success gauge
gatus_results_endpoint_success{group="",key="_cloudflare-tls-chain",name="cloudflare-tls-chain",type="TLS"} 1
gatus_results_endpoint_success{group="",key="_gmail-starttls-chain",name="gmail-starttls-chain",type="STARTTLS"} 1
gatus_results_endpoint_success{group="",key="_google-cert-chain",name="google-cert-chain",type="HTTP"} 1
# HELP gatus_results_total Number of results per endpoint
# TYPE gatus_results_total counter
gatus_results_total{group="",key="_cloudflare-tls-chain",name="cloudflare-tls-chain",success="true",type="TLS"} 1
gatus_results_total{group="",key="_gmail-starttls-chain",name="gmail-starttls-chain",success="true",type="STARTTLS"} 1
gatus_results_total{group="",key="_google-cert-chain",name="google-cert-chain",success="true",type="HTTP"} 1

Checklist

  • Tested and/or added tests to validate that the changes work as intended, if applicable.
  • Updated documentation in README.md, if applicable.

Copy link
Owner

@TwiN TwiN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the root certificate in the chain expires in, for instance, 5 days, but the child certificate expires in 30 days, but the root certificate authority isn't owned by the user using Gatus to monitor his environment, does it make sense to alert based on the expiration of the root certificate? Maybe they automatically renew a day before it expires.

(This isn't a rhetorical question, feel free to let me know what you think. I'm just trying to figure out if this change in behavior may upset some users)

client/client.go Outdated
hostAndPort := strings.Split(address, ":")
if len(hostAndPort) != 2 {
return false, nil, errors.New("invalid address for starttls, format must be host:port")
return CertificateChainInfo{Error: errors.New("invalid address for starttls, format must be host:port")}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how I feel about not passing the error as a separate return value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Looks weird.

@volodimyr
Copy link
Contributor Author

If the root certificate in the chain expires in, for instance, 5 days, but the child certificate expires in 30 days, but the root certificate authority isn't owned by the user using Gatus to monitor his environment, does it make sense to alert based on the expiration of the root certificate? Maybe they automatically renew a day before it expires.

(This isn't a rhetorical question, feel free to let me know what you think. I'm just trying to figure out if this change in behavior may upset some users)

That's a really good question. It would confuse customers indeed. Especially getting alerts on (root) certificate expiration when you don't control the CA.
However, having full control over a tool you're using is most important. In addition I would prefer knowing about the risk by default in full picture instead of negate the risk implicitly and silently.
My suggested approach is:

  1. Make root certificate expiration alert configurable (allow users to opt-out if desired).
  2. Document about this config.

@atc0005
Copy link

atc0005 commented May 28, 2025

If the root certificate in the chain expires in, for instance, 5 days, but the child certificate expires in 30 days, but the root certificate authority isn't owned by the user using Gatus to monitor his environment, does it make sense to alert based on the expiration of the root certificate?

FWIW, I've seen various older Linux distros handle expiring root certificates explicitly included in a certificate chain rather poorly (e.g., AddTrust External CA Root expiration in 2020).

At that time I supported several systems that were dependent on a load-balanced LDAP service managed by another team. From what I remember the cert chain for that service included the leaf certificate (still valid), intermediate certificates (one valid, one expired) and a root certificate (also expired).

Existing monitoring checks (applied by another product) passed since the check evaluated only the leaf certificate. Real world clients failed to connect.

If the root certificate is not explicitly included in a cert chain, then alerting on that expiration (pending or passed) is probably not that useful (particularly where cross-signed intermediates are used).

Reference link (covers the incident well):

@TwiN
Copy link
Owner

TwiN commented May 29, 2025

In light of that, I think that what I'd want is a root-level configuration such as disable-full-chain-certificate-expiration-check.

By default, Gatus should check the full certificate chain, which means this variable would default to false, but if a user only wants to check the leaf certificate, then they can set that configuration to true.

P.S. If you can think of a better name for the configuration, please feel free to suggest it.

@volodimyr
Copy link
Contributor Author

In light of that, I think that what I'd want is a root-level configuration such as disable-full-chain-certificate-expiration-check.

By default, Gatus should check the full certificate chain, which means this variable would default to false, but if a user only wants to check the leaf certificate, then they can set that configuration to true.

P.S. If you can think of a better name for the configuration, please feel free to suggest it.

Finalised the work and tested it. Looks good.
Hope I didn't mess up any previous logic. Though I did find few mistakes today.

@volodimyr
Copy link
Contributor Author

metrics: true  # Enable Prometheus metrics endpoint

endpoints:
  - name: upwork-cert-leaf-only
    url: https://www.upwork.com/
    interval: 1m
    client:
      disable-full-chain-certificate-expiration-check: true
    conditions:
      - "[CERTIFICATE_EXPIRATION] > 48h"
      - "[CONNECTED] == true"

  - name: gmail-starttls-leaf-only
    url: "starttls://smtp.gmail.com:587"
    client:
      disable-full-chain-certificate-expiration-check: true
    interval: 1m
    conditions:
      - "[CONNECTED] == true"
      - "[CERTIFICATE_EXPIRATION] > 48h"

  - name: cloudflare-tls-leaf-only
    url: "tls://1.1.1.1:853"
    client:
      disable-full-chain-certificate-expiration-check: true
    interval: 1m
    conditions:
      - "[CONNECTED] == true"
      - "[CERTIFICATE_EXPIRATION] > 48h"

Here is a config I used to test.

t.Error("expected true")
if rtt == 0 {
t.Error("Round-trip time returned on failure should've been 0")
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails on Windows system. So might be a good idea to skip?

@Exagone313
Copy link
Contributor

My two cents about this: would it be possible to keep the current behavior as default and only enable full-chain certificate check on demand?

@TwiN
Copy link
Owner

TwiN commented Aug 2, 2025

I've got another idea instead. We could implement a new placeholder, [CERTIFICATE_CHAIN_EXPIRATION].
That would allow people to check for whatever they wish. Open to other names - perhaps [CERTIFICATE_FULL_CHAIN_EXPIRATION] would make more sense, though it's quite lengthy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants