Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS for DRBD Replication doesn't work #614

Open
retinio opened this issue Mar 2, 2024 · 4 comments
Open

TLS for DRBD Replication doesn't work #614

retinio opened this issue Mar 2, 2024 · 4 comments
Labels
documentation Improvements or additions to documentation

Comments

@retinio
Copy link

retinio commented Mar 2, 2024

Hi!
I try to configure tls for DRBD by this manual
TLS for internal traffic is enable:

kind: LinstorSatelliteConfiguration
spec:
  internalTLS:
     tlsHandshakeDaemon: true
     secretName: linstor-satellite-internal-tls

λ kubectl exec -n linstor deploy/linstor-controller -- linstor node list
+---------------------------------------------------------------+
| Node | NodeType | Addresses | State |
|======================================|
| worker-01 | SATELLITE | 192.168.160.20:3367 (SSL) | Online |
| worker-02 | SATELLITE | 192.168.160.21:3367 (SSL) | Online |
| worker-03 | SATELLITE | 192.168.160.22:3367 (SSL) | Online |
+---------------------------------------------------------------+

But drdb doesn't connect to each other
λ kubectl exec -n linstor deploy/linstor-controller -- linstor r l
+-------------------------------------------------------------------------------------------------------------------------------------------+
| ResourceName | Node | Port | Usage | Conns | State | |
|===================================================================================
| pvc-4973e04e-44cf-49fe-9094-98dfbfda10d5 | worker-01 | 7000 | Unused | StandAlone(worker-03,worker-02) | UpToDate |
| pvc-4973e04e-44cf-49fe-9094-98dfbfda10d5 | worker-02 | 7000 | Unused | StandAlone(worker-03,worker-01) | TieBreaker |
| pvc-4973e04e-44cf-49fe-9094-98dfbfda10d5 | worker-03 | 7000 | InUse | StandAlone(worker-01,worker-02) | UpToDate |
+-------------------------------------------------------------------------------------------------------------------------------------------+
ktls-utils containers have errors:
λ kubectl -n linstor logs -l app.kubernetes.io/component=linstor-satellite -c ktls-utils

tlshd[29]: gnutls: The TLS connection was non-properly terminated. (-110)
tlshd[29]: Handshake with 'worker-02' (192.168.160.21) failed
tlshd[34]: gnutls: Error in the certificate. (-43)
tlshd[34]: Handshake with 'worker-03' (192.168.160.22) failed
tlshd[32]: gnutls: Error in the certificate. (-43)
tlshd[32]: Handshake with 'worker-02' (192.168.160.21) failed
tlshd[33]: gnutls: The TLS connection was non-properly terminated. (-110)
tlshd[33]: Handshake with 'worker-02' (192.168.160.21) failed
tlshd[35]: gnutls: The TLS connection was non-properly terminated. (-110)
tlshd[35]: Handshake with 'worker-03' (192.168.160.22) failed
tlshd[28]: gnutls: The TLS connection was non-properly terminated. (-110)
tlshd[28]: Handshake with 'worker-01' (192.168.160.20) failed
tlshd[32]: gnutls: Error in the certificate. (-43)
tlshd[32]: Handshake with 'worker-03' (192.168.160.22) failed

Piraeus Operator : 2.4.0
Host operating system: Almalinux 9 5.14.0-362.18.1.el9_3.x86_64
DRBD: version: 9.2.7 (api:2/proto:86-122)

@retinio
Copy link
Author

retinio commented Mar 3, 2024

I have enabled log in tlshd.conf

[debug]
loglevel=1
tls=1
nl=1

and I have got extended logs
λ kubectl -n linstor logs linstor-satellite.worker-01-7wgmg -c ktls-utils

tlshd[7]: Built from ktls-utils 0.10 on Oct  4 2023 07:26:06
tlshd[7]: x.509 priority string: SECURE256:+SECURE128:-COMP-ALL:-VERS-ALL:+VERS-TLS1.3:%NO_TICKETS:-CIPHER-ALL:+AES-256-GCM:+CHACHA20-POLY1305:+AES-128-GCM:+AES-128-CCM
tlshd[7]: PSK priority string: SECURE256:+SECURE128:-COMP-ALL:-VERS-ALL:+VERS-TLS1.3:%NO_TICKETS:-CIPHER-ALL:+AES-256-GCM:+CHACHA20-POLY1305:+AES-128-GCM:+AES-128-CCM:+PSK:+DHE-PSK:+ECDHE-PSK
tlshd[9]: Querying the handshake service
tlshd[8]: Querying the handshake service
tlshd[9]: Parsing a valid netlink message
tlshd[9]: No peer identities found
tlshd[9]: No certificates found
tlshd[9]: System config file: /etc/gnutls/config
tlshd[8]: Parsing a valid netlink message
tlshd[9]: Client x.509 truststore is /etc/tlshd.d/ca.crt
tlshd[8]: No peer identities found
tlshd[8]: No certificates found
tlshd[8]: System config file: /etc/gnutls/config
tlshd[8]: Client x.509 truststore is /etc/tlshd.d/ca.crt
tlshd[11]: Querying the handshake service
tlshd[11]: Parsing a valid netlink message
tlshd[8]: System trust: Loaded 1 certificate(s).
tlshd[11]: No peer identities found
tlshd[11]: No certificates found
tlshd[11]: System config file: /etc/gnutls/config
tlshd[11]: Server x.509 truststore is /etc/tlshd.d/ca.crt
tlshd[8]: Retrieved x.509 certificate from /etc/tlshd.d/tls.crt
tlshd[11]: System trust: Loaded 1 certificate(s).
tlshd[8]: Retrieved private key from /etc/tlshd.d/tls.key
tlshd[11]: Retrieved x.509 certificate from /etc/tlshd.d/tls.crt
tlshd[10]: Querying the handshake service
tlshd[10]: Parsing a valid netlink message
tlshd[10]: No peer identities found
tlshd[10]: No certificates found
tlshd[10]: System config file: /etc/gnutls/config
tlshd[10]: Server x.509 truststore is /etc/tlshd.d/ca.crt
tlshd[10]: System trust: Loaded 1 certificate(s).
tlshd[10]: Retrieved x.509 certificate from /etc/tlshd.d/tls.crt
tlshd[10]: Retrieved private key from /etc/tlshd.d/tls.key
tlshd[8]: Server's trusted authorities:
tlshd[9]: System trust: Loaded 1 certificate(s).
tlshd[8]:    [0]: CN=linstor-internal-ca
tlshd[11]: Retrieved private key from /etc/tlshd.d/tls.key
tlshd[8]: The certificate is NOT trusted. The name in the certificate does not match the expected.
tlshd[8]: gnutls: Error in the certificate. (-43)
tlshd[8]: Handshake with 'worker-03' (192.168.160.22) failed
DBG<1>././lib/cache_mngt.c:302  nl_cache_mngt_unregister: Unregistered cache operations genl/family
tlshd[9]: Retrieved x.509 certificate from /etc/tlshd.d/tls.crt
tlshd[9]: Retrieved private key from /etc/tlshd.d/tls.key
tlshd[9]: Server's trusted authorities:
tlshd[9]:    [0]: CN=linstor-internal-ca
tlshd[9]: The certificate is NOT trusted. The name in the certificate does not match the expected.
tlshd[9]: gnutls: Error in the certificate. (-43)
tlshd[9]: Handshake with 'worker-02' (192.168.160.21) failed
DBG<1>././lib/cache_mngt.c:302  nl_cache_mngt_unregister: Unregistered cache operations genl/family
tlshd[10]: gnutls: The TLS connection was non-properly terminated. (-110)
tlshd[11]: gnutls: The TLS connection was non-properly terminated. (-110)
tlshd[10]: Handshake with 'worker-03' (192.168.160.22) failed
tlshd[11]: Handshake with 'worker-02' (192.168.160.21) failed
DBG<1>././lib/cache_mngt.c:302  nl_cache_mngt_unregister: Unregistered cache operations genl/family
DBG<1>././lib/cache_mngt.c:302  nl_cache_mngt_unregister: Unregistered cache operations genl/family

λ kubectl -n linstor get secret linstor-satellite-internal-tls -o jsonpath="{.data['tls.crt']}" | base64 -d > tls.crt
λ kubectl -n linstor get secret linstor-satellite-internal-tls -o jsonpath="{.data['ca.crt']}" | base64 -d > ca.crt
λ openssl verify -CAfile ca.crt tls.crt
tls.crt: OK

@WanzenBug
Copy link
Member

Looks like you used the "openssl" method from here to create those certificates?

If so, the issue is that those certificates only set a generic common name:

openssl req -new -sha256 -key satellite.key -subj "/CN=linstor-satellite" -out satellite.csr

So with strict validation, this certificate is only valid for some entity named linstor-satellite. For LINSTOR itself this is fine, as we don't do strict hostname validation there, but for tlshd, it means that when it sees a DRBD connection for worker-01, but gets a certificate for linstor-satellite it simply fails the validation.

You either need to manually add all the node names to the alternative names in the certificates:

openssl req -new -sha256 -key satellite.key -subj "/CN=linstor-satellite" -out satellite.csr -addext "subjectAltName = DNS:linstor-satellite,DNS:worker-01,DNS:worker-02,DNS:worker-03"
openssl x509 -req -in satellite.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out satellite.crt -days 3650 -sha256 -copy_extensions copy

Or you use cert-manager and get that all automatically 😄

@WanzenBug WanzenBug added the documentation Improvements or additions to documentation label Mar 4, 2024
@retinio
Copy link
Author

retinio commented Mar 4, 2024

@WanzenBug Thank you sooo much! Everything worked out.
You might be interested. If I use the newest version of ktls-utils (0.10-6), the connection error still persists.
λ kubectl -n linstor logs -l app.kubernetes.io/component=linstor-satellite -c ktls-utils

tlshd[12]: No peer identities found
tlshd[12]: No certificates found
tlshd[12]: System config file: /etc/gnutls/config
tlshd[12]: Server x.509 truststore is /etc/tlshd.d/ca.crt
tlshd[12]: System trust: Loaded 1 certificate(s).
tlshd[12]: Retrieved x.509 certificate from /etc/tlshd.d/tls.crt
tlshd[12]: Retrieved private key from /etc/tlshd.d/tls.key
tlshd[11]: System trust: Loaded 140 certificate(s).
tlshd[11]: Handshake with 'worker-02' (10.0.4.171) failed
DBG<1>././lib/cache_mngt.c:302  nl_cache_mngt_unregister: Unregistered cache operations genl/family
tlshd[11]: Server x.509 truststore is /etc/tlshd.d/ca.crt
tlshd[11]: System trust: Loaded 1 certificate(s).
tlshd[11]: Retrieved x.509 certificate from /etc/tlshd.d/tls.crt
tlshd[11]: Retrieved private key from /etc/tlshd.d/tls.key
tlshd[9]: System trust: Loaded 140 certificate(s).
tlshd[9]: Handshake with 'worker-01' (10.0.3.154) failed
DBG<1>././lib/cache_mngt.c:302  nl_cache_mngt_unregister: Unregistered cache operations genl/family
tlshd[10]: System trust: Loaded 140 certificate(s).
tlshd[10]: Handshake with 'worker-03' (10.0.5.212) failed
DBG<1>././lib/cache_mngt.c:302  nl_cache_mngt_unregister: Unregistered cache operations genl/family
tlshd[10]: Retrieved x.509 certificate from /etc/tlshd.d/tls.crt
tlshd[10]: Retrieved private key from /etc/tlshd.d/tls.key
tlshd[11]: Querying the handshake service
tlshd[11]: Parsing a valid netlink message
tlshd[11]: No peer identities found
tlshd[11]: No certificates found
tlshd[11]: System config file: /etc/gnutls/config
tlshd[11]: System trust: Loaded 140 certificate(s).
tlshd[11]: Handshake with 'worker-02' (10.0.4.171) failed
DBG<1>././lib/cache_mngt.c:302  nl_cache_mngt_unregister: Unregistered cache operations genl/family

@WanzenBug
Copy link
Member

I'm wondering why it would try to load the system trust store:

tlshd[11]: System trust: Loaded 140 certificate(s).

But sometimes it loads the right certificates instead:

tlshd[12]: Server x.509 truststore is /etc/tlshd.d/ca.crt
tlshd[12]: System trust: Loaded 1 certificate(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants