How to categorize this issue?
/area quality
/kind bug
What happened:
Secret controller does not correctly handle the case where TLS was previously enabled for an Etcd resource (wither etcd client TLS or peer TLS, or etcd-backup-restore TLS), and TLS is then removed from the Etcd spec, but is not yet reconciled by etcd-druid. In such a case, secret controller simply removes the finalizer from the previously referenced secrets since they are no longer present/referenced by any Etcd resource spec, but are still being mounted/used by the etcd statefulset, until the time the Etcd resource is reconciled by druid. This leaves the etcd cluster in a vulnerable state, especially when druid is configured with auto reconciliation disabled.
How to reproduce it (as minimally and precisely as possible):
- Run druid with auto reconciliation disabled
- Deploy and Etcd resource with any of the three TLS configs enabled (etcd client TLS, etcd peer TLS or etcd-backup-restore TLS)
- Wait for, or trigger, reconciliation by druid
- Remove the TLS config from the Etcd resource spec
- Observe from druid logs as well as the TLS secrets that secret controller removes the finalizer from the TLS secrets, but they are still used by the etcd cluster (statefulset)
- Delete any of the TLS secrets for which finalizer was removed
- Restart any of the etcd pods
This can possibly lead to a quorum loss if more than one pod fail or get rescheduled for any reason.
How to categorize this issue?
/area quality
/kind bug
What happened:
Secret controller does not correctly handle the case where TLS was previously enabled for an Etcd resource (wither etcd client TLS or peer TLS, or etcd-backup-restore TLS), and TLS is then removed from the Etcd spec, but is not yet reconciled by etcd-druid. In such a case, secret controller simply removes the finalizer from the previously referenced secrets since they are no longer present/referenced by any Etcd resource spec, but are still being mounted/used by the etcd statefulset, until the time the Etcd resource is reconciled by druid. This leaves the etcd cluster in a vulnerable state, especially when druid is configured with auto reconciliation disabled.
How to reproduce it (as minimally and precisely as possible):
This can possibly lead to a quorum loss if more than one pod fail or get rescheduled for any reason.