Skip to content

Secret controller does not handle TLS disablement correctly #808

@shreyas-s-rao

Description

@shreyas-s-rao

How to categorize this issue?

/area quality
/kind bug

What happened:

Secret controller does not correctly handle the case where TLS was previously enabled for an Etcd resource (wither etcd client TLS or peer TLS, or etcd-backup-restore TLS), and TLS is then removed from the Etcd spec, but is not yet reconciled by etcd-druid. In such a case, secret controller simply removes the finalizer from the previously referenced secrets since they are no longer present/referenced by any Etcd resource spec, but are still being mounted/used by the etcd statefulset, until the time the Etcd resource is reconciled by druid. This leaves the etcd cluster in a vulnerable state, especially when druid is configured with auto reconciliation disabled.

How to reproduce it (as minimally and precisely as possible):

  • Run druid with auto reconciliation disabled
  • Deploy and Etcd resource with any of the three TLS configs enabled (etcd client TLS, etcd peer TLS or etcd-backup-restore TLS)
  • Wait for, or trigger, reconciliation by druid
  • Remove the TLS config from the Etcd resource spec
  • Observe from druid logs as well as the TLS secrets that secret controller removes the finalizer from the TLS secrets, but they are still used by the etcd cluster (statefulset)
  • Delete any of the TLS secrets for which finalizer was removed
  • Restart any of the etcd pods

This can possibly lead to a quorum loss if more than one pod fail or get rescheduled for any reason.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/qualityOutput qualification (tests, checks, scans, automation in general, etc.) relatedkind/bugBuglifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions