Skip to content

Commit

Permalink
Update etcd-druid documentation and enhance kind-up script to start a…
Browse files Browse the repository at this point in the history
… local kind registry container (#889)

Improved etcd-druid documentation and enhanced hack/kind-up.sh script
  • Loading branch information
unmarshall authored Oct 25, 2024
1 parent ac9753e commit df3ff21
Show file tree
Hide file tree
Showing 48 changed files with 1,594 additions and 615 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
/args
/bin
/hack/tools/bin
/hack/kind/*
/.kube-secrets
/tmp/*
/dev
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
Please refer to the [Gardener contributor guide](https://gardener.cloud/docs/contribute).
Please refer to the [etcd-druid contributor guide](docs/development/contribution.md).
10 changes: 5 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ PLATFORM ?= $(shell docker info --format '{{.OSType}}/{{.Architecture
BUILD_DIR := build
PROVIDERS := ""
BUCKET_NAME := "e2e-test"
KUBECONFIG_PATH := $(HACK_DIR)/e2e-test/infrastructure/kind/kubeconfig
TEST_COVER := "true"
IMG ?= ${IMAGE_REPOSITORY}:${IMAGE_BUILD_TAG}
TEST_COVER := "true"
KUBECONFIG_PATH := $(HACK_DIR)/kind/kubeconfig

# Tools
# -------------------------------------------------------------------------
Expand Down Expand Up @@ -123,7 +123,7 @@ test-e2e: $(KUBECTL) $(HELM) $(SKAFFOLD) $(KUSTOMIZE)
@VERSION=$(VERSION) GIT_SHA=$(GIT_SHA) $(HACK_DIR)/e2e-test/run-e2e-test.sh $(PROVIDERS)

.PHONY: ci-e2e-kind
ci-e2e-kind: $(GINKGO)
ci-e2e-kind: $(GINKGO) $(YQ) $(KIND)
@BUCKET_NAME=$(BUCKET_NAME) $(HACK_DIR)/ci-e2e-kind.sh

.PHONY: ci-e2e-kind-azure
Expand Down Expand Up @@ -165,12 +165,12 @@ kind-up kind-down ci-e2e-kind ci-e2e-kind-azure deploy-localstack deploy-azurite

.PHONY: kind-up
kind-up: $(KIND)
@printf "\n\033[0;33m📌 NOTE: To target the newly created KinD cluster, please run the following command:\n\n export KUBECONFIG=$(KUBECONFIG_PATH)\n\033[0m\n"
@$(HACK_DIR)/kind-up.sh
@printf "\n\033[0;33m📌 NOTE: To target the newly created KinD cluster, please run the following command:\n\n export KUBECONFIG=$(KUBECONFIG_PATH)\n\033[0m\n"

.PHONY: kind-down
kind-down: $(KIND)
$(KIND) delete cluster --name etcd-druid-e2e
@$(HACK_DIR)/kind-down.sh

# Install CRDs into a cluster
.PHONY: install
Expand Down
9 changes: 3 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
# etcd-druid

<image src="logo/etcd-druid-logo.png" style="width:300px"></image>
<img src="docs/assets/logo/etcd-druid-with-tagline.png" style="width:120%"></img>

[![REUSE status](https://api.reuse.software/badge/github.com/gardener/etcd-druid)](https://api.reuse.software/info/github.com/gardener/etcd-druid) [![CI Build status](https://concourse.ci.gardener.cloud/api/v1/teams/gardener/pipelines/etcd-druid-master/jobs/master-head-update-job/badge)](https://concourse.ci.gardener.cloud/teams/gardener/pipelines/etcd-druid-master/jobs/master-head-update-job) [![Go Report Card](https://goreportcard.com/badge/github.com/gardener/etcd-druid)](https://goreportcard.com/report/github.com/gardener/etcd-druid) [![License: Apache-2.0](https://img.shields.io/badge/License-Apache--2.0-blue.svg)](LICENSE) [![Release](https://img.shields.io/github/v/release/gardener/etcd-druid.svg?style=flat)](https://github.com/gardener/etcd-druid) [![Go Reference](https://pkg.go.dev/badge/github.com/gardener/etcd-druid.svg)](https://pkg.go.dev/github.com/gardener/etcd-druid)

`etcd-druid` is an [etcd](https://github.com/etcd-io/etcd) [operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) which makes it easy to configure, provision, reconcile and monitor etcd clusters. It enables management of an etcd cluster through [declarative Kubernetes API model](config/crd/bases/crd-druid.gardener.cloud_etcds.yaml).
`etcd-druid` is an [etcd](https://github.com/etcd-io/etcd) [operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) which makes it easy to configure, provision, reconcile, monitor and delete etcd clusters. It enables management of etcd clusters through [declarative Kubernetes API model](config/crd/bases/crd-druid.gardener.cloud_etcds.yaml).

In every etcd cluster managed by `etcd-druid`, each etcd member is a two container `Pod` which consists of:

Expand All @@ -23,7 +21,6 @@ In every etcd cluster managed by `etcd-druid`, each etcd member is a two contain
- Offers an asynchronous and threshold based capability to process backed up snapshots to:
- Potentially minimize the recovery time by leveraging restoration from backups followed by [etcd's compaction and defragmentation](https://etcd.io/docs/v3.4/op-guide/maintenance/).
- Indirectly assert integrity of the backed up snaphots.

- Allows seamless copy of backups between any two object store buckets.

## Start using or developing `etcd-druid` locally
Expand All @@ -36,7 +33,7 @@ For detailed documentation, see our `/docs` folder. Please find the [index](docs

## Contributions

If you wish to contribute then please see our [guidelines](https://github.com/gardener/etcd-druid/blob/4e9971aba3c3880a4cb6583d05843eabb8ca1409/CONTRIBUTING.md).
If you wish to contribute then please see our [contributor guidelines](docs/development/contribution.md).

## Feedback and Support

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
45 changes: 26 additions & 19 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,42 @@
# Documentation Index



## Concepts

* [Controllers](concepts/controllers.md)
* [Webhooks](concepts/webhooks.md)
* [Etcd Cluster Components](concepts/etcd-cluster-components.md)
* [Protecting Etcd Cluster Resources](concepts/etcd-cluster-resource-protection.md)

## Development

* [Testing(Unit, Integration and E2E Tests)](development/testing.md)
* [etcd Network Latency](development/etcd-network-latency.md)
* [Getting started locally using azurite emulator](development/getting-started-locally-azurite.md)
* [Getting started locally using localstack emulator](development/getting-started-locally-localstack.md)
* [Prepare Dev Environment](development/prepare-dev-environment.md)
* [Getting started locally](development/getting-started-locally.md)
* [Local End-To-End Tests](development/local-e2e-tests.md)
* [Dependency Management](development/dependency-management.md)
* [Changing the API](development/changing-api.md)
* [Controllers](development/controllers.md)
* [Add a new Etcd Cluster Component](development/add-new-etcd-cluster-component.md)
* [Raising a Pull Request](development/raising-a-pr.md)
* [Testing (Unit, Integration and E2E Tests)](development/testing.md)

## Deployment

* [etcd-druid CLI Flags](deployment/cli-flags.md)
* [Getting started locally](deployment/getting-started-locally/getting-started-locally.md)
* [Configure etcd-druid](deployment/configure-etcd-druid.md)
* [Feature Gates](deployment/feature-gates.md)
* [Recommendations for productive setup](deployment/production-setup-recommendations.md)
* [Version Comptability Matrix](deployment/version-compatibility-matrix.md)

## Monitoring

* [Metrics](monitoring/metrics.md)

## Benchmarks

## Operations
* [etcd Network Latency](benchmark/etcd-network-latency.md)

* [Metrics](operations/metrics.md)
* [Recovery from Permanent Quorum Loss in etcd cluster](operations/recovery-from-permanent-quorum-loss-in-etcd-cluster.md)
* [Restoring single member in a Multi-Node etcd cluster](operations/restoring-single-member-in-multi-node-etcd-cluster.md)
## Usage

* [Managing Etcd Clusters](usage/managing-etcd-clusters.md)
* [Securing Etcd Clusters](usage/securing-etcd-clusters.md)
* [Recovering Etcd Clusters](usage/recovering-etcd-clusters.md)

## Proposals

Expand All @@ -34,8 +45,4 @@
* [DEP-2: Snapshot compaction](proposals/02-snapshot-compaction.md)
* [DEP-3: Scaling up an Etcd cluster](proposals/03-scaling-up-an-etcd-cluster.md)
* [DEP-4: Etcd Member custom resource](proposals/04-etcd-member-custom-resource.md)
* [DEP-5: Etcd Operator Tasks](proposals/05-etcd-operator-tasks.md)

## Usage

* [Supported K8S versions](usage/supported_k8s_versions.md)
* [DEP-5: Etcd Operator Tasks](proposals/05-etcd-operator-tasks.md)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Binary file added docs/assets/logo/etcd-druid-with-tagline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
77 changes: 77 additions & 0 deletions docs/concepts/etcd-cluster-components.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Etcd Cluster Components

For every `Etcd` cluster that is provisioned by `etcd-druid` it deploys a set of resources. Following sections provides information and code reference to each such resource.

## StatefulSet

[StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) is the primary kubernetes resource that gets provisioned for an etcd cluster.

* Replicas for the StatefulSet are derived from `Etcd.Spec.Replicas` in the custom resource.

* Each pod comprises of two containers:
* `etcd-wrapper` : This is the main container which runs an etcd process.

* `etcd-backup-restore` : This is a side-container which does the following:

* Orchestrates the initialization of etcd. This includes validation of any existing etcd data directory, restoration in case of corrupt etcd data directory files for a single-member etcd cluster.
* Periodically renewes member lease.
* Optionally takes schedule and thresold based delta and full snapshots and pushes them to a configured object store.
* Orchestrates scheduled etcd-db defragmentation.

> NOTE: This is not a complete list of functionalities offered out of `etcd-backup-restore`.
**Code reference:** [StatefulSet-Component](https://github.com/gardener/etcd-druid/tree/480213808813c5282b19aff5f3fd6868529e779c/internal/component/statefulset)

> For detailed information on each container you can visit [etcd-wrapper](https://github.com/gardener/etcd-wrapper) and [etcd-backup-restore](https://github.com/gardener/etcd-backup-restore) respositories.
## ConfigMap

Every `etcd` member requires [configuration](https://etcd.io/docs/v3.4/op-guide/configuration/) with which it must be started. `etcd-druid` creates a [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/) which gets mounted onto the `etcd-backup-restore` container. `etcd-backup-restore` container will modify the etcd configuration and serve it to the `etcd-wrapper` container upon request.

**Code reference:** [ConfigMap-Component](https://github.com/gardener/etcd-druid/tree/480213808813c5282b19aff5f3fd6868529e779c/internal/component/configmap)

## PodDisruptionBudget

An etcd cluster requires quorum for all write operations. Clients can additionally configure quorum based reads as well to ensure [linearizable](https://jepsen.io/consistency/models/linearizable) reads (kube-apiserver's etcd client is configured for linearizable reads and writes). In a cluster of size 3, only 1 member failure is tolerated. [Failure tolerance](https://etcd.io/docs/v3.3/faq/#what-is-failure-tolerance) for an etcd cluster with replicas `n` is computed as `(n-1)/2`.

To ensure that etcd pods are not evicted more than its failure tolerance, `etcd-druid` creates a [PodDisruptionBudget](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets).

> **NOTE:** For a single node etcd cluster a `PodDisruptionBudget` will be created, however `pdb.spec.minavailable` is set to 0 effectively disabling it.
**Code reference:** [PodDisruptionBudget-Component](https://github.com/gardener/etcd-druid/tree/480213808813c5282b19aff5f3fd6868529e779c/internal/component/poddistruptionbudget)

## ServiceAccount

`etch-backup-restore` container running as a side-car in every etcd-member, requires permissions to access resources like `Lease`, `StatefulSet` etc. A dedicated [ServiceAccount](https://kubernetes.io/docs/concepts/security/service-accounts/) is created per `Etcd` cluster for this purpose.

**Code reference:** [ServiceAccount-Component](https://github.com/gardener/etcd-druid/tree/3383e0219a6c21c6ef1d5610db964cc3524807c8/internal/component/serviceaccount)

## Role & RoleBinding

`etch-backup-restore` container running as a side-car in every etcd-member, requires permissions to access resources like `Lease`, `StatefulSet` etc. A dedicated [Role]() and [RoleBinding]() is created and linked to the [ServiceAccount](https://kubernetes.io/docs/concepts/security/service-accounts/) created per `Etcd` cluster.

**Code reference:** [Role-Component](https://github.com/gardener/etcd-druid/tree/3383e0219a6c21c6ef1d5610db964cc3524807c8/internal/component/role) & [RoleBinding-Component](https://github.com/gardener/etcd-druid/tree/master/internal/component/rolebinding)

## Client & Peer Service

To enable clients to connect to an etcd cluster a ClusterIP `Client` [Service](https://kubernetes.io/docs/concepts/services-networking/service/) is created. To enable `etcd` members to talk to each other(for discovery, leader-election, raft consensus etc.) `etcd-druid` also creates a [Headless Service](https://kubernetes.io/docs/concepts/services-networking/service/#headless-services).

**Code reference:** [Client-Service-Component](https://github.com/gardener/etcd-druid/tree/480213808813c5282b19aff5f3fd6868529e779c/internal/component/clientservice) & [Peer-Service-Component](https://github.com/gardener/etcd-druid/tree/480213808813c5282b19aff5f3fd6868529e779c/internal/component/peerservice)

## Member Lease

Every member in an `Etcd` cluster has a dedicated [Lease](https://kubernetes.io/docs/concepts/architecture/leases/) that gets created which signifies that the member is alive. It is the responsibility of the `etcd-backup-store` side-car container to periodically renew the lease.

> Today the lease object is also used to indicate the member-ID and the role of the member in an etcd cluster. Possible roles are `Leader`, `Member`(which denotes that this is a member but not a leader). This will change in the future with [EtcdMember resource](https://github.com/gardener/etcd-druid/blob/3383e0219a6c21c6ef1d5610db964cc3524807c8/docs/proposals/04-etcd-member-custom-resource.md).
**Code reference:** [Member-Lease-Component](https://github.com/gardener/etcd-druid/tree/3383e0219a6c21c6ef1d5610db964cc3524807c8/internal/component/memberlease)

## Delta & Full Snapshot Leases

One of the responsibilities of `etcd-backup-restore` container is to take periodic or threshold based snapshots (delta and full) of the etcd DB. Today `etcd-backup-restore` communicates the end-revision of the latest full/delta snapshots to `etcd-druid` operator via leases.

`etcd-druid` creates two [Lease](https://kubernetes.io/docs/concepts/architecture/leases/) resources one for delta and another for full snapshot. This information is used by the operator to trigger [snapshot-compaction](../proposals/02-snapshot-compaction.md) jobs. Snapshot leases are also used to derive the health of backups which gets updated in the `Status` subresource of every `Etcd` resource.

> In future these leases will be replaced by [EtcdMember resource](https://github.com/gardener/etcd-druid/blob/3383e0219a6c21c6ef1d5610db964cc3524807c8/docs/proposals/04-etcd-member-custom-resource.md).
**Code reference:** [Snapshot-Lease-Component](https://github.com/gardener/etcd-druid/tree/3383e0219a6c21c6ef1d5610db964cc3524807c8/internal/component/snapshotlease)
24 changes: 24 additions & 0 deletions docs/concepts/etcd-cluster-resource-protection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Etcd Cluster Resource Protection

`etcd-druid` provisions and manages [kubernetes resources (a.k.a components)](etcd-cluster-components.md) for each `Etcd` cluster. To ensure that each component's specification is in line with the configured attributes defined in `Etcd` custom resource and to protect unintended changes done to any of these *managed components* a [Validating Webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) is employed.

[Etcd Components Webhook](https://github.com/gardener/etcd-druid/tree/55efca1c8f6c852b0a4e97f08488ffec2eed0e68/internal/webhook/etcdcomponents) is the *validating webhook* which prevents unintended *UPDATE* and *DELETE* operations on all managed resources. Following sections describe what is prohibited and in which specific conditions the changes are permitted.

## Configure Etcd Components Webhook

Prerequisite to enable the validation webhook is to [configure the Webhook Server](../deployment/configure-etcd-druid.md#webhook-server). Additionally you need to enable the `Etcd Components` validating webhook and optionally configure other options. You can look at all the options [here](../deployment/configure-etcd-druid.md#etcd-components-webhook).

## What is allowed?

Modifications to managed resources under the following circumstances will be allowed:

* `Create` and `Connect` operations are allowed and no validation is done.
* Changes to a kubernetes resource (e.g. StatefulSet, ConfigMap etc) not managed by etcd-druid are allowed.
* Changes to a resource whose Group-Kind is amongst the resources managed by etcd-druid but does not have a parent `Etcd` resource are allowed.
* It is possible that an operator wishes to explicitly disable etcd-component protection. This can be done by setting `druid.gardener.cloud/disable-etcd-component-protection` annotation on an `Etcd` resource. If this annotation is present then changes to managed components will be allowed.
* If `Etcd` resource has a deletion timestamp set indicating that it is marked for deletion and is awaiting etcd-druid to delete all managed resources then deletion requests for all managed resources for this etcd cluster will be allowed if:
* The deletion request has come from a `ServiceAccount` associated to etcd-druid. If not explicitly specified via `--reconciler-service-account` then a [default-reconciler-service-account](https://github.com/gardener/etcd-druid/blob/55efca1c8f6c852b0a4e97f08488ffec2eed0e68/internal/webhook/etcdcomponents/config.go#L23) will be assumed.
* The deletion request has come from a `ServiceAccount` configured via `--etcd-components-webhook-exempt-service-accounts`.
* `Lease` objects are periodically updated by each etcd member pod. A single `ServiceAccount` is created for all members. `Update` operation on `Lease` objects from [this ServiceAccount](https://github.com/gardener/etcd-druid/blob/55efca1c8f6c852b0a4e97f08488ffec2eed0e68/api/v1alpha1/helper.go#L28) is allowed.
* If an active reconciliation is in-progress then only allow operations that are initiated by etcd-druid.
* If no active reconciliation is currently in-progress, then allow updates to managed resource from `ServiceAccounts` configured via `--etcd-components-webhook-exempt-service-accounts`.
Loading

0 comments on commit df3ff21

Please sign in to comment.