Upgrade to controller-runtime 0.7 and its leader election #288

ccremer · 2021-01-13T13:51:55Z

ccremer
Jan 13, 2021

In 38233b1 I tried to upgrade to said version. I detected following issues that I'd like to discuss here.

Starting situation

With v0.6.x, the controller-runtime created a configmap to store leader election data. This CM serves like a mutex that each Pod will first query before starting up. Multiple Pods could occur in the following scenarios: multiple replicas for HA hot-standby, or when doing rolling upgrades of a 1-replica deployment.
This worked well, and is also backwards compatible with K8s 1.11 or OpenShift 3.11

New situation

With K8s 1.16+, new API objects became available: coordination.k8s.io/Lease. This is a resource that serves specifically for this kind of "mutex" situations.
Controller-runtime (the heart of Operators) 0.7.x onwards switched from CM to the new Lease API.
The problem: The Lease API is not available in OpenShift 3.11. As long as we support this age-old K8s version, we won't be able to support the built-in leader election feature for K8s 1.15 and below.

Options

Don't introduce built-in leader election, at all. We'd need to come up with our own solution if we want to do Implement HA for the operator #244 . The old version of K8up didn't feature any HA mechanism at all, so nothing would be lost.
Support leader election only for K8s 1.16, for older versions there won't be hot-standby.
We re-implement ConfigMap based leader election on our own (or find a library or smth else that does it already differently), to support older stuff too.
Stay on 0.6.x for as long as possible until there's no need for OpenShift 3.11 anymore. Note that we won't be able to upgrade to K8s 1.19 and beyond.

I suggest to go with Option 2, and make leader election configurable (or auto-detected based on K8s version) in the Helm chart (currently it's always enabled).

tobru · 2021-01-14T08:59:12Z

tobru
Jan 14, 2021
Maintainer

For me Option 2 seems to most reasonable action:

We didn't support leader election until now
Introducing features which do have requirements to the Kubernetes version is fine and makes sense. When they can be turned off so the K8up still works on older versions just without a particular feature is rather nice.

2 replies

Kidswiss Jan 15, 2021
Maintainer

I agree, option 2 sounds best.

If a customer wants backup in HA he has to have a current version.

ccremer Jan 15, 2021
Author

Leader election is currently enabled by default in K8up v1. This is "half-breaking" for OpenShift 3.11 installations.

There is an easy environment variable where this can be disabled again
The new Helm chart version is breaking anyway, so Helm users are required to touch their Chart definition and it's documented to disable LE if they're on old K8s versions

Kidswiss · 2021-02-15T14:37:43Z

Kidswiss
Feb 15, 2021
Maintainer

Some random brain dump:

With some concepts me, @ccremer and @cimnine discussed (k8up as client, using native k8s cron, etc). I'm actually not sure if HA is really necessary anymore for the operator itself.

If we use k8s native crons, it doesn't matter if the operator is running or not -> no more lost schedules during restarts/maintenances/crashes. So the argument for a HA k8up setup dimishes quite a bit.

Any thoughts?

1 reply

ccremer Feb 15, 2021
Author

Since it's already implemented, enabled by default and coming for free form the SDK, why should now reconsider this?

It's really no more than max 5 lines of code to maintain :)

ccremer · 2022-05-10T12:23:36Z

ccremer
May 10, 2022
Author

We have long implemented this leader election and have updated controller-runtime beyond 0.7.
Closing this.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to controller-runtime 0.7 and its leader election #288

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Upgrade to controller-runtime 0.7 and its leader election #288

ccremer Jan 13, 2021

Starting situation

New situation

Options

Replies: 3 comments · 3 replies

tobru Jan 14, 2021 Maintainer

Kidswiss Jan 15, 2021 Maintainer

ccremer Jan 15, 2021 Author

Kidswiss Feb 15, 2021 Maintainer

ccremer Feb 15, 2021 Author

ccremer May 10, 2022 Author

ccremer
Jan 13, 2021

Replies: 3 comments 3 replies

tobru
Jan 14, 2021
Maintainer

Kidswiss Jan 15, 2021
Maintainer

ccremer Jan 15, 2021
Author

Kidswiss
Feb 15, 2021
Maintainer

ccremer Feb 15, 2021
Author

ccremer
May 10, 2022
Author