Replies: 3 comments 3 replies
-
For me Option 2 seems to most reasonable action:
|
Beta Was this translation helpful? Give feedback.
-
Some random brain dump: With some concepts me, @ccremer and @cimnine discussed (k8up as client, using native k8s cron, etc). I'm actually not sure if HA is really necessary anymore for the operator itself. If we use k8s native crons, it doesn't matter if the operator is running or not -> no more lost schedules during restarts/maintenances/crashes. So the argument for a HA k8up setup dimishes quite a bit. Any thoughts? |
Beta Was this translation helpful? Give feedback.
-
We have long implemented this leader election and have updated controller-runtime beyond 0.7. |
Beta Was this translation helpful? Give feedback.
-
In 38233b1 I tried to upgrade to said version. I detected following issues that I'd like to discuss here.
Starting situation
With v0.6.x, the controller-runtime created a configmap to store leader election data. This CM serves like a mutex that each Pod will first query before starting up. Multiple Pods could occur in the following scenarios: multiple replicas for HA hot-standby, or when doing rolling upgrades of a 1-replica deployment.
This worked well, and is also backwards compatible with K8s 1.11 or OpenShift 3.11
New situation
With K8s 1.16+, new API objects became available: coordination.k8s.io/Lease. This is a resource that serves specifically for this kind of "mutex" situations.
Controller-runtime (the heart of Operators) 0.7.x onwards switched from CM to the new Lease API.
The problem: The Lease API is not available in OpenShift 3.11. As long as we support this age-old K8s version, we won't be able to support the built-in leader election feature for K8s 1.15 and below.
Options
I suggest to go with Option 2, and make leader election configurable (or auto-detected based on K8s version) in the Helm chart (currently it's always enabled).
Beta Was this translation helpful? Give feedback.
All reactions