Skip to content

Commit

Permalink
UPSTREAM: 1663: Recommended leaderelection setting (#1663)
Browse files Browse the repository at this point in the history
Extensive e2e tests revealed that operator-controller might run into
leader election timeouts during cluster bootstrap, causing sporadic
alerts being generated.

This commit uses recommended settings for leaderelection
LeaseDuration: 15s -> 137s
RenewDeadline: 10s -> 107s
RetryPeriod:    2s ->  26s

Warning: This will increase potential down-time of catalogd to 163s in
the worst case (up from 17s). (LeaseDuration + RetryPeriod)
  • Loading branch information
thetechnick authored Jan 29, 2025
1 parent 7ee4ced commit 011773e
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 3 deletions.
11 changes: 9 additions & 2 deletions catalogd/cmd/catalogd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ import (
_ "k8s.io/client-go/plugin/pkg/client/auth"
"k8s.io/klog/v2"
"k8s.io/klog/v2/textlogger"
"k8s.io/utils/ptr"
ctrl "sigs.k8s.io/controller-runtime"
crcache "sigs.k8s.io/controller-runtime/pkg/cache"
"sigs.k8s.io/controller-runtime/pkg/certwatcher"
Expand Down Expand Up @@ -231,8 +232,14 @@ func main() {
HealthProbeBindAddress: probeAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "catalogd-operator-lock",
WebhookServer: webhookServer,
Cache: cacheOptions,
// Recommended Leader Election values
// https://github.com/openshift/enhancements/blob/61581dcd985130357d6e4b0e72b87ee35394bf6e/CONVENTIONS.md#handling-kube-apiserver-disruption
LeaseDuration: ptr.To(137 * time.Second),
RenewDeadline: ptr.To(107 * time.Second),
RetryPeriod: ptr.To(26 * time.Second),

WebhookServer: webhookServer,
Cache: cacheOptions,
})
if err != nil {
setupLog.Error(err, "unable to create manager")
Expand Down
9 changes: 8 additions & 1 deletion cmd/operator-controller/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ import (
_ "k8s.io/client-go/plugin/pkg/client/auth"
"k8s.io/klog/v2"
"k8s.io/klog/v2/textlogger"
"k8s.io/utils/ptr"
ctrl "sigs.k8s.io/controller-runtime"
crcache "sigs.k8s.io/controller-runtime/pkg/cache"
"sigs.k8s.io/controller-runtime/pkg/certwatcher"
Expand Down Expand Up @@ -229,7 +230,13 @@ func main() {
HealthProbeBindAddress: probeAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "9c4404e7.operatorframework.io",
Cache: cacheOptions,
// Recommended Leader Election values
// https://github.com/openshift/enhancements/blob/61581dcd985130357d6e4b0e72b87ee35394bf6e/CONVENTIONS.md#handling-kube-apiserver-disruption
LeaseDuration: ptr.To(137 * time.Second),
RenewDeadline: ptr.To(107 * time.Second),
RetryPeriod: ptr.To(26 * time.Second),

Cache: cacheOptions,
// LeaderElectionReleaseOnCancel defines if the leader should step down voluntarily
// when the Manager ends. This requires the binary to immediately end when the
// Manager is stopped, otherwise, this setting is unsafe. Setting this significantly
Expand Down

0 comments on commit 011773e

Please sign in to comment.