[GOG-1162] Fix Cluster HealthCheck to Exclude HAProxy Check During Bootstrap #712

rurkss · 2025-07-15T18:15:27Z

PR:

Starting with version 2.4.0 of the redis-operator, we are removing HAProxy from the cluster in bootstrap mode.

However, the cluster health check still expects HAProxy to be present.

if rFailover.Bootstrapping() && !rFailover.SentinelsAllowed() {
    return r.IsRedisRunning(rFailover) && r.IsHAProxyRunning(rFailover)
}

As a result, the health check fails, preventing the Redis custom resource from updating its status and causing the deployment to fail.

Logic now distinguishes between:

Bootstrap mode with sentinels disabled → only Redis must be running.
Bootstrap mode with sentinels allowed → Redis and Sentinel must be running.
Normal mode → Redis, Sentinel, and HAProxy must all be running.

in case we would like to make operator namespace scoped we can use a variable `OP_TYPE` and set it to `namespaced` Add `env` to deployment manifest: env: - name: OP_TYPE value: "namespaced"

Co-authored-by: Ryan Rodriguez <[email protected]>

Allow Operator to Be Namespace-Scoped

@rurkss

* make redis-operator namespaced scope * add OP_TYPE variable in case we would like to make operator namespace scoped we can use a variable `OP_TYPE` and set it to `namespaced` Add `env` to deployment manifest: env: - name: OP_TYPE value: "namespaced" * rename env variable to WATCH_NAMESPACE * add namespace-scoped instructions to readme * rename variable Co-authored-by: Ryan Rodriguez <[email protected]> * rename variable to human readable --------- Co-authored-by: artur.zheludkov <[email protected]> Co-authored-by: @rurkss <[email protected]>

@rurkss

* make redis-operator namespaced scope * add OP_TYPE variable in case we would like to make operator namespace scoped we can use a variable `OP_TYPE` and set it to `namespaced` Add `env` to deployment manifest: env: - name: OP_TYPE value: "namespaced" * rename env variable to WATCH_NAMESPACE * add namespace-scoped instructions to readme * rename variable Co-authored-by: Ryan Rodriguez <[email protected]> * rename variable to human readable * automatic creation of networkpolicy object * test sign * add metrics port / metric namespace * add networkpolicy as optional service * format code --------- Co-authored-by: artur.zheludkov <[email protected]> Co-authored-by: @rurkss <[email protected]>

* Add HAProxy Init On Demand * update networkpolicy match label * HAProxy customConfig apiVersion: databases.spotahome.com/v1 kind: RedisFailover metadata: name: redisfailover labels: app.kubernetes.io/component: redis spec: haproxy: customConfig: | global daemon maxconn 256 defaults mode tcp timeout connect 5000ms timeout client 50000ms timeout server 50000ms timeout check 5000ms resolvers k8s parse-resolv-conf hold other 10s hold refused 10s hold nx 10 hold timeout 10s hold valid 10s hold obsolete 10s frontend redis-master bind *:<%= port %> default_backend redis-master backend redis-master mode tcp balance first option tcp-check tcp-check send info\ replication\r\n tcp-check expect string role:master server-template redis <%= replicas %> _redis._tcp.redis.<%= namespace %>.svc.cluster.local:<%= port %> check inter 1s resolvers k8s init-addr none redisHost: "redis-haproxy" replicas: 1 image: "haproxy:2.4" sentinel: replicas: 3 redis: replicas: 2 * fix indent. add customConfig description to the crd.yaml

* Affinity Config For Haproxy * Use standardized labels for HAProxy --------- Co-authored-by: Ryan Rodriguez <[email protected]>

* Add Status Subresource * Update operator/redisfailover/handler.go Co-authored-by: Ryan Rodriguez <[email protected]> * Update operator/redisfailover/handler.go Co-authored-by: Ryan Rodriguez <[email protected]> * Update operator/redisfailover/handler.go Co-authored-by: Ryan Rodriguez <[email protected]> * Update operator/redisfailover/handler.go Co-authored-by: Ryan Rodriguez <[email protected]> * add haproxy in check * case similarity for haproxy * Update operator/redisfailover/handler.go Co-authored-by: Ryan Rodriguez <[email protected]> * Update operator/redisfailover/handler.go Co-authored-by: Ryan Rodriguez <[email protected]> --------- Co-authored-by: Ryan Rodriguez <[email protected]>

* Update ObservedGeneration * Fix typo, expand comment --------- Co-authored-by: Ryan Rodriguez <[email protected]>

* haproxy opt params * revert haproxy image as opt * Update generator.go * remove duplicate variables

* fix redis reconciling error * Update handler.go * Update handler.go * Update handler.go * Update handler.go * Update handler.go * Update operator/redisfailover/handler.go Co-authored-by: Ryan Rodriguez <[email protected]> * add error handler --------- Co-authored-by: Ryan Rodriguez <[email protected]>

…rs.yaml Co-authored-by: Aaron Kuehler <[email protected]>

Prepare the repository for tagging / release of version 2.1.0. Note, this release contains a [fix](#49) for #48 - hence the minor version upgrade. References ---------- - https://semver.org/#what-do-i-do-if-i-accidentally-release-a-backward-incompatible-change-as-a-minor-version

This PR removes the redis-slave-HAProxy, deeming it unnecessary and potentially dangerous . Originally, this resource was designed as an endpoint for Redis replication nodes to connect to slave nodes of source cluster. However, when replicated redis uses this resource, the sentinels on source side detect it as a potential slave for failover scenario. In failover, sentinels mistakenly treat the HAProxy pods as legitimate redis slaves and attempt to promote the HAProxy pod as the next master, as it is impossible to do, sentinels get stuck in a promotion loop. Workaround is to use redis-slave-service as endpoint on source cluster. This way, when sentinels detect replicated redis as potential master in failover scenario, they will not be able to promote it, because replicated redis node is unreachable for connection and sentinels will `forget` this node. --------- Co-authored-by: Aaron Kuehler <[email protected]>

Update the default haproxy image to the latest release

An alternative approach to restarting StatefulSet-related pods is to delete the StatefulSet itself and allow the operator to automatically recreate it during the next reconciliation loop. This differs from the previous implementation made in #54 --------- Co-authored-by: Aaron Kuehler <[email protected]>

Prepare the repository for tagging / release of version 3.1.0.

There is a brief period of time during a Redis replica restart when HAProxy can send writes to it; causing the following error: ``` READONLY You can't write against a read only replica. ``` This is because HAProxy (< 3.1.0) always treats newly detected backends as immediately UP - ready to serve traffic - UNTIL they failed their checks. HAProxy v3.1.0 adds the ability to configure a backend's initial state - init-state - and how/when it HAProxy determines that the backend is ready to receive traffic. This changes the HAProxy configuration so that a new Redis node is disqualified from client receiving traffic until HAProxy determines that the node is indeed a "master" node. This also updates the default operator's HAProxy image accordingly to receive the `init-state` HAProxy configuration option. References: - haproxy/haproxy#51 - haproxy/haproxy@50322df

Prepare the repository for tagging / release of version 4.0.0

Related to: - medyagh/setup-minikube#565 - https://github.com/powerhome/redis-operator/actions/runs/12774059053/job/35608701152#step:5:78 ``` Preparing to unpack .../socat_1.8.0.0-4build3_amd64.deb ... Unpacking socat (1.8.0.0-4build3) ... Setting up socat (1.8.0.0-4build3) ... Processing triggers for man-db (2.12.0-4build2) ... Running kernel seems to be up-to-date. No services need to be restarted. No containers need to be restarted. No user sessions are running outdated binaries. No VM guests are running outdated hypervisor (qemu) binaries on this host. /usr/bin/lsb_release --short --codename noble Error: Unexpected HTTP response: 404 ```

Some clients do not gracefully handle when a failover occurs; they keep the established connection open. When the redis node comes back up as a replica, the client recieves an error indicating that they cannot write against the replica: ``` READONLY You can't write against a read only replica ``` This closes client connections to a redis node when haproxy detects that it is in a "down" state. The idea is that the clients will recognize they need to reestablish a connection. References: - https://docs.haproxy.org/3.1/configuration.html#5.2-on-marked-down

Prepare the repository for tagging / release of version 4.1.0

Fix buildx and bake action compatibility issues in the `ci` GitHub workflow. - https://github.com/powerhome/redis-operator/actions/runs/12935287363/job/36078911603 ``` docker/bake-action < v5 is not compatible with buildx >= 0.20.0, please update your workflow to latest docker/bake-action or use an older buildx version. ```

This is to change the owner of this operator from the now defunct Forever People team to Guardians of the Galaxy. Co-authored-by: Aaron Kuehler <[email protected]>

Boostrapping fails if the bootstrapNode and RF's redis port are different. The operator uses the bootstrapNode's port when trying to run `SLAVEOF` on the RF pods and fails to connect: Example RF config: ```yaml bootstrapNode: enabled: true host: redis-primary-host.io port: "36379" ... redis: port: 6679 ... ``` Example failure logs: ``` time="2025-06-05T19:08:29Z" level=info msg="Making pod rfr-redis-0 slave of redis-primary-host.io:36379" namespace=redis-test redisfailover=redis service=redis.healer src="checker.go:261" time="2025-06-05T19:08:29Z" level=error msg="error on object processing: dial tcp 10.244.2.7:36379: connect: connection refused" controller-id=redisfailover object-key=redis-test/redis operator=redisfailover service=kooper.controller src="controller.go:282" ``` This alters `MakeSlaveOfWithPort` to allow the caller to speficy different ports for the primary and replica redis instances. Additonally, `SetExternalMasterOnAll` is alertered to pass along the boostrapNode's port as well as the RF resources's port - which _can_ be different.

Optionally expose Haproxy Prometheus metrics[^1]. Example: ``` spec: ... haproxy: exporter: true ``` [^1]: https://www.haproxy.com/documentation/haproxy-configuration-tutorials/alerts-and-monitoring/prometheus/

Bring the haproxy service labels into a similar shape as the redis service labels. Makes it easier to select just the haproxy service by its labels. Current Redis "Master" Service labels: ``` yaml metadata: labels: app.kubernetes.io/component: redis app.kubernetes.io/managed-by: redis-operator app.kubernetes.io/name: redis app.kubernetes.io/part-of: redis-failover redisfailovers-role: master redisfailovers.databases.spotahome.com/name: redis ``` Additions to the Haproxy Service labels: ``` diff metadata: labels: + app.kubernetes.io/component: haproxy app.kubernetes.io/managed-by: redis-operator + app.kubernetes.io/name: redis + app.kubernetes.io/part-of: redis-failover + redisfailovers-role: master redisfailovers.databases.spotahome.com/name: redis ```

Prepare the repository for tagging / release of version 4.2.0

I messed up the changelog version number in #67.

When bootstrapping, we expect no master to be running. However, the Haproxy master resources are still deployed and looking for a master. This skips or removes the Haproxy master resources when a RedisFailover is bootstrapping.

Prepare the repository for tagging / release of version v4.3.0

rurkss and others added 30 commits April 6, 2023 09:14

make redis-operator namespaced scope

15f53b8

add OP_TYPE variable

d69800f

in case we would like to make operator namespace scoped we can use a variable `OP_TYPE` and set it to `namespaced` Add `env` to deployment manifest: env: - name: OP_TYPE value: "namespaced"

rename env variable to WATCH_NAMESPACE

d3e1e4b

add namespace-scoped instructions to readme

c315b66

rename variable

5294765

Co-authored-by: Ryan Rodriguez <[email protected]>

rename variable to human readable

900174b

Downgrade Policy API to v1beta1 for k8s compatibility < 1.21

a717ecd

Merge pull request #1 from powerhome/03032023_namespaced_operator

ba0404d

Allow Operator to Be Namespace-Scoped

Merge remote-tracking branch 'powerhome/master' into compatibility

2bc0079

Merge remote-tracking branch 'powerhome/master' into compatibility

0714ca4

Affinity Config For HAProxy (#6)

d9ca7ef

* Affinity Config For Haproxy * Use standardized labels for HAProxy --------- Co-authored-by: Ryan Rodriguez <[email protected]>

Merge remote-tracking branch 'powerhome/master' into compatibility

79ce4ec

Add Explicit Component Labels (#8)

972fc15

Merge remote-tracking branch 'powerhome/master' into compatibility

e9a072e

Merge remote-tracking branch 'powerhome/master' into compatibility

a920413

Update status.observedGeneration w/ Latest Ver (#9)

90c559c

* Update ObservedGeneration * Fix typo, expand comment --------- Co-authored-by: Ryan Rodriguez <[email protected]>

Merge remote-tracking branch 'powerhome/master' into compatibility

138c972

Haproxy Optional Parameters (#10)

d89bce1

* haproxy opt params * revert haproxy image as opt * Update generator.go * remove duplicate variables

Merge remote-tracking branch 'powerhome/master' into compatibility

43e3bee

Merge remote-tracking branch 'powerhome/master' into compatibility

91b586c

multiple redis HA instances

c2a72eb

Update databases.spotahome.com_redisfailovers.yaml

9239ef6

correct haproxy name during checks

eb977a6

Update charts/redisoperator/crds/databases.spotahome.com_redisfailove…

8514199

…rs.yaml Co-authored-by: Aaron Kuehler <[email protected]>

refactoring

a465ea6

indiebrain and others added 26 commits February 26, 2024 11:18

Upgrade default haproxy image to 2.9.6 (#52)

e27ec6f

Update the default haproxy image to the latest release

Update codeowners

08da7a1

Prepare release v3.1.0 (#56)

11bf6ad

Prepare the repository for tagging / release of version 3.1.0.

Update codeowners

1c0d2ec

Prepare release v4.0.0 (#58)

475712f

Prepare the repository for tagging / release of version 4.0.0

Add Stalebot workflow configuration

77084d4

Prepare release v4.1.0 (#61)

1235ee1

Prepare the repository for tagging / release of version 4.1.0

Add Stalebot workflow configuration

6859c4d

Add Stalebot workflow configuration

67c9480

Update codeowners

498f459

Forever People was merged into GOG (#62)

8b3c993

This is to change the owner of this operator from the now defunct Forever People team to Guardians of the Galaxy. Co-authored-by: Aaron Kuehler <[email protected]>

Add haproxy prometheus exporter configuration (#65)

c11fddd

Optionally expose Haproxy Prometheus metrics[^1]. Example: ``` spec: ... haproxy: exporter: true ``` [^1]: https://www.haproxy.com/documentation/haproxy-configuration-tutorials/alerts-and-monitoring/prometheus/

Prepare release v4.2.0 (#67)

6465ddd

Prepare the repository for tagging / release of version 4.2.0

Fix CHANGELOG.md version (#68)

2923ecd

I messed up the changelog version number in #67.

Prepare release v4.3.0 (#71)

7b2b73d

Prepare the repository for tagging / release of version v4.3.0

fix cluster state check

b47da49

rurkss requested a review from a team as a code owner July 15, 2025 18:15

rurkss closed this Jul 15, 2025

rurkss deleted the GOG-1162 branch July 15, 2025 18:16

rurkss restored the GOG-1162 branch July 15, 2025 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GOG-1162] Fix Cluster HealthCheck to Exclude HAProxy Check During Bootstrap #712

[GOG-1162] Fix Cluster HealthCheck to Exclude HAProxy Check During Bootstrap #712

Uh oh!

rurkss commented Jul 15, 2025

Uh oh!

Uh oh!

[GOG-1162] Fix Cluster HealthCheck to Exclude HAProxy Check During Bootstrap #712

[GOG-1162] Fix Cluster HealthCheck to Exclude HAProxy Check During Bootstrap #712

Uh oh!

Conversation

rurkss commented Jul 15, 2025

Uh oh!

Uh oh!