-
Notifications
You must be signed in to change notification settings - Fork 374
[GOG-1162] Fix Cluster HealthCheck to Exclude HAProxy Check During Bootstrap #712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
in case we would like to make operator namespace scoped we can use a variable `OP_TYPE` and set it to `namespaced` Add `env` to deployment manifest: env: - name: OP_TYPE value: "namespaced"
Co-authored-by: Ryan Rodriguez <[email protected]>
Allow Operator to Be Namespace-Scoped
* make redis-operator namespaced scope * add OP_TYPE variable in case we would like to make operator namespace scoped we can use a variable `OP_TYPE` and set it to `namespaced` Add `env` to deployment manifest: env: - name: OP_TYPE value: "namespaced" * rename env variable to WATCH_NAMESPACE * add namespace-scoped instructions to readme * rename variable Co-authored-by: Ryan Rodriguez <[email protected]> * rename variable to human readable --------- Co-authored-by: artur.zheludkov <[email protected]> Co-authored-by: @rurkss <[email protected]>
* make redis-operator namespaced scope * add OP_TYPE variable in case we would like to make operator namespace scoped we can use a variable `OP_TYPE` and set it to `namespaced` Add `env` to deployment manifest: env: - name: OP_TYPE value: "namespaced" * rename env variable to WATCH_NAMESPACE * add namespace-scoped instructions to readme * rename variable Co-authored-by: Ryan Rodriguez <[email protected]> * rename variable to human readable * automatic creation of networkpolicy object * test sign * add metrics port / metric namespace * add networkpolicy as optional service * format code --------- Co-authored-by: artur.zheludkov <[email protected]> Co-authored-by: @rurkss <[email protected]>
* Add HAProxy Init On Demand * update networkpolicy match label * HAProxy customConfig apiVersion: databases.spotahome.com/v1 kind: RedisFailover metadata: name: redisfailover labels: app.kubernetes.io/component: redis spec: haproxy: customConfig: | global daemon maxconn 256 defaults mode tcp timeout connect 5000ms timeout client 50000ms timeout server 50000ms timeout check 5000ms resolvers k8s parse-resolv-conf hold other 10s hold refused 10s hold nx 10 hold timeout 10s hold valid 10s hold obsolete 10s frontend redis-master bind *:<%= port %> default_backend redis-master backend redis-master mode tcp balance first option tcp-check tcp-check send info\ replication\r\n tcp-check expect string role:master server-template redis <%= replicas %> _redis._tcp.redis.<%= namespace %>.svc.cluster.local:<%= port %> check inter 1s resolvers k8s init-addr none redisHost: "redis-haproxy" replicas: 1 image: "haproxy:2.4" sentinel: replicas: 3 redis: replicas: 2 * fix indent. add customConfig description to the crd.yaml
* Affinity Config For Haproxy * Use standardized labels for HAProxy --------- Co-authored-by: Ryan Rodriguez <[email protected]>
* Add Status Subresource * Update operator/redisfailover/handler.go Co-authored-by: Ryan Rodriguez <[email protected]> * Update operator/redisfailover/handler.go Co-authored-by: Ryan Rodriguez <[email protected]> * Update operator/redisfailover/handler.go Co-authored-by: Ryan Rodriguez <[email protected]> * Update operator/redisfailover/handler.go Co-authored-by: Ryan Rodriguez <[email protected]> * add haproxy in check * case similarity for haproxy * Update operator/redisfailover/handler.go Co-authored-by: Ryan Rodriguez <[email protected]> * Update operator/redisfailover/handler.go Co-authored-by: Ryan Rodriguez <[email protected]> --------- Co-authored-by: Ryan Rodriguez <[email protected]>
* Update ObservedGeneration * Fix typo, expand comment --------- Co-authored-by: Ryan Rodriguez <[email protected]>
* haproxy opt params * revert haproxy image as opt * Update generator.go * remove duplicate variables
* fix redis reconciling error * Update handler.go * Update handler.go * Update handler.go * Update handler.go * Update handler.go * Update operator/redisfailover/handler.go Co-authored-by: Ryan Rodriguez <[email protected]> * add error handler --------- Co-authored-by: Ryan Rodriguez <[email protected]>
…rs.yaml Co-authored-by: Aaron Kuehler <[email protected]>
Prepare the repository for tagging / release of version 2.1.0. Note, this release contains a [fix](#49) for #48 - hence the minor version upgrade. References ---------- - https://semver.org/#what-do-i-do-if-i-accidentally-release-a-backward-incompatible-change-as-a-minor-version
This PR removes the redis-slave-HAProxy, deeming it unnecessary and potentially dangerous . Originally, this resource was designed as an endpoint for Redis replication nodes to connect to slave nodes of source cluster. However, when replicated redis uses this resource, the sentinels on source side detect it as a potential slave for failover scenario. In failover, sentinels mistakenly treat the HAProxy pods as legitimate redis slaves and attempt to promote the HAProxy pod as the next master, as it is impossible to do, sentinels get stuck in a promotion loop. Workaround is to use redis-slave-service as endpoint on source cluster. This way, when sentinels detect replicated redis as potential master in failover scenario, they will not be able to promote it, because replicated redis node is unreachable for connection and sentinels will `forget` this node. --------- Co-authored-by: Aaron Kuehler <[email protected]>
Update the default haproxy image to the latest release
An alternative approach to restarting StatefulSet-related pods is to delete the StatefulSet itself and allow the operator to automatically recreate it during the next reconciliation loop. This differs from the previous implementation made in #54 --------- Co-authored-by: Aaron Kuehler <[email protected]>
Prepare the repository for tagging / release of version 3.1.0.
There is a brief period of time during a Redis replica restart when HAProxy can send writes to it; causing the following error: ``` READONLY You can't write against a read only replica. ``` This is because HAProxy (< 3.1.0) always treats newly detected backends as immediately UP - ready to serve traffic - UNTIL they failed their checks. HAProxy v3.1.0 adds the ability to configure a backend's initial state - init-state - and how/when it HAProxy determines that the backend is ready to receive traffic. This changes the HAProxy configuration so that a new Redis node is disqualified from client receiving traffic until HAProxy determines that the node is indeed a "master" node. This also updates the default operator's HAProxy image accordingly to receive the `init-state` HAProxy configuration option. References: - haproxy/haproxy#51 - haproxy/haproxy@50322df
Prepare the repository for tagging / release of version 4.0.0
Related to: - medyagh/setup-minikube#565 - https://github.com/powerhome/redis-operator/actions/runs/12774059053/job/35608701152#step:5:78 ``` Preparing to unpack .../socat_1.8.0.0-4build3_amd64.deb ... Unpacking socat (1.8.0.0-4build3) ... Setting up socat (1.8.0.0-4build3) ... Processing triggers for man-db (2.12.0-4build2) ... Running kernel seems to be up-to-date. No services need to be restarted. No containers need to be restarted. No user sessions are running outdated binaries. No VM guests are running outdated hypervisor (qemu) binaries on this host. /usr/bin/lsb_release --short --codename noble Error: Unexpected HTTP response: 404 ```
Some clients do not gracefully handle when a failover occurs; they keep the established connection open. When the redis node comes back up as a replica, the client recieves an error indicating that they cannot write against the replica: ``` READONLY You can't write against a read only replica ``` This closes client connections to a redis node when haproxy detects that it is in a "down" state. The idea is that the clients will recognize they need to reestablish a connection. References: - https://docs.haproxy.org/3.1/configuration.html#5.2-on-marked-down
Prepare the repository for tagging / release of version 4.1.0
Fix buildx and bake action compatibility issues in the `ci` GitHub workflow. - https://github.com/powerhome/redis-operator/actions/runs/12935287363/job/36078911603 ``` docker/bake-action < v5 is not compatible with buildx >= 0.20.0, please update your workflow to latest docker/bake-action or use an older buildx version. ```
This is to change the owner of this operator from the now defunct Forever People team to Guardians of the Galaxy. Co-authored-by: Aaron Kuehler <[email protected]>
Boostrapping fails if the bootstrapNode and RF's redis port are different. The operator uses the bootstrapNode's port when trying to run `SLAVEOF` on the RF pods and fails to connect: Example RF config: ```yaml bootstrapNode: enabled: true host: redis-primary-host.io port: "36379" ... redis: port: 6679 ... ``` Example failure logs: ``` time="2025-06-05T19:08:29Z" level=info msg="Making pod rfr-redis-0 slave of redis-primary-host.io:36379" namespace=redis-test redisfailover=redis service=redis.healer src="checker.go:261" time="2025-06-05T19:08:29Z" level=error msg="error on object processing: dial tcp 10.244.2.7:36379: connect: connection refused" controller-id=redisfailover object-key=redis-test/redis operator=redisfailover service=kooper.controller src="controller.go:282" ``` This alters `MakeSlaveOfWithPort` to allow the caller to speficy different ports for the primary and replica redis instances. Additonally, `SetExternalMasterOnAll` is alertered to pass along the boostrapNode's port as well as the RF resources's port - which _can_ be different.
Optionally expose Haproxy Prometheus metrics[^1]. Example: ``` spec: ... haproxy: exporter: true ``` [^1]: https://www.haproxy.com/documentation/haproxy-configuration-tutorials/alerts-and-monitoring/prometheus/
Bring the haproxy service labels into a similar shape as the redis service labels. Makes it easier to select just the haproxy service by its labels. Current Redis "Master" Service labels: ``` yaml metadata: labels: app.kubernetes.io/component: redis app.kubernetes.io/managed-by: redis-operator app.kubernetes.io/name: redis app.kubernetes.io/part-of: redis-failover redisfailovers-role: master redisfailovers.databases.spotahome.com/name: redis ``` Additions to the Haproxy Service labels: ``` diff metadata: labels: + app.kubernetes.io/component: haproxy app.kubernetes.io/managed-by: redis-operator + app.kubernetes.io/name: redis + app.kubernetes.io/part-of: redis-failover + redisfailovers-role: master redisfailovers.databases.spotahome.com/name: redis ```
Prepare the repository for tagging / release of version 4.2.0
I messed up the changelog version number in #67.
When bootstrapping, we expect no master to be running. However, the Haproxy master resources are still deployed and looking for a master. This skips or removes the Haproxy master resources when a RedisFailover is bootstrapping.
Prepare the repository for tagging / release of version v4.3.0
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR:
Starting with version 2.4.0 of the redis-operator, we are removing HAProxy from the cluster in bootstrap mode.
However, the cluster health check still expects HAProxy to be present.
As a result, the health check fails, preventing the Redis custom resource from updating its status and causing the deployment to fail.
Logic now distinguishes between:
Bootstrap mode with sentinels disabled → only Redis must be running.
Bootstrap mode with sentinels allowed → Redis and Sentinel must be running.
Normal mode → Redis, Sentinel, and HAProxy must all be running.