Skip to content

Conversation

rurkss
Copy link

@rurkss rurkss commented Jul 15, 2025

PR:

Starting with version 2.4.0 of the redis-operator, we are removing HAProxy from the cluster in bootstrap mode.

However, the cluster health check still expects HAProxy to be present.

if rFailover.Bootstrapping() && !rFailover.SentinelsAllowed() {
    return r.IsRedisRunning(rFailover) && r.IsHAProxyRunning(rFailover)
}

As a result, the health check fails, preventing the Redis custom resource from updating its status and causing the deployment to fail.

Logic now distinguishes between:

  • Bootstrap mode with sentinels disabled → only Redis must be running.

  • Bootstrap mode with sentinels allowed → Redis and Sentinel must be running.

  • Normal mode → Redis, Sentinel, and HAProxy must all be running.

rurkss and others added 30 commits April 6, 2023 09:14
in case we would like to make operator namespace scoped we can use a variable `OP_TYPE` and set it to `namespaced`
Add `env` to deployment manifest:
env:
   - name: OP_TYPE
   value: "namespaced"
Co-authored-by: Ryan Rodriguez <[email protected]>
* make redis-operator namespaced scope

* add OP_TYPE variable

in case we would like to make operator namespace scoped we can use a variable `OP_TYPE` and set it to `namespaced`
Add `env` to deployment manifest:
env:
   - name: OP_TYPE
   value: "namespaced"

* rename env variable to WATCH_NAMESPACE

* add namespace-scoped instructions to readme

* rename variable

Co-authored-by: Ryan Rodriguez <[email protected]>

* rename variable to human readable

---------

Co-authored-by: artur.zheludkov <[email protected]>
Co-authored-by: @rurkss <[email protected]>
* make redis-operator namespaced scope

* add OP_TYPE variable

in case we would like to make operator namespace scoped we can use a variable `OP_TYPE` and set it to `namespaced`
Add `env` to deployment manifest:
env:
   - name: OP_TYPE
   value: "namespaced"

* rename env variable to WATCH_NAMESPACE

* add namespace-scoped instructions to readme

* rename variable

Co-authored-by: Ryan Rodriguez <[email protected]>

* rename variable to human readable

* automatic creation of networkpolicy object

* test sign

* add metrics port / metric namespace

* add networkpolicy as optional service

* format code

---------

Co-authored-by: artur.zheludkov <[email protected]>
Co-authored-by: @rurkss <[email protected]>
* Add HAProxy Init On Demand

* update networkpolicy match label

* HAProxy customConfig

apiVersion: databases.spotahome.com/v1
kind: RedisFailover
metadata:
  name: redisfailover
  labels:
    app.kubernetes.io/component: redis
spec:
  haproxy:
    customConfig: |

        global
        daemon
        maxconn 256

        defaults
        mode tcp
        timeout connect 5000ms
        timeout client 50000ms
        timeout server 50000ms
        timeout check 5000ms

        resolvers k8s
        parse-resolv-conf
        hold other 10s
        hold refused 10s
        hold nx 10
        hold timeout 10s
        hold valid 10s
        hold obsolete 10s

        frontend redis-master
        bind *:<%= port %>
        default_backend redis-master

        backend redis-master
        mode tcp
        balance first
        option tcp-check
        tcp-check send info\ replication\r\n
        tcp-check expect string role:master
        server-template redis <%= replicas %> _redis._tcp.redis.<%= namespace %>.svc.cluster.local:<%= port %> check inter 1s resolvers k8s init-addr none
    redisHost: "redis-haproxy"
    replicas: 1
    image: "haproxy:2.4"
  sentinel:
    replicas: 3
  redis:
     replicas: 2

* fix indent. add customConfig description to the crd.yaml
* Affinity Config For Haproxy

* Use standardized labels for HAProxy

---------

Co-authored-by: Ryan Rodriguez <[email protected]>
* Add Status Subresource

* Update operator/redisfailover/handler.go

Co-authored-by: Ryan Rodriguez <[email protected]>

* Update operator/redisfailover/handler.go

Co-authored-by: Ryan Rodriguez <[email protected]>

* Update operator/redisfailover/handler.go

Co-authored-by: Ryan Rodriguez <[email protected]>

* Update operator/redisfailover/handler.go

Co-authored-by: Ryan Rodriguez <[email protected]>

* add haproxy in check

* case similarity for haproxy

* Update operator/redisfailover/handler.go

Co-authored-by: Ryan Rodriguez <[email protected]>

* Update operator/redisfailover/handler.go

Co-authored-by: Ryan Rodriguez <[email protected]>

---------

Co-authored-by: Ryan Rodriguez <[email protected]>
* Update ObservedGeneration

* Fix typo, expand comment

---------

Co-authored-by: Ryan Rodriguez <[email protected]>
* haproxy opt params

* revert haproxy image as opt

* Update generator.go

* remove duplicate variables
* fix redis reconciling error

* Update handler.go

* Update handler.go

* Update handler.go

* Update handler.go

* Update handler.go

* Update operator/redisfailover/handler.go

Co-authored-by: Ryan Rodriguez <[email protected]>

* add error handler

---------

Co-authored-by: Ryan Rodriguez <[email protected]>
indiebrain and others added 26 commits February 26, 2024 11:18
Prepare the repository for tagging / release of version 2.1.0.

Note, this release contains a
[fix](#49) for
#48 - hence the minor
version upgrade.

References
----------

-
https://semver.org/#what-do-i-do-if-i-accidentally-release-a-backward-incompatible-change-as-a-minor-version
This PR removes the redis-slave-HAProxy, deeming it unnecessary and
potentially dangerous .
Originally, this resource was designed as an endpoint for Redis
replication nodes to connect to slave nodes of source cluster.
However, when replicated redis uses this resource, the sentinels on
source side detect it as a potential slave for failover scenario. In
failover, sentinels mistakenly treat the HAProxy pods as legitimate
redis slaves and attempt to promote the HAProxy pod as the next master,
as it is impossible to do, sentinels get stuck in a promotion loop.
Workaround is to use redis-slave-service as endpoint on source cluster.
This way, when sentinels detect replicated redis as potential master in
failover scenario, they will not be able to promote it, because
replicated redis node is unreachable for connection and sentinels will
`forget` this node.

---------

Co-authored-by: Aaron Kuehler <[email protected]>
Update the default haproxy image to the latest release
An alternative approach to restarting StatefulSet-related pods is to
delete the StatefulSet itself and allow the operator to automatically
recreate it during the next reconciliation loop. This differs from the
previous implementation made in
#54

---------

Co-authored-by: Aaron Kuehler <[email protected]>
Prepare the repository for tagging / release of version 3.1.0.
There is a brief period of time during a Redis replica restart when
HAProxy can send writes to it; causing the following error:

```
READONLY You can't write against a read only replica.
```

This is because HAProxy (< 3.1.0) always treats newly detected backends
as immediately UP - ready to serve traffic - UNTIL they failed their
checks. HAProxy v3.1.0 adds the ability to configure a backend's initial
state - init-state - and how/when it HAProxy determines that the backend
is ready to receive traffic.

This changes the HAProxy configuration so that a new Redis node is
disqualified from client receiving traffic until HAProxy determines that
the node is indeed a "master" node.

This also updates the default operator's HAProxy image accordingly to
receive the `init-state` HAProxy configuration option.

References:

- haproxy/haproxy#51
-
haproxy/haproxy@50322df
Prepare the repository for tagging / release of version 4.0.0
Related to: 
- medyagh/setup-minikube#565
-
https://github.com/powerhome/redis-operator/actions/runs/12774059053/job/35608701152#step:5:78

```
 Preparing to unpack .../socat_1.8.0.0-4build3_amd64.deb ...
Unpacking socat (1.8.0.0-4build3) ...
Setting up socat (1.8.0.0-4build3) ...
Processing triggers for man-db (2.12.0-4build2) ...

Running kernel seems to be up-to-date.

No services need to be restarted.

No containers need to be restarted.

No user sessions are running outdated binaries.

No VM guests are running outdated hypervisor (qemu) binaries on this host.
/usr/bin/lsb_release --short --codename
noble
Error: Unexpected HTTP response: 404
```
Some clients do not gracefully handle when a failover occurs; they keep
the established connection open. When the redis node comes back up as a
replica, the client recieves an error indicating that they cannot write
against the replica:

```
READONLY You can't write against a read only replica
```

This closes client connections to a redis node when haproxy detects that
it is in a "down" state. The idea is that the clients will recognize
they need to reestablish a connection.

References:
- https://docs.haproxy.org/3.1/configuration.html#5.2-on-marked-down
Prepare the repository for tagging / release of version 4.1.0
Fix buildx and bake action compatibility issues in the `ci` GitHub
workflow.

-
https://github.com/powerhome/redis-operator/actions/runs/12935287363/job/36078911603

```
docker/bake-action < v5 is not compatible with buildx >= 0.20.0, please update your workflow to latest docker/bake-action or use an older buildx version.
```
This is to change the owner of this operator from the now defunct
Forever People team to Guardians of the Galaxy.

Co-authored-by: Aaron Kuehler <[email protected]>
Boostrapping fails if the bootstrapNode and RF's redis port are
different.

The operator uses the bootstrapNode's port when trying to run `SLAVEOF`
on the RF pods and fails to connect:

Example RF config:
```yaml
  bootstrapNode:
    enabled: true
    host: redis-primary-host.io
    port: "36379"
...
  redis:
    port: 6679
...
```

Example failure logs:
```
time="2025-06-05T19:08:29Z" level=info msg="Making pod rfr-redis-0 slave of redis-primary-host.io:36379" namespace=redis-test redisfailover=redis service=redis.healer src="checker.go:261"
time="2025-06-05T19:08:29Z" level=error msg="error on object processing: dial tcp 10.244.2.7:36379: connect: connection refused" controller-id=redisfailover object-key=redis-test/redis operator=redisfailover service=kooper.controller src="controller.go:282"
```

This alters `MakeSlaveOfWithPort` to allow the caller to speficy
different ports for the primary and replica redis instances.
Additonally, `SetExternalMasterOnAll` is alertered to pass along the
boostrapNode's port as well as the RF resources's port - which _can_ be
different.
Optionally expose Haproxy Prometheus metrics[^1]. Example:

```
spec:
    ...
    haproxy:
      exporter: true
```

[^1]:
https://www.haproxy.com/documentation/haproxy-configuration-tutorials/alerts-and-monitoring/prometheus/
Bring the haproxy service labels into a similar shape as the redis
service labels. Makes it easier to select just the haproxy service by
its labels.

Current Redis "Master" Service labels:
``` yaml
metadata:
  labels:
    app.kubernetes.io/component: redis
    app.kubernetes.io/managed-by: redis-operator
    app.kubernetes.io/name: redis
    app.kubernetes.io/part-of: redis-failover
    redisfailovers-role: master
    redisfailovers.databases.spotahome.com/name: redis
```

Additions to the Haproxy Service labels:
``` diff
metadata:
  labels:
+   app.kubernetes.io/component: haproxy
    app.kubernetes.io/managed-by: redis-operator
+   app.kubernetes.io/name: redis
+   app.kubernetes.io/part-of: redis-failover
+   redisfailovers-role: master
    redisfailovers.databases.spotahome.com/name: redis
```
Prepare the repository for tagging / release of version 4.2.0
I messed up the changelog version number in
#67.
When bootstrapping, we expect no master to be running. However, the
Haproxy master resources are still deployed and looking for a master.

This skips or removes the Haproxy master resources when a RedisFailover
is bootstrapping.
Prepare the repository for tagging / release of version v4.3.0
@rurkss rurkss requested a review from a team as a code owner July 15, 2025 18:15
@rurkss rurkss closed this Jul 15, 2025
@rurkss rurkss deleted the GOG-1162 branch July 15, 2025 18:16
@rurkss rurkss restored the GOG-1162 branch July 15, 2025 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants