Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.2.x] k8s: Error out when annotation can not be set #13901

Conversation

vbotbuildovich
Copy link
Collaborator

Backport of PR #13711

Given the following operator logic
https://github.com/redpanda-data/redpanda/blob/51766019d6e85b7252a0c637043fc5d9edbc4673/src/go/k8s/controllers/redpanda/cluster_controller.go#L266-L275
The chain of events that causes wrong decommission logic is:
* `setPodNodeIDAnnotation` failed to get the Redpanda Node ID
* `setPodNodeIDAnnotation` does not return an error, but only it logs it out
* `setPodNodeIDLabel` was able to get Redpanda Node ID, so broker was able to boot up and register its Node ID
* `decommissionGhostBrokers` is performing filtering based on Pod annotation which is incorrect

Logs from the wrong execution:
```
{"level":"info","ts":"2023-09-27T06:51:41.644Z","logger":"ClusterReconciler.Reconcile.setPodNodeIDLabel","msg":"setting node-id label","controller":"cluster","controllerGroup":"redpanda.vectorized.io","controllerKind":"Cluster","Cluster":{"name":"repanda-cluster","namespace":"redpanda"},"namespace":"redpanda","name":"repanda-cluster","reconcileID":"57bf8b57-60b5-4248-bedd-3ffcaa5c4e07","pod-name":"repanda-cluster-1","new-node-id":3}
{"level":"info","ts":"2023-09-27T06:51:41.718Z","logger":"ClusterReconciler.Reconcile.setPodNodeIDAnnotation","msg":"decommission old node-id","controller":"cluster","controllerGroup":"redpanda.vectorized.io","controllerKind":"Cluster","Cluster":{"name":"repanda-cluster","namespace":"redpanda"},"namespace":"redpanda","name":"repanda-cluster","reconcileID":"fa64d3d8-02cf-4c1f-8a67-13a2dfb339ea","pod-name":"repanda-cluster-1","old-node-id":1}
{"level":"info","ts":"2023-09-27T06:51:41.758Z","logger":"ClusterReconciler.Reconcile.setPodNodeIDAnnotation","msg":"setting node-id annotation","controller":"cluster","controllerGroup":"redpanda.vectorized.io","controllerKind":"Cluster","Cluster":{"name":"repanda-cluster","namespace":"redpanda"},"namespace":"redpanda","name":"repanda-cluster","reconcileID":"fa64d3d8-02cf-4c1f-8a67-13a2dfb339ea","pod-name":"repanda-cluster-1","new-node-id":3}
{"level":"error","ts":"2023-09-27T06:51:42.497Z","msg":"Reconciler error","controller":"cluster","controllerGroup":"redpanda.vectorized.io","controllerKind":"Cluster","Cluster":{"name":"repanda-cluster","namespace":"redpanda"},"namespace":"redpanda","name":"repanda-cluster","reconcileID":"fa64d3d8-02cf-4c1f-8a67-13a2dfb339ea","error":"deleting ghost brokers: failed to decommission ghost broker: request PUT http://repanda-cluster-0.repanda-cluster.redpanda.svc.cluster.local/.:9644/v1/brokers/1/decommission failed: Bad Request, body: \"{\\\"message\\\": \\\"can not update broker 1 state, invalid state transition requested\\\", \\\"code\\\": 400}\"\n"}
```

(cherry picked from commit 90e9860)
In the case when broker is removed from the mock admin API the controller is failing always as Pod can not set up annotation with Node ID. The decommission is never called, so that StatefulSet is not updated. Integration tests are stuck due to that.

(cherry picked from commit 556b97b)
(cherry picked from commit 2b21d59)
(cherry picked from commit 0f8777c)
@vbotbuildovich vbotbuildovich added this to the v23.2.x-next milestone Oct 3, 2023
@vbotbuildovich vbotbuildovich added the kind/backport PRs targeting a stable branch label Oct 3, 2023
@RafalKorepta RafalKorepta marked this pull request as ready for review October 3, 2023 20:27
@RafalKorepta RafalKorepta requested a review from a team as a code owner October 3, 2023 20:27
@RafalKorepta RafalKorepta merged commit 5e3bd94 into redpanda-data:v23.2.x Oct 3, 2023
16 checks passed
@RafalKorepta RafalKorepta modified the milestones: v23.2.x-next, v23.2.12 Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/k8s kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants