Filed by @canonical/solutions-qa
Problem
The katib-controller charm deletes its ServiceAccount in the remove hook, preventing the StatefulSet from recreating pods after deletion or eviction.
Reproduction
- Deploy katib-controller r1262 (0.18/stable)
- Scale the application in to 0
- Scale the application back out to 1
- Pod creation fails:
serviceaccount "target" not found
Test execution: https://test-observer.canonical.com/#/charms/310142?testExecutionId=309790
Logs: https://charm.logs.test-observer.canonical.com/production/309790/juju-logs.tar.gz
Model Controller Cloud/Region Version SLA Timestamp
model-20961994438-260113151740 20961994438-260113151740 k8s-production 3.6.12 unsupported 15:44:48Z
App Version Status Scale Charm Channel Rev Address Exposed Message
grafana-k8s 12.0.2 active 1 grafana-k8s 2/stable 172 10.152.183.75 no
mysql-k8s 8.0.41-0ubuntu0.22.04.1 active 1 mysql-k8s 8.0/stable 255 10.152.183.149 no
neighbor active 1 katib-db-manager 0.18/stable 1123 10.152.183.209 no
prometheus-k8s 2.53.3 active 1 prometheus-k8s 2/stable 272 10.152.183.118 no
target waiting 0/1 katib-controller 0.18/stable 1262 10.152.183.239 no installing agent
Unit Workload Agent Address Ports Message
grafana-k8s/0* active idle 10.1.4.189
mysql-k8s/0* active idle 10.1.2.107 Primary
neighbor/0* active idle 10.1.4.78
prometheus-k8s/0* active idle 10.1.2.138
Root Cause
When the application is scaled in, the remove hook deletes all managed resources (debug logs, 15:29:09):
Deleting <class 'lightkube.resources.core_v1.ServiceAccount'> target...
HTTP Request: DELETE .../serviceaccounts/target "HTTP/1.1 200 OK"
But StatefulSet still references it in spec.template.spec.serviceAccount: target, causing pod creation to fail with:
error: serviceaccount "target" not found
Filed by @canonical/solutions-qa
Problem
The katib-controller charm deletes its ServiceAccount in the remove hook, preventing the StatefulSet from recreating pods after deletion or eviction.
Reproduction
serviceaccount "target" not foundTest execution: https://test-observer.canonical.com/#/charms/310142?testExecutionId=309790
Logs: https://charm.logs.test-observer.canonical.com/production/309790/juju-logs.tar.gz
Root Cause
When the application is scaled in, the remove hook deletes all managed resources (debug logs, 15:29:09):
But StatefulSet still references it in
spec.template.spec.serviceAccount: target, causing pod creation to fail with: