Skip to content

ServiceAccount deleted on application scale 0, prevents scaling back up #379

Description

@rpbritton

Filed by @canonical/solutions-qa

Problem

The katib-controller charm deletes its ServiceAccount in the remove hook, preventing the StatefulSet from recreating pods after deletion or eviction.

Reproduction

  1. Deploy katib-controller r1262 (0.18/stable)
  2. Scale the application in to 0
  3. Scale the application back out to 1
  4. Pod creation fails: serviceaccount "target" not found

Test execution: https://test-observer.canonical.com/#/charms/310142?testExecutionId=309790
Logs: https://charm.logs.test-observer.canonical.com/production/309790/juju-logs.tar.gz

Model                           Controller                Cloud/Region    Version  SLA          Timestamp
model-20961994438-260113151740  20961994438-260113151740  k8s-production  3.6.12   unsupported  15:44:48Z

App             Version                  Status   Scale  Charm             Channel       Rev  Address         Exposed  Message
grafana-k8s     12.0.2                   active       1  grafana-k8s       2/stable      172  10.152.183.75   no       
mysql-k8s       8.0.41-0ubuntu0.22.04.1  active       1  mysql-k8s         8.0/stable    255  10.152.183.149  no       
neighbor                                 active       1  katib-db-manager  0.18/stable  1123  10.152.183.209  no       
prometheus-k8s  2.53.3                   active       1  prometheus-k8s    2/stable      272  10.152.183.118  no       
target                                   waiting    0/1  katib-controller  0.18/stable  1262  10.152.183.239  no       installing agent

Unit               Workload  Agent  Address     Ports  Message
grafana-k8s/0*     active    idle   10.1.4.189         
mysql-k8s/0*       active    idle   10.1.2.107         Primary
neighbor/0*        active    idle   10.1.4.78          
prometheus-k8s/0*  active    idle   10.1.2.138

Root Cause

When the application is scaled in, the remove hook deletes all managed resources (debug logs, 15:29:09):

Deleting <class 'lightkube.resources.core_v1.ServiceAccount'> target...
HTTP Request: DELETE .../serviceaccounts/target "HTTP/1.1 200 OK"

But StatefulSet still references it in spec.template.spec.serviceAccount: target, causing pod creation to fail with:

error: serviceaccount "target" not found

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions