Note: This issue was generated with AI assistance (GitHub Copilot) based on automated log analysis and triage.
Filed by @canonical/solutions-qa
Summary
When mysql-k8s (a database provider related to katib-db-manager) has its pod deleted and recreated, katib-controller receives repeated reconciliation events and runs the full reconciliation cycle approximately 4 times over ~8 minutes. This causes integration tests that wait for all apps to return to active (with a 5-minute timeout) to fail intermittently, as the timeout fires mid-reconciliation on the 4th pass.
Test Observer Link
https://test-observer.canonical.com/#/charms/406078?testExecutionId=443475&testResultId=10110627
Environment
- katib-controller rev 1163, channel
0.18/stable
- katib-db-manager rev 1123, channel
0.18/stable
- mysql-k8s rev 400, channel
8.0/candidate
- Juju 3.6.20, Kubernetes (ubuntu:22.04), amd64
- Test:
integration/mysql-k8s:database/mysql_client/katib-db-manager:relational-db
Observed Behaviour
After mysql-k8s pod is deleted and recreated by the test (simulating a pod failure), katib-controller reconciles 4 separate times before settling:
| Time (UTC) |
Event |
| 11:44:15Z |
Reconciliation 1 (initial settle after deploy) |
| 11:46:34Z |
Reconciliation 2 (~2 min after pod deletion) |
| 11:48:16Z |
Reconciliation 3 (~2 min later) |
| 11:50:32Z |
Reconciliation 4 — test times out at 11:50:35Z, 3 seconds before active |
| 11:50:38Z |
katib-controller becomes active (too late) |
Each reconciliation itself is healthy and fast (~6 seconds, all Kubernetes API calls return 200 OK). The issue is the charm enqueuing itself multiple times from what should be a single relation-changed event cascade: mysql-k8s pod restart → katib-db-manager:relational-db → katib-controller:k8s-service-info.
Expected Behaviour
katib-controller should settle to active after a single reconciliation pass triggered by the relation-changed event, not continue re-queuing every ~2 minutes.
Impact
Intermittent ~50% failure rate on test_pod_deletion for mysql-k8s rev 400 (2 of 4 runs failed with identical symptoms: executions 443473 and 443475).
Steps to Reproduce
- Deploy
katib-controller + katib-db-manager + mysql-k8s
- Wait for all apps to go
active
- Delete the
target-0 (mysql-k8s) pod: kubectl delete pod target-0 -n <model>
- Wait and observe
juju status — katib-controller will cycle through maintenance multiple times before settling
Logs
The juju status log shows the repeated reconciliation pattern clearly:
27 Mar 2026 11:46:34Z workload maintenance Reconciling charm: executing component kubernetes:auths-webhooks-crds-configmaps
27 Mar 2026 11:46:40Z workload active
27 Mar 2026 11:48:16Z workload maintenance Reconciling charm: executing component kubernetes:auths-webhooks-crds-configmaps
27 Mar 2026 11:48:23Z workload active
27 Mar 2026 11:50:32Z workload maintenance Reconciling charm: executing component kubernetes:auths-webhooks-crds-configmaps
27 Mar 2026 11:50:38Z workload active ← test already timed out at 11:50:35Z
Note: This issue was generated with AI assistance (GitHub Copilot) based on automated log analysis and triage.
Filed by @canonical/solutions-qa
Summary
When
mysql-k8s(a database provider related tokatib-db-manager) has its pod deleted and recreated,katib-controllerreceives repeated reconciliation events and runs the full reconciliation cycle approximately 4 times over ~8 minutes. This causes integration tests that wait for all apps to return toactive(with a 5-minute timeout) to fail intermittently, as the timeout fires mid-reconciliation on the 4th pass.Test Observer Link
https://test-observer.canonical.com/#/charms/406078?testExecutionId=443475&testResultId=10110627
Environment
0.18/stable0.18/stable8.0/candidateintegration/mysql-k8s:database/mysql_client/katib-db-manager:relational-dbObserved Behaviour
After
mysql-k8spod is deleted and recreated by the test (simulating a pod failure),katib-controllerreconciles 4 separate times before settling:Each reconciliation itself is healthy and fast (~6 seconds, all Kubernetes API calls return
200 OK). The issue is the charm enqueuing itself multiple times from what should be a singlerelation-changedevent cascade:mysql-k8s pod restart→katib-db-manager:relational-db→katib-controller:k8s-service-info.Expected Behaviour
katib-controllershould settle toactiveafter a single reconciliation pass triggered by the relation-changed event, not continue re-queuing every ~2 minutes.Impact
Intermittent ~50% failure rate on
test_pod_deletionfor mysql-k8s rev 400 (2 of 4 runs failed with identical symptoms: executions 443473 and 443475).Steps to Reproduce
katib-controller+katib-db-manager+mysql-k8sactivetarget-0(mysql-k8s) pod:kubectl delete pod target-0 -n <model>juju status— katib-controller will cycle throughmaintenancemultiple times before settlingLogs
The juju status log shows the repeated reconciliation pattern clearly: