Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller pod restarts when instance watcher restarts #662

Closed
lilustga opened this issue Sep 27, 2023 · 1 comment
Closed

Controller pod restarts when instance watcher restarts #662

lilustga opened this issue Sep 27, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@lilustga
Copy link

Describe the bug
The following error causes the akri-controller pod to restart:

thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Instance watcher restarted - throwing error to restart controller', controller/src/main.rs:56:18

This is caused by Error during watch: error returned by apiserver during watch: too old resource version: 9117689 (9120023): Expired

Kubernetes Version: [e.g. Native Kubernetes 1.19, MicroK8s 1.19, Minikube 1.19, K3s]
Native Kubernetes 1.26.3

Expected behavior
This is the expected behavior however it would be better if the error was handled more gracefully.

Logs (please share snips of applicable logs)

[2023-09-26T18:22:24Z INFO    controller::util::node_watcher] handle_node - Added or modified:   aks-agentpool-35625459-vmss000003
--
[2023-09-26T18:22:29Z TRACE   controller::util::node_watcher] is_node_ready - for node   Some("aks-agentpool-35625459-vmss000002")
[2023-09-26T18:22:29Z INFO    controller::util::node_watcher] handle_node - Added or modified:   aks-agentpool-35625459-vmss000002
[2023-09-26T18:22:29Z TRACE   controller::util::node_watcher] handle_node - enter
[2023-09-26T18:22:34Z TRACE controller::util::node_watcher]   is_node_ready - for node Some("aks-agentpool-35625459-vmss000003")
[2023-09-26T18:22:34Z   INFO  controller::util::node_watcher]   handle_node - Added or modified: aks-agentpool-35625459-vmss000003
[2023-09-26T18:22:34Z TRACE controller::util::node_watcher] handle_node   - enter
[2023-09-26T18:22:39Z ERROR   controller::util::instance_action] Error during watch: watch stream failed:   Error reading events stream: error reading a body from connection: error   reading a body from connection: Connection reset by peer (os error 104)
[2023-09-26T18:22:39Z TRACE controller::util::node_watcher]   is_node_ready - for node Some("aks-agentpool-35625459-vmss000002")
[2023-09-26T18:22:39Z   INFO  controller::util::node_watcher]   handle_node - Added or modified: aks-agentpool-35625459-vmss000002
[2023-09-26T18:22:39Z TRACE controller::util::node_watcher] handle_node   - enter
[2023-09-26T18:22:40Z ERROR   controller::util::instance_action] Error during watch: error returned by   apiserver during watch: too old resource version: 9117689 (9120023): Expired
note: run with `RUST_BACKTRACE=1` environment variable to display a   backtrace
thread   'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err`   value: Instance watcher restarted - throwing error to restart controller',   controller/src/main.rs:56:18
[2023-09-26T18:22:41Z TRACE controller::util::instance_action]   handle_instance - enter
[2023-09-26T18:22:41Z TRACE   controller::util::instance_action] internal_do_instance_watch - aquired sync   lock
Error: JoinError::Panic(Id(4), ...)
@lilustga lilustga added the bug Something isn't working label Sep 27, 2023
@lilustga
Copy link
Author

lilustga commented Dec 5, 2023

This will no longer be an issue with the new controller's move to kubecontroller construct.

@lilustga lilustga closed this as completed Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

1 participant