-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accidental deletion of a container #1268
Comments
Seems the container already got deleted 29th of april around 17:20 CEST. The logs on the node show that around this time, some new reservations were deployed and the flists were not in cache. This includes another container using the same flist. Since the container was supposedly running at the time, the flist should have been there. For some reason, the container exited, got restarted by the container daemon, and then failed because zinit was not found in the path. This further supports that the flist was no longer there. We will need to investigate what exactly caused this. About 2 minutes later, daemons were restarting, though there does not seem to be an indication of an upgrade, which is possibly related |
I need to clarify something first, a node can initiate a delete if it failed to start a workload, even if it has been running for some time. So basically an error that can crash the workload, or if the node was rebooted and couldn't bring the workload to it's running state it will get deleted since that is the only way to communicate an error with the owner. It's better than having it reported as deployed, but not actually running. Also, this container got deleted on the 29th of April, so that is already a long time ago. any reason why it was only reported recently? So now about what I think has happened: I will have to look deeper into the logs to see what exactly happened to the flist mount |
On other hand, the bot should recover by redeploying another container on a different node if suddenly this node is not reachable anymore. |
So this was caused by the following issues:
|
@xmonader @muhamadazmy although this hard to verify, I saw similar behavior like the one described in the issue happens even after the fix was merged. |
35722 https://explorer.testnet.grid.tf/api/v1/reservations/workloads/35722
The text was updated successfully, but these errors were encountered: