Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle failed deployments #3072

Open
stuartwdouglas opened this issue Oct 10, 2024 · 3 comments
Open

Handle failed deployments #3072

stuartwdouglas opened this issue Oct 10, 2024 · 3 comments
Labels
next Work that will be be picked up next P2

Comments

@stuartwdouglas
Copy link
Collaborator

At present if you deploy something that ends up in CrashLoopBackOff FTL will wait forever. We need to be able to handle failed deployments without hanging.

@github-actions github-actions bot added the triage Issue needs triaging label Oct 10, 2024
@ftl-robot ftl-robot mentioned this issue Oct 10, 2024
@wesbillman wesbillman added next Work that will be be picked up next P2 and removed triage Issue needs triaging labels Oct 10, 2024
@alecthomas
Copy link
Collaborator

IIRC this used to work prior to the change to a pull model. As part of the runner state machine, there were timeouts for readiness after which a runner was rejected by the controller and a new runner scheduled.

@stuartwdouglas
Copy link
Collaborator Author

The runners time out and restart AFAIK, the issue is that if the new runner fails as well the 'deploy' operation just hangs. At some point the controller needs to decide that the deployment just isn't working and abort, keeping the old deployment if it exists.

@alecthomas
Copy link
Collaborator

Ah, I see what you're saying 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next Work that will be be picked up next P2
Projects
None yet
Development

No branches or pull requests

3 participants