Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: have all flavors delete themselves even in failure #1508

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tommartensen
Copy link
Contributor

Some already did that. Some already clean up if they fail during create (like OCP). Still better to be consistent. This is a patch on an issue that should be solved properly.

@tommartensen tommartensen self-assigned this Mar 5, 2025
@tommartensen tommartensen marked this pull request as ready for review March 5, 2025 12:04
@tommartensen tommartensen requested a review from a team as a code owner March 5, 2025 12:04
@rhacs-bot
Copy link
Contributor

A single node development cluster (infra-pr-1508) was allocated in production infra for this PR.

CI will attempt to deploy quay.io/rhacs-eng/infra-server:0.10.86-1-gdaee46d474 to it.

🔌 You can connect to this cluster with:

gcloud container clusters get-credentials infra-pr-1508 --zone us-central1-a --project acs-team-temp-dev

🛠️ And pull infractl from the deployed dev infra-server with:

nohup kubectl -n infra port-forward svc/infra-server-service 8443:8443 &
make pull-infractl-from-dev-server

🚲 You can then use the dev infra instance e.g.:

bin/infractl -k -e localhost:8443 whoami

⚠️ Any clusters that you start using your dev infra instance should have a lifespan shorter then the development cluster instance. Otherwise they will not be destroyed when the dev infra instance ceases to exist when the development cluster is deleted. ⚠️

Further Development

☕ If you make changes, you can commit and push and CI will take care of updating the development cluster.

🚀 If you only modify configuration (chart/infra-server/configuration) or templates (chart/infra-server/{static,templates}), you can get a faster update with:

make helm-deploy

Logs

Logs for the development infra depending on your @redhat.com authuser:

Or:

kubectl -n infra logs -l app=infra-server --tail=1 -f

Copy link
Contributor

@davdhacs davdhacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tldr: I don't know enough about argo cd to know if we should add onExit now. On some create failures, do the workflows exit and never run the destroy step and delete the workflow before the cleanup runs?

detail:
There may be reasons not to add this?
onExit's were removed here from infra workflows, #320
And the argocd code marks onExit as deprecated (https://github.com/argoproj/argo-workflows/blame/68fde4ffcbe9b84dfad969a45133cd4cc659d346/pkg/apis/workflow/v1alpha1/workflow_types.go#L1573).

The infra periodic cleanup code *should find clusters that failed, argocd workflow paused, and resume those workflows to run their destroy steps (added in https://github.com/stackrox/infra/pull/47/files#diff-c18c4c95a88e0fc7894470522a609f99R639 and scheduled at https://github.com/stackrox/infra/blame/master/service/cluster/cluster.go#L131).
Maybe argocd has changed in some way that this is not valid or not always working?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants