-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
H4HIP: Wait with kstatus #374
base: main
Are you sure you want to change the base?
Changes from 4 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
--- | ||
hip: 9999 | ||
title: "Wait With kstatus" | ||
authors: [ "@austinabro321" ] | ||
created: "2024-12-06" | ||
type: "feature" | ||
status: "draft" | ||
--- | ||
|
||
## Abstract | ||
|
||
Currently the `--wait` flag on `helm install` and `helm upgrade` does not wait for all resources to be fully reconciled, and does not wait for custom resources (CRs) at all. By replacing the wait logic with [kstatus](https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md), Helm will achieve more intuitive waits, while simplifying its code and documentation. | ||
|
||
## Motivation | ||
|
||
Certain workflows require custom resources to be ready. There is no way to tell Helm to wait for custom resources to be ready, so anyone that has this requirement must write their own logic to wait for their custom resources. | ||
|
||
Certain workflows requires resources to be fully reconciled. For example, Helm waits for all new pods in an upgraded deployment to be ready. However, Helm does not wait for the previous pods in that deployment to be removed. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. comment: not exactly sure how this fits as a motivation? I think it is trying to say Helm doesn't currently / correctly handle this situation, but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah |
||
|
||
By introducing kstatus we will lower user friction with the `--wait` flag. | ||
|
||
## Rationale | ||
|
||
Leveraging a existing status management library maintained by the Kubernetes team will simplify the code and documentation that Helm needs to maintain and improve the functionality of `--wait`. | ||
|
||
## Specification | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would like to see kstatus behind an adapter/interface. Helm should use it but not expose it in the API. There are two reasons I would like to see this:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes perfect sense, I'll add that to the doc. |
||
|
||
From a CLI user's perspective there will be no changes in how waits are called, they will still use the `--wait` flag. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the below subject of compatibility, and the how how waits are action, we might want e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are there cases where the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've not run into any issues with |
||
|
||
Kstatus does not output any logs. Helm will output a log message each second signalling that a resource is not ready. Helm will log one not ready resource at a time to not overwhelm users with output. This behavior will look similar to the current `--wait` logging. The resource will be chosen alphabetically by `metadata.name`. | ||
|
||
A [dynamic rest mapper](https://github.com/kubernetes-sigs/controller-runtime/blob/aea2e32a936584b06ae6f7992f856fe7292b0297/pkg/client/apiutil/restmapper.go#L36) will be used, this way if a chart installs a custom resource definition, kstatus will be able to pick up the custom resources during wait. | ||
|
||
<!-- TODO: Decide if we want more than alphabetically, such as - The APIVersion/Kind of the resource will determine it's priority for being logged. For example, the first log messages will always describe deployments. All deployments will be logged first. Once all deployments are in ready status, all stateful sets will be logged, and so forth. --> | ||
|
||
## Backwards compatibility | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Curiosity: will There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great question! I made this repo to test it out - https://github.com/AustinAbro321/kstatus-rbac-test. It looks to be pretty minimal. In my case, I tested a deployment, and only these RBAC permissions were necessary. I will add this to the doc. rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources: ["replicasets"]
verbs: ["list"] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What really surprised me was that events weren't necessary. I thought for sure they would be. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a section in backwards compatibility. Let me know thoughts / if you want a deeper evaluation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any situation where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Besides the two called our here, where kstatus will wait to return ready until reconciliation is complete, and waiting for CRDs I am not thinking of any, but I am not 100% sure. |
||
|
||
Waiting for custom resources and for reconciliation to complete for every resource could lead to charts timing out that weren't previously. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm wondering if we want an "opt-in" (or opt-out) mechanism for charts to specify they are compatible with new a new ready logic? At least initially. And/or a CLI flag for users to control the behavior? While one of the premises of Helm 4 is that we can/do want to move Helm functionality forward. We do want/need to remain compatible with existing user workflows as much as possible. So while it would certainly be okay to introduce new wait functionality, I think we would want a path for users to either fall back to the old functionality if their current situation warranted. Or for a chart to opt-in to the new functionality, if the chart author could deem the chart to be compatible with the new functionality. What we should do IMHO depends on how much we think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will leave the final call to you guys, I suspect kstatus will be a drop in replacement. I'm not sure if it will work 90%, 99%, or 99.9% of the time with existing deployments. I think it's most likely closer to the latter percentages, but I would love a way to test that out and gain additional confidence. My confidence so far comes from the fact that in Zarf, we changed the logic so kstatus is run by default for all charts without |
||
|
||
## Security implications | ||
|
||
No security implications | ||
|
||
## How to teach this | ||
|
||
Replace the [existing wait documentation](https://helm.sh/docs/intro/using_helm/) by explaining that we use kstatus and pointing to the [kstatus documentation](https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md). This comes with the added benefit of not needing to maintain Helm specific wait documentation. | ||
|
||
## Reference implementation | ||
|
||
Once I have feedback that this HIP is in the right direction I will make a draft PR in Helm implementing this feature. | ||
|
||
For now, see [healthchecks.go](https://github.com/zarf-dev/zarf/blob/main/src/internal/healthchecks/healthchecks.go) in a project I help maintain called Zarf. After each Helm install or upgrade in Zarf, we get every resource from the chart and send them into the `WaitForReady` function. | ||
|
||
## Rejected ideas | ||
|
||
TBD | ||
|
||
## Open issues | ||
|
||
[8661](https://github.com/helm/helm/issues/8661) | ||
|
||
## References | ||
|
||
existing wait documentation - https://helm.sh/docs/intro/using_helm/ | ||
kstatus documentation - https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md | ||
Zarf kstatus implementation - https://github.com/zarf-dev/zarf/blob/main/src/internal/healthchecks/healthchecks.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment: I agree, this is something Helm needs to be able to address in the future. Custom resources IMHO are becoming more prolific, as e.g. the Kubernetes community tries to have less "in-core" but still official types (e.g. Gateway API). Or simply, folk attempt to extend Kubernetes APIs for their purpose at hand.