Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H4HIP: Wait with kstatus #374

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

AustinAbro321
Copy link

proposal to replace the current wait logic in Helm with kstatus

Signed-off-by: Austin Abro <[email protected]>
Signed-off-by: Austin Abro <[email protected]>
Signed-off-by: Austin Abro <[email protected]>
Signed-off-by: Austin Abro <[email protected]>
Copy link
Member

@gjenkins8 gjenkins8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the HIP! I have been wanting to write this one myself for some time. I agree, kstatus is where the Kubernetes community has put significant effort into thinking about Kubernetes resource "readiness". And Helm would do well to reuse this effort.

I have put some comments. They are mostly centered around what noticable (if any) behaviors users would notice from the existing mechanism. And how to mitigate/manage those.


<!-- TODO: Decide if we want more than alphabetically, such as - The APIVersion/Kind of the resource will determine it's priority for being logged. For example, the first log messages will always describe deployments. All deployments will be logged first. Once all deployments are in ready status, all stateful sets will be logged, and so forth. -->

## Backwards compatibility
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curiosity: will kstatus require additional rbac rules than existing watch/ready mechanism?

Copy link
Author

@AustinAbro321 AustinAbro321 Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! I made this repo to test it out - https://github.com/AustinAbro321/kstatus-rbac-test. It looks to be pretty minimal. In my case, I tested a deployment, and only these RBAC permissions were necessary. I will add this to the doc.

rules:
  - apiGroups: ["apps"]
    resources: ["deployments"]
    verbs: ["list", "watch"]
  - apiGroups: ["apps"]
    resources: ["replicasets"]
    verbs: ["list"]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What really surprised me was that events weren't necessary. I thought for sure they would be.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a section in backwards compatibility. Let me know thoughts / if you want a deeper evaluation.


<!-- TODO: Decide if we want more than alphabetically, such as - The APIVersion/Kind of the resource will determine it's priority for being logged. For example, the first log messages will always describe deployments. All deployments will be logged first. Once all deployments are in ready status, all stateful sets will be logged, and so forth. -->

## Backwards compatibility
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any situation where kstatus will not return ready, but existing logic would?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the two called our here, where kstatus will wait to return ready until reconciliation is complete, and waiting for CRDs I am not thinking of any, but I am not 100% sure.


## Backwards compatibility

Waiting for custom resources and for reconciliation to complete for every resource could lead to charts timing out that weren't previously.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we want an "opt-in" (or opt-out) mechanism for charts to specify they are compatible with new a new ready logic? At least initially. And/or a CLI flag for users to control the behavior?

While one of the premises of Helm 4 is that we can/do want to move Helm functionality forward. We do want/need to remain compatible with existing user workflows as much as possible. So while it would certainly be okay to introduce new wait functionality, I think we would want a path for users to either fall back to the old functionality if their current situation warranted. Or for a chart to opt-in to the new functionality, if the chart author could deem the chart to be compatible with the new functionality.

What we should do IMHO depends on how much we think kstatus is a drop-in replacement for the existing wait functionality (ie. whether kstatus should become the default in Helm 4). And whether we think it would be better for existing charts to opt-in to new functionality. Or whether we would want chart users to be able to opt-out if tney need.

Copy link
Author

@AustinAbro321 AustinAbro321 Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave the final call to you guys, I suspect kstatus will be a drop in replacement. I'm not sure if it will work 90%, 99%, or 99.9% of the time with existing deployments. I think it's most likely closer to the latter percentages, but I would love a way to test that out and gain additional confidence.

My confidence so far comes from the fact that in Zarf, we changed the logic so kstatus is run by default for all charts without wait explicitly turned off. We did not expose a way to turn off kstatus separately, and I have not heard any users complain or say they've run into problems

hips/hip-0999.md Outdated

## Motivation

Certain workflows require custom resources to be ready. There is no way to tell Helm to wait for custom resources to be ready, so anyone that has this requirement must write their own logic to wait for their custom resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: I agree, this is something Helm needs to be able to address in the future. Custom resources IMHO are becoming more prolific, as e.g. the Kubernetes community tries to have less "in-core" but still official types (e.g. Gateway API). Or simply, folk attempt to extend Kubernetes APIs for their purpose at hand.

hips/hip-0999.md Outdated

Certain workflows require custom resources to be ready. There is no way to tell Helm to wait for custom resources to be ready, so anyone that has this requirement must write their own logic to wait for their custom resources.

Certain workflows requires resources to be fully reconciled. For example, Helm waits for all new pods in an upgraded deployment to be ready. However, Helm does not wait for the previous pods in that deployment to be removed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: not exactly sure how this fits as a motivation? I think it is trying to say Helm doesn't currently / correctly handle this situation, but kstatus would?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah kstatus handles that situation, I will add that.


## Specification

From a CLI user's perspective there will be no changes in how waits are called, they will still use the `--wait` flag.
Copy link
Member

@gjenkins8 gjenkins8 Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the below subject of compatibility, and the how how waits are action, we might want e.g. --wait=watch|poll|legacy. Iiuc, kstatus has a watch based mechanism for actioning readiness? And we may want to allow falling back to the "legacy" mechanism (to be decided) (I would propose --wait=watch is the default)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases where the watch version would not work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not run into any issues with watch. I know flux uses the poll method, not sure if watch was out when they implemented kstatus, or if there was a reason they decided to go with poll

Copy link
Contributor

@mattfarina mattfarina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the HIP. I like the idea of using something from the Kubernetes community to know the status. When Helm's current code was built, nothing like this was available.


Leveraging a existing status management library maintained by the Kubernetes team will simplify the code and documentation that Helm needs to maintain and improve the functionality of `--wait`.

## Specification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see kstatus behind an adapter/interface. Helm should use it but not expose it in the API. There are two reasons I would like to see this:

  1. Helm has been long lived. Helm v3 has been GA for more than 5 years. Other projects come and go. If kstatus goes and something replaces it, we would like to be able to do that without it impacting the public API to the Helm SDK. While I don't expect a change like this, we have seen this kind of thing happen in the past.
  2. kstatus has yet to reach 1.0.0 status. There could be breaking changes. We want to shield the Helm SDK public API from any of those changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes perfect sense, I'll add that to the doc.

Signed-off-by: Austin Abro <[email protected]>
@AustinAbro321
Copy link
Author

Thank you guys for the feedback, I am aiming to create a draft PR sometime next week so we can get a sense for what it will look like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants