diff --git a/hips/README.md b/hips/README.md index 80c5a24d..fb9e6ab2 100644 --- a/hips/README.md +++ b/hips/README.md @@ -36,3 +36,4 @@ restricted markdown format and can be found in the - [hip-0024: Improve the Helm Documentation](hip-0024.md) - [hip-0025: Better Support for Resource Creation Sequencing](hip-0025.md) - [hip-0026: H4HIP: Wasm plugin system](hip-0026.md) +- [hip-9999: Handling unknown hook-delete-policy values](hip-9999.md) diff --git a/hips/hip-9999.md b/hips/hip-9999.md new file mode 100644 index 00000000..d8ac9321 --- /dev/null +++ b/hips/hip-9999.md @@ -0,0 +1,180 @@ +--- +hip: 9999 +title: "Handling unknown hook-delete-policy values" +authors: [ "Marcin Owsiany " ] +created: "2025-07-23" +type: "feature" +status: "draft" +--- + +## Abstract + +Currently, unknown _values_ of `helm.sh/hook-delete-policy` annotations are silently permitted. +They result in the given hook _never_ being deleted. +However this behaviour seems to be by coincidence, rather than intentional. +This HIP proposes how to handle them going forward. + +## Motivation + +There are [a number of _documented_ hook deletion policies][Docs]. +All of them result in hook deletion _at some point_. +A default hook delete policy (namely `before-hook-creation`) +is [applied when the list of hook policies for a resource is empty][Code]. + +However, when an _unrecognized_ value (e.g. `badger`) is specified: +* Helm does not complain in any way, and +* a hook resource annotated this way is **never deleted**. + +It seems that the current behaviour is a coincidence or mistake, rather than intended. + +## Rationale + +In line with [Hyrum's Law][Hyrum's Law], at least [one project][StackRox chart] depends on the current behaviour. +Namely, it specifies a hook deletion policy of `never` on most of its storage-related resources +(such as `PersistentVolumeClaim`). +These resources are applied as hooks, in order to guard against data loss, +see [appendix A](#appendix-a-why-create-storage-resources-as-hooks) if you would like to know more. + +I was able to find [one more unrelated chart][Anance personal chart] that also specifies the same value. + +Some charts also allow the users to specify the policy using a Helm var, +and it's not clear whether they validate the value before using it in a template. + +The current Helm behaviour in this case is unspecified. +There is a concern that a change in the current implementation +(for example applying the default policy more aggressively) could cause a catastrophic data loss. + +## Specification + +This proposal: +- suggests keeping status quo for Helm charts API v2, for the sake of backwards compatibility, +- outlines a few options for charts API v3. + +### For Helm charts API v2 + +- No semantic changes to production code. +- Add a regression test to make sure that the current behaviour is not changed by mistake. + Example: https://github.com/helm/helm/pull/31385 + +### For Helm charts API v3 + +There are a few dimensions in which we can make a change, that are somewhat interconnected: +- whether to accept a deletion disabling policy at all, or break compatibility and reject it, +- if accepted, whether support it officially (documented, etc), or deprecate it, +- what to do about (other) undocumented hook deletion policy values, + +#### 1. Official support for a policy which disables hook deletion + +Add a new, documented hook deletion policy value: `never`. +When specified, such hook resource is not deleted. +Effectively, the same as suggestion for Helm charts API v2, but explicitly supported. + +Pros: +- Keeps compatibility with Helm charts API v2. +- Allows a notion of resources which are installed by Helm, but afterwards not managed by it in any way. +- StackRox chart continues to work as before without changing. +- Other charts (if any) which happen depend on current undocumented behaviour could easily be made to work by changing + whatever value they use to `never`. + +Cons: +- It seems that hooks were generally intended to be resources whose previous instances are cleaned up + before (subsequent) chart applications. Supporting this policy breaks this assumption and + introduces some complexity when reasoning about possible scenarios. For example such `pre-install` + hook resources need to be guarded by a `if $.Release.IsInstall` condition in order not to break upgrades. + +#### 2. Keep the status quo + +The same as described above for Helm charts API v2. + +Pros: +- Keeps compatibility with Helm charts API v2. +- StackRox chart continues to work as before without changing. + +#### 3. Reject all undocumented hook deletion policies (including `never`) + +Make it an error to use undocumented hook deletion policies, including `never`. + +Pros: +- Mental model for hook deletion stays simple: they are always deleted (at _some_ point). + +Cons: +- Breaks compatibility with Helm charts API v2. In particular: + - StackRox Helm chart stops working as is. This would most likely force StackRox to stop supporting direct usage of `helm` + for installation; StackRox operator would still use Helm library internally, and manage the storage resources differently, + as it does now. + +#### 4. Reject undocumented policies (_other than_ `never`) + +Make it an error to use undocumented hook deletion policies, except `never` (which would be treated +as described in point either 1 or 2). + +Pros: +- _Mostly_ keeps compatibility with Helm charts API v2. Technically breaks it, but unlike proposal (3), + there is a migration path for charts which use an undocumented value other than `never` - + they need to change it to `never`). +- A little easier to reason about the behaviour, since there are fewer options. + +Cons: +- If an older chart exists out there that uses an undocumented value other than `never`, + then its legacy versions would be incompatible with Helm charts API v3. + +#### 5. Warn about undocumented policies + +Add a linter check that complains if an undocumented value for hook deletion policy is used. + +Pros: +- Raises awareness about this issue. + +Cons: +- None? + +## Backwards compatibility + +Described above, this HIP is all about compatibility. + +## Security implications + +None. + +## How to teach this + +Documentation and linter concerns mentioned above. + +## Reference implementation + +N/A ATM. + +## Rejected ideas + +N/A + +## Open issues + +See alternative points for Helm charts API v3 above. + +## Appendix A: Why create storage resources as hooks? + +Disclaimer: some of the reasoning below are conjectures since the motivation is lost in the mists of time. + +A Helm chart was introduced for the StackRox project around 2020. +At that time, there was a strong concern that a user mistake might lead to data loss, +in case the `PersistentVolume` or `PersistentVolumeClaim` were deleted in case the Helm release was deleted accidentally. + +Because of that it looks like all possible safeguards were applied, including `helm.sh/resource-policy: keep` +**and** `helm.sh/hook: pre-install,pre-upgrade` together with `helm.sh/hook-delete-policy: never`. +I do not know why this undocumented value was chosen. +I assume it might have been a mistake that was not caught until recently, since the result worked as desired. + +According to [the documentation][Resource policy docs], using `helm.sh/resource-policy: keep` alone +would be sufficient to prevent deletion on uninstallation. +However, it is not completely clear (to me), given how Helm behaves during `--force` rollbacks. +Deleting and recreating a `PersistentVolumeClaim` in that case would lead to data loss. + +## References + +- [Docs]: https://helm.sh/docs/topics/charts_hooks/#hook-deletion-policies +- [Code]: https://github.com/helm/helm/blob/bd897c96fbaf7546d6a5c57be009f16f9d38d6de/pkg/action/hooks.go#L47 +- [Resource policy docs]: https://helm.sh/docs/howto/charts_tips_and_tricks/#tell-helm-not-to-uninstall-a-resource +- [StackRox chart]: https://github.com/stackrox/stackrox/tree/master/image/templates/helm/stackrox-central +- [Anance personal chart]: https://github.com/ananace/personal-charts/blob/0e60e96373c8d827c0723ec0bfaa336bd09ffb35/charts/matrix-synapse/templates/signing-key-job.yaml#L176 +- [Hyrum's Law]: https://www.hyrumslaw.com/