Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration Design: Rename Bundle to ClusterBundle #485

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
252 changes: 252 additions & 0 deletions design/20241124-rename-bunde-to-clusterbundle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
# Design: Renaming Bundle to ClusterBundle

- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Alternatives](#alternatives)
- [Future Work](#future-work)

## Release Signoff Checklist

This checklist contains actions which must be completed before a PR implementing this design can be
merged.

- [ ] This design doc has been discussed and approved
- [ ] Test plan has been agreed upon and the tests implemented
- [ ] User-facing documentation has been PR-ed against the release branch in [cert-manager/website]

TODO: add more items to checklist

## Summary

We want to rename `Bundle` to `ClusterBundle` for reasons described in "Motivation" below.

Conversion webhooks in Kubernetes can only be used to convert between versions of an API,
and not to convert between resources. This makes renaming harder to do, as it will represent
deleting the old API and creating a new API.

Even though trust-manager `Bundle` API is officially in an alpha state, we want to provide a smooth
migration path for our users. For this reason we are proposing to do this change over multiple releases,
requiring end-user action on the way, but without risking trust-manager target resources - which can provide
critical features in a cluster. For additional details, please see "Proposal" below.

## Motivation

We have [an open issue](https://github.com/cert-manager/trust-manager/issues/63) suggesting renaming Bundle to ClusterBundle.
Might seem like an obvious WontFix, considering all the assumed work (and pain) to do this,
but it turns out that the current name somehow blocks introducing an equivalent namespace-scoped resource -
for trust bundle management inside namespaces in multi-tenant clusters.

In addition, `ClusterBundle` would be a better name looking at cert-manager having `Issuer` and `ClusterIssuer`.
The name would also be better aligned with upstream [ClusterTrustBundle](https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/#cluster-trust-bundles).

By pushing users to migrate smoothly to a new API, we can perform requested API changes at the same time.
Consider it similar to a one-way conversion webhook.

Since trust-manager is maturing and getting more attention, we suggest establishing its own API group: `trust-manager.io`.

### Goals

- `Bundle` resource is renamed to `ClusterBundle`
- `ClusterBundle` is created in the new `trust-manager.io` API group
- Provide a smooth migration for users of trust-manager
- Use the opportunity window to improve the API
- [Issue summarizing proposed API changes (#242)](https://github.com/cert-manager/trust-manager/issues/242)
- [Draft PR for proposed API changes done when migrating (#486)](https://github.com/cert-manager/trust-manager/pull/486)


### Non-Goals

- Behavior changes in `Bundle`/`ClusterBundle` API while migrating (only simpler API improvements)
- Introduce a new API version
- Introduce a namespace-scoped bundle API (at least not now)

## Proposal

We propose to do this change over multiple releases:

1. Version X: Only `Bundle` API is served and has the functional controller. (present state)
2. Version X+1: We introduce `ClusterBundle` with API otherwise identical to `Bundle`:
- `Bundle` is marked as deprecated.
- `ClusterBundle` introduced with agreed API improvements.
- Functional controller is migrated to `ClusterBundle`
- Validating webhook for both `Bundle` and `ClusterBundle` with same validations
- New technical controller added for `Bundle`, converting bundles to cluster bundles
3. Version Y: Remove `Bundle` and only serve `ClusterBundle` (desired state)
erikgb marked this conversation as resolved.
Show resolved Hide resolved

Some of these actions/states require extra attention, which are further described below.

### Functional Controller Migration

This should be rather simple. Since the APIs are identical, we just have to perform a "smart"
search and replace in the controller code. When the controller initially boots up on the X+1 version,
there will be no `ClusterBundle` resources in the cluster, as these will be created/updated by the technical
controller.

The `ClusterBundle` reconciler should do the same as for `Bundle` in version X, but we should test/verify that
the controller is able to change owner references on target resources from `Bundle` to `ClusterBundle` without
facing issues.

We must also ensure that the `ClusterBundle` reconciler isn't deleting target resources it thinks are "orphaned".
We believe the controller will only delete target resources if/when the controller resource exists, and
otherwise rely on the Kubernetes garbage collector to delete orphans. But this should be tested/verified.

### Temporary Technical Controller

When the functional controller is migrated to target `ClusterBundle`, we want to introduce a temporary technical
controller targeting the now deprecated `Bundle` resource.

This controller should just create or update the `ClusterBundle` with the same name as the reconciled `Bundle`.
Since the APIs should be quite similar, this should be a simple controller.

- `Bundle` `.spec` should be copied (and eventually slightly converted) into `ClusterBundle` `.spec`
- `Bundle` `.status` should be updated from `ClusterBundle` `.status`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also add a clear message in the Bundle .status stating that using the Bundle is deprecated.
Also, the validation webhook should throw a warning explaining that the Bundle resources are deprecated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRDs supports deprecating APIs/versions, and we can tailor the warning message. Isn't this enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im pretty sure the CRD deprecation warning tells you every time you run kubectl against it, we can test this though to make sure thats true with all the commands (describe, get, apply). If that is the case then I think a CRD deprecation warning is enough


To allow a user to migrate their resources from `Bundle` to `ClusterBundle`, the controller should not act
on `Bundle` delete events - nor should it add an owner reference to `ClusterBundle` to avoid the Kubernetes garbage collector
acting on deletes.

For reconciling a `ClusterBundle` from `Bundle` we should investigate the possibility of "smart field manager name logic".
It should be possible to read the field manager name used (by user) to create/update the `Bundle` resource.
If we use the same field manager name in controller (per resource) to create/update `ClusterBundle`, we think it's less
likely that users will have to force conflicts when migrating from `Bundle` to `ClusterBundle`.

### Proposed API changes when migrating

Improving the trust-manager API while migrating is not something we have to do, but
appears like nice opportunity to fix some of the long-standing proposed API changes.

There is an [open draft PR for proposed (and breaking) API changes](https://github.com/cert-manager/trust-manager/pull/486),
but in order to make this a non-braking change for users, only the `ClusterBundle` API will include these changes.

While this is subject to change, an example on how a user would migrate from `Bundle` to `ClusterBundle` without
changing anything for target resources is shown below:

```yaml
---
apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
name: my-bundle
spec:
sources:
- configMap:
name: my-ca
key: ca.crt
target:
configMap:
key: ca.crt
additionalFormats:
jks:
key: bundle.jks
pkcs12:
key: bundle.p12
```

Which would be migrated to:

```yaml
---
apiVersion: trust-manager.io/v1alpha1
kind: ClusterBundle
metadata:
name: my-bundle
spec:
sources:
- configMap:
name: my-ca
key: ca.crt
target:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've often thought that instead of having spec.target.configMap and spec.target.secret we should just have spec.targets which is an array of object references including apiVersion and kind.

This makes supporting future sources much easier, say for example if we wanted to add a plugin system like cert-manager has.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Could you draw an example? 😊

Copy link
Contributor

@ThatsMrTalbot ThatsMrTalbot Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 I was thinking of sources not targets, for example:

apiVersion: trust-manager.io/v1alpha1
kind: ClusterBundle
metadata:
  name: my-bundle
spec:
  sources:
  # Using apiVersion/kind means we can utilise this same array for 
  # any future "extensions"/"plugins".
  #
  # Loading from other sources could be a valuable feature and this API change
  # would enable that option in the future.
  - apiVersion: v1
    kind: ConfigMap
    name: my-ca
    key: ca.crt
  ...

In my head it would mean that AWS for example could release an operator that exposes the AWSPrivateCABundleSource that a user creates to reference their private ca. Then by referencing that AWSPrivateCABundleSource object in a bundle we pick up the CA.

That kind of support would be added in the longer term, not this piece of work, but the API change could enable that future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good input, thanks! I suggest moving the detailed API changes discussion to #242. I have also opened #486, which will be modified to change the initially unserved ClusterBundle - after it's introduced.

For this design, I suggest we agree that we should consider changing the API. And not exactly what we want to change. Are you ok with this approach?

configMap:
- key: ca.crt
- key: bundle.jks
format: JKS
- key: bundle.p12
format: PKCS12
namespaceSelector: {} # Selecting all namespaces here, but now required in API
```

### Risks and Mitigations

#### Target configmaps/secrets are accidentally deleted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two conflicting goals here:

  • we do not want to break the behavior when deleting the Bundle resource
  • we want to support users who remove the Bundle resource and start using the ClusterBundle instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative approach here would be to add owner references and document how to remove those owner references when you want to start using the ClusterBundle as the new "source of truth".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "owner references" are too complex, we can also implement our own garbage collector (eg. using labels).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you consider this as "conflicting goals"? I did not mention "do not want to break the behavior when deleting the Bundle resource" as a goal specifically, did I? 🤔 Considering the alpha state of Bundle API, I don't see a big harm in potentially orphaning some resources that can easily be deleted by the user as a workaround. The alternative could be a lot more destructive.

As a user, I don't want my ClusterBundle to be deleted when the corresponding Bundle is deleted. Maybe I am missing your point. Please clarify!


Since using owner references in this project, we need to be extra careful when performing changes like this.

We mitigate this risk by **NOT** cascading deletes of `Bundle` to `ClusterBundle`.
This is done by **NOT** adding owner references to `ClusterBundle`, and instead use the resource `name`
to map between `ClusterBundle` and `Bundle` resources.

Owner references on target resources are changed by the reconciler from `Bundle` to `ClusterBundle`,
which means targets could only be deleted if the `ClusterBundle` is deleted or `namespaceSelector`
is changed. This is the expected behavior.

#### Target configmaps/secrets appearing as orphaned

This could happen if a users deletes a `Bundle` resource expecting the target configmaps/secrets
to vanish. Unfortunately this won't happen by the defensive approach taken to not cascade `Bundle`
delete to `ClusterBundle`.

In order to achieve the wanted behavior, a user would have to delete the corresponding `ClusterBundle`,
which appears like a simple workaround that we should document.

## Design Details

### Test Plan

TODO

### Upgrade / Downgrade Strategy

Upgrading should represent less of challenge if we ensure enough release between version X+1 and Y.
We should document that upgrading directly from version X (or earlier) to Y is explicitly **NOT** supported.

Downgrading to version X (or earlier) from a version where the migration has started could be problematic.
We can probably mitigate this risk by keeping owner references (non-controller, as Kubernetes doesn't allow
multiple controller references) from `Bundle` to target resources, in addition to the controller references from
`ClusterBundle`, during the migration period.

## Alternatives
erikgb marked this conversation as resolved.
Show resolved Hide resolved

### Introduce `ClusterBundle` as a new and Independent API

Since a `Bundle` and a `ClusterBundle` could then address the same target resources (by name), we would have to
take this into account in controllers. In addition, we would have a **lot** more code to maintain and test.
This could be partly mitigated by creating reusable abstractions that work for both resources.
But if we want to allow `ClusterBundle` to provide new features, this would become complex.

The migration path from `Bundle` to `ClusterBundle` would also be unclear and complex.
Use case: As a user, I would like to use `ClusterBundle` instead of `Bundle` **without interrupting the target resources**.

```
Bundle -> ClusterBundle -> target resources
```

seems a lot simpler to manage than

```
Bundle -> target resources AND
ClusterBundle -> target resources
```

### Just rename resource between releases

Since the `Bundle` API version is `v1alpha1`, we could justify just doing the simplest thing and rename.
This approach could cause potentially catastrophic failures in user clusters when the `Bundle` CRD is deleted
since all target configmaps/secrets are owned by bundle and would be deleted by the Kubernetes garbage collector.

### Doing Nothing

See "Motivation" above.

## Future Work

- Introduce a **namespace-scoped** `Bundle` resource.
- Integration with upstream [ClusterTrustBundle API](https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/#cluster-trust-bundles).
- New and improved version of the `ClusterBundle` API