Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional complianceTypes for ConfigurationPolicy #131

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
376 changes: 376 additions & 0 deletions enhancements/sig-policy/130-policy-merge-compliancetypes/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,376 @@
# ConfigurationPolicy ComplianceTypes for different merge behaviors

## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [website](https://github.com/open-cluster-management-io/open-cluster-management-io.github.io/)

## Summary

We propose to add new options for the `spec["object-templates][*].complianceType` field in
ConfigurationPolicies to allow the user to specify new merge strategies: `musthaveapply`,
`musthavestrategic` and `musthavemerge`. These new options correspond to existing options in
`kubectl`, which users may be familiar with. Along with the existing `musthave`, `mustonlyhave`, and
`mustnothave` options, this should provide more flexibility and control to users, and be able to
provide a more predictable experience.

## Motivation

The existing `musthave` complianceType has a bespoke algorithm for merging content from the cluster
state with what is defined in the policy. One specific difference is when the policy includes a list
in the object template: `musthave` will check the existing list for any items specified in the
policy, and if any are missing, it will *add* them to the list and apply the change. Tools like
`kubectl apply` can do something similar on some objects, via a "strategic merge", but not on custom
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good to know that server-side-apply can do strategic merges on custom resources, although it begs the question of why exactly the client-side can't do the same thing - it should have access to the same openapi information from the CRDs...

I'll have to try server-side-apply more, but from a quick read I'd be worried it would have more "conflicts" than we'd want?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JustinKuli could you research what it'd look like if we did server-side apply while forcing ownership of fields if the field is in the ConfigurationPolicy?

resources.

The existing `mustonlyhave` complianceType is simpler to understand: any fields on the object not
defined in the policy will be removed from the object on the cluster (possibly just reverting them
to default values). This can be used to avoid other merge behaviors, but requires the policy author
to include more of the object in the policy, and does not allow the unset fields to be set by other
controllers. This complianceType has been known to cause issues by repeatedly updating objects on
the cluster, causing high API server load.

New complianceTypes would give more tools to policy authors to specify the exact behavior they want.

### Goals

1. Policy authors should be able to create a policy to *replace* a specific list in an object on the
cluster with what is defined in the policy, *without* that policy reverting other customizations
on the object.
2. Policy authors should be able to create a policy that has the exact same behavior as `kubectl`
commands like `apply`, `patch --type=strategic` and `patch --type=merge`.
3. Clarify in documentation the exact behaviors of the complianceType options.

### Non-Goals

1. Support strategic merge on custom resources (`kubectl` does not support this).

## Proposal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you look into the options that Argo CD provides and why? I believe this is the diff code:
https://github.com/argoproj/gitops-engine/blob/72bcdda3f0a5b80432d9f72e5b30827a530ac349/pkg/diff/diff.go#L75

They seem to allow server side apply if asked for. They seem to try to use strategic merge on native Kubernetes types and fallback to JSON merge (see the ThreeWayDiff function, though we wouldn't provide last-applied-configuration and would just provide a copy of the desired like in the TwoWayDiff function).


Add a `musthavemerge` complianceType, corresponding to the `kubectl` command `patch --type=merge`.
In particular, this allows for replacement of specific fields / objects / lists in an object without
reverting other "unrelated" changes in the object. This merge strategy works on all resource types,
including custom ones.

Add a `musthavestrategic` complianceType, corresponding to `kubectl patch --type=strategic`. This
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if the user specified this and it was applied on a custom resource? Would it fall back to merge?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like kubectl patch operatorpolicy --type=strategic ... fails with an error mentioning that application/strategic-merge-patch+json is not supported.

strategy works only on "built-in" types on the cluster, but allows for more intelligent merges using
"patchStrategy" information available in those type definitions. For example, it allows for adding
an additional environment variable to a Deployment, as long as it has a new name.

Add a `musthaveapply` complianceType, corresponding to `kubectl apply`. The patch type is chosen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that it may be better to just pick this option and not have the other two options since I don't really see a reason to have merge behavior when strategic merge is available.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strategic merge is very similar to the bespoke musthave. The name key that musthave uses as a mergeKey is the most common one I found in the API, so it covers a good number of cases. The biggest difference is that musthave works on custom resources.

If I had to pick only one new option, I'd actually prefer musthavemerge. Its behavior is the most different from what we currently have, and I think story 1 makes a good case for why it would be better than mustonlyhave.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. Perhaps both does make sense.

based on the involved resource's kind: custom resources use `--type=merge`, and built-in types use
`--type=strategic`.

These new types will only change the behavior when *updating* a resource on the cluster.

### User Stories

#### Story 1

A policy author can define a ConfigurationPolicy to *replace* the `env` list in a container in a
Deployment, without specifying *all* of the other non-default fields of the Deployment in the
ConfigurationPolicy.

Say this is the existing Deployment:
```yaml
kind: Deployment
apiVersion: apps/v1
metadata:
name: example
namespace: default
spec:
replicas: 0
selector:
matchLabels:
app: example
template:
metadata:
creationTimestamp: null
labels:
app: example
spec:
containers:
- name: container
image: 'image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest'
ports:
- containerPort: 8080
protocol: TCP
env:
- name: hello
value: world
resources: {}
terminationMessagePath: /dev/my-termination-log # (customized)
terminationMessagePolicy: File
imagePullPolicy: Always
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirstWithHostNet # (customized)
securityContext: {}
schedulerName: default-scheduler
strategy:
type: Recreate
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
```

This ConfigurationPolicy should change *only* the environment variable section:
```yaml
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: example
spec:
namespaceSelector:
exclude: []
include:
- default
object-templates:
- complianceType: musthavemerge
objectDefinition:
kind: Deployment
apiVersion: apps/v1
metadata:
name: example
namespace: default
spec:
template:
spec:
containers:
- name: container
image: 'image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest'
ports:
- containerPort: 8080
protocol: TCP
env:
- name: foo
value: bar
terminationMessagePath: /dev/my-termination-log
recreateOption: None
pruneObjectBehavior: None
remediationAction: enforce
severity: medium
```

Note: effectively, the merge is replacing the entire `containers` list, so the `image`, `ports`, and
`terminationMessagePath` fields must be set to preserve them. However, this is far fewer fields than
would be required to accomplish this via a `mustonlyhave` policy, and allows for other processes to
customize things like the number of replicas on the Deployment.

For comparison, a `mustonlyhave` policy to accomplish this would look like:
```yaml
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: example
spec:
namespaceSelector:
exclude: []
include:
- default
object-templates:
- complianceType: mustonlyhave
objectDefinition:
kind: Deployment
apiVersion: apps/v1
metadata:
name: example
namespace: default
spec:
replicas: 0
selector:
matchLabels:
app: example
template:
metadata:
labels:
app: example
spec:
containers:
- name: container
image: 'image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest'
ports:
- containerPort: 8080
protocol: TCP
env:
- name: hello
value: world
terminationMessagePath: /dev/my-termination-log
dnsPolicy: ClusterFirstWithHostNet
strategy:
type: Recreate
recreateOption: None
pruneObjectBehavior: None
remediationAction: enforce
severity: medium
```

#### Story 2

A policy author can define a ConfigurationPolicy to update an item in the `ports` list of a
Deployment, without specifying other (non-identifying) fields in the policy. More generally, users
should be able to benefit from arbitrary `patchMergeKey` settings on built-in types.

With the same initial cluster state as Story 1, this ConfigurationPolicy should make the correct
update:
```yaml
apiVersion: policy.open-cluster-management.io/v1
kind: ConfigurationPolicy
metadata:
name: example
spec:
namespaceSelector:
exclude: []
include:
- default
object-templates:
- complianceType: musthavestrategic
objectDefinition:
kind: Deployment
apiVersion: apps/v1
metadata:
name: example
namespace: default
spec:
template:
spec:
containers:
- name: container
ports:
- containerPort: 8080
protocol: UDP
recreateOption: None
pruneObjectBehavior: None
remediationAction: enforce
severity: medium
```

If the complianceType was `musthave`, the resulting deployment would specify two ports: one UDP
and one TCP. The bespoke implementation for `musthave` does not recognize `containerPort` as the
`patchMergeKey` for that list - it only uses `name` for that purpose.

A `mustonlyhave` or `musthavemerge` policy would require many more fields to be set in the policy.

### Implementation Details/Notes/Constraints [optional]

The existing `musthave` complianceType is likely the best general choice, working on both built-in
types and custom resources the same way, and allowing the policy author to be sparse on the fields
they define in the policy. The new types would allow policy authors to have *some* more control over
the behavior, without requiring the full object definition like in `mustonlyhave`.

The `kubectl patch` command has three types of patches that can be specified. The names can be
confusing:
- `--type=json` [JSON Patch, RFC 6902](https://datatracker.ietf.org/doc/html/rfc6902) is not
covered by this proposal. Patches look like `[{"op": "replace", "path": "/a/b/c", "value": 42}]`.
- `--type=merge` [JSON Merge Patch, RFC 7386](https://datatracker.ietf.org/doc/html/rfc7386).
Patches look like a snippet of the resource in question, with omitted fields (generally) being
merged with the state of the object on the cluster.
- `--type=strategic` is kubernetes-specific. It can use additional API information on built-in types
in order to make more intelligent merge decisions. Patches look like a snippet of the resource.

### Risks and Mitigation

Adding new options makes the API more complex. Documentation with examples will help clarify what
each option does, including making the existing `musthave` behavior more clear.

As mentioned in User Story 1, the proposed `musthavemerge` will still require some additional fields
to be specified in the policy to prevent them from reverting to defaults. However, since this is the
same behavior as a `kubectl patch` command, it may be more familiar to some users. It also requires
fewer additional fields in general than `mustonlyhave`.

We do not forsee any increased use of resources (cpu, memory, API calls) by using these types. In
fact, using `musthavemerge` instead of `mustonlyhave` may reduce the total number of API calls by
preventing some cases of "thrashing".

## Design Details

### Open Questions [optional]

1. Is `musthaveapply` particularly useful? It may have to guess whether a strategic merge is
possible, for example on types in the OpenShift API which might support it. It may be more clear
to simply require the policy author to choose. There might not be a case where the strategy
should dynamically be chosen per-cluster, where this option would be helpful.

### Test Plan

**Note:** *Section not required until targeted at a release.*

<!--

Consider the following in developing a test plan for this enhancement:
- Will there be e2e and integration tests, in addition to unit tests?
- How will it be tested in isolation vs with other components?

No need to outline all of the test cases, just the general strategy. Anything
that would count as tricky in the implementation and anything particularly
challenging to test should be called out.

All code is expected to have adequate tests (eventually with coverage
expectations).

All code is expected to have sufficient e2e tests.

-->

### Graduation Criteria

**Note:** *Section not required until targeted at a release.*

<!--

Define graduation milestones.

These may be defined in terms of API maturity, or as something else. Initial proposal
should keep this high-level with a focus on what signals will be looked at to
determine graduation.

Consider the following in developing the graduation criteria for this
enhancement:

- [Maturity levels][maturity-levels]
- [Deprecation policy][deprecation-policy]

Clearly define what graduation means by either linking to the [API doc definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning),
or by redefining what graduation means.

In general, we try to use the same stages (alpha, beta, stable), regardless how the functionality is accessed.

[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions
[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/

-->

### Upgrade / Downgrade Strategy

As new options, there are no concerns with upgrades. After upgrading to a version that supports
these options, they should just work. Downgrading is only a concern if a policy has been created
that uses a new option. In that case, it may be required to delete the policy, or change the field
to a supported value.

### Version Skew Strategy

Through CRD validation, the policy framework will not be able to apply a ConfigurationPolicy using
one of the new `complianceType` options on a cluster that does not support it yet. The framework
will remove the existing ConfigurationPolicy (if it exists) and emit a template error.

## Implementation History

<!--
Major milestones in the life cycle of a proposal should be tracked in `Implementation
History`.
-->

## Drawbacks

See "Risks and Mitigation"

## Alternatives

Using `object-templates-raw` allows a policy author to effectively "roll their own" merge strategy,
but the syntax could be very complex, and it would likely be error-prone.

## Infrastructure Needed [optional]

None.
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
title: ConfigurationPolicy ComplianceTypes for different merge behaviors
authors:
- "@JustinKuli"
reviewers:
- TBD
approvers:
- TBD
creation-date: "2024-10-28"
last-updated: "2024-10-28"
status: implementable
Loading