fix(manifests): fix full CRDs and update tests to use them

The full CRDs at `manifests/base/crds/full` are only intended for usage in editors (#11266 (comment)) and are unusable otherwise. Trying to install them will cause spurious validation errors with nearly every workflow. Having the full CRDs would enable many useful features, such as `kubectl explain` output (#8190), [validating admission policies](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/) (#13503), and integration with tools like cdk8s (#8532). As explained at #13754 (comment), there's two issues that prevent us from using the full CRDs everywhere: 1. [Kubernetes size limits](kubernetes/kubernetes#82292) when using client-side apply. This is not a problem when using . 2. Wrong validation information due to kubebuilder limitations. For the first issue, I verified that this is no longer an issue when using [server-side apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/), so I updated the `make install` target to use it. Since `kubectl apply --server-side` isn't compatible with `kubectl apply --prune`, I had to switch to use [apply sets](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/declarative-config/#alternative-kubectl-apply-f-directory-prune), which is intended to replace `--prune`, and seems to work well. For the second issue, I went through and added workarounds to `hack/manifests/crds.go` for all the kubebuilder issues I could find. Since the E2E test suite now uses the full CRDs, it should tell us if there's anything I missed. I didn't update the release manifests to use the full CRDs, since that's a big change and we probably should wait awhile to make sure there aren't any unexpected issues. Users can easily opt into the full CRDs if they want. Signed-off-by: Mason Malone <[email protected]>
argoproj · Jan 2, 2025 · 0d64c1d · 0d64c1d
1 parent 81993b5
commit 0d64c1d
Show file tree

Hide file tree

Showing 73 changed files with 39,004 additions and 22,690 deletions.
diff --git a/Makefile b/Makefile
@@ -480,7 +480,10 @@ install: githooks
 	kubectl get ns $(KUBE_NAMESPACE) || kubectl create ns $(KUBE_NAMESPACE)
 	kubectl config set-context --current --namespace=$(KUBE_NAMESPACE)
 	@echo "installing PROFILE=$(PROFILE)"
-	kubectl kustomize --load-restrictor=LoadRestrictionsNone test/e2e/manifests/$(PROFILE) | sed 's|quay.io/argoproj/|$(IMAGE_NAMESPACE)/|' | sed 's/namespace: argo/namespace: $(KUBE_NAMESPACE)/' | kubectl -n $(KUBE_NAMESPACE) apply --prune -l app.kubernetes.io/part-of=argo -f -
+	kubectl kustomize --load-restrictor=LoadRestrictionsNone test/e2e/manifests/$(PROFILE) \
+		| sed 's|quay.io/argoproj/|$(IMAGE_NAMESPACE)/|' \
+		| sed 's/namespace: argo/namespace: $(KUBE_NAMESPACE)/' \
+		| KUBECTL_APPLYSET=true kubectl -n $(KUBE_NAMESPACE) apply --applyset=configmaps/install --server-side --prune -f -
 ifeq ($(PROFILE),stress)
 	kubectl -n $(KUBE_NAMESPACE) apply -f test/stress/massive-workflow.yaml
 endif

diff --git a/api/jsonschema/schema.json b/api/jsonschema/schema.json
diff --git a/api/openapi-spec/swagger.json b/api/openapi-spec/swagger.json
diff --git a/docs/executor_swagger.md b/docs/executor_swagger.md
@@ -983,7 +983,7 @@ ConfigMap volumes support ownership management and SELinux relabeling.
 | template | string| `string` |  | | Name of template to execute |  |
 | templateRef | [TemplateRef](#template-ref)| `TemplateRef` |  | |  |  |
 | when | string| `string` |  | | When is an expression in which the task should conditionally execute |  |
-| withItems | [][Item](#item)| `[]Item` |  | | WithItems expands a task into multiple parallel tasks from the items in the list |  |
+| withItems | [][Item](#item)| `[]Item` |  | | WithItems expands a task into multiple parallel tasks from the items in the list</br>+kubebuilder:validation:Schemaless</br>+kubebuilder:pruning:PreserveUnknownFields |  |
 | withParam | string| `string` |  | | WithParam expands a task into multiple parallel tasks from the value in the parameter,</br>which is expected to be a JSON list. |  |
 | withSequence | [Sequence](#sequence)| `Sequence` |  | |  |  |
 
@@ -1856,7 +1856,6 @@ ISCSI volumes support ownership management and SELinux relabeling.
 
 
 > +protobuf.options.(gogoproto.goproto_stringer)=false
-+kubebuilder:validation:Type=object
   
 
 
@@ -2511,11 +2510,8 @@ be cluster-scoped, so there is no namespace field.
 ### <span id="parallel-steps"></span> ParallelSteps
 
 
-> +kubebuilder:validation:Type=array
 
 
-
-
 [interface{}](#interface)
 
 ### <span id="parameter"></span> Parameter
@@ -3391,7 +3387,7 @@ cause implementors to also use a fixed point implementation.
 | resources | [ResourceRequirements](#resource-requirements)| `ResourceRequirements` |  | |  |  |
 | restartPolicy | [ContainerRestartPolicy](#container-restart-policy)| `ContainerRestartPolicy` |  | |  |  |
 | securityContext | [SecurityContext](#security-context)| `SecurityContext` |  | |  |  |
-| source | string| `string` |  | | Source contains the source code of the script to execute |  |
+| source | string| `string` |  | | Source contains the source code of the script to execute</br>This is required, but it's possible for this to be set in templateDefaults, so we have to mark it as optional in the CRD.</br>+optional |  |
 | startupProbe | [Probe](#probe)| `Probe` |  | |  |  |
 | stdin | boolean| `bool` |  | | Whether this container should allocate a buffer for stdin in the container runtime. If this</br>is not set, reads from stdin in the container will always result in EOF.</br>Default is false.</br>+optional |  |
 | stdinOnce | boolean| `bool` |  | | Whether the container runtime should close the stdin channel after it has been opened by</br>a single attach. When stdin is true the stdin stream will remain open across multiple attach</br>sessions. If stdinOnce is set to true, stdin is opened on container start, is empty until the</br>first client attaches to stdin, and then remains open and accepts data until the client disconnects,</br>at which time stdin is closed and remains closed until the container is restarted. If this</br>flag is false, a container processes that reads from stdin will never receive an EOF.</br>Default is false</br>+optional |  |

diff --git a/docs/fields.md b/docs/fields.md
@@ -3066,7 +3066,7 @@ ScriptTemplate is a template subtype to enable scripting through code steps
 |`resources`|[`ResourceRequirements`](#resourcerequirements)|Compute Resources required by this container. Cannot be updated. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/|
 |`restartPolicy`|`string`|RestartPolicy defines the restart behavior of individual containers in a pod. This field may only be set for init containers, and the only allowed value is "Always". For non-init containers or when this field is not specified, the restart behavior is defined by the Pod's restart policy and the container type. Setting the RestartPolicy as "Always" for the init container will have the following effect: this init container will be continually restarted on exit until all regular containers have terminated. Once all regular containers have completed, all init containers with restartPolicy "Always" will be shut down. This lifecycle differs from normal init containers and is often referred to as a "sidecar" container. Although this init container still starts in the init container sequence, it does not wait for the container to complete before proceeding to the next init container. Instead, the next init container starts immediately after this init container is started, or after any startupProbe has successfully completed.|
 |`securityContext`|[`SecurityContext`](#securitycontext)|SecurityContext defines the security options the container should be run with. If set, the fields of SecurityContext override the equivalent fields of PodSecurityContext. More info: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/|
-|`source`|`string`|Source contains the source code of the script to execute|
+|`source`|`string`|Source contains the source code of the script to execute This is required, but it's possible for this to be set in templateDefaults, so we have to mark it as optional in the CRD.|
 |`startupProbe`|[`Probe`](#probe)|StartupProbe indicates that the Pod has successfully initialized. If specified, no other probes are executed until this completes successfully. If this probe fails, the Pod will be restarted, just as if the livenessProbe failed. This can be used to provide different probe parameters at the beginning of a Pod's lifecycle, when it might take a long time to load data or warm a cache, than during steady-state operation. This cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes|
 |`stdin`|`boolean`|Whether this container should allocate a buffer for stdin in the container runtime. If this is not set, reads from stdin in the container will always result in EOF. Default is false.|
 |`stdinOnce`|`boolean`|Whether the container runtime should close the stdin channel after it has been opened by a single attach. When stdin is true the stdin stream will remain open across multiple attach sessions. If stdinOnce is set to true, stdin is opened on container start, is empty until the first client attaches to stdin, and then remains open and accepts data until the client disconnects, at which time stdin is closed and remains closed until the container is restarted. If this flag is false, a container processes that reads from stdin will never receive an EOF. Default is false|

diff --git a/docs/installation.md b/docs/installation.md
@@ -25,6 +25,19 @@ You can use Kustomize to patch your preferred [configurations](managed-namespace
 
 You can install Argo Workflows using the community maintained [Helm charts](https://github.com/argoproj/argo-helm).
 
+### Full CRDs
+
+The official release manifests come with stripped-down CRDs that omit validation information.
+This is a workaround for [Kubernetes size limitations](https://github.com/kubernetes/kubernetes/issues/82292) when using client-side apply.
+With [server-side apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/), you can install the full CRDs using the following command:
+
+```bash
+kubectl apply --server-side --kustomize https://github.com/argoproj/argo-workflows/manifests/base/crds/full?ref=<appVersion>
+
+# e.g. for version 3.6.2:
+kubectl apply --server-side --kustomize https://github.com/argoproj/argo-workflows/manifests/base/crds/full?ref=v3.6.2
+```
+
 ## Installation options
 
 Determine your base installation option.

diff --git a/hack/manifests/crdgen.sh b/hack/manifests/crdgen.sh
@@ -8,7 +8,7 @@ add_header() {
   mv tmp "$1"
 }
 
-controller-gen crd:maxDescLen=0 paths=./pkg/apis/... output:dir=manifests/base/crds/full
+controller-gen crd:maxDescLen=0,generateEmbeddedObjectMeta=true paths=./pkg/apis/... output:dir=manifests/base/crds/full
 
 find manifests/base/crds/full -name 'argoproj.io*.yaml' | while read -r file; do
   # remove junk fields

diff --git a/hack/manifests/crds.go b/hack/manifests/crds.go
@@ -1,108 +1,83 @@
 package main
 
-import (
-	"fmt"
-	"os"
-	"path/filepath"
-
-	"sigs.k8s.io/yaml"
-)
+import "fmt"
 
+// cleanCRD does post-processing of the CRDs generated by kubebuilder to workaround limitations
+// (see https://github.com/kubernetes-sigs/controller-tools/issues/461)
 func cleanCRD(filename string) {
-	data, err := os.ReadFile(filepath.Clean(filename))
-	if err != nil {
-		panic(err)
-	}
-	crd := make(obj)
-	err = yaml.Unmarshal(data, &crd)
-	if err != nil {
-		panic(err)
-	}
-	delete(crd, "status")
-	metadata := crd["metadata"].(obj)
-	delete(metadata, "annotations")
-	delete(metadata, "creationTimestamp")
-	spec := crd["spec"].(obj)
-	versions := spec["versions"].([]interface{})
-	version := versions[0].(obj)
-	schema := version["schema"].(obj)["openAPIV3Schema"].(obj)
-	name := crd["metadata"].(obj)["name"].(string)
-	switch name {
+	crd := ParseYaml(Read(filename))
+	crd.RemoveNestedField("status")
+	crd.RemoveNestedField("metadata", "annotations")
+	crd.RemoveNestedField("metadata", "creationTimestamp")
+	schema := crd.OpenAPIV3Schema()
+	switch crd.Name() {
 	case "cronworkflows.argoproj.io":
-		properties := schema["properties"].(obj)["spec"].(obj)["properties"].(obj)["workflowSpec"].(obj)["properties"].(obj)["templates"].(obj)["items"].(obj)["properties"]
-		properties.(obj)["container"].(obj)["required"] = []string{"image"}
-		properties.(obj)["script"].(obj)["required"] = []string{"image", "source"}
+		patchWorkflowSpecTemplateFields(&schema, "spec", "properties", "workflowSpec", "properties")
 	case "clusterworkflowtemplates.argoproj.io", "workflows.argoproj.io", "workflowtemplates.argoproj.io":
-		properties := schema["properties"].(obj)["spec"].(obj)["properties"].(obj)["templates"].(obj)["items"].(obj)["properties"]
-		properties.(obj)["container"].(obj)["required"] = []string{"image"}
-		properties.(obj)["script"].(obj)["required"] = []string{"image", "source"}
+		patchWorkflowSpecTemplateFields(&schema, "spec", "properties")
 	}
-	data, err = yaml.Marshal(crd)
-	if err != nil {
-		panic(err)
-	}
-	err = os.WriteFile(filename, data, 0o600)
-	if err != nil {
-		panic(err)
+	if crd.Name() == "workflows.argoproj.io" {
+		patchTemplateFields(&schema, "status", "properties", "storedTemplates", "additionalProperties")
+		patchWorkflowSpecTemplateFields(&schema, "status", "properties", "storedWorkflowTemplateSpec", "properties")
 	}
+	crd.WriteYaml(filename)
 }
 
-// minimizeCRD is a workaround for two separate issues:
-// 1. "Request entity too large: limit is 3145728" errors due to https://github.com/kubernetes/kubernetes/issues/82292
-// 2. "spec.validation.openAPIV3Schema.properties[spec].properties[tasks].additionalProperties.properties[steps].items.items: Required value: must be specified" due to kubebuilder issues (https://github.com/argoproj/argo-workflows/pull/3809#discussion_r472383090)
-func minimizeCRD(filename string) {
-	data, err := os.ReadFile(filepath.Clean(filename))
-	if err != nil {
-		panic(err)
-	}
+func patchWorkflowSpecTemplateFields(schema *obj, baseFields ...string) {
+	patchTemplateFields(schema, append(baseFields, "templateDefaults")...)
+	patchTemplateFields(schema, append(baseFields, "templates", "items")...)
+}
 
-	shouldMinimize := false
-	if len(data) > 1024*1024 {
-		fmt.Printf("Minimizing %s due to CRD size (%d) exceeding 1MB\n", filename, len(data))
-		shouldMinimize = true
-	}
+func patchTemplateFields(schema *obj, baseFields ...string) {
+	// "container" and "script" templates embed the k8s.io/api/core/v1/Container
+	// struct, which requires the "name" field to be specified, but it's not actually required,
+	// and there's no easy way to override that in kubebuilder.
+	schema.RemoveNestedField(append(baseFields, "properties", "container", "required")...)
+	schema.RemoveNestedField(append(baseFields, "properties", "script", "required")...)
 
-	crd := make(obj)
-	err = yaml.Unmarshal(data, &crd)
-	if err != nil {
-		panic(err)
-	}
+	// Hack to transform "steps" from an array of objects, e.g.
+	//   steps:
+	//   - steps:
+	//     - name: foo
+	//       template: bar
+	// to an array of arrays, e.g.:
+	//   steps:
+	//   - - name: foo
+	//       template: bar
+	//
+	// The reason it's represented as an array of objects in "workflow_types.go"
+	// is because that's the only way to get kubebuilder to generate the spec
+	// without breaking API compatibility.
+	stepFields := append(baseFields, "properties", "steps", "items")
+	schema.CopyNestedField(append(stepFields, "properties", "steps"), stepFields)
+}
 
-	if !shouldMinimize {
-		name := crd["metadata"].(obj)["name"].(string)
-		switch name {
-		case "cronworkflows.argoproj.io", "clusterworkflowtemplates.argoproj.io", "workflows.argoproj.io", "workflowtemplates.argoproj.io", "workflowtasksets.argoproj.io":
-			fmt.Printf("Minimizing %s due to kubebuilder issues\n", filename)
-			shouldMinimize = true
-		}
+// minimizeCRD generates a stripped-down CRD as a workaround for "Request entity too large: limit is 3145728" errors due to https://github.com/kubernetes/kubernetes/issues/82292.
+// This isn't an issue if you use server-side apply, but not everyone has that available.
+func minimizeCRD(filename string) {
+	data := Read(filename)
+	shouldMinimize := false
+	// TODO: 512KB is just to avoid changing too much in this PR. Increase this back to 1MB after it's merged.
+	if len(data) > 512*1024 {
+		fmt.Printf("Minimizing %s due to CRD size (%d) exceeding 512KB\n", filename, len(data))
+		shouldMinimize = true
 	}
-
 	if !shouldMinimize {
 		return
 	}
-
-	crd = stripSpecAndStatusFields(crd)
-
-	data, err = yaml.Marshal(crd)
-	if err != nil {
-		panic(err)
-	}
-	err = os.WriteFile(filename, data, 0o600)
-	if err != nil {
-		panic(err)
-	}
+	crd := ParseYaml(data)
+	stripSpecAndStatusFields(crd)
+	crd.WriteYaml(filename)
 }
 
 // stripSpecAndStatusFields strips the "spec" and "status" fields from the CRD, as those are usually the largest.
-func stripSpecAndStatusFields(crd obj) obj {
-	spec := crd["spec"].(obj)
-	versions := spec["versions"].([]interface{})
-	version := versions[0].(obj)
-	properties := version["schema"].(obj)["openAPIV3Schema"].(obj)["properties"].(obj)
-	for k := range properties {
-		if k == "spec" || k == "status" {
-			properties[k] = obj{"type": "object", "x-kubernetes-preserve-unknown-fields": true, "x-kubernetes-map-type": "atomic"}
-		}
+func stripSpecAndStatusFields(crd *obj) {
+	schema := crd.OpenAPIV3Schema()
+	preserveMarker := obj{"type": "object", "x-kubernetes-preserve-unknown-fields": true, "x-kubernetes-map-type": "atomic"}
+	if _, ok := schema["spec"]; ok {
+		schema["spec"] = preserveMarker
+	}
+	if _, ok := schema["status"]; ok {
+		schema["status"] = preserveMarker
 	}
-	return crd
 }
diff --git a/hack/manifests/helpers.go b/hack/manifests/helpers.go
@@ -0,0 +1,71 @@
+package main
+
+import (
+	"fmt"
+	"os"
+	"path/filepath"
+
+	"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
+	"sigs.k8s.io/yaml"
+)
+
+type obj map[string]interface{}
+
+func (o *obj) RemoveNestedField(fields ...string) {
+	unstructured.RemoveNestedField(*o, fields...)
+}
+
+func (o *obj) CopyNestedField(sourceFields []string, targetFields []string) {
+	value := nestedFieldNoCopy[any](o, sourceFields...)
+	parentField := nestedFieldNoCopy[map[string]interface{}](o, targetFields[:len(targetFields)-1]...)
+	parentField[targetFields[len(targetFields)-1]] = value
+}
+
+func (o *obj) Name() string {
+	return nestedFieldNoCopy[string](o, "metadata", "name")
+}
+
+func (o *obj) OpenAPIV3Schema() obj {
+	versions := nestedFieldNoCopy[[]interface{}](o, "spec", "versions")
+	version := obj(versions[0].(map[string]interface{}))
+	return nestedFieldNoCopy[map[string]interface{}](&version, "schema", "openAPIV3Schema", "properties")
+}
+
+func nestedFieldNoCopy[T any](o *obj, fields ...string) T {
+	value, found, err := unstructured.NestedFieldNoCopy(*o, fields...)
+	if !found {
+		panic(fmt.Sprintf("failed to find field %v", fields))
+	}
+	if err != nil {
+		panic(err.Error())
+	}
+	return value.(T)
+}
+
+func (o *obj) WriteYaml(filename string) {
+	data, err := yaml.Marshal(o)
+	if err != nil {
+		panic(err)
+	}
+	err = os.WriteFile(filename, data, 0o600)
+	if err != nil {
+		panic(err)
+	}
+}
+
+func Read(filename string) []byte {
+	data, err := os.ReadFile(filepath.Clean(filename))
+	if err != nil {
+		panic(err)
+	}
+	return data
+}
+
+func ParseYaml(data []byte) *obj {
+	crd := make(obj)
+	err := yaml.Unmarshal(data, &crd)
+	if err != nil {
+		panic(err)
+	}
+	return &crd
+}
diff --git a/hack/manifests/types.go b/hack/manifests/types.go
diff --git a/manifests/base/crds/full/README.md b/manifests/base/crds/full/README.md
@@ -1,4 +1,4 @@
 # Full CRDs
 
-These CRDs have the full schema.
-As a result, they are large and probably not suitable to be used in your cluster due to [kubernetes/kubernetes#82292](https://github.com/kubernetes/kubernetes/issues/82292).
+These CRDs have the full schema and must be applied using [server-side apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/) to avoid hitting size limits.
+See [Full CRDs](https://argo-workflows.readthedocs.io/en/latest/installation/#full-crds) for more details.