diff --git a/design/Implemented/node-agent-affinity.md b/design/Implemented/node-agent-affinity.md index 8a2911b10a..d95312c7ab 100644 --- a/design/Implemented/node-agent-affinity.md +++ b/design/Implemented/node-agent-affinity.md @@ -26,8 +26,8 @@ Therefore, in order to improve the compatibility, it is worthy to configure the ## Non-Goals - It is also beneficial to support VGDP instances affinity for PodVolume backup/restore, however, it is not possible since VGDP instances for PodVolume backup/restore should always run in the node where the source/target pods are created. -- It is also beneficial to support VGDP instances affinity for data movement restores, however, it is not possible in some cases. For example, when the `volumeBindingMode` in the storageclass is `WaitForFirstConsumer`, the restore volume must be mounted in the node where the target pod is scheduled, so the VGDP instance must run in the same node. On the other hand, considering the fact that restores may not frequently and centrally run, we will not support data movement restores. -- As elaberated in the [Volume Snapshot Data Movement Design][2], the Exposer may take different ways to expose snapshots, i.e., through backup pods (this is the only way supported at present). The implementation section below only considers this approach currently, if a new expose method is introduced in future, the definition of the affinity configurations and behaviors should still work, but we may need a new implementation. +- It is also beneficial to support VGDP instances affinity for data movement restores, however, it is not possible in some cases. For example, when the `volumeBindingMode` in the StorageClass is `WaitForFirstConsumer`, the restore volume must be mounted in the node where the target pod is scheduled, so the VGDP instance must run in the same node. On the other hand, considering the fact that restores may not frequently and centrally run, we will not support data movement restores. +- As elaborated in the [Volume Snapshot Data Movement Design][2], the Exposer may take different ways to expose snapshots, i.e., through backup pods (this is the only way supported at present). The implementation section below only considers this approach currently, if a new expose method is introduced in future, the definition of the affinity configurations and behaviors should still work, but we may need a new implementation. ## Solution diff --git a/design/repo_maintenance_job_config.md b/design/repo_maintenance_job_config.md new file mode 100644 index 0000000000..90548a68fb --- /dev/null +++ b/design/repo_maintenance_job_config.md @@ -0,0 +1,148 @@ +# Repository maintenance job configuration design + +## Abstract +Add this design to make the repository maintenance job can read configuration from a dedicate ConfigMap and make the Job's `PodSpec.Affinity` configurable. + +## Background +Repository maintenance is split from the Velero server to a k8s Job in v1.14 by [repository maintenance job](Implemented/repository-maintenance.md). +The repository maintenance Job configuration was read from the Velero server CLI parameter. +The Job inherits the most of the Job's PodSpec configuration from Velero server's Deployment's PodSpec. + +This design introduces a new way to let the user to customize the repository maintenance behavior instead of inheriting from the Velero server Deployment. +The configurations added in this design including the resource limitations, node selection. +It's possible new configurations are introduced based on this design. + +For the node selection, the repository maintenance Job also inherits from the Velero server deployment before, but the Job may last for a while and cost noneligible resources, especially memory. +The users have the need to choose which k8s node to run the maintenance Job. +This design reuses the data structure introduced by that [node-agent affinity configuration](Implemented/node-agent-affinity.md) to make the repository maintenance job can choose which node running on. + +## Goals +- Let user can choose repository maintenance Job running on which nodes. +- Unify the repository maintenance Job configuration at one place. +- Deprecate the existing `velero server` parameters `--maintenance-job-cpu-request`, `--maintenance-job-mem-request`, `--maintenance-job-cpu-limit` and `--maintenance-job-mem-limit`. + +## Non Goals +- There was an [issue](https://github.com/vmware-tanzu/velero/issues/7911) to require the whole Job's PodSpec should be configurable. That's not in the scope of this design. +- The introduced configuration is universal for all repositories. Configuration dedicated for each repository maintenance Job is not in this design's scope. +- Please notice this new configuration is dedicated for the repository maintenance. Repository itself configuration is not covered. + +## Alternatives Considered +An other option is making each BackupRepository having its own repository maintenance job configuration. +Each BackupRepository has its own storage capacity, so they may require different resource to maintain. +This can be done by adding a field to the BackupRepository CRD to specify the referenced ConfigMap. +After discussing, we think there is no need to make the new configuration for each BackupRepository. +The configuration should be specific for the repository maintenance job. + +## Compatibility +v1.14 uses the `velero server` CLI's parameter to pass the repository maintenance job configuration. +If the ConfigMap labelled as `velero.io/config-map-type=repo-maintenance-job-config` doesn't exist, those parameters are still respected. +If the ConfigMap labelled as `velero.io/config-map-type=repo-maintenance-job-config` exists, the ConfigMap provided parameters override the CLI parameters. +The values not provided in the ConfigMap will have the default value instead of reading from `velero server` CLI parameters. + + +## Design +This design introduces a new ConfigMap labelled as `velero.io/config-map-type=repo-maintenance-job-config` as the source of the repository maintenance job configuration. +If the `repo-maintenance-job-config` doesn't exist, the `velero server` parameters related to the repository maintenance job are used. + +**Notice** +* Velero doesn't own this ConfigMap. If the user wants to customize the repository maintenance job, the user needs to create this ConfigMap. +* Velero reads this ConfigMap content at starting a new repository maintenance job, so the ConfigMap change will take affect until the next created job. + +### Structure +The data structure for ```repo-maintenance-job-config``` is as below: +```go +type Configs struct { + // LoadAffinity is the config for data path load affinity. + LoadAffinity []*LoadAffinity `json:"loadAffinity,omitempty"` + + // The repository maintenance job CPU request setting + CPURequest string `json:"cpuRequest,omitempty"` + + // The repository maintenance job memory request setting + MemRequest string `json:"memRequest,omitempty"` + + // The repository maintenance job CPU limit setting + CPULimit string `json:"cpuLimit,omitempty"` + + // The repository maintenance job memory limit setting + MemLimit string `json:"memLimit,omitempty"` +} + +type LoadAffinity struct { + // NodeSelector specifies the label selector to match nodes + NodeSelector metav1.LabelSelector `json:"nodeSelector"` +} +``` + +The `LoadAffinity` structure is reused from design [node-agent affinity configuration](Implemented/node-agent-affinity.md). +It's possible that the users want to choose nodes that match condition A or condition B to run the job. +For example, the user want to let the nodes having GPU or the nodes locate in the us-central1-x zones to run the job. +This can be done by adding multiple entries in the `LoadAffinity` array. + +### Sample +A sample of the ```repo-maintenance-job-config``` configMap is as below: +``` bash +cat < repo-maintenance-job-config.json +{ + "cpuRequest": "100m", + "cpuLimit": "200m", + "memRequest": "100Gi", + "memLimit": "200Gi", + "loadAffinity": [ + { + "nodeSelector": { + "matchExpressions": [ + { + "key": "device-type", + "operator": "In", + "values": [ + "GPU", + ] + } + ] + } + }, + { + "nodeSelector": { + "matchExpressions": [ + { + "key": "topology.kubernetes.io/zone", + "operator": "In", + "values": [ + "us-central1-a", + "us-central1-b", + "us-central1-c" + ] + } + ] + } + } + ] +} +EOF +``` +This sample showcases two affinity configurations: +- matchLabels: maintenance job runs on nodes with label key `device-type` and value `GPU`. +- matchLabels: maintenance job runs on nodes located in `us-central1-a`, `us-central1-b` and `us-central1-c`. + +To create the configMap, users need to save something like the above sample to a json file and then run below command: +``` +kubectl create cm repo-maintenance-job-config -n velero --from-file=repo-maintenance-job-config.json +kubectl -n velero label cm repo-maintenance-job-config velero.io/config-map-type=repo-maintenance-job-config +``` + +### Implementation +During the Velero server start, `velero server` CLI uses this function calling flow to initializes the `Repository.Manager` `run -> s.initRepoManager -> repository.NewManager`. +In function `manager.PruneRepo`, the ConfigMap labelled as `velero.io/config-map-type=repo-maintenance-job-config` is get to reinitialize the repository `MaintenanceConfig` setting. + +``` go + config, err := GetConfigs(context.Background(), namespace, crClient) + if err == nil { + if len(config.LoadAffinity) > 0 { + mgr.maintenanceCfg.Affinity = toSystemAffinity((*nodeagent.LoadAffinity)(config.LoadAffinity[0])) + } + ...... + } else { + log.Info("Cannot find the repo-maintenance-job-config ConfigMap: %s", err.Error()) + } +```