From 95f6f33d623a12f2ac025a80828bc6e5af7837a0 Mon Sep 17 00:00:00 2001 From: Xun Jiang Date: Tue, 10 Sep 2024 16:31:48 +0800 Subject: [PATCH] Add the ConfigMap-specified parameters into `velero install` CLI. Rename backup-repository-config to backup-repository-configmap. Rename repo-maintenance-job-config to repo-maintenance-job-configmap. Rename node-agent-config to node-agent-configmap. Add those three parameters to `velero install` CLI. Modify the design and the site documents. Signed-off-by: Xun Jiang --- design/Implemented/node-agent-affinity.md | 16 +++---- design/Implemented/node-agent-concurrency.md | 14 +++--- design/backup-pvc-config.md | 16 +++---- design/backup-repo-config.md | 2 +- design/repo_maintenance_job_config.md | 46 ++++++++++++------- .../vgdp-micro-service/vgdp-micro-service.md | 6 +-- pkg/cmd/cli/install/install.go | 24 ++++++++++ pkg/cmd/cli/nodeagent/server.go | 2 +- pkg/cmd/server/config/config.go | 6 +-- pkg/install/daemonset.go | 4 ++ pkg/install/daemonset_test.go | 4 ++ pkg/install/deployment.go | 28 +++++++++++ pkg/install/deployment_test.go | 8 ++++ pkg/install/resources.go | 15 ++++++ .../main/backup-repository-configuration.md | 6 ++- .../data-movement-backup-node-selection.md | 43 +++++++++-------- .../data-movement-backup-pvc-configuration.md | 9 ++-- .../docs/main/node-agent-concurrency.md | 43 ++++++++--------- .../docs/main/repository-maintenance.md | 9 ++-- 19 files changed, 205 insertions(+), 96 deletions(-) diff --git a/design/Implemented/node-agent-affinity.md b/design/Implemented/node-agent-affinity.md index d95312c7ab..604ac7f791 100644 --- a/design/Implemented/node-agent-affinity.md +++ b/design/Implemented/node-agent-affinity.md @@ -31,13 +31,13 @@ Therefore, in order to improve the compatibility, it is worthy to configure the ## Solution -We will use the ```node-agent-config``` configMap to host the node affinity configurations. -This configMap is not created by Velero, users should create it manually on demand. The configMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one configMap in each namespace which applies to node-agent in that namespace only. -Node-agent server checks these configurations at startup time and use it to initiate the related VGDP modules. Therefore, users could edit this configMap any time, but in order to make the changes effective, node-agent server needs to be restarted. -Inside ```node-agent-config``` configMap we will add one new kind of configuration as the data in the configMap, the name is ```loadAffinity```. +We will use the ```node-agent-configmap``` ConfigMap to host the node affinity configurations. +This ConfigMap is not created by Velero, users should create it manually on demand. The ConfigMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one ConfigMap in each namespace which applies to node-agent in that namespace only. +Node-agent server checks these configurations at startup time and use it to initiate the related VGDP modules. Therefore, users could edit this ConfigMap any time, but in order to make the changes effective, node-agent server needs to be restarted. +Inside ```node-agent-configmap``` ConfigMap we will add one new kind of configuration as the data in the ConfigMap, the name is ```loadAffinity```. Users may want to set different LoadAffinity configurations according to different conditions (i.e., for different storages represented by StorageClass, CSI driver, etc.), so we define ```loadAffinity``` as an array. This is for extensibility consideration, at present, we don't implement multiple configurations support, so if there are multiple configurations, we always take the first one in the array. -The data structure for ```node-agent-config``` is as below: +The data structure for ```node-agent-configmap``` is as below: ```go type Configs struct { // LoadConcurrency is the config for load concurrency per node. @@ -63,7 +63,7 @@ Anti-affinity configuration means preventing VGDP instances running in the nodes - It could be defined by `MatchExpressions` of `metav1.LabelSelector`. The labels are defined in `Key` and `Values` of `MatchExpressions` and the `Operator` should be defined as `LabelSelectorOpNotIn` or `LabelSelectorOpDoesNotExist`. ### Sample -A sample of the ```node-agent-config``` configMap is as below: +A sample of the ```node-agent-configmap``` ConfigMap is as below: ```json { "loadAffinity": [ @@ -99,9 +99,9 @@ This sample showcases two affinity configurations: This sample showcases one anti-affinity configuration: - matchExpressions: VGDP instances will not run in nodes with label key `xxx/critial-workload` -To create the configMap, users need to save something like the above sample to a json file and then run below command: +To create the ConfigMap, users need to save something like the above sample to a json file and then run below command: ``` -kubectl create cm node-agent-config -n velero --from-file= +kubectl create cm node-agent-configmap -n velero --from-file= ``` ### Implementation diff --git a/design/Implemented/node-agent-concurrency.md b/design/Implemented/node-agent-concurrency.md index 3efd8bd369..6c1c681f7f 100644 --- a/design/Implemented/node-agent-concurrency.md +++ b/design/Implemented/node-agent-concurrency.md @@ -26,11 +26,11 @@ Therefore, in order to gain the optimized performance with the limited resources ## Solution -We introduce a configMap named ```node-agent-config``` for users to specify the node-agent related configurations. This configMap is not created by Velero, users should create it manually on demand. The configMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one configMap in each namespace which applies to node-agent in that namespace only. -Node-agent server checks these configurations at startup time and use it to initiate the related VGDP modules. Therefore, users could edit this configMap any time, but in order to make the changes effective, node-agent server needs to be restarted. -The ```node-agent-config``` configMap may be used for other purpose of configuring node-agent in future, at present, there is only one kind of configuration as the data in the configMap, the name is ```loadConcurrency```. +We introduce a ConfigMap named ```node-agent-configmap``` for users to specify the node-agent related configurations. This ConfigMap is not created by Velero, users should create it manually on demand. The ConfigMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one ConfigMap in each namespace which applies to node-agent in that namespace only. +Node-agent server checks these configurations at startup time and use it to initiate the related VGDP modules. Therefore, users could edit this ConfigMap any time, but in order to make the changes effective, node-agent server needs to be restarted. +The ```node-agent-configmap``` ConfigMap may be used for other purpose of configuring node-agent in future, at present, there is only one kind of configuration as the data in the ConfigMap, the name is ```loadConcurrency```. -The data structure for ```node-agent-config``` is as below: +The data structure for ```node-agent-configmap``` is as below: ```go type Configs struct { // LoadConcurrency is the config for load concurrency per node. @@ -82,7 +82,7 @@ At least one node is expected to have a label with the specified ```RuledConfigs If one node falls into more than one rules, e.g., if node1 also has the label ```beta.kubernetes.io/instance-type=Standard_B4ms```, the smallest number (3) will be used. ### Sample -A sample of the ```node-agent-config``` configMap is as below: +A sample of the ```node-agent-configmap``` ConfigMap is as below: ```json { "loadConcurrency": { @@ -108,9 +108,9 @@ A sample of the ```node-agent-config``` configMap is as below: } } ``` -To create the configMap, users need to save something like the above sample to a json file and then run below command: +To create the ConfigMap, users need to save something like the above sample to a json file and then run below command: ``` -kubectl create cm node-agent-config -n velero --from-file= +kubectl create cm node-agent-configmap -n velero --from-file= ``` ### Global data path manager diff --git a/design/backup-pvc-config.md b/design/backup-pvc-config.md index 7f37b15059..cf0f64373c 100644 --- a/design/backup-pvc-config.md +++ b/design/backup-pvc-config.md @@ -27,13 +27,13 @@ In some scenarios, users may need to configure some advanced settings of the bac ## Solution -We will use the ```node-agent-config``` configMap to host the backupPVC configurations. -This configMap is not created by Velero, users should create it manually on demand. The configMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one configMap in each namespace which applies to node-agent in that namespace only. -Node-agent server checks these configurations at startup time and use it to initiate the related Exposer modules. Therefore, users could edit this configMap any time, but in order to make the changes effective, node-agent server needs to be restarted. -Inside ```node-agent-config``` configMap we will add one new kind of configuration as the data in the configMap, the name is ```backupPVC```. +We will use the ```node-agent-configmap``` ConfigMap to host the backupPVC configurations. +This ConfigMap is not created by Velero, users should create it manually on demand. The ConfigMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one ConfigMap in each namespace which applies to node-agent in that namespace only. +Node-agent server checks these configurations at startup time and use it to initiate the related Exposer modules. Therefore, users could edit this ConfigMap any time, but in order to make the changes effective, node-agent server needs to be restarted. +Inside ```node-agent-configmap``` ConfigMap we will add one new kind of configuration as the data in the ConfigMap, the name is ```backupPVC```. Users may want to set different backupPVC configurations for different volumes, therefore, we define the configurations as a map and allow users to specific configurations by storage class. Specifically, the key of the map element is the storage class name used by the sourcePVC and the value is the set of configurations for the backupPVC created for the sourcePVC. -The data structure for ```node-agent-config``` is as below: +The data structure for ```node-agent-configmap``` is as below: ```go type Configs struct { // LoadConcurrency is the config for data path load concurrency per node. @@ -56,7 +56,7 @@ type BackupPVC struct { ``` ### Sample -A sample of the ```node-agent-config``` configMap is as below: +A sample of the ```node-agent-configmap``` ConfigMap is as below: ```json { "backupPVC": { @@ -74,9 +74,9 @@ A sample of the ```node-agent-config``` configMap is as below: } ``` -To create the configMap, users need to save something like the above sample to a json file and then run below command: +To create the ConfigMap, users need to save something like the above sample to a json file and then run below command: ``` -kubectl create cm node-agent-config -n velero --from-file= +kubectl create cm node-agent-configmap -n velero --from-file= ``` ### Implementation diff --git a/design/backup-repo-config.md b/design/backup-repo-config.md index f6a25e8ec3..0fbf529d18 100644 --- a/design/backup-repo-config.md +++ b/design/backup-repo-config.md @@ -79,7 +79,7 @@ Therefore, a BackupRepository configMap is introduced as a template of the confi When the backup repository CR is created by the BackupRepository controller, the configurations in the configMap are copied to the ```repositoryConfig``` field. For an existing BackupRepository CR, the configMap is never visited, if users want to modify the configuration value, they should directly edit the BackupRepository CR. -The BackupRepository configMap is created by users in velero installation namespace. The configMap name must be specified in the velero server parameter ```--backup-repository-config```, otherwise, it won't effect. +The BackupRepository configMap is created by users in velero installation namespace. The configMap name must be specified in the velero server parameter ```--backup-repository-configmap```, otherwise, it won't effect. If the configMap name is specified but the configMap doesn't exist by the time of a backup repository is created, the configMap name is ignored. For any reason, if the configMap doesn't effect, nothing is specified to the backup repository CR, so the Unified Repo modules use the hard-coded values to configure the backup repository. diff --git a/design/repo_maintenance_job_config.md b/design/repo_maintenance_job_config.md index 07a7a21339..aa909c3a4b 100644 --- a/design/repo_maintenance_job_config.md +++ b/design/repo_maintenance_job_config.md @@ -27,9 +27,9 @@ This design reuses the data structure introduced by design [node-agent affinity ## Compatibility v1.14 uses the `velero server` CLI's parameter to pass the repository maintenance job configuration. In v1.15, those parameters are still kept, including `--maintenance-job-cpu-request`, `--maintenance-job-mem-request`, `--maintenance-job-cpu-limit`, `--maintenance-job-mem-limit`, and `--keep-latest-maintenance-jobs`. -But the parameters read from the ConfigMap specified by `velero server` CLI parameter `--repo-maintenance-job-config` introduced by this design have a higher priority. +But the parameters read from the ConfigMap specified by `velero server` CLI parameter `--repo-maintenance-job-configmap` introduced by this design have a higher priority. -If there `--repo-maintenance-job-config` is not specified, then the `velero server` parameters are used if provided. +If there `--repo-maintenance-job-configmap` is not specified, then the `velero server` parameters are used if provided. If the `velero server` parameters are not specified too, then the default values are used. * `--keep-latest-maintenance-jobs` default value is 3. @@ -41,19 +41,19 @@ If the `velero server` parameters are not specified too, then the default values ## Deprecation Propose to deprecate the `velero server` parameters `--maintenance-job-cpu-request`, `--maintenance-job-mem-request`, `--maintenance-job-cpu-limit`, `--maintenance-job-mem-limit`, and `--keep-latest-maintenance-jobs` in release-1.15. That means those parameters will be deleted in release-1.17. -After deletion, those resources-related parameters are replaced by the ConfigMap specified by `velero server` CLI's parameter `--repo-maintenance-job-config`. +After deletion, those resources-related parameters are replaced by the ConfigMap specified by `velero server` CLI's parameter `--repo-maintenance-job-configmap`. `--keep-latest-maintenance-jobs` is deleted from `velero server` CLI. It turns into a non-configurable internal parameter, and its value is 3. Please check [issue 7923](https://github.com/vmware-tanzu/velero/issues/7923) for more information why deleting this parameter. ## Design -This design introduces a new ConfigMap specified by `velero server` CLI parameter `--repo-maintenance-job-config` as the source of the repository maintenance job configuration. The specified ConfigMap is read from the namespace where Velero is installed. +This design introduces a new ConfigMap specified by `velero server` CLI parameter `--repo-maintenance-job-configmap` as the source of the repository maintenance job configuration. The specified ConfigMap is read from the namespace where Velero is installed. If the ConfigMap doesn't exist, the internal default values are used. -Example of using the parameter `--repo-maintenance-job-config`: +Example of using the parameter `--repo-maintenance-job-configmap`: ``` velero server \ ... - --repo-maintenance-job-config repo-job-config + --repo-maintenance-job-configmap repo-job-config ... ``` @@ -62,7 +62,7 @@ velero server \ * Velero reads this ConfigMap content at starting a new repository maintenance job, so the ConfigMap change will not take affect until the next created job. ### Structure -The data structure for ```repo-maintenance-job-config``` is as below: +The data structure for ```repo-maintenance-job-configmap``` is as below: ```go type Configs struct { // LoadAffinity is the config for data path load affinity. @@ -124,7 +124,7 @@ For example, the user want to let the nodes is in a specified machine type or th This can be done by adding multiple entries in the `LoadAffinity` array. ### Affinity Example -A sample of the ```repo-maintenance-job-config``` ConfigMap is as below: +A sample of the ```repo-maintenance-job-configmap``` ConfigMap is as below: ``` bash cat < repo-maintenance-job-config.json { @@ -277,17 +277,29 @@ config := Configs { ### Implementation During the Velero repository controller starts to maintain a repository, it will call the repository manager's `PruneRepo` function to build the maintenance Job. -The ConfigMap specified by `velero server` CLI parameter `--repo-maintenance-job-config` is get to reinitialize the repository `MaintenanceConfig` setting. +The ConfigMap specified by `velero server` CLI parameter `--repo-maintenance-job-configmap` is get to reinitialize the repository `MaintenanceConfig` setting. ``` go - config, err := GetConfigs(context.Background(), namespace, crClient) - if err == nil { - if len(config.LoadAffinity) > 0 { - mgr.maintenanceCfg.Affinity = toSystemAffinity((*nodeagent.LoadAffinity)(config.LoadAffinity[0])) - } - ...... - } else { - log.Info("Cannot find the repo-maintenance-job-config ConfigMap: %s", err.Error()) + jobConfig, err := getMaintenanceJobConfig( + context.Background(), + m.client, + m.log, + m.namespace, + m.repoMaintenanceJobConfig, + repo, + ) + if err != nil { + log.Infof("Cannot find the repo-maintenance-job-config ConfigMap: %s. Use default value.", err.Error()) + } + + log.Info("Start to maintenance repo") + + maintenanceJob, err := m.buildMaintenanceJob( + jobConfig, + param, + ) + if err != nil { + return errors.Wrap(err, "error to build maintenance job") } ``` diff --git a/design/vgdp-micro-service/vgdp-micro-service.md b/design/vgdp-micro-service/vgdp-micro-service.md index d40eda1e9b..7dd6203e0e 100644 --- a/design/vgdp-micro-service/vgdp-micro-service.md +++ b/design/vgdp-micro-service/vgdp-micro-service.md @@ -60,7 +60,7 @@ Below are the parameters of the commands: **log-format** and **log-level**: This is to control the behavior of log generation inside VGDP-MS. In order to have the same capability and permission with node-agent, below pod configurations are inherited from node-agent and set to backupPod/restorePod's spec: -- Volumes: Some configMaps will be mapped as volumes to node-agent, so we add the same volumes of node-agent to the backupPod/restorePod +- Volumes: Some ConfigMaps will be mapped as volumes to node-agent, so we add the same volumes of node-agent to the backupPod/restorePod - Environment Variables - Security Contexts We may not actually need all the capabilities in the VGDP-MS as the node-agent. At present, we just duplicate all of them, if we find any problem in future, we can filter out the capabilities that are not required by VGDP-MS. @@ -178,7 +178,7 @@ This log redirecting mechanism is thread safe since the hook acquires the write ### Resource Control The CPU/memory resource of backupPod/restorePod is configurable, which means users are allowed to configure resources per volume backup/restore. -By default, the [Best Effort policy][5] is used, and users are allowed to change it through the ```node-agent-config``` configMap. Specifically, we add below structures to the configMap: +By default, the [Best Effort policy][5] is used, and users are allowed to change it through the ```node-agent-configmap``` ConfigMap. Specifically, we add below structures to the ConfigMap: ``` type Configs struct { // PodResources is the resource config for various types of pods launched by node-agent, i.e., data mover pods. @@ -194,7 +194,7 @@ type PodResources struct { ``` The string values must mactch Kubernetes Quantity expressions; for each resource, the "request" value must not be larger than the "limit" value. Otherwise, if any one of the values fail, all the resource configurations will be ignored. -The configurations are loaded by node-agent at start time, so users can change the values in the configMap any time, but the changes won't effect until node-agent restarts. +The configurations are loaded by node-agent at start time, so users can change the values in the ConfigMap any time, but the changes won't effect until node-agent restarts. ## node-agent diff --git a/pkg/cmd/cli/install/install.go b/pkg/cmd/cli/install/install.go index 04c8ae1764..5390a02848 100644 --- a/pkg/cmd/cli/install/install.go +++ b/pkg/cmd/cli/install/install.go @@ -86,6 +86,9 @@ type Options struct { ScheduleSkipImmediately bool PodResources kubeutil.PodResources KeepLatestMaintenanceJobs int + BackupRepoConfig string + RepoMaintenanceJobConfig string + NodeAgentConfigMap string } // BindFlags adds command line values to the options struct. @@ -161,6 +164,24 @@ func (o *Options) BindFlags(flags *pflag.FlagSet) { o.PodResources.MemoryLimit, "Memory limit for maintenance jobs. Default is no limit.", ) + flags.StringVar( + &o.BackupRepoConfig, + "backup-repository-configmap", + o.BackupRepoConfig, + "The name of configMap containing backup repository configurations.", + ) + flags.StringVar( + &o.RepoMaintenanceJobConfig, + "repo-maintenance-job-configmap", + o.RepoMaintenanceJobConfig, + "The name of ConfigMap containing repository maintenance Job configurations.", + ) + flags.StringVar( + &o.NodeAgentConfigMap, + "node-agent-configmap", + o.NodeAgentConfigMap, + "The name of ConfigMap containing node-agent configurations.", + ) } // NewInstallOptions instantiates a new, default InstallOptions struct. @@ -259,6 +280,9 @@ func (o *Options) AsVeleroOptions() (*install.VeleroOptions, error) { ScheduleSkipImmediately: o.ScheduleSkipImmediately, PodResources: o.PodResources, KeepLatestMaintenanceJobs: o.KeepLatestMaintenanceJobs, + BackupRepoConfig: o.BackupRepoConfig, + RepoMaintenanceJobConfig: o.RepoMaintenanceJobConfig, + NodeAgentConfigMap: o.NodeAgentConfigMap, }, nil } diff --git a/pkg/cmd/cli/nodeagent/server.go b/pkg/cmd/cli/nodeagent/server.go index 73f0fa0d97..ba7c1610fb 100644 --- a/pkg/cmd/cli/nodeagent/server.go +++ b/pkg/cmd/cli/nodeagent/server.go @@ -125,7 +125,7 @@ func NewServerCommand(f client.Factory) *cobra.Command { command.Flags().DurationVar(&config.resourceTimeout, "resource-timeout", config.resourceTimeout, "How long to wait for resource processes which are not covered by other specific timeout parameters. Default is 10 minutes.") command.Flags().DurationVar(&config.dataMoverPrepareTimeout, "data-mover-prepare-timeout", config.dataMoverPrepareTimeout, "How long to wait for preparing a DataUpload/DataDownload. Default is 30 minutes.") command.Flags().StringVar(&config.metricsAddress, "metrics-address", config.metricsAddress, "The address to expose prometheus metrics") - command.Flags().StringVar(&config.nodeAgentConfig, "node-agent-config", config.nodeAgentConfig, "The name of configMap containing node-agent configurations.") + command.Flags().StringVar(&config.nodeAgentConfig, "node-agent-configmap", config.nodeAgentConfig, "The name of ConfigMap containing node-agent configurations.") return command } diff --git a/pkg/cmd/server/config/config.go b/pkg/cmd/server/config/config.go index d6cedbc5ce..032cfdd763 100644 --- a/pkg/cmd/server/config/config.go +++ b/pkg/cmd/server/config/config.go @@ -280,13 +280,13 @@ func (c *Config) BindFlags(flags *pflag.FlagSet) { ) flags.StringVar( &c.BackupRepoConfig, - "backup-repository-config", + "backup-repository-configmap", c.BackupRepoConfig, - "The name of configMap containing backup repository configurations.", + "The name of ConfigMap containing backup repository configurations.", ) flags.StringVar( &c.RepoMaintenanceJobConfig, - "repo-maintenance-job-config", + "repo-maintenance-job-configmap", c.RepoMaintenanceJobConfig, "The name of ConfigMap containing repository maintenance Job configurations.", ) diff --git a/pkg/install/daemonset.go b/pkg/install/daemonset.go index 17580f05d6..9cc3a814c1 100644 --- a/pkg/install/daemonset.go +++ b/pkg/install/daemonset.go @@ -50,6 +50,10 @@ func DaemonSet(namespace string, opts ...podTemplateOption) *appsv1.DaemonSet { daemonSetArgs = append(daemonSetArgs, fmt.Sprintf("--features=%s", strings.Join(c.features, ","))) } + if len(c.nodeAgentConfigMap) > 0 { + daemonSetArgs = append(daemonSetArgs, fmt.Sprintf("--node-agent-configmap=%s", c.nodeAgentConfigMap)) + } + userID := int64(0) mountPropagationMode := corev1.MountPropagationHostToContainer diff --git a/pkg/install/daemonset_test.go b/pkg/install/daemonset_test.go index a9cce29ea4..c181306e2c 100644 --- a/pkg/install/daemonset_test.go +++ b/pkg/install/daemonset_test.go @@ -41,6 +41,10 @@ func TestDaemonSet(t *testing.T) { assert.Len(t, ds.Spec.Template.Spec.Containers[0].Args, 3) assert.Equal(t, "--features=foo,bar,baz", ds.Spec.Template.Spec.Containers[0].Args[2]) + ds = DaemonSet("velero", WithNodeAgentConfigMap("node-agent-config-map")) + assert.Len(t, ds.Spec.Template.Spec.Containers[0].Args, 3) + assert.Equal(t, "--node-agent-configmap=node-agent-config-map", ds.Spec.Template.Spec.Containers[0].Args[2]) + ds = DaemonSet("velero", WithServiceAccountName("test-sa")) assert.Equal(t, "test-sa", ds.Spec.Template.Spec.ServiceAccountName) } diff --git a/pkg/install/deployment.go b/pkg/install/deployment.go index e4aac20eaa..713d6fdbde 100644 --- a/pkg/install/deployment.go +++ b/pkg/install/deployment.go @@ -54,6 +54,9 @@ type podTemplateConfig struct { scheduleSkipImmediately bool podResources kube.PodResources keepLatestMaintenanceJobs int + backupRepoConfig string + repoMaintenanceJobConfig string + nodeAgentConfigMap string } func WithImage(image string) podTemplateOption { @@ -174,6 +177,12 @@ func WithPrivilegedNodeAgent(b bool) podTemplateOption { } } +func WithNodeAgentConfigMap(nodeAgentConfigMap string) podTemplateOption { + return func(c *podTemplateConfig) { + c.nodeAgentConfigMap = nodeAgentConfigMap + } +} + func WithScheduleSkipImmediately(b bool) podTemplateOption { return func(c *podTemplateConfig) { c.scheduleSkipImmediately = b @@ -192,6 +201,17 @@ func WithKeepLatestMaintenanceJobs(keepLatestMaintenanceJobs int) podTemplateOpt } } +func WithBackupRepoConfig(backupRepoConfig string) podTemplateOption { + return func(c *podTemplateConfig) { + c.backupRepoConfig = backupRepoConfig + } +} +func WithRepoMaintenanceJobConfig(repoMaintenanceJobConfig string) podTemplateOption { + return func(c *podTemplateConfig) { + c.repoMaintenanceJobConfig = repoMaintenanceJobConfig + } +} + func Deployment(namespace string, opts ...podTemplateOption) *appsv1.Deployment { // TODO: Add support for server args c := &podTemplateConfig{ @@ -269,6 +289,14 @@ func Deployment(namespace string, opts ...podTemplateOption) *appsv1.Deployment args = append(args, fmt.Sprintf("--maintenance-job-mem-request=%s", c.podResources.MemoryRequest)) } + if len(c.backupRepoConfig) > 0 { + args = append(args, fmt.Sprintf("--backup-repository-configmap=%s", c.backupRepoConfig)) + } + + if len(c.repoMaintenanceJobConfig) > 0 { + args = append(args, fmt.Sprintf("--repo-maintenance-job-configmap=%s", c.repoMaintenanceJobConfig)) + } + deployment := &appsv1.Deployment{ ObjectMeta: objectMeta(namespace, "velero"), TypeMeta: metav1.TypeMeta{ diff --git a/pkg/install/deployment_test.go b/pkg/install/deployment_test.go index 598b585db3..f98ff0a81d 100644 --- a/pkg/install/deployment_test.go +++ b/pkg/install/deployment_test.go @@ -91,4 +91,12 @@ func TestDeployment(t *testing.T) { assert.Equal(t, "--maintenance-job-cpu-request=100m", deploy.Spec.Template.Spec.Containers[0].Args[2]) assert.Equal(t, "--maintenance-job-mem-limit=512Mi", deploy.Spec.Template.Spec.Containers[0].Args[3]) assert.Equal(t, "--maintenance-job-mem-request=256Mi", deploy.Spec.Template.Spec.Containers[0].Args[4]) + + deploy = Deployment("velero", WithBackupRepoConfig("test-backup-repo-config")) + assert.Len(t, deploy.Spec.Template.Spec.Containers[0].Args, 2) + assert.Equal(t, "--backup-repository-configmap=test-backup-repo-config", deploy.Spec.Template.Spec.Containers[0].Args[1]) + + deploy = Deployment("velero", WithRepoMaintenanceJobConfig("test-repo-maintenance-config")) + assert.Len(t, deploy.Spec.Template.Spec.Containers[0].Args, 2) + assert.Equal(t, "--repo-maintenance-job-configmap=test-repo-maintenance-config", deploy.Spec.Template.Spec.Containers[0].Args[1]) } diff --git a/pkg/install/resources.go b/pkg/install/resources.go index f65737f915..4a834b8d4b 100644 --- a/pkg/install/resources.go +++ b/pkg/install/resources.go @@ -264,6 +264,9 @@ type VeleroOptions struct { ScheduleSkipImmediately bool PodResources kube.PodResources KeepLatestMaintenanceJobs int + BackupRepoConfig string + RepoMaintenanceJobConfig string + NodeAgentConfigMap string } func AllCRDs() *unstructured.UnstructuredList { @@ -376,6 +379,14 @@ func AllResources(o *VeleroOptions) *unstructured.UnstructuredList { deployOpts = append(deployOpts, WithDisableInformerCache(true)) } + if len(o.BackupRepoConfig) > 0 { + deployOpts = append(deployOpts, WithBackupRepoConfig(o.BackupRepoConfig)) + } + + if len(o.RepoMaintenanceJobConfig) > 0 { + deployOpts = append(deployOpts, WithRepoMaintenanceJobConfig(o.RepoMaintenanceJobConfig)) + } + deploy := Deployment(o.Namespace, deployOpts...) if err := appendUnstructured(resources, deploy); err != nil { @@ -397,6 +408,10 @@ func AllResources(o *VeleroOptions) *unstructured.UnstructuredList { if o.PrivilegedNodeAgent { dsOpts = append(dsOpts, WithPrivilegedNodeAgent(true)) } + if len(o.NodeAgentConfigMap) > 0 { + dsOpts = append(dsOpts, WithNodeAgentConfigMap(o.NodeAgentConfigMap)) + } + ds := DaemonSet(o.Namespace, dsOpts...) if err := appendUnstructured(resources, ds); err != nil { fmt.Printf("error appending DaemonSet %s: %s\n", ds.GetName(), err.Error()) diff --git a/site/content/docs/main/backup-repository-configuration.md b/site/content/docs/main/backup-repository-configuration.md index 8d33b0176e..46301c54eb 100644 --- a/site/content/docs/main/backup-repository-configuration.md +++ b/site/content/docs/main/backup-repository-configuration.md @@ -8,7 +8,11 @@ Velero uses selectable backup repositories for various backup/restore methods, i Velero uses a BackupRepository CR to represent the instance of the backup repository. Now, a new field `repositoryConfig` is added to support various configurations to the underlying backup repository. Velero also allows you to specify configurations before the BackupRepository CR is created through a configMap. The configurations in the configMap will be copied to the BackupRepository CR when it is created at the due time. -The configMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one configMap in each namespace which applies to Velero instance in that namespace only. The name of the configMap should be specified in the Velero server parameter `--backup-repository-config`. +The configMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one configMap in each namespace which applies to Velero instance in that namespace only. The name of the configMap should be specified in the Velero server parameter `--backup-repository-configmap`. + + +The users can specify the ConfigMap name during velero installation by CLI: +`velero install --backup-repository-configmap=` Conclusively, you have two ways to add/change/delete configurations of a backup repository: - If the BackupRepository CR for the backup repository is already there, you should modify the `repositoryConfig` field. The new changes will be applied to the backup repository at the due time, it doesn't require Velero server to restart. diff --git a/site/content/docs/main/data-movement-backup-node-selection.md b/site/content/docs/main/data-movement-backup-node-selection.md index d8bd8bbd79..18b42e3636 100644 --- a/site/content/docs/main/data-movement-backup-node-selection.md +++ b/site/content/docs/main/data-movement-backup-node-selection.md @@ -3,20 +3,23 @@ title: "Node Selection for Data Movement Backup" layout: docs --- -Velero node-agent is a daemonset hosting the data movement modules to complete the concrete work of backups/restores. -Varying from the data size, data complexity, resource availability, the data movement may take a long time and remarkable resources (CPU, memory, network bandwidth, etc.) during the backup and restore. +Velero node-agent is a daemonset hosting the data movement modules to complete the concrete work of backups/restores. +Varying from the data size, data complexity, resource availability, the data movement may take a long time and remarkable resources (CPU, memory, network bandwidth, etc.) during the backup and restore. -Velero data movement backup supports to constrain the nodes where it runs. This is helpful in below scenarios: -- Prevent the data movement backup from running in specific nodes because users have more critical workloads in the nodes -- Constrain the data movement backup to run in specific nodes because these nodes have more resources than others -- Constrain the data movement backup to run in specific nodes because the storage allows volume/snapshot provisions in these nodes only +Velero data movement backup supports to constrain the nodes where it runs. This is helpful in below scenarios: +- Prevent the data movement backup from running in specific nodes because users have more critical workloads in the nodes +- Constrain the data movement backup to run in specific nodes because these nodes have more resources than others +- Constrain the data movement backup to run in specific nodes because the storage allows volume/snapshot provisions in these nodes only -Velero introduces a new section in the node-agent configMap, called ```loadAffinity```, through which you can specify the nodes to/not to run data movement backups, in the affinity and anti-affinity flavors. -If it is not there, a configMap should be created manually. The configMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one configMap in each namespace which applies to node-agent in that namespace only. The name of the configMap should be specified in the node-agent server parameter ```--node-agent-config```. -Node-agent server checks these configurations at startup time. Therefore, you could edit this configMap any time, but in order to make the changes effective, node-agent server needs to be restarted. +Velero introduces a new section in the node-agent ConfigMap, called ```loadAffinity```, through which you can specify the nodes to/not to run data movement backups, in the affinity and anti-affinity flavors. +If it is not there, a ConfigMap should be created manually. The ConfigMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one ConfigMap in each namespace which applies to node-agent in that namespace only. The name of the ConfigMap should be specified in the node-agent server parameter ```--node-agent-configmap```. +Node-agent server checks these configurations at startup time. Therefore, you could edit this ConfigMap any time, but in order to make the changes effective, node-agent server needs to be restarted. + +The users can specify the ConfigMap name during velero installation by CLI: +`velero install --node-agent-configmap=` ### Sample -Here is a sample of the configMap with ```loadAffinity```: +Here is a sample of the ConfigMap with ```loadAffinity```: ```json { "loadAffinity": [ @@ -45,31 +48,31 @@ Here is a sample of the configMap with ```loadAffinity```: ] } ``` -To create the configMap, save something like the above sample to a json file and then run below command: +To create the ConfigMap, save something like the above sample to a json file and then run below command: ``` -kubectl create cm node-agent-config -n velero --from-file= +kubectl create cm node-agent-configmap -n velero --from-file= ``` -To provide the configMap to node-agent, edit the node-agent daemonset and add the ```- --node-agent-config``` argument to the spec: -1. Open the node-agent daemonset spec +To provide the ConfigMap to node-agent, edit the node-agent daemonset and add the ```- --node-agent-configmap``` argument to the spec: +1. Open the node-agent daemonset spec ``` kubectl edit ds node-agent -n velero ``` -2. Add ```- --node-agent-config``` to ```spec.template.spec.containers``` +2. Add ```- --node-agent-configmap``` to ```spec.template.spec.containers``` ``` spec: template: spec: containers: - args: - - --node-agent-config= + - --node-agent-configmap= ``` ### Affinity Affinity configuration means allowing the data movement backup to run in the nodes specified. There are two ways to define it: -- It could be defined by `MatchLabels`. The labels defined in `MatchLabels` means a `LabelSelectorOpIn` operation by default, so in the current context, they will be treated as affinity rules. In the above sample, it defines to run data movement backups in nodes with label `beta.kubernetes.io/instance-type` of value `Standard_B4ms` (Run data movement backups in `Standard_B4ms` nodes only). -- It could be defined by `MatchExpressions`. The labels are defined in `Key` and `Values` of `MatchExpressions` and the `Operator` should be defined as `LabelSelectorOpIn` or `LabelSelectorOpExists`. In the above sample, it defines to run data movement backups in nodes with label `kubernetes.io/hostname` of values `node-1`, `node-2` and `node-3` (Run data movement backups in `node-1`, `node-2` and `node-3` only). +- It could be defined by `MatchLabels`. The labels defined in `MatchLabels` means a `LabelSelectorOpIn` operation by default, so in the current context, they will be treated as affinity rules. In the above sample, it defines to run data movement backups in nodes with label `beta.kubernetes.io/instance-type` of value `Standard_B4ms` (Run data movement backups in `Standard_B4ms` nodes only). +- It could be defined by `MatchExpressions`. The labels are defined in `Key` and `Values` of `MatchExpressions` and the `Operator` should be defined as `LabelSelectorOpIn` or `LabelSelectorOpExists`. In the above sample, it defines to run data movement backups in nodes with label `kubernetes.io/hostname` of values `node-1`, `node-2` and `node-3` (Run data movement backups in `node-1`, `node-2` and `node-3` only). ### Anti-affinity -Anti-affinity configuration means preventing the data movement backup from running in the nodes specified. Below is the way to define it: -- It could be defined by `MatchExpressions`. The labels are defined in `Key` and `Values` of `MatchExpressions` and the `Operator` should be defined as `LabelSelectorOpNotIn` or `LabelSelectorOpDoesNotExist`. In the above sample, it disallows data movement backups to run in nodes with label `xxx/critial-workload`. \ No newline at end of file +Anti-affinity configuration means preventing the data movement backup from running in the nodes specified. Below is the way to define it: +- It could be defined by `MatchExpressions`. The labels are defined in `Key` and `Values` of `MatchExpressions` and the `Operator` should be defined as `LabelSelectorOpNotIn` or `LabelSelectorOpDoesNotExist`. In the above sample, it disallows data movement backups to run in nodes with label `xxx/critial-workload`. \ No newline at end of file diff --git a/site/content/docs/main/data-movement-backup-pvc-configuration.md b/site/content/docs/main/data-movement-backup-pvc-configuration.md index 61bbf53e14..ee4ebe1b3b 100644 --- a/site/content/docs/main/data-movement-backup-pvc-configuration.md +++ b/site/content/docs/main/data-movement-backup-pvc-configuration.md @@ -16,7 +16,7 @@ operation could perform better. Specifically: However, it doesn't make any sense to keep replicas when an intermediate volume used by the backup. Therefore, users should be allowed to configure another storage class specifically used by the `backupPVC`. -Velero introduces a new section in the node agent configuration configMap (the name of this configMap is passed using `--node-agent-config` velero server argument) +Velero introduces a new section in the node agent configuration ConfigMap (the name of this ConfigMap is passed using `--node-agent-configmap` velero server argument) called `backupPVC`, through which you can specify the following configurations: @@ -26,7 +26,10 @@ default the source PVC's storage class will be used. - `readOnly`: This is a boolean value. If set to `true` then `ReadOnlyMany` will be the only value set to the backupPVC's access modes. Otherwise `ReadWriteOnce` value will be used. -A sample of `backupPVC` config as part of the configMap would look like: +The users can specify the ConfigMap name during velero installation by CLI: +`velero install --node-agent-configmap=` + +A sample of `backupPVC` config as part of the ConfigMap would look like: ```json { "backupPVC": { @@ -47,7 +50,7 @@ A sample of `backupPVC` config as part of the configMap would look like: **Note:** - Users should make sure that the storage class specified in `backupPVC` config should exist in the cluster and can be used by the `backupPVC`, otherwise the corresponding DataUpload CR will stay in `Accepted` phase until timeout (data movement prepare timeout value is 30m by default). -- If the users are setting `readOnly` value as `true` in the `backupPVC` config then they must also make sure that the storage class that is being used for +- If the users are setting `readOnly` value as `true` in the `backupPVC` config then they must also make sure that the storage class that is being used for `backupPVC` should support creation of `ReadOnlyMany` PVC from a snapshot, otherwise the corresponding DataUpload CR will stay in `Accepted` phase until timeout (data movement prepare timeout value is 30m by default). - If any of the above problems occur, then the DataUpload CR is `canceled` after timeout, and the backupPod and backupPVC will be deleted, and the backup diff --git a/site/content/docs/main/node-agent-concurrency.md b/site/content/docs/main/node-agent-concurrency.md index b4c6554b5f..4e7e476952 100644 --- a/site/content/docs/main/node-agent-concurrency.md +++ b/site/content/docs/main/node-agent-concurrency.md @@ -4,35 +4,38 @@ layout: docs --- Velero node-agent is a daemonset hosting modules to complete the concrete tasks of backups/restores, i.e., file system backup/restore, CSI snapshot data movement. -Varying from the data size, data complexity, resource availability, the tasks may take a long time and remarkable resources (CPU, memory, network bandwidth, etc.). These tasks make the loads of node-agent. +Varying from the data size, data complexity, resource availability, the tasks may take a long time and remarkable resources (CPU, memory, network bandwidth, etc.). These tasks make the loads of node-agent. -Node-agent concurrency configurations allow you to configure the concurrent number of node-agent loads per node. When the resources are sufficient in nodes, you can set a large concurrent number, so as to reduce the backup/restore time; otherwise, the concurrency should be reduced, otherwise, the backup/restore may encounter problems, i.e., time lagging, hang or OOM kill. +Node-agent concurrency configurations allow you to configure the concurrent number of node-agent loads per node. When the resources are sufficient in nodes, you can set a large concurrent number, so as to reduce the backup/restore time; otherwise, the concurrency should be reduced, otherwise, the backup/restore may encounter problems, i.e., time lagging, hang or OOM kill. -To set Node-agent concurrency configurations, a configMap should be created manually. The configMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one configMap in each namespace which applies to node-agent in that namespace only. The name of the configMap should be specified in the node-agent server parameter ```--node-agent-config```. -Node-agent server checks these configurations at startup time. Therefore, you could edit this configMap any time, but in order to make the changes effective, node-agent server needs to be restarted. +To set Node-agent concurrency configurations, a configMap should be created manually. The configMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one configMap in each namespace which applies to node-agent in that namespace only. The name of the configMap should be specified in the node-agent server parameter ```--node-agent-configmap```. +Node-agent server checks these configurations at startup time. Therefore, you could edit this configMap any time, but in order to make the changes effective, node-agent server needs to be restarted. + +The users can specify the ConfigMap name during velero installation by CLI: +`velero install --node-agent-configmap=` ### Global concurrent number -You can specify a concurrent number that will be applied to all nodes if the per-node number is not specified. This number is set through ```globalConfig``` field in ```loadConcurrency```. -The number starts from 1 which means there is no concurrency, only one load is allowed. There is no roof limit. If this number is not specified or not valid, a hard-coded default value will be used, the value is set to 1. +You can specify a concurrent number that will be applied to all nodes if the per-node number is not specified. This number is set through ```globalConfig``` field in ```loadConcurrency```. +The number starts from 1 which means there is no concurrency, only one load is allowed. There is no roof limit. If this number is not specified or not valid, a hard-coded default value will be used, the value is set to 1. ### Per-node concurrent number You can specify different concurrent number per node, for example, you can set 3 concurrent instances in Node-1, 2 instances in Node-2 and 1 instance in Node-3. -The range of Per-node concurrent number is the same with Global concurrent number. Per-node concurrent number is preferable to Global concurrent number, so it will overwrite the Global concurrent number for that node. +The range of Per-node concurrent number is the same with Global concurrent number. Per-node concurrent number is preferable to Global concurrent number, so it will overwrite the Global concurrent number for that node. -Per-node concurrent number is implemented through ```perNodeConfig``` field in ```loadConcurrency```. +Per-node concurrent number is implemented through ```perNodeConfig``` field in ```loadConcurrency```. ```perNodeConfig``` is a list of ```RuledConfigs``` each item of which matches one or more nodes by label selectors and specify the concurrent number for the matched nodes. Here is an example of the ```perNodeConfig``: ``` "nodeSelector: kubernetes.io/hostname=node1; number: 3" "nodeSelector: beta.kubernetes.io/instance-type=Standard_B4ms; number: 5" ``` -The first element means the node with host name ```node1``` gets the Per-node concurrent number of 3. -The second element means all the nodes with label ```beta.kubernetes.io/instance-type``` of value ```Standard_B4ms``` get the Per-node concurrent number of 5. -At least one node is expected to have a label with the specified ```RuledConfigs``` element (rule). If no node is with this label, the Per-node rule makes no effect. -If one node falls into more than one rules, e.g., if node1 also has the label ```beta.kubernetes.io/instance-type=Standard_B4ms```, the smallest number (3) will be used. +The first element means the node with host name ```node1``` gets the Per-node concurrent number of 3. +The second element means all the nodes with label ```beta.kubernetes.io/instance-type``` of value ```Standard_B4ms``` get the Per-node concurrent number of 5. +At least one node is expected to have a label with the specified ```RuledConfigs``` element (rule). If no node is with this label, the Per-node rule makes no effect. +If one node falls into more than one rules, e.g., if node1 also has the label ```beta.kubernetes.io/instance-type=Standard_B4ms```, the smallest number (3) will be used. ### Sample -A sample of the complete configMap is as below: +A sample of the complete ConfigMap is as below: ```json { "loadConcurrency": { @@ -58,23 +61,21 @@ A sample of the complete configMap is as below: } } ``` -To create the configMap, save something like the above sample to a json file and then run below command: +To create the ConfigMap, save something like the above sample to a json file and then run below command: ``` -kubectl create cm node-agent-config -n velero --from-file= +kubectl create cm node-agent-configmap -n velero --from-file= ``` -To provide the configMap to node-agent, edit the node-agent daemonset and add the ```- --node-agent-config``` argument to the spec: -1. Open the node-agent daemonset spec +To provide the ConfigMap to node-agent, edit the node-agent daemonset and add the ```- --node-agent-configmap``` argument to the spec: +1. Open the node-agent daemonset spec ``` kubectl edit ds node-agent -n velero ``` -2. Add ```- --node-agent-config``` to ```spec.template.spec.containers``` +2. Add ```- --node-agent-configmap``` to ```spec.template.spec.containers``` ``` spec: template: spec: containers: - args: - - --node-agent-config= + - --node-agent-configmap= ``` - - diff --git a/site/content/docs/main/repository-maintenance.md b/site/content/docs/main/repository-maintenance.md index 5e0c4bac4e..8c712a9d7c 100644 --- a/site/content/docs/main/repository-maintenance.md +++ b/site/content/docs/main/repository-maintenance.md @@ -9,11 +9,14 @@ Before v1.14.0, Velero performs periodic maintenance on the repository within Ve For repository maintenance jobs, there's no limit on resources by default. You could configure the job resource limitation based on target data to be backed up. -From v1.15 and on, Velero introduces a new ConfigMap, specified by `velero server --repo-maintenance-job-config` parameter, to set repository maintenance Job configuration, including Node Affinity and resources. The old `velero server` parameters ( `--maintenance-job-cpu-request`, `--maintenance-job-mem-request`, `--maintenance-job-cpu-limit`, `--maintenance-job-mem-limit`, and `--keep-latest-maintenance-jobs`) introduced in v1.14 are deprecated, and will be deleted in v1.17. +From v1.15 and on, Velero introduces a new ConfigMap, specified by `velero server --repo-maintenance-job-configmap` parameter, to set repository maintenance Job configuration, including Node Affinity and resources. The old `velero server` parameters ( `--maintenance-job-cpu-request`, `--maintenance-job-mem-request`, `--maintenance-job-cpu-limit`, `--maintenance-job-mem-limit`, and `--keep-latest-maintenance-jobs`) introduced in v1.14 are deprecated, and will be deleted in v1.17. + +The users can specify the ConfigMap name during velero installation by CLI: +`velero install --repo-maintenance-job-configmap=` ## Settings ### Resource Limitation and Node Affinity -Those are specified by the ConfigMap specified by `velero server --repo-maintenance-job-config` parameter. +Those are specified by the ConfigMap specified by `velero server --repo-maintenance-job-configmap` parameter. This ConfigMap content is a Map. If there is a key value as `global` in the map, the key's value is applied to all BackupRepositories maintenance jobs that cannot find their own specific configuration in the ConfigMap. @@ -55,7 +58,7 @@ It's possible that the users want to choose nodes that match condition A or cond For example, the user want to let the nodes is in a specified machine type or the nodes locate in the us-central1-x zones to run the job. This can be done by adding multiple entries in the `LoadAffinity` array. -The sample of the ```repo-maintenance-job-config``` ConfigMap for the above scenario is as below: +The sample of the ```repo-maintenance-job-configmap``` ConfigMap for the above scenario is as below: ``` bash cat < repo-maintenance-job-config.json {