Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to take CSI snapshot with Velero and having hard time to find information about how to do it #8542

Open
pew-x2 opened this issue Dec 22, 2024 · 11 comments
Assignees
Labels
Area/CSI Related to Container Storage Interface support kind/question Needs info Waiting for information

Comments

@pew-x2
Copy link

pew-x2 commented Dec 22, 2024

What steps did you take and what happened:
I'm failing to understand how to take a snapshot of a PersistentVolume using CSI in Velero v1.15.

What did you expect to happen:
I expected to get a snapshot when running backup that would backup the files of the persistent volume.

  features: EnableCSI
  • settings fro volumeSnapshotLocation in values.yaml looks like this
  volumeSnapshotLocation:
    # name is the name of the volume snapshot location where snapshots are being taken. Required.
  - name: volume-snapshot-location-1
    # provider is the name for the volume snapshot provider.
    provider: aws
    credential:
      # name of the secret used by this volumeSnapshotLocation.
      name: velero-secret
      # name of key that contains the secret data to be used.
      key: credentials
    # Additional provider-specific configuration. See link above
    # for details of required/optional fields for your provider.
    config: # {}
      region: None
  #    region:
  #    apiTimeout:
  #    resourceGroup:
  #    The ID of the subscription where volume snapshots should be stored, if different from the cluster’s subscription. If specified, also requires `configuration.volumeSnapshotLocation.config.resourceGroup`to be set. (Azure only)
  #    subscriptionId:
  #    incremental:
  #    snapshotLocation:
  #    project:

The following information will help us better understand what's going on:

This is the log I get when I try to run the backup:

time="2024-12-22T14:44:37Z" level=info msg="Backing up item" backup=backup/test logSource="pkg/backup/item_backupper.go:184" name=db-backup-stage-pv namespace= resource=persistentvolumes
time="2024-12-22T14:44:37Z" level=info msg="Executing takePVSnapshot" backup=backup/test logSource="pkg/backup/item_backupper.go:549" name=db-backup-stage-pv namespace= resource=persistentvolumes
time="2024-12-22T14:44:38Z" level=info msg="performing snapshot action for pv %!s(MISSING) as the snapshotVolumes is not set to false" backup=backup/test logSource="internal/volumehelper/volume_policy_helper.go:131"
time="2024-12-22T14:44:38Z" level=info msg="label \"topology.kubernetes.io/zone\" is not present on PersistentVolume, checking deprecated label..." backup=backup/test logSource="pkg/backup/item_backupper.go:608" name=db-backup-stage-pv namespace= persistentVolume=db-backup-stage-pv resource=persistentvolumes
time="2024-12-22T14:44:38Z" level=info msg="label \"failure-domain.beta.kubernetes.io/zone\" is not present on PersistentVolume" backup=backup/test logSource="pkg/backup/item_backupper.go:612" name=db-backup-stage-pv namespace= persistentVolume=db-backup-stage-pv resource=persistentvolumes
time="2024-12-22T14:44:38Z" level=info msg="zone info not available in nodeAffinity requirements" backup=backup/test logSource="pkg/backup/item_backupper.go:617" name=db-backup-stage-pv namespace= persistentVolume=db-backup-stage-pv resource=persistentvolumes
time="2024-12-22T14:44:38Z" level=warning msg="No volume ID returned by volume snapshotter for persistent volume" backup=backup/test logSource="pkg/backup/item_backupper.go:641" name=db-backup-stage-pv namespace= persistentVolume=db-backup-stage-pv resource=persistentvolumes volumeSnapshotLocation=volume-snapshot-location-1
time="2024-12-22T14:44:38Z" level=info msg="Persistent volume is not a supported volume type for Velero-native volumeSnapshotter snapshot, skipping." backup=backup/test logSource="pkg/backup/item_backupper.go:653" name=db-backup-stage-pv namespace= persistentVolume=db-backup-stage-pv resource=persistentvolumes
time="2024-12-22T14:44:38Z" level=info msg="Backed up 1 items out of an estimated total of 1 (estimate will change throughout the backup)" backup=backup/test logSource="pkg/backup/backup.go:499" name=db-backup-stage-pv namespace= progress= resource=persistentvolumes
time="2024-12-22T14:44:38Z" level=info msg="Summary for skipped PVs: [{\"name\":\"db-backup-stage-pv\",\"reasons\":[{\"approach\":\"volumeSnapshot\",\"reason\":\"no applicable volumesnapshotter found\"}]}]" backup=backup/test logSource="pkg/backup/backup.go:542"
time="2024-12-22T14:44:38Z" level=info msg="Backed up a total of 1 items" backup=backup/test logSource="pkg/backup/backup.go:546" progress=

Here is the snapshot classes and storageclass

---

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: csi-hostpath-sc
provisioner: hostpath.csi.k8s.io
reclaimPolicy: Delete
volumeBindingMode: Immediate

---

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-hostpath-sc
  labels:
    velero.io/csi-volumesnapshot-class: "true"
driver: hostpath.csi.k8s.io
deletionPolicy: Delete

Anything else you would like to add:

Environment:

  • Velero version (use velero version): v1.15.0
  • Velero features (use velero client config get features): "" returned
  • Kubernetes version (use kubectl version): v1.30.5
  • Kubernetes installer & version: docker desktop
  • Cloud provider or hardware configuration: none
  • OS (e.g. from /etc/os-release): Mac OSX 14.6.1

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@reasonerjt
Copy link
Contributor

I haven't tried the hostpath CSI driver. It seems for some reason the CSI plugin was skipped during your backup
Please collect the debug bundle via velero debug --backup xxxx and attach to this issue

@reasonerjt reasonerjt added Needs info Waiting for information Area/CSI Related to Container Storage Interface support labels Dec 22, 2024
@pew-x2
Copy link
Author

pew-x2 commented Dec 22, 2024

Here is output from velero debug --backup test -n backup

bundle-2024-12-22-20-31-30.tar.gz

@reasonerjt
Copy link
Contributor

@pew-x2 Could you re-try by installing velero and your workload in different namespaces?

@pew-x2
Copy link
Author

pew-x2 commented Dec 23, 2024

I cleared kuberentes and reinstalled. And get the same error.
Attaching logs

  1. CSI
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/release-8.2/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/release-8.2/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/release-8.2/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml

git clone https://github.com/kubernetes-csi/csi-driver-host-path.git
cd csi-driver-host-path
deploy/kubernetes-latest/deploy.sh
  1. installed velero and seaweedfs in velero namespace
  2. installed workload (persistent volume which I try to backup) in dev namespace.
  3. backing up with command
velero backup create test --selector type=db-backup --include-resources pvc,pv,pod -n velero

Output from

velero debug --backup test

bundle-2024-12-23-09-40-06.tar.gz

Is storage classes and snapshot classes namespaced or are they global?

Here is the last part of the log

time="2024-12-23T08:32:59Z" level=info msg="Processing item" backup=velero/test logSource="pkg/backup/backup.go:439" name=db-backup-stage-pv namespace= progress= resource=persistentvolumes
time="2024-12-23T08:32:59Z" level=info msg="adding persistentvolumes /db-backup-stage-pv to ItemBlock" backup=velero/test logSource="pkg/backup/itemblock.go:63"
time="2024-12-23T08:32:59Z" level=info msg="Backing Up Item Block including persistentvolumes /db-backup-stage-pv (1 items in block)" backup=velero/test logSource="pkg/backup/backup.go:476"
time="2024-12-23T08:32:59Z" level=info msg="Backing up item" backup=velero/test logSource="pkg/backup/item_backupper.go:184" name=db-backup-stage-pv namespace= resource=persistentvolumes
time="2024-12-23T08:32:59Z" level=info msg="Executing takePVSnapshot" backup=velero/test logSource="pkg/backup/item_backupper.go:549" name=db-backup-stage-pv namespace= resource=persistentvolumes
time="2024-12-23T08:32:59Z" level=info msg="performing snapshot action for pv %!s(MISSING) as the snapshotVolumes is not set to false" backup=velero/test logSource="internal/volumehelper/volume_policy_helper.go:131"
time="2024-12-23T08:32:59Z" level=info msg="label \"topology.kubernetes.io/zone\" is not present on PersistentVolume, checking deprecated label..." backup=velero/test logSource="pkg/backup/item_backupper.go:608" name=db-backup-stage-pv namespace= persistentVolume=db-backup-stage-pv resource=persistentvolumes
time="2024-12-23T08:32:59Z" level=info msg="label \"failure-domain.beta.kubernetes.io/zone\" is not present on PersistentVolume" backup=velero/test logSource="pkg/backup/item_backupper.go:612" name=db-backup-stage-pv namespace= persistentVolume=db-backup-stage-pv resource=persistentvolumes
time="2024-12-23T08:32:59Z" level=info msg="zone info not available in nodeAffinity requirements" backup=velero/test logSource="pkg/backup/item_backupper.go:617" name=db-backup-stage-pv namespace= persistentVolume=db-backup-stage-pv resource=persistentvolumes
time="2024-12-23T08:32:59Z" level=warning msg="No volume ID returned by volume snapshotter for persistent volume" backup=velero/test logSource="pkg/backup/item_backupper.go:641" name=db-backup-stage-pv namespace= persistentVolume=db-backup-stage-pv resource=persistentvolumes volumeSnapshotLocation=volume-snapshot-location-1
time="2024-12-23T08:32:59Z" level=info msg="Persistent volume is not a supported volume type for Velero-native volumeSnapshotter snapshot, skipping." backup=velero/test logSource="pkg/backup/item_backupper.go:653" name=db-backup-stage-pv namespace= persistentVolume=db-backup-stage-pv resource=persistentvolumes
time="2024-12-23T08:32:59Z" level=info msg="Backed up 1 items out of an estimated total of 1 (estimate will change throughout the backup)" backup=velero/test logSource="pkg/backup/backup.go:499" name=db-backup-stage-pv namespace= progress= resource=persistentvolumes
time="2024-12-23T08:32:59Z" level=info msg="Summary for skipped PVs: [{\"name\":\"db-backup-stage-pv\",\"reasons\":[{\"approach\":\"volumeSnapshot\",\"reason\":\"no applicable volumesnapshotter found\"}]}]" backup=velero/test logSource="pkg/backup/backup.go:542"
time="2024-12-23T08:32:59Z" level=info msg="Backed up a total of 1 items" backup=velero/test logSource="pkg/backup/backup.go:546" progress=

@kaovilai
Copy link
Member

Thanks for the recreate, I believe you missed the step to label the VolumeSnapshotClass

Your recreate without VSC labeling would only work if #8294 is solved.

@kaovilai
Copy link
Member

kaovilai commented Dec 23, 2024

Please help us improve the docs in the meantime. Where would you expect the instruction to add label to live if not here.

@pew-x2
Copy link
Author

pew-x2 commented Dec 23, 2024

This is probably caused by my insufficient knowledge about snapshots: But I have the impression that I've installed a custom snapshot class which is labeled as you are suggesting. Could it be that I've added the VolumeSnapshotClass to the wrong namespace? I had the impression that there were not namespaced.
Bellow is the definition of the class:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-hostpath-sc
  labels:
    velero.io/csi-volumesnapshot-class: "true" <----- isn't this what you are suggesting?
driver: hostpath.csi.k8s.io
deletionPolicy: Delete

I didn't add that to later description since I was thinking that was implied.
Regarding the documentation: I found the label and added because of the documentation, so I think that it is a good place to have it. Maybe it could be a good idea to have a "Hello World" setting up a backup and snapshot to make easier to understand the different concepts.

But I still having problem with understand what "natvie snapshotter" is and how to trigger it (if I now have actually added the label correctly).

@kaovilai
Copy link
Member

"native snapshotter" in velero context is using VolumeSnapshotLocation defined using cloud storage plugins such as https://github.com/vmware-tanzu/velero-plugin-for-aws/blob/main/volumesnapshotlocation.md which does not use k8s CSI snapshot apis but "natively" trigger calls to aws using aws SDK functions implemented in the plugin.

@kaovilai
Copy link
Member

Can you give us yaml of the PVC and PV?

Relevant code sections

// #4758 Do not take snapshot for CSI PV to avoid duplicated snapshotting, when CSI feature is enabled.
if features.IsEnabled(velerov1api.CSIFeatureFlag) && pv.Spec.CSI != nil {
log.Infof("Skipping snapshot of persistent volume %s, because it's handled by CSI plugin.", pv.Name)
return nil
}

@pew-x2
Copy link
Author

pew-x2 commented Dec 24, 2024

values.yaml

This is how they are added in helm. I'm hoping this is quite basic pattern. But pasting in the helm template bellow anyway if that is relevant.

  volumeMounts:
  - name: db-backup-stage-pv
    mountPath: "/tmp/backup"

  volumes:
  - name: db-backup-stage-pv
    persistentVolumeClaim:
      claimName: db-backup-stage-pv-claim

Part of the Helm template where the values are used

          resources:
            {{- toYaml .Values.db.resources | nindent 12 }}
          {{- with .Values.db.volumeMounts }}
          volumeMounts:
            {{- toYaml . | nindent 12 }}
          {{- end }}
      {{- with .Values.db.volumes }}
      volumes:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.db.nodeSelector }}

StoreageClass, VolumeSnapshot, PV and PVC

---

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: csi-hostpath-sc
provisioner: hostpath.csi.k8s.io
reclaimPolicy: Delete
volumeBindingMode: Immediate

---

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-hostpath-sc
  labels:
    velero.io/csi-volumesnapshot-class: "true"
driver: hostpath.csi.k8s.io
deletionPolicy: Delete

---

apiVersion: v1
kind: PersistentVolume
metadata:
  name: db-backup-stage-pv
  labels:
    type: db-backup
spec:
  storageClassName: csi-hostpath-sc
  persistentVolumeReclaimPolicy: Recycle
  capacity:
    storage: 250Mi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/pwrmap/data"

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: db-backup-stage-pv-claim
  annotations:
    velero.io/csi-volumesnapshot-class: "csi-hostpath-sc"
spec:
  storageClassName: csi-hostpath-sc
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 200Mi
  selector:
    matchLabels:
      type: db-backup

@kaovilai
Copy link
Member

  1. https://github.com/kubernetes-csi/csi-driver-host-path says This driver is just a demo implementation and is used for CI testing. This has many fake implementations and other non-standard best practices, and should not be used as an example of how to write a real driver. I would advise the same and ask that you refrain from using this in any production.
  2. The PV is failing to be identified as a CSI volume by velero
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: db-backup-stage-pv
      labels:
        type: db-backup
    spec:
      storageClassName: csi-hostpath-sc
      persistentVolumeReclaimPolicy: Recycle
      capacity:
        storage: 250Mi
      accessModes:
        - ReadWriteOnce
      hostPath:
        path: "/mnt/pwrmap/data"
    
    fail several functions velero currently uses
    i. PV lacks .spec.CSI and do not contain relevant annotations pv.kubernetes.io/provisioned-by
    func isProvisionedByCSI(log logrus.FieldLogger, pv *corev1api.PersistentVolume, kbClient client.Client) (bool, error) {
    if pv.Spec.CSI != nil {
    return true, nil
    }
    // Although the pv.Spec.CSI is nil, the volume could be provisioned by a CSI driver when enabling the CSI migration
    // Refer to https://github.com/vmware-tanzu/velero/issues/4496 for more details
    if pv.Annotations != nil {
    driverName := pv.Annotations[KubeAnnDynamicallyProvisioned]
    migratedDriver := pv.Annotations[KubeAnnMigratedTo]
    if len(driverName) > 0 || len(migratedDriver) > 0 {
    list := &storagev1api.CSIDriverList{}
    if err := kbClient.List(context.TODO(), list); err != nil {
    return false, err
    }
    for _, driver := range list.Items {
    if driverName == driver.Name || migratedDriver == driver.Name {
    log.Debugf("the annotation %s or %s equals to %s indicates the volume is provisioned by a CSI driver", KubeAnnDynamicallyProvisioned, KubeAnnMigratedTo, driver.Name)
    return true, nil
    }
    }
    }
    }
    return false, nil
    }

    if pv.Spec.PersistentVolumeSource.CSI == nil {
    p.log.Infof(
    "Skipping PVC %s/%s, associated PV %s is not a CSI volume",
    pvc.Namespace, pvc.Name, pv.Name)
    kubeutil.AddAnnotations(
    &pvc.ObjectMeta,
    map[string]string{
    velerov1api.SkippedNoCSIPVAnnotation: "true",
    })

    It has been seen with this CSI driver that the PV should contain at least the annotation. I would ask you find out why annotation is missing.

Per demo it should be implementable but we need to get some answers first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/CSI Related to Container Storage Interface support kind/question Needs info Waiting for information
Projects
None yet
Development

No branches or pull requests

3 participants