-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restic-wait init container not inserted during restic restore #6471
Comments
@ameerdtm little confuse about this statement. can you provide further clarification on this? |
Taking a quick look at debug information you provided. I notice that the restores were created from a |
Yes, a restore consistently fails for a specific namespace, but seems to work for the rest. The PartiallyFailed are not desired and also increased greatly after the 1.24.3 upgrade. As I was looking further, the nightly scheduled (schedules.velero.io) backup does not appear to get this PVC and it should. Below are the errors from the PartiallyFailed backup: |
I see a few "timed out waiting for all PodVolumeBackups to complete" issues in the past, and that seems to be a common thread here. I have turned debug logging on my restic daemonset but I get no errors regarding the timeout. Is there any way to get more detail on what is happening here? |
@ameerdtm There won't be any errors on the daemonset related to the timeout. The timeout just means that during velero backup processing, the timeout was reached without a completed PVB. This means one of two things:
Either way, PVB status is an important thing to check here. If the only problem is that restic is taking too long, then increasing the timeout may allow your backups to complete successfully. |
The amount of data for most of these PVs is small, a few GB at most. The backups seem to take four hours, which to me look like its stuck on a restic backup that isn't progressing, then eventually times out. That's the odd thing; I can't get any logging either velero or restic logs turned to debug that tells me if it can't access the bucket or something else. Any ideas on how I can see more in this situation? |
@ameerdtm Therefore, here are some suggestions:
|
I upgraded to v1.11.0 and things seem to have improved. I will need a few days to confirm that the new version resolved all the issues. Thanks for the suggestions. |
This issue is resolved. For anyone upgrading from 1.23 to 1.24; version v1.9.1 went from working fine to timing out waiting restic podvolumebackups. Even with debugging on there was no indication as to why it wasn't moving. Upgrading to v1.11.0 works much better, despite the fact I am still running into 8-10 hour backups for some larger clusters. |
closing |
Describe the problem/challenge you have
We run a nightly backup and restore of a namespace. This was working without issue on Kubernetes 1.23.9 and Velero version v1.9.1. When we upgraded to Kubernetes 1.24.3, the jobs started failing because the data in the PVC was not restored via restic.
What did you expect to happen:
Data in the restic restore to be available to the container
The following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
bundle-2023-07-06-15-38-35.tar.gz
se VM
If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add:
In the backup logs, the PVC is identified and successfully backed up via restic. When the restore runs, none of the restic information is listed in describe, there is no restic-wait init container in the restored deployment, and there is an empty volume attached to the container. This is consistent but other namespaces I can restore from the same backup and the PVC is restored and the restic-wait init container exists.
Environment:
velero version
): v1.9.1velero client config get features
): features:kubectl version
): v1.24.3/etc/os-release
): Ubuntu 20.04.4 LTSVote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: