-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics to show the backup status other than "Completed" #7239
Comments
No, it's not expected. Thanks for pointing it out, we will fix it in next version. |
Hi @Shuanglu, upon further consideration, I think it may not be necessary to update the Velero metrics in this scenario. In the case you describe, the unfinished backups are marked as |
Close it as we plan to leave the current behaviour as-is. Feel free to reopen it if you have any doubt. |
Thanks. I'll check if any other metrics can be used to monitor this. It's a little bit long time and I don't remember the behavior detail. |
@Shuanglu did you find the better way to catch the same failed scenario with metrics? The easiest thing which comes to mind is to check in logs and generate alerts based on error messages. But using metrics in this case is a better approach. |
Nope. We actually change the monitor to watch the successful backup within the period and if it's a mismatch, it will alert us |
What steps did you take and what happened:
We have Velero deployed with helm chart 4.1.3. Recently it started to encounter OOM when it attempt to backup the cluster and sometimes it's evicted due to autoscaler/node pressure. After these incidnet, the
backup
status becamephase: Failed
andfailureReason: get a backup with status "InProgress" during the server starting, mark it as "Failed"
other than theCompleted
and no backup files uploaded to our remote storage.It looks like Velero native metrics
backupFailureTotal
andbackupPartialFailureTotal
are always 0 in this case. Is this expected?Previously this was deployed with helm2 and recently we migrated to helm3 and it started to OOM very frequently. Any parameter might be helpful to avoid this?
What did you expect to happen:
Metrics to show the backup status other than
Completed
The following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add:
Environment:
velero version
): 1.10.1velero client config get features
):kubectl version
): 1.27/etc/os-release
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: