-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Journalized activity recorder for backup and restore #6606
Comments
I think this would also be a great change for third-party data movers to have a common way to give information to the user during the backup and restore. I also love the UX of this personally as a user of k8s. I am so used to getting this information with kubectl describe |
To cover 3rd-party data movers, one possible way is that we provide this journalized event mechanism as a generic mechanism of Velero backup/restore workflow, so that these events go together with Velero backup/restore no matter which module generates them. |
One thing that may be a hinder of the proposal to use Kubernetes Event mechanism is:
As a result, if we store the backup & restore events based on the Events, the events will be cleared after 1 hour, for the long running backups, this is not enough. This means:
Then we will need to compare whether this is simpler enough than the solution to create a dedicate event mechanism from Velero. |
I think there are two concerrns here:
For the first, events should be used because it will alert the user to what is happening. For 2, the events are stored in the audit log. IIRC would be a place to point users for 2, or we can create a new log file that just TEE records the events but saves them in the backup repository. IDK, does that make sense? |
Personally, if the events only last 1 hour, I think even for 1, it will lose lots of values --- users will not timely check the events during the backup, especially for schedule backups. |
Hm sounds like a different use case to me, TBH. I think that when I create a backup, you can tell me <we have done X, we are doing Y> and keep this info coming (you can see the "got event eight times over the last 5 min". This helps you to know that things are being worked. It sounds like you are focused on the case of me coming in on Monday morning, and my backup which is supposed to run on Sunday at 8 pm or something, has failed. Here I agree having a journaled log in the backup (like the TEE approach I talked about) would be useful. Sounds like you just disagree that the first use case is relevant or needed? |
I think it is less valuable if it can only support the first case as you mentioned, because:
Let me discuss this within the team and address:
|
I disagree with
This is how kube events work. This is known and works for long-running pods, PVs, PVCs, Jobs, etc. Please consider making it easier for users to use normal k8s tooling to debug rather than using something special. I agree on something special for the second case as there is no other option. And as stated, just adding a call to EventRecorder when you add a journal log is minimal complexity. I also can't entirely agree that the only way someone uses this is from schedule backups. We have many use cases where users watch the backups, and this would be very helpful. |
I also think that we should have a conversation on this in the open, can we add it to the next community meeting instead? |
Sure, let's try to reach more people and hear more voices. A conclusion of my personal opinions, if the solution could meet both 1 and 2, I will fully vote it. If it only meets 1, I will not be confident in its values. My understanding may be wrong. So let's see more comments later from others. |
++ love the idea |
At present, for a backup or restore, users need to collect information from multiple places, i.e., from various CRs, from various logs, etc., to tell what has exactly done. In the other words, critical information are not listed centrally in a journal style for Velero backups and restores.
Moreover, the information in the logs are getting increasingly complicated.
One possible solution is to use the Event mechanism:
The text was updated successfully, but these errors were encountered: