Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create and update duplicate archives #199

Open
tasket opened this issue May 22, 2024 · 1 comment
Open

Create and update duplicate archives #199

tasket opened this issue May 22, 2024 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@tasket
Copy link
Owner

tasket commented May 22, 2024

Add an archive duplication feature to create backups of archives in another location.

Problem

Although commands like rsync -aH --delete may be considered sufficient for making duplicates intact, there are a couple of drawbacks:

  1. rsync, cp, etc. aren't aware of the dichotomy between data and metadata in Wyng archives, so any neglect by the user when running these tools (not heeding error conditions) could result in a seemingly intact archive that is corrupt. Although Wyng provides multiple ways to check for errors, relying on verification is not the most preventative process.

  2. There's no possibility for selecting which volumes or sessions within an archive to duplicate (using traditional tools). Users may want to prioritize only certain volumes or sessions for their backup-of-a-backup.

Solution

A Wyng duplication function could make and refresh duplicate archives using the same safety patterns (data first, metadata last) employed when creating original archives. It would also be possible to add some level of selectivity (per volume, etc) at some point.

It could be of further help if the duplicate archive were marked as having a special status and possibly with a different uuid; this would be to avoid temptation of backing up to two+ copies absentmindedly with users thinking they are the same archive.

Notes

  • Encryption keys could be the same between two archives. If they are not, then duplication would involve a re-encryption process.
  • Duplication function would have to acquire two unique coexisting instances of Destination class and probably ArchiveSet as well. Some currently 'independent' helper functions may have to be moved into those classes to raise their effective encapsulation.

Related

#140
#175
#184

@tasket
Copy link
Owner Author

tasket commented Feb 8, 2025

Some observations from another issue:

  • When updating a duplicate archive, rsync doesn't handle pruning changes efficiently
  • It can be helped by performing a raw data-level merge of the pruned session dirs (see concept below)
  • This suggests that efficient updates to duplicate archives can be done without authentication

Concept:

With the following difference between Src archive and Dest...

Src Vol_a12345/             Dest Vol_a12345/
   S_20250104-000001/           S_20250104-000001/
                                S_20250112-000001/
                                S_20250115-000001/
   S_20250123-000001/           S_20250123-000001/

...on Dest do something like:

cd Vol_a12345
cp -al S_20250112-000001/* S_20250104-000001
rm -r S_20250112-000001
cp -al S_20250115-000001/* S_20250104-000001
rm -r S_20250115-000001

Follow up with usual raw sync of archive dir using rsync or similar. I'm not sure cp -al would be appropriate but I think you get the idea. Python might be a better way to script it, since you could then use mv/rename commands without creating links.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant