Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using benji to backup hundreds of database servers #97

Closed
MannerMan opened this issue Sep 23, 2020 · 9 comments
Closed

Using benji to backup hundreds of database servers #97

MannerMan opened this issue Sep 23, 2020 · 9 comments
Assignees
Labels
user support General question or user support

Comments

@MannerMan
Copy link

Hi,

I'm evaluating using Benji for backing up hundreds of database servers (virtual machines) using LVM snapshots. Since we have a special data model, it results in several million files in the database storage directory on each server. This is causing problems (memory usage and very long run-times) for 'native' backup tools we have been looking at. However, since benji is block-based, we're getting very promising performance on both backup and restore in our tests.

My question here is mostly related to how we should implement benji on a large scale. The docs are rather sparse; ( https://benji-backup.me/configuration.html#multiple-instance-installations )

Our plan is to have one S3 bucket per server, which contains the lvm block-backup of that specific server. My current train of though is to have a central benji postgres db and have benji instances on all servers share that database. Since we have different S3 buckets for every server, that means a new 'storage id' for every server. This should not be a big deal since we have automation tools that would take care of that for us.

My question is basically, does this sound like a good way to implement benji on this scale? I guess another approach would be to have separate postgres-schemas for each benji instance on every server. We would like to keep the backups of individual servers in separate S3 buckets, or atleast in different folders inside in a shared bucket.

@elemental-lf elemental-lf self-assigned this Sep 24, 2020
@elemental-lf elemental-lf added the user support General question or user support label Sep 24, 2020
@elemental-lf
Copy link
Owner

Deduplication is per storage so if you'd have separate a S3 bucket per server deduplication would only be performed on the data coming from that server. If you want to go that route I'd suggest to use one Benji database for all servers and then automatically generating a Benji configuration per server consisting of a common part (transforms, ios, database credentials and such) and one storage definition with a unique name and bucket configuration. I have not tested this but Benji should be able to cope with the fact that not all configuration files include all storages as long as the name is unique. If this does not work for some reason this could be easily fixed I think. You should also be able to generate the transform definitions per server (for example to assign different encryption keys). I'd suggest to use unique names here too to prevent a mix-up.

My suggestion based on the limited information on your use case would be to use one storage for all servers to get the maximum benefit from deduplication. You can use labels to add extra information like server name or customer id and then use benji storage-usage to calculate a usage metric based on these labels. This metric does not include the overhead from the metadata objects or encryption and is calculated by using the uncompressed block size so it does not represent the real usage in the S3 bucket. See the documentation for more information on how the figures are calculated.

@MannerMan
Copy link
Author

Deduplication is per storage so if you'd have separate a S3 bucket per server deduplication would only be performed on the data coming from that server.

Yes, we're aware of this, it should not be a big issue for us - our motivation to keep the storage for each server isolated is to limit the impact of human mistakes, corruption due to misconfiguration etc.

If you want to go that route I'd suggest to use one Benji database for all servers and then automatically generating a Benji configuration per server consisting of a common part (transforms, ios, database credentials and such) and one storage definition with a unique name and bucket configuration. I have not tested this but Benji should be able to cope with the fact that not all configuration files include all storages as long as the name is unique. If this does not work for some reason this could be easily fixed I think.

Indeed, this is how we have configured it in our testing; two servers with benji instances, common configuration except for storage backends (which have unique id's and name) and a shared benji postgresql database. We have a separate recovery node that has all storages included in its benji config, and we can therefore recover any server from this node. Seems to work flawlessly!

You should also be able to generate the transform definitions per server (for example to assign different encryption keys). I'd suggest to use unique names here too to prevent a mix-up.

We currently do not require encryption, but have one common transform definition like so;

transforms:
  - name: zstd
    module: zstd
    configuration:
      level: 1

Should we use unique names here as well, even though we don't encrypt?

My suggestion based on the limited information on your use case would be to use one storage for all servers to get the maximum benefit from deduplication.

The database system we want to backup are postgres-servers, and our data-model is defined as every tenant in the system has their own database schema. We have 4000 tenant schemas in every server, and every schema has the same table/index structure. This why we have so many files on disk, since every table and index results in a new file. The gains we get from benji deduplication are already huge, a 72gb instance only used up 4.9 gb in S3 - very impressive 👍
Having all instances in the same bucket would of course give even greater benefits, but it's a trade-off we'll probably do for reasons mentioned above.

You can use labels to add extra information like server name or customer id and then use benji storage-usage to calculate a usage metric based on these labels. This metric does not include the overhead from the metadata objects or encryption and is calculated by using the uncompressed block size so it does not represent the real usage in the S3 bucket. See the documentation for more information on how the figures are calculated.

We do indeed do this by passing the hostname as a label; benji backup -l server=$hostname, so we can filter and list backups for specific servers. Thank you for clarifing the storage usage!

I do have a few other questions;

  • The docs mentions that benji is nearing beta quality, is there any estimate of when a stable/rc release could happen?
  • Are changes that could break the storage format (i.e. old backups would be invalid with a new version of benji) still happening?
  • Do you see any problems or have any other remarks or questions about our implementation?

@elemental-lf
Copy link
Owner

Thanks for telling more about your use case and your experience so far.

Should we use unique names here as well, even though we don't encrypt?

No. Unique names would only make sense if the actual configuration of the transforms differ. Like different encryption keys.

I do have a few other questions;

* The docs mentions that benji is nearing beta quality, is there any estimate of when a stable/rc release could happen?

I had hoped to get more feedback from users (see #67) to better asses how many users there actually are and how they are using Benji to build more confidence in the stability of the code base. To actually answer your question: There currently is no estimate.

* Are changes that could break the storage format (i.e. old backups would be invalid with a new version of benji) still happening?

For the database I've provided automatic migrations from the beginning and this has worked out quite well I think. The structure of the object metadata hasn't seen any significant changes for quite a while and Benji can still read the older version. The same holds true for the format of the exported metadata for which there are currently four different revisions.

So apart from bugs it should actually be possible to upgrade any released version to any other later released version. Downgrades are another matter and are currently only possible in a limited number of circumstances. We could provide code for automated downgrades of the database schema but I'm not sure if it is worth the effort even for stable releases.

I'm planning on continuing to provide backwards compatibility. So even if there are changes to the data structures Benji will be able to read the old versions and the database schema will be migrated automatically.

* Do you see any problems or have any other remarks or questions about our implementation?

There are other disadvantages to using so many different storage apart from losing the space savings of global deduplication:

  • Increased network usage and time for writing blocks that would have been deduplicated
  • Increased network usage and time for reading blocks during (batch) deep-scrubbing

I still understand your reasoning just wanted to add these two as they came to my mind and could be an issue in the long run.

@MannerMan
Copy link
Author

No. Unique names would only make sense if the actual configuration of the transforms differ. Like different encryption keys.

Great! 👍

I had hoped to get more feedback from users (see #67) to better asses how many users there actually are and how they are using Benji to build more confidence in the stability of the code base. To actually answer your question: There currently is no estimate.

Alright, understandable. If we end up using benji, I'll be sure to drop a comment in #67, like I said we'll have an extensive deployment, and generally do several recovery operations per week to look at previous database states for various reasons. So it should help build confidence (provided it works well!)

For the database I've provided automatic migrations from the beginning and this has worked out quite well I think. The structure of the object metadata hasn't seen any significant changes for quite a while and Benji can still read the older version. The same holds true for the format of the exported metadata for which there are currently four different revisions.

So apart from bugs it should actually be possible to upgrade any released version to any other later released version. Downgrades are another matter and are currently only possible in a limited number of circumstances. We could provide code for automated downgrades of the database schema but I'm not sure if it is worth the effort even for stable releases.

I'm planning on continuing to provide backwards compatibility. So even if there are changes to the data structures Benji will be able to read the old versions and the database schema will be migrated automatically.

As long as benji is 'forward-compatible', it will not be a problem for us. Our concern was that a benji release might break the previous format, forcing us to 'reset' the backups and start over.

There are other disadvantages to using so many different storage apart from losing the space savings of global deduplication:

* Increased network usage and time for writing blocks that would have been deduplicated

* Increased network usage and time for reading blocks during (batch) deep-scrubbing

I still understand your reasoning just wanted to add these two as they came to my mind and could be an issue in the long run.

We're still debating internally if we should go with one or several buckets.. You see no problem with performance degradation by going with a single bucket? Even though benji process blocks and not files, there will likely be a lot of (block)-files in a single combined bucket. Won't de-duplication performance go down as there are more and more blocks to check? Quickly counting, the raw data on disk is somewhere around ~20TB with ~675000000 files, combined.

Two more questions!

  • I see you have a pyinstaller spec-file provided in the repo, is it supported to run benji 'compiled' into a pyinstaller binary?
  • In our benji PoC backup-script, we do a deep-scrub directly after completing a full-backup, to verify it. Is this unnecessary or good practice?

@elemental-lf
Copy link
Owner

We're still debating internally if we should go with one or several buckets.. You see no problem with performance degradation by going with a single bucket? Even though benji process blocks and not files, there will likely be a lot of (block)-files in a single combined bucket. Won't de-duplication performance go down as there are more and more blocks to check? Quickly counting, the raw data on disk is somewhere around ~20TB with ~675000000 files, combined.

I'd think that it mainly depends on how the object store handles large numbers of objects in the same bucket and I'd assume that services like S3 are optimized to work well in such a scenario. I'm not completely sure about the database. It is going to be smaller with a unified storage due to deduplication but the database optimizer might work better with multiple storages.

You could consider splitting your database deployments into groups where each group uses one storage. That would get you the benefit of better deduplication and you'd still be safer from human error or software failure.

Would it be possible to test how much you would benefit from deduplication? Maybe we're discussing a non-issue.

* I see you have a pyinstaller spec-file provided in the repo, is it supported to run benji 'compiled' into a pyinstaller binary?

That was an experiment of mine. Last time I tried it out it worked. But the topic of how to bundle (or externally provide) Ceph's python modules is still unsolved and I haven't invested any work into that yet, it might just work. I'm still interested in pyinstaller as it would provide a way to easily distribute Benji.

* In our benji PoC backup-script, we do a deep-scrub directly after completing a full-backup, to verify it. Is this unnecessary or good practice?

Some would consider it good practice, especially if you're using the option to directly compare the backup to the source snapshot. But your backups will take longer that way and generate more io (even more so if you're comparing to the source snapshot). So as usual the answer is: It depends.

@MannerMan
Copy link
Author

I'd think that it mainly depends on how the object store handles large numbers of objects in the same bucket and I'd assume that services like S3 are optimized to work well in such a scenario. I'm not completely sure about the database. It is going to be smaller with a unified storage due to deduplication but the database optimizer might work better with multiple storages.

We'll likely be using https://min.io local S3 service, with XFS as backend file-system.. I interpret your answer as essentially if the S3 storage is fine, benji should be too?

You could consider splitting your database deployments into groups where each group uses one storage. That would get you the benefit of better deduplication and you'd still be safer from human error or software failure.

Heh, we came up with that idea internally as well 👍

Would it be possible to test how much you would benefit from deduplication? Maybe we're discussing a non-issue.

Yes, we're in a PoC phase right now so that will be one of the things we'll test, for sure!

That was an experiment of mine. Last time I tried it out it worked. But the topic of how to bundle (or externally provide) Ceph's python modules is still unsolved and I haven't invested any work into that yet, it might just work. I'm still interested in pyinstaller as it would provide a way to easily distribute Benji.

For us, no Ceph support is not an issue since we'll be backing up LVM and storing that data in S3.. I'll be sure to test the pyinstaller spec and submit improvements, should we find any!

Some would consider it good practice, especially if you're using the option to directly compare the backup to the source snapshot. But your backups will take longer that way and generate more io (even more so if you're comparing to the source snapshot). So as usual the answer is: It depends.

We'll probably keep it scrubbing for now then, if we see I/O utilization problems we can omit it later or run it at a later point in time.

Thank you for your helpful advice and replies! Like I said above, we're in the Proof-of-Concept stage working out some tooling around benji to work in our setup, and it continues to impress us.

Some numbers for you;
pgbackrest 2.30 (native postgres backup tool) take 16 hours to perform a backup of our test-server, and around 32 hours to do a recovery. Backup size is about 40 gb.
Benji take 10 minutes to do a full backup, 8 minutes to scrub, and 5 minutes to recover from scratch on a new server, the backup size ends up at around 4.7 gb :)

@elemental-lf
Copy link
Owner

We'll likely be using https://min.io local S3 service, with XFS as backend file-system.. I interpret your answer as essentially if the S3 storage is fine, benji should be too?

Yes.

That was an experiment of mine. Last time I tried it out it worked. But the topic of how to bundle (or externally provide) Ceph's python modules is still unsolved and I haven't invested any work into that yet, it might just work. I'm still interested in pyinstaller as it would provide a way to easily distribute Benji.

For us, no Ceph support is not an issue since we'll be backing up LVM and storing that data in S3.. I'll be sure to test the pyinstaller spec and submit improvements, should we find any!

Please do. I quite like the idea of distributing Benji as a single "binary" as it simplifies installation immensely. (Of course I know that there are also disadvantages).

As you mention LVM, there is optimization potential in this area, which could be quite substantial if we consider how much Ceph's snapshot diffs help speed up backups. See #59.

Some numbers for you;

Thanks!

@MannerMan
Copy link
Author

@elemental-lf

Update; We have deployed benji to our development environment (15 db-servers) and have had it running for a few weeks. After some internal debate we ended up using a single bucket for benji per environment (so one dedicated bucket only for dev, one for test and one for prod). Since we have a secondary backup system in place as well, we thought the odds of losing both systems to a human mistake very low. So we should get some good numbers on dedup as you suggested :)

We have run into one minor issue so far, see #101

Please do. I quite like the idea of distributing Benji as a single "binary" as it simplifies installation immensely. (Of course I know that there are also disadvantages).

I had some trouble building a standalone benji binary with pyinstaller but got it working in the end with some changes to the .spec-file, I'll try to get around creating a PR with those fixes. Note; since we don't use Ceph, I did not look into building those modules.

As you mention LVM, there is optimization potential in this area, which could be quite substantial if we consider how much Ceph's snapshot diffs help speed up backups. See #59.

Cool, would love to see that implemented even though for us, benji is definitively fast enough as it is :)

@elemental-lf
Copy link
Owner

Thanks for the update! I'm going to close this issue now, take a look at #101 and I'm looking forward to the PR. Any fixes to the spec file are definitely welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user support General question or user support
Projects
None yet
Development

No branches or pull requests

2 participants