This tool will turn AWS EBS volume snapshots into temporary EBS volumes, tar them up, compress
them with LZ4, and upload them to Amazon S3 for you. You can also opt to create an image of the
entire volume by using dd
, instead of using tar
.
Once stored on S3, you could add an S3 Lifecycle Rule to the S3 bucket to automatically migrate the snapshots into Glacier.
This tool is only intended to run on Linux, and has only been tested on Ubuntu 16.04, Amazon Linux 2017.03 and Amazon Linux 2 2017.12.
This tool must be run on an EC2 instance, and can only operate on snapshots within the same region as the instance.
This is a Node.js application, so if you don't have it installed already, install node (at least version 6.0.0 LTS or newer) and npm:
# Ubuntu 16.04
curl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash -
sudo apt-get install -y nodejs
# Amazon Linux
curl -sL https://rpm.nodesource.com/setup_6.x | sudo -E bash -
sudo yum install -y nodejs
The "lz4" command-line compression tool will be used to compress the tars, so make sure you have it available:
# Ubuntu 16.04
sudo apt-get install liblz4-tool
# Amazon Linux
sudo yum install lz4
# We'll also need git for installation:
sudo yum install git
Now you can fetch and install snap-to-s3 from NPM:
sudo npm install -g snap-to-s3
Or if you download snap-to-s3
from its GitHub repository,
you can install that version instead from the repository root:
npm install # Fetch dependencies
npm link # Link this installation to your $PATH
Now it'll be on your $PATH, so you can run it like so:
sudo snap-to-s3 --help
In order to mount and unmount volumes, and read all files for backup, snap-to-s3
will need
to be run as root or with sudo
.
This tool requires access to the instance metadata service
at http://169.254.169.254:80/
, so ensure that your instance does not have a firewall policy
that blocks access to this.
This tool needs to create volumes from snapshots, perform uploads to S3, attach and detach volumes to/from instances, delete volumes, and add and delete tags. For snapshot validation, it also needs to read objects from S3.
You can grant these permissions by attaching an IAM Role
to your instance with the following policy attached. Don't forget to update
the bucket names in that policy with the actual name of the S3 bucket you'll be uploading to.
snap-to-s3
will then be able to use that policy automatically with no further configuration.
If you're not using an IAM Instance Role to give permissions to snap-to-s3, you can grant those permissions to an IAM user and provide an AWS Access Key ID / Secret Access Key pair for that user instead, follow these instructions.
This tool works for me and for my use-case, and I'm happy if it works for you too, but you might be doing something that I didn't expect. There is a definite potential for data-loss if something goes wrong.
I haven't tested it at all with LVM
or mdadm
, and it's likely to fail horribly. Only use
with disks with regular partitions on them (or disks with no partition table, i.e. only one
normal formatted filesystem).
It's better that the temporary EBS volumes created from the snapshots don't automatically get
mounted read-write by your instance's /etc/fstab
, since this might cause changes to the
files on the volume during upload. For dd
backups, this is much more critical.
snap-to-s3
will tag temporary EBS volumes it creates from snapshots with a tag called
snap-to-s3
(this is configurable). Accordingly, it will assume that any EBS volume with this
tag is one of its volumes that it can do whatever it likes with (including deleting it).
The fidelity of the backup depends on how well tar
is able to preserve your files.
snap-to-s3
calls your system's tar
using the default options. If you have some unusual
files (odd extended file attributes, long path names, special characters in filenames) you
may find that a tar backup isn't perfect. You can use the --dd
option instead to just image
the entire volume, but note that this will include data from deleted files in the "free"
portion of the drive, and so will increase the backup size considerably for non-full volumes.
Warnings from tar
will be printed to the screen, but otherwise ignored by snap-to-s3
unless tar
returns a non-zero exit code. In practice, the only warnings I've seen have come
from snapshots taken of a running operating system's root disk, where tar
will note that it
is ignoring unix socket files like /var/spool/postfix/public/flush
(this is a good thing).
Note that snapshots will not be deleted for you even after copying them to S3, so you have
the opportunity to verify the snapshot was transferred correctly before removing it
yourself. Upload validation can be performed by snap-to-s3
using the --validate
option,
or you could do it yourself manually.
For a manual validation, you could use the --keep-temp-volumes
option to retain the
temporary volume after migration, and run
find . -type f -exec md5sum {} \; | sort -k 2 | md5sum
in that directory to compute a
signature for the files in the volume. Then in a different directory, you can download and
untar the snapshot you just uploaded to S3 (e.g. using the AWS CLI
like aws s3 cp "s3://backups.example.com/vol-xxx/snap-xxx.tar.lz4" - | lz4 -d | tar -x
),
compute the same signature over the files you unpacked, and ensure the signatures match.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
A typical migration command looks like this:
sudo snap-to-s3 --migrate --all --bucket backups.example.com
This will search for all snapshots in the current region which have a tag called "snap-to-s3" with the value "migrate" (you need to tag these yourself beforehand). The snapshots will be turned into temporary EBS volumes and attached to your instance. Each partition of those volumes will be separately tar'd and compressed with lz4 before being uploaded to the bucket that you've specified. The temporary EBS volume will then be detached and deleted. Finally, the snapshot will be tagged with "migrated".
The resulting S3 objects will have locations like:
s3://backups.example.com/vol-xxx/2017-01-01T00:00:00+00:00 snap-xxx.1.tar.lz4
s3://backups.example.com/vol-xxx/2017-01-01T00:00:00+00:00 snap-xxx.2.tar.lz4
s3://backups.example.com/vol-xxx/2017-05-01T00:00:00+00:00 snap-yyy.tar.lz4
Metadata is added to the files on S3 with details of the snapshot that it was created from, and tags that were applied to the snapshot are copied over (with some substitutions for illegal characters).
Note that you need to create the bucket beforehand (it's not created for you), and you should create it in the same region as your snapshots in order to eliminate AWS's inter-region transfer fees.
If you want to make sure that the snapshot was uploaded to S3 correctly, you can use the "--validate" option. This option can either be added at the same time as you perform your --migrate:
sudo snap-to-s3 --migrate --validate --all --bucket backups.example.com
Or it can be done in a separate invocation after migration using just the "--validate" option.
sudo snap-to-s3 --validate --all --bucket backups.example.com
If you want to validate it using a separate invocation, you can speed up validation massively by providing the "--keep-temp-volumes" option when you perform the --migrate.
The previously-uploaded tar will be downloaded from S3, unpacked, and MD5 hashes will be computed of all of the files in it. This uses a streaming approach, so no extra disk space is needed for temporary files. For dd images, a hash of the entire raw volume is taken instead.
At the same time, the snapshot being verified will be turned into a temporary EBS volume, mounted, and MD5 hashes will be computed of all of its files.
Finally, when both processes are complete, you'll be told if there are any files missing from the S3 copy of the snapshot, and if any of the file hashes differ. If validation was successful, the snapshot will be tagged with the value "validated", and the temporary EBS volume will be detached and deleted.
Note that the tar validation only compares the hashes of the content of regular files. Special files like symlinks are not checked at all, and neither are attributes like file permissions.
snap-to-s3
doesn't perform snapshot restorations itself, but you can do this with the
AWS CLI.
To restore a tar, check the metadata on the archive on S3 to find the
"x-amz-meta-uncompressed-size" header, this will give you a hint about how large of an EBS
volume you'll need to create to hold the volume (you'll need somewhat more space than this in
order to hold filesystem metadata). Create a volume of that size, attach it to the instance,
create a filesystem on it with mkfs
, mount it somewhere useful, and enter that directory. Now
you can download and extract the tar from S3 like so:
aws s3 cp "s3://backups.example.com/vol-xxx/2017-01-01 snap-xxx.tar.lz4" - | lz4 -d | sudo tar -x
If you're restoring an image that was created with dd, create and attach an EBS volume at least
as large as the "x-amz-meta-snapshot-volumesize" field indicates. If you attached it at
/dev/xvdf
(for example), then you could restore the snapshot like so:
aws s3 cp "s3://backups.example.com/vol-xxx/2017-01-01 snap-xxx.img.lz4" - | lz4 -d | sudo dd bs=1M of=/dev/xvdf
snap-to-s3
can examine an Amazon Cost and Usage report to show you a per-volume and
per-snapshot breakdown of your EBS snapshot charges, which you can use to identify snapshots
suitable for migrating to S3/Glacier.
From your Amazon billing dashboard, go to the Reports section, and create a new report. Choose "daily" for the time period, tick the "Include Resource IDs" box, choose GZip compression, and select an S3 bucket to store the report in. Around 24 hours later, you should have a .csv.gz report in that bucket to analyze. Download it to your instance.
You can pass that file to snap-to-s3
using any of these styles:
snap-to-s3 --analyze costreport-1.csv
snap-to-s3 --analyze costreport-1.csv.gz
aws s3 cp "s3://cost-reports.example.com/20170501-20170601/x-x-x-x-x/costreport-1.csv.gz" - | snap-to-s3 --analyze
snap-to-s3
will summarize the billing data in the report, then combine it with information about
your current volumes and snapshots (DescribeVolumes and DescribeSnapshots).
Here's an example output. The effective size of each snapshot is shown next to it, this is the amount of data in the snapshot that differs from the previous snapshot, which is the size you are billed for:
Region us-west-2 ($166.16/month for 24 snapshots)
vol-xxx (500GB, MySQL Slave DB): 3016 GB total, $151/month for 16 snapshots, average snapshot change 32%
snap-xxx 2016-11-01 448.7 GB
snap-xxx 2016-12-01 261.7 GB (52%)
snap-xxx 2017-01-01 301.5 GB (60%)
snap-xxx 2017-02-01 275.4 GB (55%)
snap-xxx 2017-03-01 250.5 GB (50%)
snap-xxx 2017-04-01 279.3 GB (56%)
snap-xxx 2017-05-01 320.6 GB (64%)
snap-xxx 2017-05-17 218.1 GB (44%)
snap-xxx 2017-05-18 90.8 GB (18%)
snap-xxx 2017-05-19 85.2 GB (17%)
snap-xxx 2017-05-20 89.4 GB (18%)
snap-xxx 2017-05-21 93.2 GB (19%)
snap-xxx 2017-05-22 92.6 GB (19%)
snap-xxx 2017-05-23 82.8 GB (17%)
snap-xxx 2017-05-24 87.1 GB (17%)
snap-xxx 2017-05-25 39.5 GB (7.9%)
vol-xxx (20GB, deleted): 5 GB total, $0.247/month for 6 snapshots, average snapshot change 2.1%
snap-xxx 2015-04-27 2.4 GB
snap-xxx 2015-05-04 0.1 GB (0.43%)
snap-xxx 2015-06-01 0.1 GB (0.56%)
snap-xxx 2015-06-29 0.1 GB (0.62%)
snap-xxx 2016-04-25 2.2 GB (11%)
snap-xxx 2016-09-08 0.0 GB (0.069%)
In this case, the older snapshots of the first volume change a lot, so the delta encoding scheme of EBS snapshots isn't saving us very much. These snapshots are a great candidate to move to S3 or Glacier. Whereas the second set of snapshots change by nearly nothing, so S3/Glacier will be more expensive.
Here's the full options list:
Migrate snapshots to S3
--migrate Migrate EBS snapshots to S3
--all Migrate all snapshots whose tag is set to "migrate"
--one ... or migrate any one snapshot whose tag is set to "migrate"
--snapshots SnapshotId ... ... or provide an explicit list of snapshots to migrate (tags are ignored)
--upload-streams num Number of simultaneous streams to send to S3 (increases upload speed and
memory usage, default: 4)
--compression-level level LZ4 compression level (1-9, default: 1)
--dd Use dd to create a raw image of the entire volume, instead of tarring up the
files of each partition
--sse mode Enables server-side encryption, valid modes are AES256 and aws:kms
--sse-kms-key-id id KMS key ID to use for aws:kms encryption, if not using the S3 master KMS key
Validate uploaded snapshots
--validate Validate uploaded snapshots from S3 against the original EBS snapshots (can
be combined with --migrate)
--all Validate all snapshots whose tag is set to "migrated"
--one ... or validate any one snapshot whose tag is set to "migrated"
--snapshots SnapshotId ... ... or provide an explicit list of snapshots to validate (tags are ignored)
Analyze AWS Cost and Usage reports
--analyze filename Analyze an AWS Cost and Usage report to find opportunities for savings
Common options
--help Show this page
--tag name Name of tag you have used to mark snapshots for migration, and to mark
created EBS temporary volumes (default: snap-to-s3)
--bucket name S3 bucket to upload to (required)
--mount-point path Temporary volumes will be mounted here, created if it doesn't already exist
(default: /mnt)
--keep-temp-volumes Don't delete temporary volumes after we're done with them
--volume-type type Volume type to use for temporary EBS volumes (suggest standard or gp2,
default: standard)
The snapshot migration rate that snap-to-s3
achieves seems to be largely limited by how fast
EC2 will turn a snapshot into a completely-readable volume (i.e. the rate they can copy blocks
from their own private S3 snapshot storage into EBS). For me a simple dd
from an EBS volume
(freshly created from a snapshot) to /dev/zero
averages a transfer rate of 3.5MiB/s, which
gives an expected 24 hours to upload a 300GiB snapshot with snap-to-s3
.
If your migration rate is being limited by this, you'll notice it as a high iowait percentage
in "top" and low "volume idle" percentages in the EC2 console. In this situation you can
increase your effective snapshot upload rate by running multiple instances of snap-to-s3
at
the same time to upload multiple snapshots in parallel. This allows you to scale your upload
rate nearly linearly with the number of snapshots being uploaded, until other limits are
reached like network speed and CPU usage.
lz4 consumes the largest portion of the CPU time. If you're on a t2-series instance, you'll
probably want to use the least amount of compression (--compression-level 1
), which is the
default. Faster instances can afford to use up to level 9.
Node will consume the most memory. Of that memory, the S3 upload process will use at least
part_size * num_upload_streams
bytes to buffer the upload.
The part size is set automatically based on the uncompressed size of the data being uploaded,
it is approximately uncompressed_size / 9000
, and is always at least 5MB.
The number of upload streams defaults to 4. This helps to overcome the effective TCP speed
limits you would run in to when using a single TCP connection, and reduces the impact of the
latency of starting the upload of the next part. You can change the number of upload streams
with the --upload-streams
option.
Here's the minimum amount of memory that would be consumed with various volume sizes (for 100% full volumes):
Volume size | Memory with streams = 1 | Memory with streams = 4 |
---|---|---|
1GB | 5MB | 20MB |
40GB | 5MB | 20MB |
100GB | 11MB | 46MB |
200GB | 23MB | 91MB |
400GB | 46MB | 180MB |
800GB | 91MB | 360MB |
1600GB | 180MB | 730MB |
3200GB | 360MB | 1500MB |
On top of this, inefficiencies in Node's memory management (especially the garbage collector) will likely require a factor more memory, and there's a fixed-size overhead of around 100MB. Test it out with your snapshots/Node version if memory is tight.
At the time of writing and in the region I use, EBS snapshots were charged at $0.05/GB-month, S3 Standard at $0.023/GB-month, S3 Infrequent-Access at $0.0125/GB-month, and Glacier at $0.004/GB-month, so migrating snapshots to S3 or Glacier could potentially save you on your monthly storage bill.
However, keep in mind that EBS snapshots are incremental; if you have two snapshots of the same volume, the second snapshot will only require enough storage space to hold the blocks that changed since the previous snapshot. In contrast, snapshots pushed to S3 or Glacier with this tool are full backups, not incremental.
This means that the cost difference between EBS snapshots and S3/Glacier will depend on how much your successive snapshots differ.
If you have many snapshots of the same volume (i.e., in the limit as the number of snapshots reaches infinity), and your volume changes by more than 46% between successive snapshots, S3 Standard is cheaper than EBS snapshots. For S3 Infrequent Access, the breakeven point is at 25%.
For Glacier, the breakeven point comes much sooner, with volumes changing more than 8% between snaps being cheaper to store on Glacier.
If LZ4 achieves a 2:1 compression ratio on your data, this breakeven point is correspondingly halved (i.e. Glacier would break-even at 4% change between snapshots).
For a smaller number of snapshots, the breakeven point is reached earlier. Let's say that you are slowly outgrowing your volume sizes, so you create a new volume each year, and you want to retain one snapshot per month (12 snapshots for the lifetime of the volume). Here's the price of S3 and Glacier as a percentage of the cost of EBS snapshots.
Change between snapshots | S3 Std | S3 IA | Glacier |
---|---|---|---|
0% | 550% | 300% | 96% |
1% | 500% | 270% | 86% |
2% | 450% | 250% | 79% |
3% | 420% | 230% | 72% |
4% | 380% | 210% | 67% |
5% | 360% | 190% | 62% |
10% | 260% | 140% | 46% |
15% | 210% | 110% | 36% |
20% | 170% | 94% | 30% |
30% | 130% | 70% | 22% |
40% | 102% | 56% | 18% |
50% | 85% | 46% | 15% |
75% | 60% | 32% | 10% |
100% | 46% | 25% | 8.0% |
Notice that if you only have 12 snapshots of a given volume, Glacier is always cheaper than EBS snapshots, no matter how much each snapshot changes compared to the previous one.
Here's the same situation if you achieve a 2:1 compression ratio using LZ4:
Change between snapshots | S3 Std 2:1 | S3 IA 2:1 | Glacier 2:1 |
---|---|---|---|
0% | 280% | 150% | 48% |
1% | 250% | 140% | 43% |
2% | 230% | 120% | 39% |
3% | 210% | 110% | 36% |
4% | 190% | 104% | 33% |
5% | 180% | 97% | 31% |
10% | 130% | 71% | 23% |
15% | 104% | 57% | 18% |
20% | 86% | 47% | 15% |
30% | 64% | 35% | 11% |
40% | 51% | 28% | 8.9% |
50% | 42% | 23% | 7.4% |
75% | 30% | 16% | 5.2% |
100% | 23% | 12.5% | 4.0% |
If you want to play with the parameters that generated this table you can do so using this spreadsheet on Google Docs. You'll need to download it or copy it to your own account to be able to edit the fields.
S3 Infrequent Access archives have their storage costs charged based on a minimum 30-day lifetime, even if you delete them or migrate them to Glacier before then.
Glacier archives have their storage costs charged based on a minimum 90-day lifetime, even if you delete them sooner.
Don't forget that restoring from Glacier takes longer and costs much more than from EBS or S3. It's mostly suitable for archival backups.
There are several costs involved in using snap-to-s3
. Here are some of the costs that I
consider significant in my use-case. Your use-case may vary:
Pushing snapshots to S3 will require the use of an EC2 instance for some hours (prices vary, especially depending on how many snapshots you upload at once). You'll pay EBS storage and I/O costs for the temporary volumes created from the snapshots while they are uploading. Uploads to S3 are made in at most 9000 parts, which requires 9000 S3 PutObject calls (so $0.045 per volume if charged at $0.005/1000 PUTs).
Avoid uploading to a S3 bucket in a different region, since it will incur inter-region transfer costs for both upload and download.
There are other per-request costs that should only become significant if you are migrating thousands of snapshots. Please read the relevant Amazon documentation for details.
Occasionally, EC2 fails to properly attach a volume to the instance, and it gets stuck in the "attaching" state. You'll get this error message on the command line:
[snap-xxx] An error occurred, tagging snapshot with "migrate" so it can be retried later
[snap-xxx] Timed out waiting for vol-xxx to appear in /dev
On the EC2 web console, use the "force detach" option on the volume, then reattach it on a different mount-point and re-run snap-to-s3.
If you end up "poisoning" too many mount-points with this problem, you may need to stop and start your instance in order to clear them.
If Ctrl+C is pressed while an upload is in progress to S3 (sending a SIGINT), the multipart-upload to S3 is cleanly aborted.
If the process gets SIGINT at some other time, the snapshot that was currently being uploaded
will likely still have its tag set to "migrating", which will prevent it from being migrated
again when calling snap-to-s3 --migrate --all
.
You can either manually change that snapshot tag in the EC2 web console to "migrate"
before trying again, or you can explicitly pass the snapshot id to the --snapshot
argument
which will ignore the "migrating" tag for you.
If snap-to-s3
receives a SIGHUP
from your SSH session dropping, it will be killed.
Consider running snap-to-s3 in a screen
/tmux
session, or with
nohup.
If something really weird goes wrong, (e.g. process receives SIGKILL due to out of memory condition) a half-completed multipart upload might be left on S3, which will continue to incur storage charges. These incomplete uploads do not appear in your bucket as objects, or in the S3 web console.
You can use abort-incomplete-multipart to remove those leftovers. Another option is to add a lifecycle policy to the S3 bucket which automatically deletes incomplete multipart uploads after X days (where X is comfortably longer than the longest snapshot upload time you expect with snap-to-s3).