Vault Raft Snapshot Agent is a Go binary that takes periodic snapshots of a Vault HA cluster using the integrated raft storage backend. It can store the snapshots locally or upload them to a remote storage backend like AWS S3 as backup in case of system failure or user errors. This agent automates vault's manual standard backup procedure for a single vault cluster or clusters with disaster recovery.
In case of failure just follow the standard restore procedure for your cluster type using the last snapshot created by the agent from your backup storage.
If you're running on kubernetes, you can use the provided Helm-Charts to install Vault Raft Snapshot Agent into your cluster.
You can run the agent with the supplied container-image, e.g. via docker:
docker run -v <path to snapshots.json>:/etc/vault.d/snapshots.json" ghcr.io/argelbargel/vault-raft-snapshot-agent:latest
If your storage uses self-signed-certificates (e.g. self-hosted s3), you can add your certficates by mounting them at /tmp/certs
and run the container:
docker run -v <path to snapshots.json>:/etc/vault.d/snapshots.json" -v "<path to your certificates>:/tmp/certs" ghcr.io/argelbargel/vault-raft-snapshot-agent:latest
Upon startup the container will add these certficates to its certificate-store automatically.
If you want to use the plain binary, the recommended way of running this daemon is using systemctl, since it handles restarts and failure scenarios quite
well. To learn more about systemctl,
checkout this article.
begin, create the following file at /etc/systemd/system/snapshot.service
:
[Unit]
Description="An Open Source Snapshot Service for Raft"
Documentation=https://github.com/Argelbargel/vault-raft-snapshot-agent/
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/vault.d/snapshots.json
[Service]
Type=simple
User=vault
Group=vault
ExecStart=/usr/local/bin/vault-raft-snapshot-agent
ExecReload=/usr/local/bin/vault-raft-snapshot-agent
KillMode=process
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
Your configuration is assumed to exist at /etc/vault.d/snapshots.json
and the actual daemon binary
at /usr/local/bin/vault-raft-snapshot-agent
.
Then just run:
sudo systemctl enable snapshot
sudo systemctl start snapshot
If your configuration is right and Vault is running on the same host as the agent you will see one of the following:
Not running on leader node, skipping.
or Successfully created <type> snapshot to <location>
, depending on if the
daemon runs on the leader's host or not.
Most of the agents' configuration is done via its configuration-file or environment variables. The location of a custom configuration-file and logging are specified via the command-line:
Vault Raft Snapshot Agent uses go's slog package to provide structured logging
capabilities.
Log format text
uses TextHandler, json
uses JSONHandler.
If no log format or default
is specified the default log format is used which outputs the timestamp followed by the
message followed by additional key=value-pairs if any are present.
You can specify most command-line options via environment-variables:
Environment variable | Corresponding command-line-option |
---|---|
VRSA_CONFIG_FILE=<file> |
--config-file |
VRSA_LOG_FORMAT=<format> |
--log-format |
VRSA_LOG_LEVEL=<level> |
--log-level |
VRSA_LOG_OUTPUT=<output> |
--log-output |
Additionally Vault Raft Snapshot Agent supports static configuration via environment variables alongside its configuration file:
- for setting the address of the vault-server you can use
VAULT_ADDR
. - any other configuration option can be set by prefixing
VRSA_
to the upper-cased path to the key
and replacing.
with_
. For exampleVRSA_SNAPSHOTS_FREQUENCY=<value>
configures the snapshot-frequency andVRSA_VAULT_AUTH_TOKEN=<value>
configures the token authentication for vault.
In contrast to values specified via the configuration file, these environment variables are read once at startup only and the configuration will not be reloaded when their values change, except those specified as external property sources/Secret below which always reflect the currently configured value.
Options specified via environment-variables take precedence before the values specified in the configuration file - even those specified as external property sources/Secret!
Vault Raft Snapshot Agent uses viper as configuration-backend, so you can write your configuration in either json, yaml or toml.
The Agent monitors the configuration-file for changes and reloads the configuration automatically when the file changes.
vault:
nodes:
urls:
# Url of the (leading) vault-server
- https://vault-server:8200
auth:
# configures kubernetes auth
kubernetes:
role: "test-role"
snapshots:
# configures how often snapshots are made, default 1h
frequency: "4h"
# configures how many snapshots are retained, default 0
retain: 10
storages:
# configures local storage of snapshots
local:
path: /snapshots
(for a complete example with all configuration-options see complete.yaml)
Vault Raft Snapshot allows you to specify dynamic sources for properties containing secrets which you either do not want
to write into the configuration file or which might change while the agent is running. For these properties you may
specify either an environment variable as source using env://<variable-name>
or a file-source containing the value for
the secret using file://<file-path>
, where <file-path>
may be either an absolute path or a path relative to the
configuration file. Any value not prefixed with env://
or file://
will be used as is.
Dynamic properties are validated at startup only, so if e.g. you delete the source-file for a property required to authenticate with vault or connect to a remote storage while the agent is running, the next login to vault or upload to that storage will fail (gracefully)!
vault:
nodes:
urls:
- <http(s)-urls to vault-cluster nodes>
- ...
autoDetectLeader: true
insecure: <true|false>
timeout: <duration>
Key | Type | Required/Default | Description |
---|---|---|---|
nodes.urls |
List of URL | required | specifies at least one url to a vault-server |
nodes.autoDetectLeader |
Boolean | false | if true the agent will ask the nodes for the url to the leader. Otherwise it will try the given urls until it finds the leader node |
insecure |
Boolean | false | specifies whether insecure https connections are allowed or not. Set to true when you use self-signed certificates |
timeout |
Duration | 60s | timeout for the vault-http-client; increase for large raft databases (and increase snapshots.timeout accordingly!) |
It is recommended to specify only a single url in vault.nodes.urls
which always points to the current leader (e.g. to
http(s)://vault-active.<vault-namespace>.svc.cluster.local:<vault-server service-port>
when using the vault-helm chart) and to disable the automatic leader detection by not specifying nodes.autoDetectLeader
or setting it to false
.
If automatic leader detection is enabled the response of vault's /sys/leader-api-endpoint must return a leaderAddress
reachable by the agent.
If you specify multiple urls in vault.nodes.urls
without enabling vault.nodes.autoDetectLeader
, the agent contacts each node until one reports that it is is the current leader.
To allow Vault Raft Snapshot Agent to take snapshots, you must add a policy that allows read-access to the snapshot-apis. This involves the following:
vault login
with an admin user.- Create the following policy
vault policy write snapshots ./my_policies/snapshots.hcl
wheresnapshots.hcl
is:
path "/sys/storage/raft/snapshot"
{
capabilities = ["read"]
}
The above policy is the minimum required policy to be able to generate snapshots. This policy must be associated with the app- or kubernetes-role you specify in you're configuration (see below).
Only one of the following authentication options should be specified. If multiple options are specified one of them is
used with the following priority: approle
, aws
, azure
, gcp
,
kubernetes
, ldap
, token
, userpass
. If no option is specified, Vault Raft Snapshot Agent tries to access vault
unauthenticated (which should fail outside of test- or develop-environments)
Vault Raft Snapshot Agent automatically renews the authentication when it expires.
Authentication via AppRole (see the Vault docs)
vault:
auth:
approle:
role: "<role-id>"
secret: "<secret-id>"
Key | Type | Required/Default | Description |
---|---|---|---|
role |
Secret | required | specifies the role_id used to call the Vault API. See the authentication steps below |
secret |
Secret | required | specifies the secret_id used to call the Vault API. |
path |
String | approle | specifies the backend-name used to select the login-endpoint (auth/<path>/login ) |
To allow the App-Role access to the snapshots you should run the following commands on your vault-cluster:
vault write auth/<path>/role/snapshot token_policies=snapshots
vault read auth/<path>/role/snapshot/role-id
# prints role-id and meta-data
vault write -f auth/<path>/role/snapshot/secret-id
# prints the secret id and it's metadata
Uses AWS for authentication (see the Vault docs).
vault:
auth:
aws:
role: "<role>"
Key | Type | Required/Default | Description |
---|---|---|---|
role |
String | required | specifies the role used to call the Vault API. See the authentication steps below |
ec2Nonce |
Secret | enables EC2 authentication and sets the required nonce | |
ec2SignatureType |
String | pkcs7 | changes the signature-type for EC2 authentication; valid values are identity , pkcs7 and rs2048 |
iamServerIdHeader |
String | specifies the server-id-header when using IAM authentication type | |
region |
Secret | env://AWS_DEFAULT_REGION | specifies the aws region to use. |
path |
String | aws | specifies the backend-name used to select the login-endpoint (auth/<path>/login ) |
AWS authentication uses the IAM authentication type by default unless ec2Nonce
is set. The credentials for IAM
authentication must be
provided via environment variables (AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, AWS_SESSION_TOKEN
or AWS_SHARED_CREDENTIALS_FILE
;
AWS_SHARED_CREDENTIALS_FILE
must be specified as an absolute path).
To allow the access to the snapshots you should run the following commands on your vault-cluster:
# for AWS EC2 authentication
vault write auth/<path>/role/<role> auth_type=ec2 bound_ami_id=<ami-id> policies=snapshots max_ttl=500h
# for IAM authentication
vault write auth/<path>/role/<role> auth_type=iam bound_iam_principal_arn=<princial-arn> policies=snapshots max_ttl=500h
Authentication using Azure (see the Vault docs).
vault:
auth:
azure:
role: "<role-id>"
Key | Type | Required/Default | Description |
---|---|---|---|
role |
String | required | specifies the role used to call the Vault API. See the authentication steps below |
resource |
String | optional azure resource | |
path |
String | azure | specifies the backend-name used to select the login-endpoint (auth/<path>/login ) |
To allow the access to the snapshots you should run the following commands on your vault-cluster:
vault write auth/<path>/role/<role> \
policies="snapshots" \
bound_subscription_ids=<subscription-ids> \
bound_resource_groups=<resource-group>
Authentication using Google Cloud GCE or IAM authentication ( see the Vault docs).
vault:
auth:
gcp:
role: "<role>"
Key | Type | Required/Default | Description |
---|---|---|---|
role |
String | required | specifies the role used to call the Vault API. See the authentication steps below |
serviceAccountEmail |
String | activates IAM authentication and specifies the service-account to use | |
path |
String | gcp | specifies the backend-name used to select the login-endpoint (auth/<path>/login ) |
Google Cloud authentication uses the GCE authentication type by default unless serviceAccountEmail
is set.
To allow the access to the snapshots you should run the following commands on your vault-cluster:
# for IAM authentication type
vault write auth/<path>/role/<role> \
type="iam" \
policies="snapshots" \
bound_service_accounts="<service-account-email>"
# for GCE authentication type
vault write auth/<path>/role/<role> \
type="gce" \
policies="snapshots" \
bound_projects="<projects>" \
bound_zones="<zones>" \
bound_labels="<labels>" \
bound_service_accounts="<service-acoount-email>"
To enable Kubernetes authentication mode, you should follow the steps from the Vault docs and create the appropriate policies and roles.
vault:
auth:
kubernetes:
role: "test"
Key | Type | Required/Default | Description |
---|---|---|---|
role |
String | required | specifies vault k8s auth role |
jwtToken |
Secret | file:///var/run/secrets/kubernetes.io/serviceaccount/token | specifies the JWT-Token for the kubernetes service-account, must resolve to a non-empty value |
path |
String | kubernetes | specifies the backend-name used to select the login-endpoint (auth/<path>/login ) |
To allow kubernetes access to the snapshots you should run the following commands on your vault-cluster:
kubectl -n <your-vault-namespace> exec -it <vault-pod-name> -- vault write auth/<path>/role/<kubernetes.role> bound_service_account_names=* bound_service_account_namespaces=<namespace of your vault-raft-snapshot-agent-pod> policies=snapshots ttl=24h
Depending on your setup you can restrict access to specific service-account-names and/or namespaces.
Authentication using LDAP (see the Vault docs).
vault:
auth:
ldap:
role: "test"
Key | Type | Required/Default | Description |
---|---|---|---|
username |
Secret | required | the username |
password |
Secret | required | the password |
path |
String | ldap | specifies the backend-name used to select the login-endpoint (auth/<path>/login ) |
To allow access to the snapshots you should run the following commands on your vault-cluster:
# allow access for a specific user
vault write auth/<path>/users/<username> policies=snapshot
# allow access based on group
vault write auth/<path>/groups/<group> policies=snapshots
vault:
auth:
token: <token>
Key | Type | Required/Default | Description |
---|---|---|---|
token |
Secret | required | specifies the token used to log in |
Authentication using username and password ( see the Vault docs).
vault:
auth:
userpass:
username: "<username>"
password: "<password>"
Key | Type | Required/Default | Description |
---|---|---|---|
username |
Secret | required | the username |
password |
Secret | required | the password |
path |
String | userpass | specifies the backend-name used to select the login-endpoint (auth/<path>/login ) |
To allow access to the snapshots you should run the following commands on your vault-cluster:
vault write auth/<path>/users/<username> \
password=<password> \
policies=snapshots
snapshots:
frequency: <duration>
timeout: <duration>
retain: <int>
namePrefix: <prefix>
nameSuffix: <suffix>
timestampFormat: <format>
Key | Type | Required/Default | Description |
---|---|---|---|
frequency |
Duration | 1h | how often to run the snapshot agent |
retain |
Integer | 0 | the number of snapshots to retain. For example, if you set retain: 2 , the two most recent snapshots will be kept in storage. 0 means all snapshots will be retained |
timeout |
Duration | 60s | timeout for creating snapshots |
namePrefix |
String | raft-snapshot- | prefix of the uploaded snapshots |
nameSuffix |
String | .snap | suffix/extension of the uploaded snapshots |
timestampFormat |
Go Time.Format Layout-String | 2006-01-02T15-04-05Z-0700 | timestamp-format for the uploaded snapshots' timestamp; you can test your layout-string at the Go Playground |
The name of the snapshots is created by concatenating namePrefix
, the timestamp formatted according
to timestampFormat
and nameSuffix
, e.g. the defaults would generate
raft-snapshot-2023-09-01T15-30-00Z+0200.snap
for a snapshot taken at 15:30:00 on 09/01/2023 when the timezone is
CEST (GMT + 2h).
These options can be overridden for a specific storage:
snapshots:
frequency: 1h
retain: 24
storages:
local:
path: /snapshots
aws:
frequency: 24h
retain: 365
timestampFormat: 2006-01-02
#...
In this example the agent would take and store a snapshot to the local-storage every hour, retaining 24 snapshots and store a daily snapshot on aws remote storage, retaining the last 365 snapshots with a appropriate shorter timestamp.
Note: as the agent uses the default frequency in case of failures, you should always configure the shorter frequency in the defaults and specify longer frequencies for specific storages if required!
Note that if you specify more than one storage option, all specified storages will be written to. For example,
specifying local
and aws
will write to both locations.
When using multiple remote storages, increase the timeout allowed via snapahots.timeout
for larger raft databases.
Each option can be specified exactly once;
it is currently not possible to e.g. upload to multiple aws regions by specifying multiple aws
-storage-options.
Uploads snapshots to an AWS S3 storage bucket. This storage uses
the AWS Go SDK. Use this storage for S3 services that use an AWS S3-API compatible addressing-scheme (e.g. https://<bucket>-<endpoint>
). For other S3 implementations, try the generic s3 storage.
snapshots:
storage
aws:
bucket: <bucket>
Key | Type | Required/Default | Description |
---|---|---|---|
bucket |
String | required | bucket to store snapshots in |
accessKeyId |
Secret | env://AWS_ACCESS_KEY_ID | specifies the access key |
accessKey |
Secret | env://AWS_SECRET_ACCESS_KEY | specifies the secret access key; must resolve to non-empty value if accessKeyId resolves to a non-empty value |
sessionToken |
Secret | env://AWS_SESSION_TOKEN | specifies the session token |
region |
Secret | env://AWS_DEFAULT_REGION | S3 region if it is required |
keyPrefix |
String | prefix to store s3 snapshots in | |
endpoint |
Secret | env://AWS_ENDPOINT_URL | S3 compatible storage endpoint (ex: http://127.0.0.1:9000) |
useServerSideEncryption |
Boolean | false | Set to true to turn on AWS' AES256 encryption. Support for AWS KMS keys is not currently supported |
forcePathStyle |
Boolean | false | needed if your S3 Compatible storage supports only path-style, or you would like to use S3's FIPS Endpoint |
Any common snapshot configuration option overrides the global snapshot-configuration.
Uploads snapshots to an Azure Blob Storage container.
snapshots:
storages:
azure:
container: <container>
Key | Type | Required/Default | Description |
---|---|---|---|
container |
String | required | the name of the blob container to write to |
accountName |
Secret | env://AZURE_STORAGE_ACCOUNT | the account name of the storage account; must resolve to non-empty value |
accountKey |
Secret | env://AZURE_STORAGE_KEY | the account key of the storage account; must resolve to non-empty value |
cloudDomain |
String | blob.core.windows.net | domain of the cloud-service to use |
Any common snapshot configuration option overrides the global snapshot-configuration.
Uploads snapshots into a Google Cloud storage bucket.
snapshots:
storages:
gcp:
bucket: <bucket>
Key | Type | Required/Default | Description |
---|---|---|---|
bucket |
String | required | the Google Storage Bucket to write to. Auth is expected to be default machine credentials |
Any option common snapshot configuration option overrides the global snapshot-configuration.
snapshots:
storages:
local:
path: <path>
Key | Type | Required/Default | Description |
---|---|---|---|
path |
String | required | fully qualified path, not including file name, for where the snapshot should be written. i.e. /raft/snapshots |
Any common snapshot configuration option overrides the global snapshot-configuration.
Uploads snapshots to a Openstack Swift Object Storage container.
snapshots:
storages:
swift:
container: <container>
authUrl: <auth-url>
Key | Type | Required/Default | Description |
---|---|---|---|
container |
String | required | the name of the container to write to |
authUrl |
URL | required | the auth-url to authenticate against |
username |
Secret | env://SWIFT_USERNAME | the username used for authentication; must resolve to non-empty value |
apiKey |
Secret | env://SWIFT_API_KEY | the api-key used for authentication; must resolve to non-empty value |
region |
Secret | env://SWIFT_REGION | optional region to use eg "LON", "ORD" |
domain |
URL | optional user's domain name | |
tenantId |
String | optional id of the tenant | |
timeout |
Duration | 60s | timeout for snapshot-uploads |
Any common snapshot configuration option overrides the global snapshot-configuration.
Uploads snapshots to any S3-compatible server. This storage uses the MinIO Go Client SDK. If your self-hosted S3-server does not support the default adressing-scheme of AWS S3, then this storage might still work.
snapshots:
storage
s3:
endpoint: <endpoint>
bucket: <bucket>
Key | Type | Required/Default | Description |
---|---|---|---|
endpoint |
String | required | S3 compatible storage endpoint (ex: my-storage.example.com) |
bucket |
String | required | bucket to store snapshots in |
accessKeyId |
Secret | env://S3_ACCESS_KEY_ID | specifies the access key |
accessKey |
Secret | env://S3_SECRET_ACCESS_KEY | specifies the secret access key; must resolve to non-empty value if accessKeyId resolves to a non-empty value |
sessionToken |
Secret | env://S3_SESSION_TOKEN | specifies the session token |
region |
Secret | S3 region if it is required | |
insecure |
Boolean | false | whether to connect using https (false) or not |
skipSSLVerify |
Boolean | false | disable SSL certificate validation (true) or not |
Any common snapshot configuration option overrides the global snapshot-configuration.
- Source code is licensed under MIT
- Vault Raft Snapshot Agent was originally developed by @Lucretius
- contains improvements done by @F21
- enhancements for azure-uploader by @vikramhansawat
- support for additional authentication methods based on code from @alexeiser
- support for Openstack Swift Storage based on code from @Pyjou