Skip to content

An agent which creates periodic snapshots of Vault's integrated raft storage and stores them locally or in remote storages as backup in case of system failure or user errors.

License

Notifications You must be signed in to change notification settings

Argelbargel/vault-raft-snapshot-agent

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Release Artifact Hub

Vault Raft Snapshot Agent

Vault Raft Snapshot Agent is a Go binary that takes periodic snapshots of a Vault HA cluster using the integrated raft storage backend. It can store the snapshots locally or upload them to a remote storage backend like AWS S3 as backup in case of system failure or user errors. This agent automates vault's manual standard backup procedure for a single vault cluster or clusters with disaster recovery.

Restoring a Snapshot

In case of failure just follow the standard restore procedure for your cluster type using the last snapshot created by the agent from your backup storage.

Running

Helm-Chart

If you're running on kubernetes, you can use the provided Helm-Charts to install Vault Raft Snapshot Agent into your cluster.

Container-Image

You can run the agent with the supplied container-image, e.g. via docker:

docker run -v <path to snapshots.json>:/etc/vault.d/snapshots.json" ghcr.io/argelbargel/vault-raft-snapshot-agent:latest

Add self-signed-certificates

If your storage uses self-signed-certificates (e.g. self-hosted s3), you can add your certficates by mounting them at /tmp/certs and run the container:

docker run -v <path to snapshots.json>:/etc/vault.d/snapshots.json" -v "<path to your certificates>:/tmp/certs" ghcr.io/argelbargel/vault-raft-snapshot-agent:latest

Upon startup the container will add these certficates to its certificate-store automatically.

systemd-service

If you want to use the plain binary, the recommended way of running this daemon is using systemctl, since it handles restarts and failure scenarios quite well. To learn more about systemctl, checkout this article. begin, create the following file at /etc/systemd/system/snapshot.service:

[Unit]
Description="An Open Source Snapshot Service for Raft"
Documentation=https://github.com/Argelbargel/vault-raft-snapshot-agent/
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/vault.d/snapshots.json

[Service]
Type=simple
User=vault
Group=vault
ExecStart=/usr/local/bin/vault-raft-snapshot-agent
ExecReload=/usr/local/bin/vault-raft-snapshot-agent
KillMode=process
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

Your configuration is assumed to exist at /etc/vault.d/snapshots.json and the actual daemon binary at /usr/local/bin/vault-raft-snapshot-agent.

Then just run:

sudo systemctl enable snapshot
sudo systemctl start snapshot

If your configuration is right and Vault is running on the same host as the agent you will see one of the following:

Not running on leader node, skipping. or Successfully created <type> snapshot to <location>, depending on if the daemon runs on the leader's host or not.

Command-Line Options and Logging

Most of the agents' configuration is done via its configuration-file or environment variables. The location of a custom configuration-file and logging are specified via the command-line:

Long option Short option Description
--config <file> -c <file> load configuration from <file>; if not specified, searches for snapshots.\[json|toml|yaml\] in /etc/vault.d or the current working directory
--log-format <format> -f <format> format for log-output; possible values are default, json, text (default: default)
--log-level <level> -l <level> log-level; possible values are debug, info, warn or error (default: info)
--log-output <output> -o <output> output-target for logs; possible values are stderr, stdout or <path-to-logfile> (default: stderr)
--help, -h show help
--version -v prints version-information and exists

Structured Logging

Vault Raft Snapshot Agent uses go's slog package to provide structured logging capabilities. Log format text uses TextHandler, json uses JSONHandler. If no log format or default is specified the default log format is used which outputs the timestamp followed by the message followed by additional key=value-pairs if any are present.

Environment variables

You can specify most command-line options via environment-variables:

Environment variable Corresponding command-line-option
VRSA_CONFIG_FILE=<file> --config-file
VRSA_LOG_FORMAT=<format> --log-format
VRSA_LOG_LEVEL=<level> --log-level
VRSA_LOG_OUTPUT=<output> --log-output

Additionally Vault Raft Snapshot Agent supports static configuration via environment variables alongside its configuration file:

In contrast to values specified via the configuration file, these environment variables are read once at startup only and the configuration will not be reloaded when their values change, except those specified as external property sources/Secret below which always reflect the currently configured value.

Options specified via environment-variables take precedence before the values specified in the configuration file - even those specified as external property sources/Secret!

Configuration

Vault Raft Snapshot Agent uses viper as configuration-backend, so you can write your configuration in either json, yaml or toml.

The Agent monitors the configuration-file for changes and reloads the configuration automatically when the file changes.

Example configuration (yaml)

vault:
  nodes:
    urls:
      # Url of the (leading) vault-server
      - https://vault-server:8200
  auth:
    # configures kubernetes auth
    kubernetes:
      role: "test-role"
snapshots:
  # configures how often snapshots are made, default 1h
  frequency: "4h"
  # configures how many snapshots are retained, default 0
  retain: 10
  storages:
    # configures local storage of snapshots
    local:
      path: /snapshots

(for a complete example with all configuration-options see complete.yaml)

Secrets and external property-sources

Vault Raft Snapshot allows you to specify dynamic sources for properties containing secrets which you either do not want to write into the configuration file or which might change while the agent is running. For these properties you may specify either an environment variable as source using env://<variable-name> or a file-source containing the value for the secret using file://<file-path>, where <file-path> may be either an absolute path or a path relative to the configuration file. Any value not prefixed with env:// or file:// will be used as is.

Dynamic properties are validated at startup only, so if e.g. you delete the source-file for a property required to authenticate with vault or connect to a remote storage while the agent is running, the next login to vault or upload to that storage will fail (gracefully)!

Vault configuration

vault:
  nodes:
    urls:
      -  <http(s)-urls to vault-cluster nodes>
      - ...
    autoDetectLeader: true
  insecure: <true|false>
  timeout: <duration>
Key Type Required/Default Description
nodes.urls List of URL required specifies at least one url to a vault-server
nodes.autoDetectLeader Boolean false if true the agent will ask the nodes for the url to the leader. Otherwise it will try the given urls until it finds the leader node
insecure Boolean false specifies whether insecure https connections are allowed or not. Set to true when you use self-signed certificates
timeout Duration 60s timeout for the vault-http-client; increase for large raft databases (and increase snapshots.timeout accordingly!)

Vault Leader-Detection

It is recommended to specify only a single url in vault.nodes.urls which always points to the current leader (e.g. to http(s)://vault-active.<vault-namespace>.svc.cluster.local:<vault-server service-port> when using the vault-helm chart) and to disable the automatic leader detection by not specifying nodes.autoDetectLeader or setting it to false. If automatic leader detection is enabled the response of vault's /sys/leader-api-endpoint must return a leaderAddress reachable by the agent. If you specify multiple urls in vault.nodes.urls without enabling vault.nodes.autoDetectLeader, the agent contacts each node until one reports that it is is the current leader.

Vault authentication

To allow Vault Raft Snapshot Agent to take snapshots, you must add a policy that allows read-access to the snapshot-apis. This involves the following:

  1. vault login with an admin user.
  2. Create the following policy vault policy write snapshots ./my_policies/snapshots.hcl where snapshots.hcl is:
path "/sys/storage/raft/snapshot"
{
  capabilities = ["read"]
}

The above policy is the minimum required policy to be able to generate snapshots. This policy must be associated with the app- or kubernetes-role you specify in you're configuration (see below).

Only one of the following authentication options should be specified. If multiple options are specified one of them is used with the following priority: approle, aws, azure, gcp, kubernetes, ldap, token, userpass. If no option is specified, Vault Raft Snapshot Agent tries to access vault unauthenticated (which should fail outside of test- or develop-environments)

Vault Raft Snapshot Agent automatically renews the authentication when it expires.

AppRole authentication

Authentication via AppRole (see the Vault docs)

Minimal configuration
vault:
  auth:
    approle:
      role: "<role-id>"
      secret: "<secret-id>"
Configuration options
Key Type Required/Default Description
role Secret required specifies the role_id used to call the Vault API. See the authentication steps below
secret Secret required specifies the secret_id used to call the Vault API.
path String approle specifies the backend-name used to select the login-endpoint (auth/<path>/login)

To allow the App-Role access to the snapshots you should run the following commands on your vault-cluster:

vault write auth/<path>/role/snapshot token_policies=snapshots
vault read auth/<path>/role/snapshot/role-id
# prints role-id and meta-data
vault write -f auth/<path>/role/snapshot/secret-id
# prints the secret id and it's metadata

AWS authentication

Uses AWS for authentication (see the Vault docs).

Minimal configuration
vault:
  auth:
    aws:
      role: "<role>"
Configuration options
Key Type Required/Default Description
role String required specifies the role used to call the Vault API. See the authentication steps below
ec2Nonce Secret enables EC2 authentication and sets the required nonce
ec2SignatureType String pkcs7 changes the signature-type for EC2 authentication; valid values are identity, pkcs7 and rs2048
iamServerIdHeader String specifies the server-id-header when using IAM authentication type
region Secret env://AWS_DEFAULT_REGION specifies the aws region to use.
path String aws specifies the backend-name used to select the login-endpoint (auth/<path>/login)

AWS authentication uses the IAM authentication type by default unless ec2Nonce is set. The credentials for IAM authentication must be provided via environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN or AWS_SHARED_CREDENTIALS_FILE; AWS_SHARED_CREDENTIALS_FILE must be specified as an absolute path).

To allow the access to the snapshots you should run the following commands on your vault-cluster:

# for AWS EC2 authentication
vault write auth/<path>/role/<role> auth_type=ec2 bound_ami_id=<ami-id> policies=snapshots max_ttl=500h

# for IAM authentication
vault write auth/<path>/role/<role> auth_type=iam bound_iam_principal_arn=<princial-arn> policies=snapshots max_ttl=500h

Azure authentication

Authentication using Azure (see the Vault docs).

Minimal configuration
vault:
  auth:
    azure:
      role: "<role-id>"
Configuration options
Key Type Required/Default Description
role String required specifies the role used to call the Vault API. See the authentication steps below
resource String optional azure resource
path String azure specifies the backend-name used to select the login-endpoint (auth/<path>/login)

To allow the access to the snapshots you should run the following commands on your vault-cluster:

vault write auth/<path>/role/<role> \
    policies="snapshots" \
    bound_subscription_ids=<subscription-ids> \
    bound_resource_groups=<resource-group>

Google Cloud authentication

Authentication using Google Cloud GCE or IAM authentication ( see the Vault docs).

Minimal configuration
vault:
  auth:
    gcp:
      role: "<role>"
Configuration options
Key Type Required/Default Description
role String required specifies the role used to call the Vault API. See the authentication steps below
serviceAccountEmail String activates IAM authentication and specifies the service-account to use
path String gcp specifies the backend-name used to select the login-endpoint (auth/<path>/login)

Google Cloud authentication uses the GCE authentication type by default unless serviceAccountEmail is set.

To allow the access to the snapshots you should run the following commands on your vault-cluster:

# for IAM authentication type
vault write auth/<path>/role/<role> \
    type="iam" \
    policies="snapshots" \
    bound_service_accounts="<service-account-email>"

# for GCE authentication type
vault write auth/<path>/role/<role> \
    type="gce" \
    policies="snapshots" \
    bound_projects="<projects>" \
    bound_zones="<zones>" \
    bound_labels="<labels>" \
    bound_service_accounts="<service-acoount-email>"

Kubernetes authentication

To enable Kubernetes authentication mode, you should follow the steps from the Vault docs and create the appropriate policies and roles.

Minimal configuration
vault:
  auth:
    kubernetes:
      role: "test"
Configuration options
Key Type Required/Default Description
role String required specifies vault k8s auth role
jwtToken Secret file:///var/run/secrets/kubernetes.io/serviceaccount/token specifies the JWT-Token for the kubernetes service-account, must resolve to a non-empty value
path String kubernetes specifies the backend-name used to select the login-endpoint (auth/<path>/login)

To allow kubernetes access to the snapshots you should run the following commands on your vault-cluster:

  kubectl -n <your-vault-namespace> exec -it <vault-pod-name> -- vault write auth/<path>/role/<kubernetes.role> bound_service_account_names=*  bound_service_account_namespaces=<namespace of your vault-raft-snapshot-agent-pod> policies=snapshots ttl=24h

Depending on your setup you can restrict access to specific service-account-names and/or namespaces.

LDAP authentication

Authentication using LDAP (see the Vault docs).

Minimal configuration
vault:
  auth:
    ldap:
      role: "test"
Configuration options
Key Type Required/Default Description
username Secret required the username
password Secret required the password
path String ldap specifies the backend-name used to select the login-endpoint (auth/<path>/login)

To allow access to the snapshots you should run the following commands on your vault-cluster:

# allow access for a specific user
vault write auth/<path>/users/<username> policies=snapshot

# allow access based on group
vault write auth/<path>/groups/<group> policies=snapshots

Token authentication

Minimal configuration
vault:
  auth:
    token: <token>
Configuration options
Key Type Required/Default Description
token Secret required specifies the token used to log in

User and Password authentication

Authentication using username and password ( see the Vault docs).

Minimal configuration
vault:
  auth:
    userpass:
      username: "<username>"
      password: "<password>"
Configuration options
Key Type Required/Default Description
username Secret required the username
password Secret required the password
path String userpass specifies the backend-name used to select the login-endpoint (auth/<path>/login)

To allow access to the snapshots you should run the following commands on your vault-cluster:

vault write auth/<path>/users/<username> \
    password=<password> \
    policies=snapshots

Snapshot configuration

snapshots:
  frequency: <duration>
  timeout: <duration>
  retain: <int>
  namePrefix: <prefix>
  nameSuffix: <suffix>
  timestampFormat: <format>

Configuration options

Key Type Required/Default Description
frequency Duration 1h how often to run the snapshot agent
retain Integer 0 the number of snapshots to retain. For example, if you set retain: 2, the two most recent snapshots will be kept in storage. 0 means all snapshots will be retained
timeout Duration 60s timeout for creating snapshots
namePrefix String raft-snapshot- prefix of the uploaded snapshots
nameSuffix String .snap suffix/extension of the uploaded snapshots
timestampFormat Go Time.Format Layout-String 2006-01-02T15-04-05Z-0700 timestamp-format for the uploaded snapshots' timestamp; you can test your layout-string at the Go Playground

The name of the snapshots is created by concatenating namePrefix, the timestamp formatted according to timestampFormat and nameSuffix, e.g. the defaults would generate raft-snapshot-2023-09-01T15-30-00Z+0200.snap for a snapshot taken at 15:30:00 on 09/01/2023 when the timezone is CEST (GMT + 2h).

These options can be overridden for a specific storage:

snapshots:
  frequency: 1h
  retain: 24
  storages:
    local:
      path: /snapshots
    aws:
      frequency: 24h
      retain: 365
      timestampFormat: 2006-01-02
      #...

In this example the agent would take and store a snapshot to the local-storage every hour, retaining 24 snapshots and store a daily snapshot on aws remote storage, retaining the last 365 snapshots with a appropriate shorter timestamp.

Note: as the agent uses the default frequency in case of failures, you should always configure the shorter frequency in the defaults and specify longer frequencies for specific storages if required!

Storage configuration

Note that if you specify more than one storage option, all specified storages will be written to. For example, specifying local and aws will write to both locations. When using multiple remote storages, increase the timeout allowed via snapahots.timeout for larger raft databases. Each option can be specified exactly once; it is currently not possible to e.g. upload to multiple aws regions by specifying multiple aws-storage-options.

AWS S3 Storage

Uploads snapshots to an AWS S3 storage bucket. This storage uses the AWS Go SDK. Use this storage for S3 services that use an AWS S3-API compatible addressing-scheme (e.g. https://<bucket>-<endpoint>). For other S3 implementations, try the generic s3 storage.

Minimal Configuration
snapshots:
  storage
    aws:
      bucket: <bucket>
Configuration Options
Key Type Required/Default Description
bucket String required bucket to store snapshots in
accessKeyId Secret env://AWS_ACCESS_KEY_ID specifies the access key
accessKey Secret env://AWS_SECRET_ACCESS_KEY specifies the secret access key; must resolve to non-empty value if accessKeyId resolves to a non-empty value
sessionToken Secret env://AWS_SESSION_TOKEN specifies the session token
region Secret env://AWS_DEFAULT_REGION S3 region if it is required
keyPrefix String prefix to store s3 snapshots in
endpoint Secret env://AWS_ENDPOINT_URL S3 compatible storage endpoint (ex: http://127.0.0.1:9000)
useServerSideEncryption Boolean false Set to true to turn on AWS' AES256 encryption. Support for AWS KMS keys is not currently supported
forcePathStyle Boolean false needed if your S3 Compatible storage supports only path-style, or you would like to use S3's FIPS Endpoint

Any common snapshot configuration option overrides the global snapshot-configuration.

Azure Storage

Uploads snapshots to an Azure Blob Storage container.

Minimal Configuration
snapshots:
  storages:
    azure:
      container: <container>
Configuration Options
Key Type Required/Default Description
container String required the name of the blob container to write to
accountName Secret env://AZURE_STORAGE_ACCOUNT the account name of the storage account; must resolve to non-empty value
accountKey Secret env://AZURE_STORAGE_KEY the account key of the storage account; must resolve to non-empty value
cloudDomain String blob.core.windows.net domain of the cloud-service to use

Any common snapshot configuration option overrides the global snapshot-configuration.

Google Cloud Storage

Uploads snapshots into a Google Cloud storage bucket.

Minimal Configuration
snapshots:
  storages:
    gcp:
      bucket: <bucket>
Configuration Options
Key Type Required/Default Description
bucket String required the Google Storage Bucket to write to. Auth is expected to be default machine credentials

Any option common snapshot configuration option overrides the global snapshot-configuration.

Local Storage

Minimal Configuration
snapshots:
  storages:
    local:
      path: <path>
Configuration Options
Key Type Required/Default Description
path String required fully qualified path, not including file name, for where the snapshot should be written. i.e. /raft/snapshots

Any common snapshot configuration option overrides the global snapshot-configuration.

Openstack Swift Storage

Uploads snapshots to a Openstack Swift Object Storage container.

Minimal Configuration
snapshots:
  storages:
    swift:
      container: <container>
      authUrl: <auth-url>
Key Type Required/Default Description
container String required the name of the container to write to
authUrl URL required the auth-url to authenticate against
username Secret env://SWIFT_USERNAME the username used for authentication; must resolve to non-empty value
apiKey Secret env://SWIFT_API_KEY the api-key used for authentication; must resolve to non-empty value
region Secret env://SWIFT_REGION optional region to use eg "LON", "ORD"
domain URL optional user's domain name
tenantId String optional id of the tenant
timeout Duration 60s timeout for snapshot-uploads

Any common snapshot configuration option overrides the global snapshot-configuration.

Generic/MinIO S3 Storage

Uploads snapshots to any S3-compatible server. This storage uses the MinIO Go Client SDK. If your self-hosted S3-server does not support the default adressing-scheme of AWS S3, then this storage might still work.

Minimal Configuration
snapshots:
  storage
    s3:
      endpoint: <endpoint>
      bucket: <bucket>
Configuration Options
Key Type Required/Default Description
endpoint String required S3 compatible storage endpoint (ex: my-storage.example.com)
bucket String required bucket to store snapshots in
accessKeyId Secret env://S3_ACCESS_KEY_ID specifies the access key
accessKey Secret env://S3_SECRET_ACCESS_KEY specifies the secret access key; must resolve to non-empty value if accessKeyId resolves to a non-empty value
sessionToken Secret env://S3_SESSION_TOKEN specifies the session token
region Secret S3 region if it is required
insecure Boolean false whether to connect using https (false) or not
skipSSLVerify Boolean false disable SSL certificate validation (true) or not

Any common snapshot configuration option overrides the global snapshot-configuration.

License

  • Source code is licensed under MIT

Contributors

  • Vault Raft Snapshot Agent was originally developed by @Lucretius
  • contains improvements done by @F21
  • enhancements for azure-uploader by @vikramhansawat
  • support for additional authentication methods based on code from @alexeiser
  • support for Openstack Swift Storage based on code from @Pyjou

About

An agent which creates periodic snapshots of Vault's integrated raft storage and stores them locally or in remote storages as backup in case of system failure or user errors.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • Go 97.9%
  • Shell 1.8%
  • Dockerfile 0.3%