Skip to content

Databases and Storage

Process-ing edited this page Apr 13, 2024 · 7 revisions

Databases and Storage

To ensure our storage needs we will be using a 2-tier storage system with flash and HDD memory. Flash storage will be used for all the applications requiring high-speed storage, such as our distributed PostgreSQL and MongoDB databases. HDD storage will be used for applications that require large storage quantities.

SSD Storage

All nodes are equipped with SSDs used for the OS and pods.

In the K8s context, SSD storage is used for the distributed databases and for the distributed block storage system (currently Longhorn) to be used for general distributed reliable storage.

HDD Storage

HDD storage is concentrated in the storage node.

This node is responsible for holding large quantities of less frequently accessed data, like backups and long ter file storage (drive). This storage will be accessible through a NFS server, which will be accessible by the other nodes in the cluster and available to be used by pods with these storage requirements.

Data kept in HDD storage will not be backed up to offsite storage, as it is considered to be less critical data. However, the storage node will have a RAID ? configuration to ensure some level data integrity in case of disk failure.

Databases

The cluster hosts PostgreSQL and MongoDB instances. These instances are distributed and are backed up to the storage node and to offsite storage.

PostgreSQL

Distributed PostgreSQL database using CloudNativePG (referred to as CNPG from now on).

Configuration

To configure PostgreSQL, after setting up the cluster, you just need to run the script deploy-cnpg-dev.sh, for development, or deploy-cnpg-prod.sh, for production, from the root of the repository. For production, you may want to set (in the beginning of the script) the port to where you want to expose the service. The scripts do the following:

  1. Install the CNPG operator manifest
  2. Wait for the operator to be available
  3. Create a CNPG cluster in a new namespace pg, with a database tts-db with users tts (owner) and ni (superuser).
  4. Wait a hard-coded amount of time for the first pod to be created (at the time of writing, kubectl does not allow to wait for a non-existing resource)
  5. Wait for the rest of the pods to be ready
  6. In the case of the development script, port-forwards the specified local port to the port of the service pod port

From there, you can connect to the database using the following command:

psql -h localhost -p <port> -U <user> tts-db

MongoDB

To deploy MongoDB, we use the MongoDB Community Kubernetes Operator

Configuration

You can configure MongoDB very similarly to PostgreSQL: just run deploy-mongodb-dev.sh for development and deploy-mongodb-prod.sh for production, again, from the root of the repository. In development, the local port can be specified at the beginning of the script. The scripts do the following:

  1. Add the "MongoDB Helm Charts for Kubernetes" repository to Helm
  2. Install the "Custom Resource Definitions" and the "Community Operator" in a new namespace mongodb
  3. Deploy the replica set, with a user ni
  4. Wait a hard-coded amount of time for the first pod to be created (at the time of writing, kubectl does not allow to wait for a non-existing resource)
  5. Wait for the rest of the pods to be ready
  6. In the case of the development script, port-forwards the specified local port to the port of the service pod port

From there, you can connect to the database using the following command:

mongosh --port <port> --username <user> --password <pass>

Longhorn - Distributed flash block storage

Longhorn is a distributed block storage system for Kubernetes that uses block storage and provides features like snapshots, and backups. By replicating the data across multiple nodes, Longhorn ensures data availability and redundancy, even in case of node failure.

Configuration

After making sure all pre-requisites (the deploy ansible playbook should ensure this) are met and that a secret with the backup credential is created (instructions here), Longhorn can be installed using the script available at services/storage/longhorn/deploy.sh. This script receives a values file with the desired configurations for the Longhorn installation as an argument. The values file should be based on the dev-values.yaml file available at services/storage/longhorn/dev-values.yaml.

Managing nodes and disks

New nodes are added without any tags and with default disk configuration. You should use the provided k8s node annotations to specify the desired tags and disk configuration.

Warning

Beware that some configurations require Longhorn to be launched with custom configurations and that, after initial setup, configurations are not syncronized with the k8s node annotations. This means that to change tags and disk configurations you will need to use the Longhorn UI or to directly change the config in the Longhorn node CRD (lhn, Longhorn node).

Managing volumes

Volumes can be created according to configurations specified by a StorageClass or in the Longhorn UI. These volumes can be created on demand by a PVC or can be pre-created and then attached to a PV and PVC. Volumes can be created with kubectl or with the Longhorn UI.

Ensuring workloads run in nodes hosting a copy of the volume

The Storage Class parameter parameters.dataLocality controls whether Longhorn tries to keep a replica of the volume in the same node as the workload using it. This is useful for workloads that require high-speed storage, like databases and other high-throughput applications.

Available options are:

  • dataLocality: "disabled" - This is the default option. There may or may not be a replica on the same node as the attached volume (workload).

  • dataLocality: "best-effort" - This option instructs Longhorn to try to keep a replica on the same node as the attached volume (workload). Longhorn will not stop the volume, even if it cannot keep a replica local to the attached volume (workload) due to an environment limitation, e.g. not enough disk space, incompatible disk tags, etc.

  • dataLocality: "strict-local" - This option enforces Longhorn keep the only one replica on the same node as the attached volume, and therefore, it offers higher IOPS and lower latency performance.

  • Documentation here:

Auto-balancing

The Storage Class parameter replicaAutoBalance controls where Longhorn keeps the requested replicas of a volume. This helps ensure that all replicas will not end up on the same node, which would make the volume unavailable if that node fails.

Available options are:

Exclude volume from backups

To exclude a volume from backups, its Storage Class parameter parameters.recurringJobSelector should not include recurring jobs or recurring job groups with backup tasks.

An intro to recurring jobs

Recurring jobs are a way to run tasks periodically in Longhorn. These tasks can run standalone or as part of a recurring job group. Groups that apply to a volume are defined by the Storage Class parameter parameters.recurringJobSelector, as previously mentioned.

Snapshots and backups are done by recurring jobs. Snapshots are taken by snapshot jobs, and backups are done by backup jobs. Other tasks are available.

Predefined Storage Classes

Some Storage Classes with different settings are available at services/storage/longhorn/storageClasses/. Snapshots are enabled for all of them, backups are only enabled when specified.

  • longhorn-strict-local - Storage Class with dataLocality: "strict-local". Backups are enabled. strict-local has only one replica per volume, working as a high-speed storage solution, but with no redundancy, like a the common local path provisioner.
  • longhorn-strict-local-retain - Storage Class with dataLocality: "strict-local" and reclaimPolicy: "Retain". Backups are enabled. strict-local has only one replica per volume, working as a high-speed storage solution, but with no redundancy, like a the common local path provisioner.
  • longhorn-locality - Storage Class with dataLocality: "best-effort" and replicaAutoBalance: "least-effort". Backups are enabled.
  • longhorn-retain.yaml - Storage Class with replicaAutoBalance: "least-effort" and reclaimPolicy: "Retain". Backups are enabled.
  • longhorn-locality-retain - Storage Class with dataLocality: "best-effort", replicaAutoBalance: "least-effort" and reclaimPolicy: "Retain". Backups are enabled.
  • longhorn-locality-no-backup - Storage Class with dataLocality: "best-effort" and replicaAutoBalance: "least-effort".
  • longhorn-retain-no-backup-retain - Storage Class with dataLocality: "best-effort", replicaAutoBalance: "least-effort" and reclaimPolicy: "Retain".

All classes have fake storage classes with matching names available at services/storage/longhorn/storageClasses/fakeDevClasses/. These do not implement any Longhorn behavior and use local path provisioning. However, they keep the retention policies.

Backups

Longhorn provides backup mechanisms for volumes using many different strategies.

Snapshots

A snapshot in Longhorn captures the state of a volume at the time the snapshot is created. Each snapshot only captures changes that overwrite data from earlier snapshots, so a sequence of snapshots is needed to fully represent the full state of the volume. Volumes can be restored from a snapshot.

The snapshot recurring job is defined to run every day at 00:00 and 12:00. It will keep 6 snapshots of the volume, corresponding to the last 3 days.

Backups per se

Offsite backups can be made using different storage providers. The most interesting for our use-case are NFS and S3.

We are currently planning on using R2 (Clouflares S3 compatible object storage) as the backup storage for Longhorn. It's a cheap and reliable solution without egress fees.

Another option could be to use a NFS server to store backups in the HDD storage node, and then backup this data to offsite storage. This would enable faster recovery, but should be overkill for now.

The backup recurring job is defined to run every Sunday at 03:00. It will keep 12 backups of the volume, corresponding to the last 3 months.