Skip to content

feat: Periodically VACUUM SQLite Database to Reclaim Disk Space? #3113

@raweber42

Description

@raweber42

What happened?

The SQLite database used by vcluster grows excessively over time and does not release disk space after data is deleted. In my case, I observed the database file growing to over 10 GB. This can lead to significant disk space consumption for long-running vclusters, potentially causing "disk full" errors and instability. I had to increase the controlplane.statefulSet.persistence.volumeClaim.size several times. Whenever the PVC reaches its limit, this leads to downtime of the whole vCluster.

What did you expect to happen?

I expect vcluster to manage its internal database storage efficiently, preventing it from growing indefinitely and reclaiming unused space when possible.

How can we reproduce it (as minimally and precisely as possible)?

  1. Deploy a vcluster using the default SQLite backend.

  2. Use the cluster for a while, creating and deleting many resources (e.g., pods, deployments, configmaps). We used Review Environments on Gitlab extensively.

  3. Observe the size of the SQLite database file in the vcluster pod's persistent volume. Notice that the file size only increases, even after resources are deleted.

Anything else we need to know?

This behavior is characteristic of SQLite, where pages from deleted data are marked as free for reuse within the database but are not returned to the operating system. This causes file size bloat. (Explanation by Gemini, fyi). For more context, also see here.

One solution is to run the SQLite VACUUM command, which rebuilds the database file and reclaims the unused space. The results are dramatic. I was able to shrink a database from 2.1 GB down to just 15 MB:

Before:

$ ls -lh platform-platform-data.db
-rw-r--r--@ 1 me  staff   2.1G Aug 20 17:33 platform-platform-data.db
After running VACUUM:

$ sqlite3 platform-platform-data.db 'VACUUM;'

$ ls -lh platform-platform-data.db
-rw-r--r--@ 1 me  staff    15M Aug 20 17:34 platform-platform-data.db

Some possible solutions (don't know if this is feasible):

Run VACUUM on startup: This would ensure the database is compacted each time the vcluster restarts. The main drawback is a potential increase in startup time.

Keep as is and let users runVACUUM manually: We could add my solution approach as an addition to the docs. This is the last resort solution imho, but it is a solution for people with this problem, too.

Host cluster Kubernetes version

$ kubectl version
Client Version: v1.33.1
Kustomize Version: v5.6.0
Server Version: v1.31.6+k3s1

vcluster version

$ vcluster --version
vcluster version 0.27.0

VCluster Config

# My vcluster.yaml / values.yaml here
_vclusterValues = lambda vcluster {
    values = {
        sync = {
            fromHost.secrets = {
                    enabled = True
                    mappings.byName = {
                        "XXXXX" = "XXXXX"
                    }
            }
        }
        controlPlane = {
            distro = {
                k3s = {
                    enabled = True
                    image.tag = "v1.31.6-k3s1"
                }
            }
            backingStore = {
                database = {
                    embedded = {enabled: True}
                }
            }
            service = {
                enabled = True
                spec = {
                    $type = "LoadBalancer"
                }
                annotations = {"lbipam.cilium.io/ips": vcluster.externalIP}
            }
            statefulSet = {
                imagePullPolicy = ""
                image = {
                    registry = "ghcr.io"
                    repository = "loft-sh/vcluster-oss"
                }
                resources = {
                    limits = {
                        "ephemeral-storage" = "8Gi"
                        memory = "4Gi"
                    }
                    requests = {
                        "ephemeral-storage" = "400Mi"
                        cpu = "200m"
                        memory = "256Mi"
                    }
                }
                highAvailability.replicas = 1
                persistence = {
                    volumeClaim = {
                        enabled = "auto"
                        size = "12Gi" # TODO: Investigate why this is growing so fast
                        storageClass = "longhorn-default"
                    }
                }
            }
            proxy = {extraSANs = ["${vcluster.name}.clusters.cistec.io", vcluster.externalIP]}
        }
        policies = {
            resourceQuota.enabled = False
            limitRange.enabled = False
        }
        exportKubeConfig = {
            context = "${vcluster.name}-vcluster"
            server = "https://${vcluster.name}.clusters.cistec.io"
            secret = {
                name = "${vcluster.name}-vcluster-kubeconfig"
                namespace = vcluster.name
            }
        }
        telemetry.enabled = False
    }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions