-
Notifications
You must be signed in to change notification settings - Fork 523
Description
What happened?
The SQLite database used by vcluster grows excessively over time and does not release disk space after data is deleted. In my case, I observed the database file growing to over 10 GB. This can lead to significant disk space consumption for long-running vclusters, potentially causing "disk full" errors and instability. I had to increase the controlplane.statefulSet.persistence.volumeClaim.size
several times. Whenever the PVC reaches its limit, this leads to downtime of the whole vCluster.
What did you expect to happen?
I expect vcluster to manage its internal database storage efficiently, preventing it from growing indefinitely and reclaiming unused space when possible.
How can we reproduce it (as minimally and precisely as possible)?
-
Deploy a vcluster using the default SQLite backend.
-
Use the cluster for a while, creating and deleting many resources (e.g., pods, deployments, configmaps). We used Review Environments on Gitlab extensively.
-
Observe the size of the SQLite database file in the vcluster pod's persistent volume. Notice that the file size only increases, even after resources are deleted.
Anything else we need to know?
This behavior is characteristic of SQLite, where pages from deleted data are marked as free for reuse within the database but are not returned to the operating system. This causes file size bloat. (Explanation by Gemini, fyi). For more context, also see here.
One solution is to run the SQLite VACUUM
command, which rebuilds the database file and reclaims the unused space. The results are dramatic. I was able to shrink a database from 2.1 GB down to just 15 MB:
Before:
$ ls -lh platform-platform-data.db
-rw-r--r--@ 1 me staff 2.1G Aug 20 17:33 platform-platform-data.db
After running VACUUM:
$ sqlite3 platform-platform-data.db 'VACUUM;'
$ ls -lh platform-platform-data.db
-rw-r--r--@ 1 me staff 15M Aug 20 17:34 platform-platform-data.db
Some possible solutions (don't know if this is feasible):
Run VACUUM
on startup: This would ensure the database is compacted each time the vcluster restarts. The main drawback is a potential increase in startup time.
Keep as is and let users runVACUUM
manually: We could add my solution approach as an addition to the docs. This is the last resort solution imho, but it is a solution for people with this problem, too.
Host cluster Kubernetes version
$ kubectl version
Client Version: v1.33.1
Kustomize Version: v5.6.0
Server Version: v1.31.6+k3s1
vcluster version
$ vcluster --version
vcluster version 0.27.0
VCluster Config
# My vcluster.yaml / values.yaml here
_vclusterValues = lambda vcluster {
values = {
sync = {
fromHost.secrets = {
enabled = True
mappings.byName = {
"XXXXX" = "XXXXX"
}
}
}
controlPlane = {
distro = {
k3s = {
enabled = True
image.tag = "v1.31.6-k3s1"
}
}
backingStore = {
database = {
embedded = {enabled: True}
}
}
service = {
enabled = True
spec = {
$type = "LoadBalancer"
}
annotations = {"lbipam.cilium.io/ips": vcluster.externalIP}
}
statefulSet = {
imagePullPolicy = ""
image = {
registry = "ghcr.io"
repository = "loft-sh/vcluster-oss"
}
resources = {
limits = {
"ephemeral-storage" = "8Gi"
memory = "4Gi"
}
requests = {
"ephemeral-storage" = "400Mi"
cpu = "200m"
memory = "256Mi"
}
}
highAvailability.replicas = 1
persistence = {
volumeClaim = {
enabled = "auto"
size = "12Gi" # TODO: Investigate why this is growing so fast
storageClass = "longhorn-default"
}
}
}
proxy = {extraSANs = ["${vcluster.name}.clusters.cistec.io", vcluster.externalIP]}
}
policies = {
resourceQuota.enabled = False
limitRange.enabled = False
}
exportKubeConfig = {
context = "${vcluster.name}-vcluster"
server = "https://${vcluster.name}.clusters.cistec.io"
secret = {
name = "${vcluster.name}-vcluster-kubeconfig"
namespace = vcluster.name
}
}
telemetry.enabled = False
}
}