Volume Management #8367

smira · 2024-02-26T11:15:41Z

Closely related: #8016

Problem Statement

Talos Linux is not flexible in the way it manages volumes, it occupies the whole system disk, creating an EPHEMERAL partition covering 99% of the disk space. User disk management is fragile, requires extra steps to get it to work properly (mounting into the kubelet), doesn’t support wiping disks, etc. Talos does not detect properly various partition types which leads to wiping user data (e.g. Ceph bluestore).

There were following requests from the users/customers which can’t be addressed in the current design:

running Talos network booted (i.e. with STATE / EPHEMERAL on tmpfs)
running Talos installed e.g. to the SBC’s SD card, but mounting /var from an NVMe/SSD
Azure VMs with directly attached NVMes will always install Talos to the network volume, so locally attached NVMe can’t be used for e.g. containerd state
splitting etcd data directory to a separate disk
performing wipe operations on the contents of the /var:
- wiping system directories (e.g. containerd state)
- wiping user data (e.g. /var/mnt/foo)
disk encryption of the user disks (volumes)
read-only user disks mounts (e.g. mounting my precious photo archive to the Talos machine, making sure that Talos never touches the contents)
volume/disk management operations without reboot:
- wiping user disks/volumes
- wiping system areas
creating additional user-managed partitions on the system disk:
- /data
- swap space
container image cache storage
some persistent across reboots logs storage (e.g. storing installation logs during staged upgrades)

The proposed design provides an option to solve the issues mentioned above.

Groundwork

Before we move into volume management operations, there is some amount of work that needs to be done to improve the blockdevice management operations:

Talos should quickly and easily detect various filesystem/partition types, including the most common ones, the ones that can be used in Kubernetes clusters. The minimum detection of the filesystem/partition prevents the disk from being considered empty and eligible for allocation
The blockdevices/partitions should be presented as resources in a way that it allows to render them as a tree presenting a user with a view of available block devices, their types, partitions, filesystem types, available space, etc.
Talos should detect and reliably show information which matches various standard Linux tools, e.g. blkid, to allow easier identification of storage objects.

Installation Process

Talos installation should do the bare minimum to make sure that Talos can be booted from the disk, without touching the pieces which are not strictly required to boot Talos. This might include installing Talos without having machine configuration.

So the install should only touch following partitions/objects:

BOOT / EFI partitions (boot assets, boot loader itself)
META partition
bootloader-specific stuff, e.g. writing to the MBR

Any management of the storage/volumes should be deferred to the Talos running on the host (i.e. creating /var, /system/state, etc.)

Volumes

Let’s introduce a new concept of the volumes, which will solve the ideas mentioned above and allow us to take storage management to the next level.

There are two kinds of volumes:

system volumes (i.e. required by Talos, and Talos provides default configuration for them if no other configuration is available)
user volumes (configured and managed by users, optional)

Every volume has several most important features:

lookup: Talos can find the volume by some selector, or say that the volume is not available
provisioning (optional): if the volume is not available, Talos can provision the volume (e.g. create a partition), so that it can be looked up, and it becomes available
parent: if the volume has a parent volume, it creates a dependency relationship between them
mount path: might create another dependency on the volume which provides the mount path

Volumes support a basic set of operations:

Mount/Unmount
Wipe (requires unmounting first)
Destroy (implies Wipe, but removes provisioned volumes)

Volume types:

disk (e.g. use the whole disk)
partition (allocate a partition on the disk)
subdirectory (a sub-path on other Volume)

Volume formats:

filesystem (or none)
encryption (or none)
…

Volume additional options:

criticality (should be available for the pods to be started)
mounted into the kubelet

System Volumes

As of today, Talos implicitly has the following volume types:

Name	Lookup	Provisioning	Format
STATE	a partition with the label STATE	create a partition on the system disk of size X MiB	xfs, optionally encrypted
EPHEMERAL	a partition with the label EPHEMERAL	create a partition on the system disk which occupies all remaining space	xfs, optionally encrypted
etcd data	-	subdirectory of EPHEMERAL, /var/lib/etcd	-
containerd data	-	subdirectory of EPHEMERAL, /var/lib/containerd	-
kubelet data	-	subdirectory of EPHEMERAL, /var/lib/kubelet	-

Volume Lifecycle

Talos services can express their dependency on the volumes. For example, kubelet service can only be started when kubelet data volume is available. Same way, if kubelet data volume is going to be unmounted, kubelet should be stopped first.

The boot process should naturally stop when the required volume is not available. E.g. maintenance mode of Talos implies that the boot can’t proceed as long as the volume configuration is not available.

Volume Configuration

System volumes have implicit configuration, which is applied as long as v1alpha1.Config is applied to the machine. Some properties are configurable in v1alpha1.Config, e.g. disk encryption. If an explicit volume configuration is provided, Talos uses that.

For example, if the user configures EPHEMERAL to be tmpfs of size 10 GiB, it will be created on each boot as instructed.

Users might provide configuration for user volumes (similar to the user disks feature today), which might be critical for the pods to be started, an otherwise e.g. extension services might provide a dependency on the additional volumes.

Some system volumes might be optional, i.e. configured by the users - for example, container image cache.

Upgrades and Wiping

Talos Linux upgrades should not wipe anything by default, and wiping should be an additional operation which can be done without an upgrade, or can optionally be combined with an upgrade.

Update itself should only modify boot assets/boot loader, i.e. ensure that the new version of Talos Linux can be booted up from the disk device.

Wiping is volume-based, examples:

I want to wipe EPHEMERAL, which implies wiping all volumes which have EPHEMERAL as a parent (e.g. subdirectory volume of /var/lib/etcd); all services which depend on EPHEMERAL or its children should be stopped, but reboot is not necessary, as the EPHEMERAL will be re-provisioned after the wipe
I want to wipe etcd data, which in the default configuration implies leaving etcd, stopping etcd services, performing rm -rf /var/lib/etcd, and re-starting etcd join process

Notes

As pointed out by @utkuozdemir, EPHEMERAL might be a bad name given that the partition is not supposed to be forced wiped by default.

Tasks

Give feedback

v2: Talos Volume support go-blockdevice#78

7 of 8
implement configuration for EPHEMERAL volume #9261
remove v1 pkg/mount #9471
remove v1 go-blockdevice
implement mount controller #9602
implement volume wipe
support wiping non-volume blockdevices #9731
implement wipe by probe signatures #9724
implement directory volumes
user volumes
"Error probing disks" after upgrade from 1.7.6 to 1.8.1 #9530
Options

The text was updated successfully, but these errors were encountered:

smira · 2024-03-12T10:39:06Z

This feature is going to be shifted to Talos 1.8.0 (only first bits might appear in Talos 1.7.0).

Talos 1.8.0 will be released as soon as this feature is ready.

runningman84 · 2024-05-09T19:03:08Z

Some software like longhorn might not respect limits and fill the whole disk. It would be great if a misbehaving pod cannot destroy etcd or other core parts of talos by just claiming all the available disk space.

andrewrynhard · 2024-05-09T19:58:18Z

Some software like longhorn might not respect limits and fill the whole disk. It would be great if a misbehaving pod cannot destroy etcd or other core parts of talos by just claiming all the available disk space.

This is good to know. I always liked the separation of Talos and it using a dedicated disk to prevent unknowns/complications like this. Any ideas on how we could impose those limitations?

runningman84 · 2024-05-09T20:20:26Z

From my point of view something like lvm and partitions for each part would help. I used a similar setup in k3s and had never issues like this.

LVM would also make the encryption part easy because you only have to encrypt one device…

bplein · 2024-05-09T23:26:04Z

Allow the choice of any block device.

A partition is also a block device. People could partition a single SSD with sufficient space for Talos and then an additional partition for general use. Filling up the general use partition isn’t going to affect the Talos partition(s)

PeterFalken · 2024-05-09T23:51:17Z

Allow the choice of any block device.

A partition is also a block device. People could partition a single SSD with sufficient space for Talos and then an additional partition for general use. Filling up the general use partition isn’t going to affect the Talos partition(s)

Similar to how the newer ESXi installer does when using the systemMediaSize option, this allows the installer to make the system & OS partitions at the beginning of the disk.While leaving free space at the end of the disk.

runningman84 · 2024-05-10T06:04:49Z

I think at minimal we would need two partitions or lvm volumes:
Talos (etcd and other stuff)
General purpose (container data and so on)

It would be great if we could have an option to say okay we also need 100gb longhorn space and 50gb local path space. That are just some examples we would just need a volume size and mount path. All remaining space could be assigned to the general purpose partition. Here the default setting should be to use all space.

With something like lvm we could also allow to fix the general volume to a specific size and leave the remaining space unused. The would allow for expansion of other volumes or ensure that all nodes are the same even if one has a bigger disk.

cortopy · 2024-05-18T08:47:49Z

This feature is going to be shifted to Talos 1.8.0 (only first bits might appear in Talos 1.7.0).

Talos 1.8.0 will be released as soon as this feature is ready.

Thank you for clarifying @smira! If I set up a cluster with 1.7 today, will there be a migration path in 1.8 to have talos managing disks as proposed in this issue?

smira · 2024-05-20T10:20:31Z

Thank you for clarifying @smira! If I set up a cluster with 1.7 today, will there be a migration path in 1.8 to have talos managing disks as proposed in this issue?

Talos is always backwards compatible, so upgrade to 1.8 will always work. You would be able to start using volume management features, but some of them (e.g. shrinking /var) might require a wipe of some volumes.

laibe · 2024-05-28T08:53:39Z

OpenEBS had a component called ndm (node-disk-manager) that was quite handy to manage block-devices. HostPath and OS disks could be excluded with filters, e.g.:

    filterconfigs:
      - key: os-disk-exclude-filter
        name: os disk exclude filter
        state: true
        exclude: "/,/etc/hosts,/boot,/var/mnt/openebs/nvme-hostpath-xfs"

This was used by the localpc-device sc, letting you assign a whole block device to a pod. Unfortunately they have stopped supporting ndm and localpv-device with the release of OpenEBS 4.0.

It would be great if talos had a similar feature!

This is early WIP. See siderolabs#8367 Signed-off-by: Andrey Smirnov <[email protected]>

chr0n1x · 2024-07-08T23:57:34Z

this is incredibly exciting, happy to give it a whirl once you get an RC/beta or something @smira . thank you!

This is early WIP. See siderolabs#8367 Signed-off-by: Andrey Smirnov <[email protected]>

PrivatePuffin · 2024-07-13T08:30:57Z

@smira TopoLVM requires a pvcreate and vgcreate to be able to allocate remaining free disk space.

What I get from this issue, is that we would be able to at least allocate the system disk with free-space remaining, which is already 99% of the way there!

Does it also allow us to use lvm using pvcreate and vgcreate to consume the rest of the disk space?
Note: not a lvm specialist at all.

threeseed · 2024-11-19T22:11:53Z

Was asked to add some notes about software RAID for on-premise i.e. metal. Not looking for a response just wanted to add another viewpoint.

I appreciate that the goal of Kubernetes is for disposable hardware. But that’s not the reality these days where GPUs are expensive and hard to come by. So for us we would rather not run workloads as opposed to spend another 20k just to have a backup GPU server sitting there idle in case of failure. So any improvement in per-node reliability is a big deal for us.
Hardware RAID is not an option for Intel hardware as VROC/DST only works with Intel SSDs. And using third party cards is difficult because the format is often proprietary. So if you can’t get easily get a replacement card your data is lost.
Using OpenEBS, Ceph etc is an issue because we aren’t looking to replicate data across multiple machines. This cripples performance where our data sizes are large and we are sensitive to latency. We just want the data on our node to be more available. Also it’s not exactly a simple piece of software to operate.

PrivatePuffin · 2024-11-22T11:06:16Z

@nadenf I dont get what you aim for here with this response.
It adds nothing to the issue, which is just about adding volume management.

Completely irrelevant to this issue about volume management.
Ok, so what does this has to do with hardware raid? Nothing at all.
3.a. OpenEBS LocalPV is not a replicated form of storage.
3.b. OpenEBS ALSO offers a ZFS backend
3.c. this issue has absolutely NOTHING to do with availability in itself

You're sharing a viewpoint, that is irrelevant for the issue. As it's already been decided to add this feature, your opinion or implementation of it (aka viewpoint) is of absolutely zero relevance.

Byond that just because you say you're not looking for responses, posting off topic crap like this (which even contains wrong information), is always going to get the inherent response of "stfu".

As you're pushing yourself into the notification of all 14 people watching this issue, with information that is of zero relevance. It's as much relevance to this issue as sharing your favorite recipe for fried kangaroo.

threeseed · 2024-11-22T11:35:41Z

@nadenf I dont get what you aim for here with this response.
It adds nothing to the issue, which is just about adding volume management.

I was asked to add it here. By Steve Francis, the CEO of Sidera Labs.

And perhaps the thinking is that since a RAID is a form of volume that the design whilst not supporting software RAID could at least be done in a way that does not prevent it being added in the future.

PrivatePuffin · 2024-11-22T12:50:01Z

I was asked to add it here. By Steve Francis, the CEO of Sidera Labs.

I bet you also emailed him some off-topic crap out-of-the-blue like you did here.

And perhaps the thinking is that since a RAID is a form of volume that the design whilst not supporting software RAID could at least be done in a way that does not prevent it being added in the future.

In that case, freaking say that.
Instead of rambling on about all sorts of things that that have nothing to do with the issue.

steverfrancis · 2024-11-22T15:18:43Z

@PrivatePuffin, I (Sidero CEO) did ask @nadenf to post to this epic.

Looking closer, it is the wrong epic - it's volume, not disk/partition management, but that is on me.

Posting why a user wants a feature, without asking for a specific response, is very valuable to us.

The reason Talos has such limited volume/partition/disk management features is that it has a very architecturally pure design - and in the pure design, it is best if Talos owns the entire disk, that it can completely wire/erase/repartition on any upgrade, and all other disk management is done on other disks, and if things go awry, you add another node (cattle, not pets.)

This doesn't work in real life, but our product team appreciate reminders as to why, which is exactly the context that was posted.

While I appreciate your thoughts to keep issues relevant, your impolite and aggressive tone is completely inappropriate.

To quote yourself, you could have just said:

"I dont get what you aim for here with this response.
It adds nothing to the issue, which is just about adding volume management."

Instead of you, @PrivatePuffin, "rambling on about all sorts of things that that have nothing to do with the issue" or the point your are trying to make, and further are rude, and leave a bad taste in the community.
And then exacerbating the issue of "pushing yourself into the notification of all 14 people" by adding another snide rebuttal.

0dragosh · 2024-11-22T15:22:03Z

Just wanted to pop in and say the entire community loves Talos and just ignore the one butthead.

Keep up the good work guys!

absolutejam · 2024-11-28T11:20:53Z

Given the new implementation and support for configuring the EPHEMERAL volume, is it now possible to split a disk between machine.disks and VolumeConfig? If I configure machine.disks[0].partitions[0].size somewhat smaller than the associated NVMe drive, and then configure a VolumeConfig with diskSelector targeting the same drive, will it automagically add and use a second partition for the EPHEMERAL volume? (Use case: getting the most out of internal NVMe in a homelab Turing Pi 2 cluster with Longhorn…)

Did you have any joy with this approach @isometry? Would love to see an example if so!

maxpain · 2024-12-01T21:47:00Z

It would be helpful to create custom partitions.
In my case, my blades have two HDDs for Ceph and only one M.2 NVMe for Talos Linux. I want to use this SSD not only for Talos but also for Ceph metadata, but it requires a dedicated block device (it's impossible to use hostPath).

maxpain · 2024-12-03T08:20:36Z

Any workarounds for now?
I set a fixed size for EPHEMERAL volume using VolumeConfig:

apiVersion: v1alpha1
kind: VolumeConfig
name: EPHEMERAL
provisioning:
  maxSize: 30GiB
  grow: false

And then manually created a custom volume for Ceph metadata:

How bad is this as a temporary solution? I even tried talosctl upgrade, and it works.

smira · 2024-12-03T09:13:15Z

It's designed to be working this way. Keep in mind that talosctl reset without specific labels will blow up the whole disk still, but we are working on a solution for that as well.

maxpain · 2024-12-15T15:40:56Z

It would be helpful to create custom partitions. In my case, my blades have two HDDs for Ceph and only one M.2 NVMe for Talos Linux. I want to use this SSD not only for Talos but also for Ceph metadata, but it requires a dedicated block device (it's impossible to use hostPath).

Unfortunately, Ceph doesn't support the use of partitions for OSD:

rookcmd: failed to configure devices: failed to initialize osd: Partition device nvme0n1p7 can not be specified as metadataDevice in the global OSD configuration or in the node level OSD configuration

It would be cool to have a way to manage NVMe namespaces in the machine configuration.

davralin · 2024-12-15T15:50:46Z

It would be helpful to create custom partitions. In my case, my blades have two HDDs for Ceph and only one M.2 NVMe for Talos Linux. I want to use this SSD not only for Talos but also for Ceph metadata, but it requires a dedicated block device (it's impossible to use hostPath).

Unfortunately, Ceph doesn't support the use of partitions for OSD:
rookcmd: failed to configure devices: failed to initialize osd: Partition device nvme0n1p7 can not be specified as metadataDevice in the global OSD configuration or in the node level OSD configuration
It would be cool to have a way to manage NVMe namespaces in the machine configuration.

I have previously used partitions for rook, so I know this used to work, but that was with kubeadm-clusters, so things might have changed since I switched to talos 2 years ago.

That being said, this mentions partitions several times as something one can use.

Edit: I would try wiping the partition if you are sure it's ready for rook

seang96 · 2024-12-15T15:52:46Z

My cluster runs nixos right now waiting for this feature on Talos. I specify by disk uuid for ceph though not the partition name and it works fine.

maxpain · 2024-12-15T15:53:15Z

That being said, this mentions partitions several times as something one can use.

Hmm.

halittiryaki · 2024-12-15T18:15:23Z

after shrinking the ephemeral disk, how can i mount the remaining disk space as seperate partition to eg: /var/mnt/persistent

apiVersion: v1alpha1
kind: VolumeConfig
name: EPHEMERAL
provisioning:
  diskSelector:
    match: system_disk
  minSize: 200GB
  maxSize: 200GB
  grow: false

the disks section in machineconfig only is able to select devices,.
so the only options currently would be to use staticpods/daemonsets that calls mount .. for a manually created partition on the remaining space?

halittiryaki · 2024-12-15T20:55:57Z

Any workarounds for now? I set a fixed size for EPHEMERAL volume using VolumeConfig:
apiVersion: v1alpha1
kind: VolumeConfig
name: EPHEMERAL
provisioning:
  maxSize: 30GiB
  grow: false
And then manually created a custom volume for Ceph metadata:

How bad is this as a temporary solution? I even tried talosctl upgrade, and it works.

did you manage to manually create and bind the volume with talos?

maxpain · 2024-12-15T21:06:08Z

did you manage to manually create and bind the volume with talos?

No, each node in my cluster has two HDDs for Ceph data and one NVMe SSD for Talos and Ceph metadata.
I decided to use NVMe namespaces instead. It allows me to have two or more "virtual" block devices (/dev/nvme0n1 and /dev/nvme0n2) instead of partitions, so I no longer need VolumeConfig.

isometry · 2024-12-19T05:49:14Z

Did you have any joy with this approach @isometry? Would love to see an example if so!

Yes, I've (only just) finished testing this successfully.

I applied the following configuration snippets to a freshly reset Turing RK1 with Talos v1.9.0 installed to the MMC, and with a single, unpartitioned 1TB NVMe drive attached at /dev/nvme0n1:

machine:
  # ...elided...
  disks:
    - device: /dev/nvme0n1
      partitions:
        - mountpoint: /var/lib/longhorn
          size: 800GB
  # ...elided...
  kubelet:
    extraMounts:
      - destination: /var/lib/longhorn # Destination is the absolute path where the mount will be placed in the container.
        type: bind # Type specifies the mount kind.
        source: /var/lib/longhorn # Source specifies the source path of the mount.
        # Options are fstab style mount options.
        options:
          - rbind
          - rshared
          - rw
  # ...elided...
---
apiVersion: v1alpha1
kind: VolumeConfig
name: EPHEMERAL
provisioning:
  diskSelector:
    match: disk.transport == "nvme"
  maxSize: 128GB
  grow: false

This results in the following volume configuration:

$ talosctl get volumestatus
NODE         NAMESPACE   TYPE           ID               VERSION   PHASE   LOCATION         SIZE
turingpi-1   runtime     VolumeStatus   /dev/nvme0n1-1   2         ready   /dev/nvme0n1p2   800 GB
turingpi-1   runtime     VolumeStatus   EPHEMERAL        1         ready   /dev/nvme0n1p1   128 GB
turingpi-1   runtime     VolumeStatus   META             2         ready   /dev/mmcblk0p4   1.0 MB
turingpi-1   runtime     VolumeStatus   STATE            3         ready   /dev/mmcblk0p5   105 MB

$ talosctl mounts | egrep 'NODE|nvme'
NODE         FILESYSTEM       SIZE(GB)   USED(GB)   AVAILABLE(GB)   PERCENT USED   MOUNTED ON
turingpi-1   /dev/nvme0n1p1   127.93     5.44       122.50          4.25%          /var
turingpi-1   /dev/nvme0n1p2   871.78     16.76      855.01          1.92%          /var/lib/longhorn

I was a little surprised that the EPHEMERAL volume is partitioned before the machine disk, but it's fine. I also tested with multiple machine.disks[].partitions[], and this works much the same.

Longhorn is happily running atop the /var/lib/longhorn mount.

isometry · 2024-12-27T12:58:29Z

Warning: following my previously described configuration, local storage is broken by an upgrade (to v1.9.1) :-/

$ talosctl -n 10.0.88.11 get volumestatus
NODE         NAMESPACE   TYPE           ID               VERSION   PHASE   LOCATION         SIZE
turingpi-1   runtime     VolumeStatus   /dev/nvme0n1-1   1         ready   /dev/nvme0n1p1   128 GB
turingpi-1   runtime     VolumeStatus   EPHEMERAL        1         ready   /dev/nvme0n1p1   128 GB
turingpi-1   runtime     VolumeStatus   META             2         ready   /dev/mmcblk0p4   1.0 MB
turingpi-1   runtime     VolumeStatus   STATE            2         ready   /dev/mmcblk0p5   105 MB

I suspect that it is necessary to pre-partition the NVMe for all machine.disks[].partitions[] before applying any VolumeConfig patches to avoid this... Will try this now.

UPDATE: Confirmed. Pre-creating the appropriate machine.disks partitions, and things work as I expect.

I personally did this via:

talosctl reset;
re-imaging v1.9.0 without the VolumeConfig patch but with a size set on the machine.disks partition;
talosctl reset --wipe-mode system-disk;
re-imaging v1.9.0 with the VolumeConfig patch;
upgrading to v1.9.1 to confirm that the previous regression was addressed.

smira · 2024-12-27T14:47:45Z

User disks .machine.disks are not compatible with random placement of EPHEMERAL partition. They will eventually be deprecated and replace with a better mechanism.

Blarc · 2024-12-28T18:40:27Z

I might be a bit too early, and this might not be implemented yet, or I completely miss understood how this works, but I tried following to what @isometry wrote:

Did you have any joy with this approach @isometry? Would love to see an example if so!

Yes, I've (only just) finished testing this successfully.

I applied the following configuration snippets to a freshly reset Turing RK1 with Talos v1.9.0 installed to the MMC, and with a single, unpartitioned 1TB NVMe drive attached at /dev/nvme0n1:

machine:
  # ...elided...
  disks:
    - device: /dev/nvme0n1
      partitions:
        - mountpoint: /var/lib/longhorn
          size: 800GB
  # ...elided...
  kubelet:
    extraMounts:
      - destination: /var/lib/longhorn # Destination is the absolute path where the mount will be placed in the container.
        type: bind # Type specifies the mount kind.
        source: /var/lib/longhorn # Source specifies the source path of the mount.
        # Options are fstab style mount options.
        options:
          - rbind
          - rshared
          - rw
  # ...elided...
---
apiVersion: v1alpha1
kind: VolumeConfig
name: EPHEMERAL
provisioning:
  diskSelector:
    match: disk.transport == "nvme"
  maxSize: 128GB
  grow: false

This results in the following volume configuration:

$ talosctl get volumestatus
NODE         NAMESPACE   TYPE           ID               VERSION   PHASE   LOCATION         SIZE
turingpi-1   runtime     VolumeStatus   /dev/nvme0n1-1   2         ready   /dev/nvme0n1p2   800 GB
turingpi-1   runtime     VolumeStatus   EPHEMERAL        1         ready   /dev/nvme0n1p1   128 GB
turingpi-1   runtime     VolumeStatus   META             2         ready   /dev/mmcblk0p4   1.0 MB
turingpi-1   runtime     VolumeStatus   STATE            3         ready   /dev/mmcblk0p5   105 MB

$ talosctl mounts | egrep 'NODE|nvme'
NODE         FILESYSTEM       SIZE(GB)   USED(GB)   AVAILABLE(GB)   PERCENT USED   MOUNTED ON
turingpi-1   /dev/nvme0n1p1   127.93     5.44       122.50          4.25%          /var
turingpi-1   /dev/nvme0n1p2   871.78     16.76      855.01          1.92%          /var/lib/longhorn

I was a little surprised that the EPHEMERAL volume is partitioned before the machine disk, but it's fine. I also tested with multiple machine.disks[].partitions[], and this works much the same.

Longhorn is happily running atop the /var/lib/longhorn mount.

But I always get:

192.168.1.16: user: warning: [2024-12-28T18:25:29.291534284Z]: [talos] volume status {"component": "controller-runtime", "controller": "block.VolumeManagerController", "volume": "/dev/sda-1", "phase": "failed -> failed", "error": "filesystem type mismatch: vfat != xfs", "location": "/dev/sda1", "parentLocation": "/dev/sda"}

The EPHEMERAL is correctly sized to 100Gb, but the following machine mount does not work:

machine:
  disks:
      - device: /dev/sda
        partitions:
          - mountpoint: /var/local
            size: 400GB

isometry · 2024-12-28T21:20:21Z

I might be a bit too early, and this might not be implemented yet, or I completely miss understood how this works, but I tried following to what @isometry wrote:
...
The EPHEMERAL is correctly sized to 100Gb, but the following machine mount does not work:
...

@Blarc check my follow-up warning. Ensure that your target disk has a GPT label and a single 400GB partition (which will be used for your machine.disks configuration) before you apply the initial Talos configuration (I personally achieved this by using a debug container on an already initialised node, then resetting just system disks). During initialisation, the VolumeConfig doc should create a second EPHEMERAL partition according to your spec. Note that you'll need to fully wipe/reset the system first to ensure the existing EPHEMERAL partition is removed; you do not want it as the first partition on your disk as that will collide with your machine.disks config when you upgrade.

Blarc · 2024-12-29T08:30:58Z

I might be a bit too early, and this might not be implemented yet, or I completely miss understood how this works, but I tried following to what @isometry wrote:
...
The EPHEMERAL is correctly sized to 100Gb, but the following machine mount does not work:
...

@Blarc check my follow-up warning. Ensure that your target disk has a GPT label and a single 400GB partition (which will be used for your machine.disks configuration) before you apply the initial Talos configuration (I personally achieved this by using a debug container on an already initialised node, then resetting just system disks). During initialisation, the VolumeConfig doc should create a second EPHEMERAL partition according to your spec. Note that you'll need to fully wipe/reset the system first to ensure the existing EPHEMERAL partition is removed; you do not want it as the first partition on your disk as that will collide with your machine.disks config when you upgrade.

Is there any chance you could provide an example how to ensure disk has GPT label?

isometry · 2024-12-30T21:01:50Z

Is there any chance you could provide an example how to ensure disk has GPT label?

I don't think GPT or not matters. Just make sure that your disk contains a single 400GB partition before you apply your configuration (most easily done by resetting the host, applying just the machine.disks configuration, then only resetting the system-disk, and applying with the VolumeConig doc next time).

rserbitar · 2025-01-01T10:41:02Z

Hi úSo is there any documentation (except the template in the .yaml config file) on how to use the machine.disks and how to interact with default EPHEMERAL? Or is this currently in testing state anyways? Setting Talos up in a way @isometry recommends (meaning requiring a certain order of commands to initialize) is kind of exactly what Talos wants to avoid, right?

Also, if I if set the machine.disks like this:

    disks:
        - device: /dev/nvme0n1 # The name of the disk to use.
          # A list of partitions to create on the disk.
          partitions:
            - mountpoint: /var/lib/longhorn # Where to mount the partition.
              
              # # The size of partition: either bytes or human readable representation. If `size:` is omitted, the partition is sized to occupy the full disk.

              # # Human readable representation.
              size: 400GB
              # # Precise value in bytes.
              # size: 1073741824

, I get the following:

NODE            NAMESPACE   TYPE           ID               VERSION   PHASE    LOCATION         SIZE
192.168.2.136   runtime     VolumeStatus   /dev/nvme0n1-1   1         failed   /dev/nvme0n1p1   105 MB
192.168.2.136   runtime     VolumeStatus   EPHEMERAL        1         ready    /dev/nvme0n1p6   511 GB
192.168.2.136   runtime     VolumeStatus   META             2         ready    /dev/nvme0n1p4   1.0 MB
192.168.2.136   runtime     VolumeStatus   STATE            2         ready    /dev/nvme0n1p5   105 MB

(not the proper size and phase is failed).

Same result if I set the EPHEMERAL volume to e.g. 40gb using volume config entry.

isometry · 2025-01-11T13:05:35Z

@rserbitar "quite". From what I can tell, the machine disks mechanism is strictly not currently appropriate for trying to create or manage non-system partitions on the system disk. The longhorn/ephemeral on NVMe configuration I found to work is sub-optimal, as you say, in that it requires multiple configuration passes, and only works at all when you have (at least) a second disk for machine disks (I'm using the Turing RK1 embedded MMC for Talos system/meta/state, on the understanding that these should be almost read-only during normal operation).

This was referenced Feb 26, 2024

Release 1.7.0 Checklist #8010

Closed

v2: Talos Volume support siderolabs/go-blockdevice#78

Closed

smira self-assigned this Feb 26, 2024

This was referenced Feb 29, 2024

SystemDisk and DATA partition #4041

Closed

feat: implement blockdevice watch controller #8394

Merged

smira mentioned this issue Mar 12, 2024

Ability to run Talos in ram without installing it to disk, ephemeral style. #8426

Open

smira mentioned this issue Mar 22, 2024

Talos 1.8 Release Checklist ✔️ #8484

Closed

smira mentioned this issue Apr 4, 2024

Add upgrade option "--preserve" to configuration #7889

Closed

smira mentioned this issue Apr 15, 2024

MachineConfig.disks nofail and other options #4594

Open

utkuozdemir mentioned this issue May 6, 2024

[feature] Omni should allow wiping user disks when resetting nodes siderolabs/omni#211

Open

rsmitty mentioned this issue May 22, 2024

[feature] support "bare install" from ISO/pxe siderolabs/omni#268

Open

smira mentioned this issue Jun 7, 2024

feat: provide disk detection based on new blockdevices #8876

Merged

smira added a commit to smira/talos that referenced this issue Jun 12, 2024

feat: support volume configuration, provisioning, etc.

f9bc55d

This is early WIP. See siderolabs#8367 Signed-off-by: Andrey Smirnov <[email protected]>

smira mentioned this issue Jun 12, 2024

feat: support volume configuration, provisioning, etc. #8901

Merged

smira added a commit to smira/talos that referenced this issue Jul 8, 2024

feat: support volume configuration, provisioning, etc.

e09fea0

This is early WIP. See siderolabs#8367 Signed-off-by: Andrey Smirnov <[email protected]>

smira mentioned this issue Jul 9, 2024

Feature request: better disk management #6325

Closed

smira added a commit to smira/talos that referenced this issue Jul 9, 2024

feat: support volume configuration, provisioning, etc

4d3cd8c

This is early WIP. See siderolabs#8367 Signed-off-by: Andrey Smirnov <[email protected]>

smira mentioned this issue Jul 12, 2024

Truly ephemeral local node install #4824

Closed

smira mentioned this issue Nov 25, 2024

How to delete data in/var/mnt/storage? #9790

Closed

smira mentioned this issue Dec 9, 2024

Talos 1.10 Release Checklist #9899

Open

smira added this to Product Jan 17, 2025

smira moved this to Designed in Product Jan 17, 2025

Volume Management #8367

Volume Management #8367

Comments

smira commented Feb 26, 2024 • edited Loading

Problem Statement

Groundwork

Installation Process

Volumes

System Volumes

Volume Lifecycle

Volume Configuration

Upgrades and Wiping

Notes

Tasks

smira commented Mar 12, 2024

runningman84 commented May 9, 2024

andrewrynhard commented May 9, 2024

runningman84 commented May 9, 2024 • edited Loading

bplein commented May 9, 2024

PeterFalken commented May 9, 2024

runningman84 commented May 10, 2024

cortopy commented May 18, 2024

smira commented May 20, 2024

laibe commented May 28, 2024

chr0n1x commented Jul 8, 2024

PrivatePuffin commented Jul 13, 2024

threeseed commented Nov 19, 2024

PrivatePuffin commented Nov 22, 2024 • edited Loading

threeseed commented Nov 22, 2024

PrivatePuffin commented Nov 22, 2024

steverfrancis commented Nov 22, 2024

0dragosh commented Nov 22, 2024

absolutejam commented Nov 28, 2024

maxpain commented Dec 1, 2024 • edited Loading

maxpain commented Dec 3, 2024

smira commented Dec 3, 2024

maxpain commented Dec 15, 2024

davralin commented Dec 15, 2024 • edited Loading

seang96 commented Dec 15, 2024

maxpain commented Dec 15, 2024

halittiryaki commented Dec 15, 2024

halittiryaki commented Dec 15, 2024

maxpain commented Dec 15, 2024 • edited Loading

isometry commented Dec 19, 2024

isometry commented Dec 27, 2024 • edited Loading

smira commented Dec 27, 2024

Blarc commented Dec 28, 2024 • edited Loading

isometry commented Dec 28, 2024

Blarc commented Dec 29, 2024

isometry commented Dec 30, 2024

rserbitar commented Jan 1, 2025 • edited Loading

isometry commented Jan 11, 2025

smira commented Feb 26, 2024 •

edited

Loading

runningman84 commented May 9, 2024 •

edited

Loading

PrivatePuffin commented Nov 22, 2024 •

edited

Loading

maxpain commented Dec 1, 2024 •

edited

Loading

davralin commented Dec 15, 2024 •

edited

Loading

maxpain commented Dec 15, 2024 •

edited

Loading

isometry commented Dec 27, 2024 •

edited

Loading

Blarc commented Dec 28, 2024 •

edited

Loading

rserbitar commented Jan 1, 2025 •

edited

Loading