The terraform-virtualbox-talos module could be used to build a Talos based, fully compliant, Kubernetes cluster, using Oracle VirtualBox, without any manual intervention; just provide the VM names and Talos version, or customize the default options to match your needs, and in a few minutes start using your Kubernetes cluster, which could be used as a Cluster API (CAPI) Bootstrap provider, or a GitOps bootstrap cluster, to just name a few use cases.
When using the module, only the following options don't have default values, and must be provided:
-
Control plane VM names (at least one)
-
Talos version
If you encounter any failures during the build or teardown, it's completely safe to run the terraform apply/destroy command again, as steps are designed and built to be idempotent.
- Terraform 0.13.x, or greater
- VirtualBox 5.x, or greater
- curl, or wget for downloading Talos components
- jq for JSON processing
Please see Terraform Registry for full documentation of all the input and output variables.
module "talos" {
source = "masoudbahar/talos/virtualbox"
version = ">= 0.1.0"
controlplane_nodes = ["kube"]
talos_version = "v0.8.0"
}
Please see examples section below, for more customizations and various use cases.
A current complication, when initializing a Talos VM on VirtualBox or QEMU is not knowing the IP address of the VM in advance, which is needed for Talos instance configuration. The Talos team has created a clever solution, called maintenance mode, which pauses the installation, as soon as the IP is assigned, so, the user could use the Talos CLI tool to generate init.yaml, for the first control plane node, controlplane.yaml, for other control plane nodes, if creating a HA Kubernetes cluster, and join.yaml, for cluster's worker nodes, and then apply the configurations to the waiting VMs, one by one.
This module builds on top of Talos maintenance mode, and eliminates the manual intervention steps by:
Generating the VirtualBox VMs specs
- VirtualBox VBoxManage commands, wrapped in a Terraform template, are used to build VMs with sensible specs, primarily generated by Terraform, such as the MAC Address of the network interfaces, needed for finding the VM IPs.
- Control plane and worker nodes can have different VM specs (i.e. CPU, RAM and disk), if there is a need to alter the default specs.
-
The first one (VirtualBox: nic1, Talos: eth0), as a bridge interface, which has access to the Internet, and makes the cluster accessible on the local network.
-
The second one (VirtualBox: nic2, Talos: eth1), as a hostonly interface, which receives its IP, from a VirtualBox DHCP server, which is built by this module, if nonexistent, and can be queried programmatically, to obtain the IP address needed for generation of the Talos node configuration, and triggering the build.
-
The third one, which is optional, but highly recommended for HA clusters, as a NAT interface, which is configured in this module to proxy DNS queries through the host (i.e. your laptop/desktop), giving access to static entries and nameservers stored in the /etc/hosts. This should be rarely needed though, and if used will be configured as the first interface, making the bridge and hostonly interfaces the second and third.
A notable benefit of this configurations is that MetalLB's layer 2 configuration could be used with a pool from the hostonly network, and another one from the local network, for different use cases.
Configuring a dedicated hostonly network
- While there could be existing VirtualBox hostonly networks present, to ensure proper DHCP setup and IP address availability, a dedicated hostonly network (default: 172.27.0.0/16) will be setup, and used for all Talos clusters. A DHCP server will be configured for this network as well.
- Terraform templates are used to create sensible and flexible, yet opinionated, YAML configuration files, for each Talos VM. Talos uses Ed25519 keys for server administration (machine and admin certificates), and RSA 4096 keys for Kubernetes and etcd. Since Terraform still doesn't support Ed25519 key format, Talos CLI is used for generation of certificates.
- A Terraform template is also used for generation of talosconfig, which is used for managing the cluster VMs.
There are four timers, which could be used to fine tune cluster build:
- There is a fixed timer, which creates a 3 seconds delay between configuring and starting each VirtualBox VM. Its main purpose is to avoid overwhelming VirtualBox CLI (VBoxManage), as Terraform runs operations in parallel.
- There's a user controlled timer (ip_assignment_wait) for controlling how long VirtualBox should wait for Talos VMs to enter maintenance mode, before it queries the hostonly IP of the VM, which is needed for generating Talos configurations; the default is 20s.
- There's another user controlled timer (os_install_wait) for controlling how long the execution should pause, before updating kubeconfig and wrapping up the build; the default is 4m.
- The last user controlled timer (apply_config_wait) could be useful for users, which want to space out Talos nodes installation and configuration; the default is 0s. While Terraform's parallel execution is possible, doing so would impact the entire build, as opposed to just certain steps, therefore, it's not used.
- After the cluster is built, Talos CLI is used to update the kubeconfig (~/.kube/config), so, the Kubernetes cluster could be used right away.
- One of the configuration options, which is off by default, could be used to allow workload scheduling on control plane nodes. This is particularly useful, when building single node Kubernetes clusters.
- If not present, Talos ISO and CLI tool are downloaded for the selected Talos version; so, the user doesn't have to have them available, when using the module.
- Optionally, if user password is provided, static entries, for the Kubernetes cluster, and Talos VMs, are added to /etc/hosts, which makes everything just work, as soon as the build is done (please see caveats for further details).
- Issuing terraform destroy will tear down the VMs, cleans the kubeconfig, removes static entries from /etc/hosts, if added, and could optionally remove the hostonly network and its dedicated DHCP server. It's worth noting that by default the hostonly network configuration is preserved, as it's used for all Talos clusters built using this module.
The module could be used to build a single or multi node Kubernetes cluster. The cluster can either have 1 or 3 control plane nodes, and 0 or more worker nodes. Nevertheless, for proper performance, users are encouraged to use specs (e.g. RAM, CPU, number of nodes) that the host, most likely a laptop, could reasonably support. The users should also experiment with the timers and set them based on the performance of their network and desktop.
-
Single node cluster, with workload scheduling enabled
-
Simple cluster, with one control plane and two worker nodes
-
HA cluster, with three control plane and one worker nodes (please see caveats)
VM names used in the examples are native names of some of the Islands of the Canadian Arctic Archipelago.
The module relies on a number of bash/zsh shell scripts to tie together Terraform, Talos and VirtualBox, and as such, is compatible with Linux and macOS platforms. However, it's been extensively tested on macOS, using the stock bash, the latest version of bash, zsh, and sh shells to ensure proper compatibility with those shells. If you encounter any issues on Linux, please open an issue, or contribute a pull request.
- The templates used for VM configurations do not support image registry mirrors. They also don't support adding custom files (e.g. a custom CA) to VMs filesystem.
- Merging the generated talosconfig with the default Talos configuration file (~/.talos/config) doesn't work, from within Terraform; so, after the installation, the user can use talosctl config merge /pathTo/talosconfig (please replace pathTo with actual path) command to merge them.
- While providing the user password, needed for adding/removing static entries to/from the /etc/hosts file, would ensure the cluster can be used right away, Terraform doesn't allow it to be marked as sensitive, due to its serialization requirements. This means the password could be seen in the module execution logs, in addition to Terraform State file. If password is not provided, please make sure to update the /etc/hosts file manually, as kubeconfig is configured with cluster URL.
- Building HA clusters is prone to etcd cluster initiation errors, which are probably caused by running in a constrained environment from a resource and networking perspectives; so, it's not recommended.
Contributions to this repo are very welcome. If you find a bug or want to improve the module, please open an issue, a pull request, or provide feedback.