You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The errors continue, however I can connect and apply a config.... after which there are more errors along the lines of
[talos] 2024/12/15 14:18:57 service[trustd](Waiting): Error running Containerd(trustd), going to restart forever: failed to create task: "trustd": failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: can't get final child's PID from pipe: EOF: unknown
[talos] 2024/12/15 14:18:57 service[apid](Waiting): Error running Containerd(apid), going to restart forever: failed to create task: "apid": failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: can't get final child's PID from pipe: EOF: unknown
Making a simple re-production of what seems to be the core problem, interestingly it works under Docker Desktop (also using apple virtualization framework):
Checking the podman machine (backing vm) points to the cause:
m00m00:talos damian$ podman machine ssh
Connecting to vm podman-machine-default. To close connection, use `~.` or `exit`
Fedora CoreOS 39.20240322.2.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/tag/coreos
Last login: Sun Dec 15 14:37:54 2024 from 192.168.127.1
core@localhost:~$ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2980a10f025e docker.io/library/ubuntu:latest bash 9 seconds ago Up 9 seconds musing_ramanujan
core@localhost:~$ sudo podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
core@localhost:~$
Apparently we're not running as root, so --privileged isn't doing too much!
Checking the machine config.... and then podman-machine-init.1, this is the default behaviour on creation.
Solution
Switching the machine config to run as root:
m00m00:talos damian$ podman machine stop podman-machine-default
Machine "podman-machine-default" stopped successfully
m00m00:talos damian$ podman machine set --rootful=true podman-machine-default
m00m00:talos damian$ podman machine start
Starting machine "podman-machine-default"
API forwarding listening on: /var/run/docker.sock
Docker API clients default to this address. You do not need to set DOCKER_HOST.
Machine "podman-machine-default" started successfully
Allows everything to work as expected:
m00m00:talos damian$ podman run --rm -ti --name talos-dev --hostname talos-dev --read-only --privileged --security-opt seccomp=unconfined --mount type=tmpfs,destination=/run --mount type=tmpfs,destination=/system --mount type=tmpfs,destination=/tmp --mount type=volume,destination=/system/state --mount type=volume,destination=/var --mount type=volume,destination=/etc/cni --mount type=volume,destination=/etc/kubernetes --mount type=volume,destination=/usr/libexec/kubernetes --mount type=volume,destination=/opt -e PLATFORM=container -p 50000:50000 -p 50001:50001 -p 6443:6443 ghcr.io/siderolabs/talos:v1.8.4
Trying to pull ghcr.io/siderolabs/talos:v1.8.4...
Getting image source signatures
Copying blob sha256:2046317f30354cab0f464d89c2bd0a9a220a95bc290fd86947b6b7bb6a5d6d9d
Copying config sha256:025f2cdd84d5bc5f56c2618924187a9c32236cc04fb8734c570ea3228226921d
Writing manifest to image destination
2024/12/15 14:45:56 limited GOMAXPROCS to 4
[talos] 2024/12/15 14:45:56 initialize sequence: 4 phase(s)
[talos] 2024/12/15 14:45:56 phase systemRequirements (1/4): 2 tasks(s)
[talos] 2024/12/15 14:45:56 task initVolumeLifecycle (2/2): starting
[talos] 2024/12/15 14:45:56 task setupSystemDirectory (1/2): starting
[talos] 2024/12/15 14:45:56 task setupSystemDirectory (1/2): done, 240.751µs
[talos] 2024/12/15 14:45:56 task initVolumeLifecycle (2/2): done, 72.875µs
[talos] 2024/12/15 14:45:56 phase systemRequirements (1/4): done, 335.335µs
[talos] 2024/12/15 14:45:56 phase etc (2/4): 3 tasks(s)
[talos] 2024/12/15 14:45:56 task setUserEnvVars (3/3): starting
[talos] 2024/12/15 14:45:56 task setUserEnvVars (3/3): done, 5.75µs
[talos] 2024/12/15 14:45:56 task createOSReleaseFile (2/3): starting
[talos] 2024/12/15 14:45:56 task CreateSystemCgroups (1/3): starting
[talos] task CreateSystemCgroups (1/3): 2024/12/15 14:45:56 using cgroups root: /
[talos] 2024/12/15 14:45:56 task createOSReleaseFile (2/3): done, 138µs
[talos] 2024/12/15 14:45:56 pre-created iptables-nft table 'mangle'/'KUBE-IPTABLES-HINT' {"component": "controller-runtime", "controller": "network.NfTablesChainController"}
[talos] 2024/12/15 14:45:56 nftables chains updated {"component": "controller-runtime", "controller": "network.NfTablesChainController", "chains": []}
[talos] 2024/12/15 14:45:56 node identity established {"component": "controller-runtime", "controller": "cluster.NodeIdentityController", "node_id": "QPuJpXfV0InHM2BuRO1eeING3dVC2eXmaSeBH3xZjQKD"}
[talos] 2024/12/15 14:45:56 task CreateSystemCgroups (1/3): done, 5.850726ms
[talos] 2024/12/15 14:45:56 phase etc (2/4): done, 5.922893ms
[talos] 2024/12/15 14:45:56 phase machined (3/4): 2 tasks(s)
[talos] 2024/12/15 14:45:56 task startContainerd (2/2): starting
[talos] 2024/12/15 14:45:56 service[containerd](Starting): Starting service
[talos] 2024/12/15 14:45:56 service[containerd](Preparing): Running pre state
[talos] 2024/12/15 14:45:56 service[containerd](Preparing): Creating service runner
[talos] 2024/12/15 14:45:56 task startMachined (1/2): starting
[talos] 2024/12/15 14:45:56 TPM device is not available, skipping PCR extension
[talos] 2024/12/15 14:45:56 service[machined](Starting): Starting service
[talos] 2024/12/15 14:45:56 service[machined](Preparing): Running pre state
[talos] 2024/12/15 14:45:56 service[machined](Preparing): Creating service runner
[talos] 2024/12/15 14:45:56 service[containerd](Running): Process Process(["/bin/containerd" "--address" "/system/run/containerd/containerd.sock" "--state" "/system/run/containerd" "--root" "/system/var/lib/containerd"]) started with PID 10
[talos] 2024/12/15 14:45:56 service[machined](Running): Service started as goroutine
[talos] 2024/12/15 14:45:56 setting time servers {"component": "controller-runtime", "controller": "network.TimeServerSpecController", "addresses": ["time.cloudflare.com"]}
[talos] 2024/12/15 14:45:56 setting time servers {"component": "controller-runtime", "controller": "network.TimeServerSpecController", "addresses": ["time.cloudflare.com"]}
[talos] 2024/12/15 14:45:56 setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["192.168.127.1"]}
[talos] 2024/12/15 14:45:56 setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["192.168.127.1"]}
[talos] 2024/12/15 14:45:57 service[machined](Running): Health check successful
[talos] 2024/12/15 14:45:57 task startMachined (1/2): done, 1.001872322s
[talos] 2024/12/15 14:45:57 service[containerd](Running): Health check successful
[talos] 2024/12/15 14:45:57 task startContainerd (2/2): done, 1.006667169s
[talos] 2024/12/15 14:45:57 phase machined (3/4): done, 1.006695212s
[talos] 2024/12/15 14:45:57 phase config (4/4): 1 tasks(s)
[talos] 2024/12/15 14:45:57 task loadConfig (1/1): starting
[talos] 2024/12/15 14:45:57 downloading config {"component": "controller-runtime", "controller": "config.AcquireController", "platform": "container"}
[talos] 2024/12/15 14:45:57 fetching machine config from: USERDATA environment variable
[talos] 2024/12/15 14:45:57 entering maintenance service {"component": "controller-runtime", "controller": "config.AcquireController"}
[talos] 2024/12/15 14:45:57 this machine is reachable at: {"component": "controller-runtime", "controller": "runtime.MaintenanceServiceController"}
[talos] 2024/12/15 14:45:57 10.88.0.2 {"component": "controller-runtime", "controller": "runtime.MaintenanceServiceController"}
[talos] 2024/12/15 14:45:57 server certificate issued {"component": "controller-runtime", "controller": "runtime.MaintenanceServiceController", "fingerprint": "/gNom5EfdIzD0/BzDVuK54gVsUNnWjUpR0BJ0NPwMLw="}
[talos] 2024/12/15 14:45:57 upload configuration using talosctl: {"component": "controller-runtime", "controller": "runtime.MaintenanceServiceController"}
[talos] 2024/12/15 14:45:57 talosctl apply-config --insecure --nodes 10.88.0.2 --file <config.yaml> {"component": "controller-runtime", "controller": "runtime.MaintenanceServiceController"}
[talos] 2024/12/15 14:45:57 or apply configuration using talosctl interactive installer: {"component": "controller-runtime", "controller": "runtime.MaintenanceServiceController"}
[talos] 2024/12/15 14:45:57 talosctl apply-config --insecure --nodes 10.88.0.2 --mode=interactive {"component": "controller-runtime", "controller": "runtime.MaintenanceServiceController"}
[talos] 2024/12/15 14:45:57 optionally with node fingerprint check: {"component": "controller-runtime", "controller": "runtime.MaintenanceServiceController"}
[talos] 2024/12/15 14:45:57 talosctl apply-config --insecure --nodes 10.88.0.2 --cert-fingerprint '/gNom5EfdIzD0/BzDVuK54gVsUNnWjUpR0BJ0NPwMLw=' --file <config.yaml> {"component": "controller-runtime", "controller": "runtime.MaintenanceServiceController"}
2024/12/15 14:46:15 OK [/machine.MachineService/ApplyConfiguration] 1.373208ms unary Success (:authority=127.0.0.1:50000;content-type=application/grpc;grpc-accept-encoding=gzip;runtime=Talos;user-agent=grpc-go/1.66.3)
[talos] 2024/12/15 14:46:15 assigned address {"component": "controller-runtime", "controller": "network.AddressSpecController", "address": "169.254.116.108/32", "link": "lo"}
[talos] 2024/12/15 14:46:15 kubeprism KubePrism is enabled {"component": "controller-runtime", "controller": "k8s.KubePrismController", "endpoint": "127.0.0.1:7445"}
[talos] 2024/12/15 14:46:15 leaving maintenance service {"component": "controller-runtime", "controller": "config.AcquireController"}
[talos] 2024/12/15 14:46:15 machine config loaded successfully {"component": "controller-runtime", "controller": "config.AcquireController", "sources": ["maintenance"]}
[talos] 2024/12/15 14:46:15 task loadConfig (1/1): done, 18.235988602s
[talos] 2024/12/15 14:46:15 phase config (4/4): done, 18.236003768s
[talos] 2024/12/15 14:46:15 initialize sequence: done: 19.249005207s
[talos] 2024/12/15 14:46:15 install sequence: 0 phase(s)
[talos] 2024/12/15 14:46:15 install sequence: done: 1.458µs
[talos] 2024/12/15 14:46:15 service[apid](Starting): Starting service
[talos] 2024/12/15 14:46:15 service[apid](Waiting): Waiting for service "containerd" to be "up", api certificates
[talos] 2024/12/15 14:46:15 created dns upstream {"component": "controller-runtime", "controller": "network.DNSUpstreamController", "addr": "192.168.127.1", "idx": 0}
[talos] 2024/12/15 14:46:15 updated dns server nameservers {"component": "dns-resolve-cache", "addrs": ["192.168.127.1:53"]}
[talos] 2024/12/15 14:46:15 boot sequence: 10 phase(s)
[talos] 2024/12/15 14:46:15 phase saveConfig (1/10): 1 tasks(s)
[talos] 2024/12/15 14:46:15 service[apid](Preparing): Running pre state
[talos] 2024/12/15 14:46:15 service[apid](Preparing): Creating service runner
[talos] 2024/12/15 14:46:15 service[kubelet](Starting): Starting service
[talos] 2024/12/15 14:46:15 service[kubelet](Waiting): Waiting for service "cri" to be "up", time sync, network
[talos] 2024/12/15 14:46:15 task saveConfig (1/1): starting
[talos] 2024/12/15 14:46:15 task saveConfig (1/1): done, 100.208µs
[talos] 2024/12/15 14:46:15 phase saveConfig (1/10): done, 2.702875ms
[talos] 2024/12/15 14:46:15 phase memorySizeCheck (2/10): 1 tasks(s)
[talos] 2024/12/15 14:46:15 task memorySizeCheck (1/1): starting
[talos] task memorySizeCheck (1/1): 2024/12/15 14:46:15 skipping memory size check in the container
[talos] 2024/12/15 14:46:15 task memorySizeCheck (1/1): done, 8.458µs
[talos] 2024/12/15 14:46:15 phase memorySizeCheck (2/10): done, 18.708µs
[talos] 2024/12/15 14:46:15 phase diskSizeCheck (3/10): 1 tasks(s)
[talos] 2024/12/15 14:46:15 task diskSizeCheck (1/1): starting
[talos] task diskSizeCheck (1/1): 2024/12/15 14:46:15 skipping disk size check in the container
[talos] 2024/12/15 14:46:15 task diskSizeCheck (1/1): done, 3.709µs
[talos] 2024/12/15 14:46:15 phase diskSizeCheck (3/10): done, 9.125µs
[talos] 2024/12/15 14:46:15 phase env (4/10): 2 tasks(s)
[talos] 2024/12/15 14:46:15 task waitForCARoots (2/2): starting
[talos] 2024/12/15 14:46:15 task setUserEnvVars (1/2): starting
[talos] 2024/12/15 14:46:15 task setUserEnvVars (1/2): done, 12.541µs
[talos] 2024/12/15 14:46:15 task waitForCARoots (2/2): done, 32.916µs
[talos] 2024/12/15 14:46:15 phase env (4/10): done, 41.792µs
[talos] 2024/12/15 14:46:15 phase dbus (5/10): 1 tasks(s)
[talos] 2024/12/15 14:46:15 task startDBus (1/1): starting
[talos] 2024/12/15 14:46:15 task startDBus (1/1): done, 894.292µs
[talos] 2024/12/15 14:46:15 phase dbus (5/10): done, 912.334µs
[talos] 2024/12/15 14:46:15 phase sharedFilesystems (6/10): 1 tasks(s)
[talos] 2024/12/15 14:46:15 task setupSharedFilesystems (1/1): starting
[talos] 2024/12/15 14:46:15 task setupSharedFilesystems (1/1): done, 18.25µs
[talos] 2024/12/15 14:46:15 phase sharedFilesystems (6/10): done, 40.583µs
[talos] 2024/12/15 14:46:15 phase var (7/10): 1 tasks(s)
[talos] 2024/12/15 14:46:15 task setupVarDirectory (1/1): starting
[talos] 2024/12/15 14:46:15 task setupVarDirectory (1/1): done, 161.292µs
[talos] 2024/12/15 14:46:15 phase var (7/10): done, 176.125µs
[talos] 2024/12/15 14:46:15 phase userSetup (8/10): 1 tasks(s)
[talos] 2024/12/15 14:46:15 task writeUserFiles (1/1): starting
[talos] 2024/12/15 14:46:15 task writeUserFiles (1/1): done, 4.375µs
[talos] 2024/12/15 14:46:15 phase userSetup (8/10): done, 10.208µs
[talos] 2024/12/15 14:46:15 phase extendPCRStartAll (9/10): 1 tasks(s)
[talos] 2024/12/15 14:46:15 task extendPCRStartAll (1/1): starting
[talos] 2024/12/15 14:46:15 TPM device is not available, skipping PCR extension
[talos] 2024/12/15 14:46:15 task extendPCRStartAll (1/1): done, 17.125µs
[talos] 2024/12/15 14:46:15 phase extendPCRStartAll (9/10): done, 25.625µs
[talos] 2024/12/15 14:46:15 phase startEverything (10/10): 1 tasks(s)
[talos] 2024/12/15 14:46:15 task startAllServices (1/1): starting
[talos] 2024/12/15 14:46:15 service[cri](Starting): Starting service
[talos] 2024/12/15 14:46:15 service[cri](Waiting): Waiting for network
[talos] 2024/12/15 14:46:15 service[cri](Preparing): Running pre state
[talos] 2024/12/15 14:46:15 service[cri](Preparing): Creating service runner
[talos] 2024/12/15 14:46:15 service[trustd](Starting): Starting service
[talos] 2024/12/15 14:46:15 service[trustd](Waiting): Waiting for service "containerd" to be "up", time sync, network
[talos] 2024/12/15 14:46:15 service[etcd](Starting): Starting service
[talos] 2024/12/15 14:46:15 service[etcd](Waiting): Waiting for service "cri" to be "up", time sync, network, etcd spec
[talos] task startAllServices (1/1): 2024/12/15 14:46:15 waiting for 7 services
[talos] task startAllServices (1/1): 2024/12/15 14:46:15 service "apid" to be "up", service "containerd" to be "up", service "cri" to be "up", service "etcd" to be "up", service "kubelet" to be "up", service "machined" to be "up", service "trustd" to be "up"
[talos] 2024/12/15 14:46:15 service[trustd](Preparing): Running pre state
[talos] 2024/12/15 14:46:15 service[cri](Running): Process Process(["/bin/containerd" "--address" "/run/containerd/containerd.sock" "--config" "/etc/cri/containerd.toml"]) started with PID 30
[talos] 2024/12/15 14:46:15 service[trustd](Preparing): Creating service runner
[talos] 2024/12/15 14:46:15 service[trustd](Running): Started task trustd (PID 92) for container trustd
[talos] 2024/12/15 14:46:15 service[apid](Running): Started task apid (PID 91) for container apid
[talos] 2024/12/15 14:46:16 service[kubelet](Waiting): Waiting for service "cri" to be "up"
[talos] 2024/12/15 14:46:16 service[etcd](Waiting): Waiting for service "cri" to be "up"
[talos] 2024/12/15 14:46:16 service[trustd](Running): Health check successful
[talos] 2024/12/15 14:46:16 service[apid](Running): Health check successful
[talos] 2024/12/15 14:46:16 service[cri](Running): Health check successful
[talos] 2024/12/15 14:46:16 service[etcd](Preparing): Running pre state
[talos] 2024/12/15 14:46:16 service[kubelet](Preparing): Running pre state
[talos] 2024/12/15 14:46:22 etcd is waiting to join the cluster, if this node is the first node in the cluster, please run `talosctl bootstrap` against one of the following IPs:
[talos] 2024/12/15 14:46:22 [10.88.0.2]
[talos] 2024/12/15 14:46:23 service[kubelet](Preparing): Creating service runner
[talos] 2024/12/15 14:46:23 service[kubelet](Running): Started task kubelet (PID 186) for container kubelet
Also for the helper function:
m00m00:talos damian$ talosctl cluster create
validating CIDR and reserving IPs
generating PKI and tokens
downloading ghcr.io/siderolabs/talos:v1.8.3
creating network talos-default
creating controlplane nodes
creating worker nodes
renamed talosconfig context "talos-default" -> "talos-default-2"
waiting for API
bootstrapping cluster
waiting for etcd to be healthy: OK
waiting for etcd members to be consistent across nodes: OK
waiting for etcd members to be control plane nodes: OK
waiting for apid to be ready: OK
waiting for all nodes memory sizes: OK
waiting for all nodes disk sizes: OK
waiting for no diagnostics: OK
waiting for kubelet to be healthy: OK
waiting for all nodes to finish boot sequence: OK
waiting for all k8s nodes to report: OK
waiting for all control plane static pods to be running: OK
waiting for all control plane components to be ready: OK
waiting for all k8s nodes to report ready: OK
waiting for kube-proxy to report ready: OK
waiting for coredns to report ready: OK
waiting for all k8s nodes to report schedulable: OK
merging kubeconfig into "/Users/damian/.kube/config"
PROVISIONER docker
NAME talos-default
NETWORK NAME talos-default
NETWORK CIDR 10.5.0.0/24
NETWORK GATEWAY 10.5.0.1
NETWORK MTU 1500
KUBERNETES ENDPOINT https://127.0.0.1:53343
NODES:
NAME TYPE IP CPU RAM DISK
/talos-default-controlplane-1 controlplane 10.5.0.2 2.00 2.1 GB -
/talos-default-worker-1 worker 10.5.0.3 2.00 2.1 GB -
💃🏼
Proposal
Similar to the existing note about the docker socket, some form of note about the required configuration.
While this is not perhaps officially supported, it could prevent some future headaches.
The deafult podman setup on osx does not allow talos to run out of the box.
Trying the docker quickstart results in an un-helpful timeout:
Debugging
Trying this on a linux host, everything just works as expected.
Spinning up (under podman on OSX Sonoma) a single container manually starts to reveal the issue:
The errors continue, however I can connect and apply a config.... after which there are more errors along the lines of
Making a simple re-production of what seems to be the core problem, interestingly it works under Docker Desktop (also using apple virtualization framework):
Checking the podman machine (backing vm) points to the cause:
Apparently we're not running as root, so
--privileged
isn't doing too much!Checking the machine config.... and then
podman-machine-init.1
, this is the default behaviour on creation.Solution
Switching the machine config to run as root:
Allows everything to work as expected:
Also for the helper function:
💃🏼
Proposal
Similar to the existing note about the docker socket, some form of note about the required configuration.
While this is not perhaps officially supported, it could prevent some future headaches.
Currently some of the docs explicitly specify
Docker 18.03+
as a pre-requisite (https://www.talos.dev/v1.8/talos-guides/install/local-platforms/docker/), others just specifydocker
(https://www.talos.dev/v1.8/introduction/quickstart/), whichpodman
could be implied as 'docker compatible mostly'.If acceptable, something similar to main...DamianZaremba:feature/add-podman-note-on-quickstart would probably suffice.
The text was updated successfully, but these errors were encountered: