Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create a local talos cluster in version 1.8.3 - related to #9431 #9902

Open
ja-softdevel opened this issue Dec 9, 2024 · 9 comments

Comments

@ja-softdevel
Copy link

ja-softdevel commented Dec 9, 2024

Bug Report

talosctl create cluster fails with error message context deadline exceeded

Description

Running talosctl create cluster ends with context deadline exceeded error

I do see an etcd pull error in the controlplane. I saw the same issue in the logs for the referenced issue #9431

Suggestion

I know there was a commit to resolve this issue, but it was pushed into v1.9.
Suggest the docs are updated to use v1.9

The follow are requirements for running Talos in Docker:

Docker 18.03 or greater
a recent version of [talosctl](https://github.com/siderolabs/talos/releases) <<< mention it here

I will pull talosctl asset for release v1.9.0-beta.0 and try using it.

Environment

talosctl version
Client:        
           Tag:         v1.8.3
           SHA:         6494aced
           Built:        Go version:  go1.22.9
           OS/Arch:     linux/amd64
docker version
Client: Docker Engine - Community 
  Version:           27.3.1 
  API version:       1.47 
  Go version:        go1.22.7 
  Git commit:        ce12230 
  Built:             Fri Sep 20 11:41:00 2024 
  OS/Arch:           linux/amd64 
  Context:           default
Server: Docker Engine - Community 
  Engine:  
    Version:          27.3.1  
    API version:      1.47 (minimum version 1.24)  
    Go version:       go1.22.7  
    Git commit:       41ca978  
    Built:            Fri Sep 20 11:41:00 2024  
    OS/Arch:          linux/amd64  
    Experimental:     false 
  containerd:  
    Version:          1.7.24  
    GitCommit:        88bf19b2105c8b17560993bee28a01ddc2f97182 
  runc:  
    Version:          1.2.2  
    GitCommit:        v1.2.2-0-g7cb3632 
  docker-init:  
    Version:          0.19.0  
    GitCommit:        de40ad0
kubectl version
    Client Version: v1.31.3
    Kustomize Version: v5.4.2

Logs

talos-controlplane

2024/12/09 14:46:11 limited GOMAXPROCS to 4
[talos] 2024/12/09 14:46:11 initialize sequence: 4 phase(s)
[talos] 2024/12/09 14:46:11 phase systemRequirements (1/4): 2 tasks(s)
[talos] 2024/12/09 14:46:11 task setupSystemDirectory (1/2): starting
[talos] 2024/12/09 14:46:11 task initVolumeLifecycle (2/2): starting
[talos] 2024/12/09 14:46:11 task initVolumeLifecycle (2/2): done, 100.917µs
[talos] 2024/12/09 14:46:11 task setupSystemDirectory (1/2): done, 81.062µs
[talos] 2024/12/09 14:46:11 phase systemRequirements (1/4): done, 204.82µs
[talos] 2024/12/09 14:46:11 phase etc (2/4): 3 tasks(s)
[talos] 2024/12/09 14:46:11 task setUserEnvVars (3/3): starting
[talos] 2024/12/09 14:46:11 task setUserEnvVars (3/3): done, 13.371µs
[talos] 2024/12/09 14:46:11 task CreateSystemCgroups (1/3): starting
[talos] 2024/12/09 14:46:11 task createOSReleaseFile (2/3): starting
[talos] task CreateSystemCgroups (1/3): 2024/12/09 14:46:11 using cgroups root: /
[talos] 2024/12/09 14:46:11 task createOSReleaseFile (2/3): done, 231.203µs
[talos] 2024/12/09 14:46:11 node identity established {"component": "controller-runtime", "controller": "cluster.NodeIdentityController", "node_id": "WmmhUKydFPfynFzvAtTjCr20kHMdqLgFUC7EtPOuCLM"}
[talos] 2024/12/09 14:46:11 pre-created iptables-nft table 'mangle'/'KUBE-IPTABLES-HINT' {"component": "controller-runtime", "controller": "network.NfTablesChainController"}
[talos] 2024/12/09 14:46:11 nftables chains updated {"component": "controller-runtime", "controller": "network.NfTablesChainController", "chains": []}
[talos] 2024/12/09 14:46:11 setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["127.0.0.11"]}
[talos] 2024/12/09 14:46:11 setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["127.0.0.11"]}
[talos] 2024/12/09 14:46:11 setting time servers {"component": "controller-runtime", "controller": "network.TimeServerSpecController", "addresses": ["time.cloudflare.com"]}
[talos] 2024/12/09 14:46:11 setting time servers {"component": "controller-runtime", "controller": "network.TimeServerSpecController", "addresses": ["time.cloudflare.com"]}
[talos] 2024/12/09 14:46:11 setting time servers {"component": "controller-runtime", "controller": "network.TimeServerSpecController", "addresses": ["time.cloudflare.com"]}
[talos] 2024/12/09 14:46:11 task CreateSystemCgroups (1/3): done, 24.755902ms
[talos] 2024/12/09 14:46:11 phase etc (2/4): done, 24.800894ms
[talos] 2024/12/09 14:46:11 phase machined (3/4): 2 tasks(s)
[talos] 2024/12/09 14:46:11 task startContainerd (2/2): starting
[talos] 2024/12/09 14:46:11 task startMachined (1/2): starting
[talos] 2024/12/09 14:46:11 service[containerd](Starting): Starting service
[talos] 2024/12/09 14:46:11 service[containerd](Preparing): Running pre state
[talos] 2024/12/09 14:46:11 service[containerd](Preparing): Creating service runner
[talos] 2024/12/09 14:46:11 service[containerd](Running): Process Process(["/bin/containerd" "--address" "/system/run/containerd/containerd.sock" "--state" "/system/run/containerd" "--root" "/system/var/lib/containerd"]) started with PID 23
[talos] 2024/12/09 14:46:11 service[machined](Starting): Starting service
[talos] 2024/12/09 14:46:11 service[machined](Preparing): Running pre state
[talos] 2024/12/09 14:46:11 service[machined](Preparing): Creating service runner
[talos] 2024/12/09 14:46:11 service[machined](Running): Service started as goroutine
[talos] 2024/12/09 14:46:12 service[containerd](Running): Health check successful
[talos] 2024/12/09 14:46:12 task startContainerd (2/2): done, 1.002103778s
[talos] 2024/12/09 14:46:12 service[machined](Running): Health check successful
[talos] 2024/12/09 14:46:12 task startMachined (1/2): done, 1.020695649s
[talos] 2024/12/09 14:46:12 phase machined (3/4): done, 1.020756332s
[talos] 2024/12/09 14:46:12 phase config (4/4): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task loadConfig (1/1): starting
[talos] 2024/12/09 14:46:12 downloading config {"component": "controller-runtime", "controller": "config.AcquireController", "platform": "container"}
[talos] 2024/12/09 14:46:12 fetching machine config from: USERDATA environment variable
[talos] 2024/12/09 14:46:12 machine config loaded successfully {"component": "controller-runtime", "controller": "config.AcquireController", "sources": ["container"]}
[talos] 2024/12/09 14:46:12 task loadConfig (1/1): done, 2.961318ms
[talos] 2024/12/09 14:46:12 phase config (4/4): done, 3.027341ms
[talos] 2024/12/09 14:46:12 initialize sequence: done: 1.048849751s
[talos] 2024/12/09 14:46:12 install sequence: 0 phase(s)
[talos] 2024/12/09 14:46:12 install sequence: done: 3.767µs
[talos] 2024/12/09 14:46:12 service[apid](Starting): Starting service
[talos] 2024/12/09 14:46:12 service[apid](Waiting): Waiting for service "containerd" to be "up", api certificates
[talos] 2024/12/09 14:46:12 boot sequence: 10 phase(s)
[talos] 2024/12/09 14:46:12 phase saveConfig (1/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task saveConfig (1/1): starting
[talos] 2024/12/09 14:46:12 kubeprism KubePrism is enabled {"component": "controller-runtime", "controller": "k8s.KubePrismController", "endpoint": "127.0.0.1:7445"}
[talos] 2024/12/09 14:46:12 task saveConfig (1/1): done, 193.738µs
[talos] 2024/12/09 14:46:12 phase saveConfig (1/10): done, 249.345µs
[talos] 2024/12/09 14:46:12 phase memorySizeCheck (2/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task memorySizeCheck (1/1): starting
[talos] task memorySizeCheck (1/1): 2024/12/09 14:46:12 skipping memory size check in the container
[talos] 2024/12/09 14:46:12 task memorySizeCheck (1/1): done, 27.626µs
[talos] 2024/12/09 14:46:12 phase memorySizeCheck (2/10): done, 39.733µs
[talos] 2024/12/09 14:46:12 phase diskSizeCheck (3/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task diskSizeCheck (1/1): starting
[talos] task diskSizeCheck (1/1): 2024/12/09 14:46:12 skipping disk size check in the container
[talos] 2024/12/09 14:46:12 task diskSizeCheck (1/1): done, 7.916µs
[talos] 2024/12/09 14:46:12 phase diskSizeCheck (3/10): done, 18.262µs
[talos] 2024/12/09 14:46:12 phase env (4/10): 2 tasks(s)
[talos] 2024/12/09 14:46:12 task waitForCARoots (2/2): starting
[talos] 2024/12/09 14:46:12 task setUserEnvVars (1/2): starting
[talos] 2024/12/09 14:46:12 task setUserEnvVars (1/2): done, 22.545µs
[talos] 2024/12/09 14:46:12 task waitForCARoots (2/2): done, 119.257µs
[talos] 2024/12/09 14:46:12 phase env (4/10): done, 131.038µs
[talos] 2024/12/09 14:46:12 phase dbus (5/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task startDBus (1/1): starting
[talos] 2024/12/09 14:46:12 task startDBus (1/1): done, 326.565µs
[talos] 2024/12/09 14:46:12 phase dbus (5/10): done, 399.448µs
[talos] 2024/12/09 14:46:12 phase sharedFilesystems (6/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task setupSharedFilesystems (1/1): starting
[talos] 2024/12/09 14:46:12 task setupSharedFilesystems (1/1): done, 25.158µs
[talos] 2024/12/09 14:46:12 phase sharedFilesystems (6/10): done, 56.401µs
[talos] 2024/12/09 14:46:12 phase var (7/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task setupVarDirectory (1/1): starting
[talos] 2024/12/09 14:46:12 task setupVarDirectory (1/1): done, 369.631µs
[talos] 2024/12/09 14:46:12 phase var (7/10): done, 386.082µs
[talos] 2024/12/09 14:46:12 phase userSetup (8/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task writeUserFiles (1/1): starting
[talos] 2024/12/09 14:46:12 task writeUserFiles (1/1): done, 5.861µs
[talos] 2024/12/09 14:46:12 phase userSetup (8/10): done, 17.297µs
[talos] 2024/12/09 14:46:12 phase extendPCRStartAll (9/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task extendPCRStartAll (1/1): starting
[talos] 2024/12/09 14:46:12 assigned address {"component": "controller-runtime", "controller": "network.AddressSpecController", "address": "169.254.116.108/32", "link": "lo"}
[talos] 2024/12/09 14:46:12 created dns upstream {"component": "controller-runtime", "controller": "network.DNSUpstreamController", "addr": "127.0.0.11", "idx": 0}
[talos] 2024/12/09 14:46:12 updated dns server nameservers {"component": "dns-resolve-cache", "addrs": ["127.0.0.11:53"]}
[talos] 2024/12/09 14:46:12 task extendPCRStartAll (1/1): done, 20.498933ms
[talos] 2024/12/09 14:46:12 phase extendPCRStartAll (9/10): done, 20.679263ms
[talos] 2024/12/09 14:46:12 phase startEverything (10/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task startAllServices (1/1): starting
[talos] 2024/12/09 14:46:12 service[cri](Starting): Starting service
[talos] 2024/12/09 14:46:12 service[cri](Waiting): Waiting for network
[talos] 2024/12/09 14:46:12 service[cri](Preparing): Running pre state
[talos] 2024/12/09 14:46:12 service[cri](Preparing): Creating service runner
[talos] 2024/12/09 14:46:12 service[trustd](Starting): Starting service
[talos] 2024/12/09 14:46:12 service[trustd](Waiting): Waiting for service "containerd" to be "up", time sync, network
[talos] 2024/12/09 14:46:12 service[cri](Running): Process Process(["/bin/containerd" "--address" "/run/containerd/containerd.sock" "--config" "/etc/cri/containerd.toml"]) started with PID 52
[talos] 2024/12/09 14:46:12 service[etcd](Starting): Starting service
[talos] 2024/12/09 14:46:12 service[etcd](Waiting): Waiting for service "cri" to be "up", time sync, network, etcd spec
[talos] task startAllServices (1/1): 2024/12/09 14:46:12 waiting for 7 services
[talos] task startAllServices (1/1): 2024/12/09 14:46:12 service "apid" to be "up", service "containerd" to be "up", service "cri" to be "up", service "etcd" to be "up", service "kubelet" to be "up", service "machined" to be "up", service "trustd" to be "up"
[talos] 2024/12/09 14:46:12 service[trustd](Preparing): Running pre state
[talos] 2024/12/09 14:46:12 service[trustd](Preparing): Creating service runner
[talos] 2024/12/09 14:46:12 service[apid](Preparing): Running pre state
[talos] 2024/12/09 14:46:12 service[apid](Preparing): Creating service runner
[talos] 2024/12/09 14:46:12 service[kubelet](Starting): Starting service
[talos] 2024/12/09 14:46:12 service[kubelet](Waiting): Waiting for service "cri" to be "up", time sync, network
[talos] 2024/12/09 14:46:12 service[apid](Running): Started task apid (PID 118) for container apid
[talos] 2024/12/09 14:46:12 service[trustd](Running): Started task trustd (PID 119) for container trustd
[talos] 2024/12/09 14:46:13 bootstrap request received
[talos] 2024/12/09 14:46:13 service[etcd](Waiting): Waiting for service "cri" to be "up"
[talos] 2024/12/09 14:46:13 service[cri](Running): Health check successful
[talos] 2024/12/09 14:46:13 service[etcd](Preparing): Running pre state
[talos] 2024/12/09 14:46:13 service[kubelet](Preparing): Running pre state
[talos] 2024/12/09 14:46:13 service[etcd](Failed): Failed to run pre stage: failed to pull image "gcr.io/etcd-development/etcd:v3.5.16": 1 error(s) occurred:
	failed to pull image "gcr.io/etcd-development/etcd:v3.5.16": context canceled
[talos] 2024/12/09 14:46:13 service[etcd](Finished): Bootstrap requested
[talos] 2024/12/09 14:46:13 service[etcd](Starting): Starting service
[talos] 2024/12/09 14:46:13 service[etcd](Waiting): Waiting for service "cri" to be "up", time sync, network, etcd spec
[talos] 2024/12/09 14:46:13 service[etcd](Preparing): Running pre state
[talos] 2024/12/09 14:46:13 service[apid](Running): Health check successful
[talos] 2024/12/09 14:46:13 service[trustd](Running): Health check successful
[talos] 2024/12/09 14:46:17 service[etcd](Preparing): Creating service runner
[talos] 2024/12/09 14:46:17 service[etcd](Running): Started task etcd (PID 251) for container etcd
[talos] 2024/12/09 14:46:22 service[etcd](Running): Health check successful
[talos] 2024/12/09 14:46:22 rendered new static pod {"component": "controller-runtime", "controller": "k8s.StaticPodServerController", "id": "kube-apiserver"}
[talos] 2024/12/09 14:46:22 rendered new static pod {"component": "controller-runtime", "controller": "k8s.StaticPodServerController", "id": "kube-controller-manager"}
[talos] 2024/12/09 14:46:22 rendered new static pod {"component": "controller-runtime", "controller": "k8s.StaticPodServerController", "id": "kube-scheduler"}
[talos] task startAllServices (1/1): 2024/12/09 14:46:27 service "kubelet" to be "up"
[talos] 2024/12/09 14:46:27 kubernetes endpoint watch error {"component": "controller-runtime", "controller": "k8s.EndpointController", "error": "failed to list *v1.Endpoints: Get \"https://10.5.0.2:6443/api/v1/namespaces/default/endpoints?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0\": dial tcp 10.5.0.2:6443: connect: connection refused"}
[talos] 2024/12/09 14:46:30 service[kubelet](Preparing): Creating service runner
[talos] 2024/12/09 14:46:30 service[kubelet](Running): Started task kubelet (PID 311) for container kubelet
[talos] 2024/12/09 14:46:32 service[kubelet](Running): Health check successful
[talos] 2024/12/09 14:46:32 task startAllServices (1/1): done, 19.86532006s
[talos] 2024/12/09 14:46:32 phase startEverything (10/10): done, 19.865380081s
[talos] 2024/12/09 14:46:32 boot sequence: done: 19.887453828s
[talos] 2024/12/09 14:46:48 kubernetes endpoint watch error {"component": "controller-runtime", "controller": "k8s.EndpointController", "error": "failed to list *v1.Endpoints: Get \"https://10.5.0.2:6443/api/v1/namespaces/default/endpoints?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0\": dial tcp 10.5.0.2:6443: connect: connection refused"}
[talos] 2024/12/09 14:47:06 controller failed {"component": "controller-runtime", "controller": "k8s.NodeApplyController", "error": "1 error(s) occurred:\n\ttimeout"}
[talos] 2024/12/09 14:47:16 controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-ylag20: Get \"https://127.0.0.1:7445/api?timeout=32s\": EOF"}
[talos] 2024/12/09 14:47:19 controller failed {"component": "controller-runtime", "controller": "k8s.NodeApplyController", "error": "1 error(s) occurred:\n\ttimeout"}
[talos] 2024/12/09 14:47:28 controller failed {"component": "controller-runtime", "controller": "k8s.ManifestApplyController", "error": "error creating mapping for object /v1/Secret/bootstrap-token-ylag20: Get \"https://127.0.0.1:7445/api?timeout=32s\": EOF"}
[talos] 2024/12/09 14:47:31 controller failed {"component": "controller-runtime", "controller": "k8s.NodeApplyController", "error": "1 error(s) occurred:\n\ttimeout"}
[talos] 2024/12/09 14:47:33 created /v1/Secret/bootstrap-token-ylag20 {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:33 created rbac.authorization.k8s.io/v1/ClusterRoleBinding/system-bootstrap-approve-node-client-csr {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:33 created rbac.authorization.k8s.io/v1/ClusterRoleBinding/system-bootstrap-node-bootstrapper {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:33 created rbac.authorization.k8s.io/v1/ClusterRoleBinding/system-bootstrap-node-renewal {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:33 created rbac.authorization.k8s.io/v1/ClusterRole/flannel {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:33 created rbac.authorization.k8s.io/v1/ClusterRoleBinding/flannel {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:34 created /v1/ServiceAccount/flannel {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:34 created /v1/ConfigMap/kube-flannel-cfg {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:35 created apps/v1/DaemonSet/kube-flannel {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:35 created apps/v1/DaemonSet/kube-proxy {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:35 created /v1/ServiceAccount/kube-proxy {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:36 created rbac.authorization.k8s.io/v1/ClusterRoleBinding/kube-proxy {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:36 created /v1/ServiceAccount/coredns {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:37 created rbac.authorization.k8s.io/v1/ClusterRoleBinding/system:coredns {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:37 created rbac.authorization.k8s.io/v1/ClusterRole/system:coredns {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:37 created /v1/ConfigMap/coredns {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:38 created apps/v1/Deployment/coredns {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:38 created /v1/Service/kube-dns {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:38 controller failed {"component": "controller-runtime", "controller": "k8s.NodeApplyController", "error": "1 error(s) occurred:\n\terror getting node: nodes \"talos-default-controlplane-1\" not found"}
[talos] 2024/12/09 14:47:39 created /v1/ConfigMap/kubeconfig-in-cluster {"component": "controller-runtime", "controller": "k8s.ManifestApplyController"}
[talos] 2024/12/09 14:47:43 controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\") has prevented the request from succeeding"}
[talos] 2024/12/09 14:47:49 controller failed {"component": "controller-runtime", "controller": "k8s.NodeApplyController", "error": "1 error(s) occurred:\n\terror getting node: nodes \"talos-default-controlplane-1\" not found"}
[talos] 2024/12/09 14:48:07 machine is running and ready {"component": "controller-runtime", "controller": "runtime.MachineStatusController"}

talos-worker

2024/12/09 14:46:11 limited GOMAXPROCS to 4
[talos] 2024/12/09 14:46:11 initialize sequence: 4 phase(s)
[talos] 2024/12/09 14:46:11 phase systemRequirements (1/4): 2 tasks(s)
[talos] 2024/12/09 14:46:11 task initVolumeLifecycle (2/2): starting
[talos] 2024/12/09 14:46:11 task setupSystemDirectory (1/2): starting
[talos] 2024/12/09 14:46:11 task initVolumeLifecycle (2/2): done, 95.447µs
[talos] 2024/12/09 14:46:11 task setupSystemDirectory (1/2): done, 99.277µs
[talos] 2024/12/09 14:46:11 phase systemRequirements (1/4): done, 229.169µs
[talos] 2024/12/09 14:46:11 phase etc (2/4): 3 tasks(s)
[talos] 2024/12/09 14:46:11 task setUserEnvVars (3/3): starting
[talos] 2024/12/09 14:46:11 task setUserEnvVars (3/3): done, 8.812µs
[talos] 2024/12/09 14:46:11 task CreateSystemCgroups (1/3): starting
[talos] 2024/12/09 14:46:11 task createOSReleaseFile (2/3): starting
[talos] task CreateSystemCgroups (1/3): 2024/12/09 14:46:11 using cgroups root: /
[talos] 2024/12/09 14:46:11 task createOSReleaseFile (2/3): done, 170.9µs
[talos] 2024/12/09 14:46:11 node identity established {"component": "controller-runtime", "controller": "cluster.NodeIdentityController", "node_id": "9d5WC8UapdaeDdbKkfZ1GhMLUTbYvY7HFDbsVLe0fKDD"}
[talos] 2024/12/09 14:46:11 pre-created iptables-nft table 'mangle'/'KUBE-IPTABLES-HINT' {"component": "controller-runtime", "controller": "network.NfTablesChainController"}
[talos] 2024/12/09 14:46:11 nftables chains updated {"component": "controller-runtime", "controller": "network.NfTablesChainController", "chains": []}
[talos] 2024/12/09 14:46:11 setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["127.0.0.11"]}
[talos] 2024/12/09 14:46:11 setting time servers {"component": "controller-runtime", "controller": "network.TimeServerSpecController", "addresses": ["time.cloudflare.com"]}
[talos] 2024/12/09 14:46:11 setting time servers {"component": "controller-runtime", "controller": "network.TimeServerSpecController", "addresses": ["time.cloudflare.com"]}
[talos] 2024/12/09 14:46:11 setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["127.0.0.11"]}
[talos] 2024/12/09 14:46:11 task CreateSystemCgroups (1/3): done, 25.693653ms
[talos] 2024/12/09 14:46:11 phase etc (2/4): done, 25.779698ms
[talos] 2024/12/09 14:46:11 phase machined (3/4): 2 tasks(s)
[talos] 2024/12/09 14:46:11 task startContainerd (2/2): starting
[talos] 2024/12/09 14:46:11 task startMachined (1/2): starting
[talos] 2024/12/09 14:46:11 service[containerd](Starting): Starting service
[talos] 2024/12/09 14:46:11 service[containerd](Preparing): Running pre state
[talos] 2024/12/09 14:46:11 service[containerd](Preparing): Creating service runner
[talos] 2024/12/09 14:46:11 service[containerd](Running): Process Process(["/bin/containerd" "--address" "/system/run/containerd/containerd.sock" "--state" "/system/run/containerd" "--root" "/system/var/lib/containerd"]) started with PID 23
[talos] 2024/12/09 14:46:11 service[machined](Starting): Starting service
[talos] 2024/12/09 14:46:11 service[machined](Preparing): Running pre state
[talos] 2024/12/09 14:46:11 service[machined](Preparing): Creating service runner
[talos] 2024/12/09 14:46:11 service[machined](Running): Service started as goroutine
[talos] 2024/12/09 14:46:12 service[containerd](Running): Health check successful
[talos] 2024/12/09 14:46:12 task startContainerd (2/2): done, 1.001695571s
[talos] 2024/12/09 14:46:12 service[machined](Running): Health check successful
[talos] 2024/12/09 14:46:12 task startMachined (1/2): done, 1.019956054s
[talos] 2024/12/09 14:46:12 phase machined (3/4): done, 1.020041411s
[talos] 2024/12/09 14:46:12 phase config (4/4): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task loadConfig (1/1): starting
[talos] 2024/12/09 14:46:12 downloading config {"component": "controller-runtime", "controller": "config.AcquireController", "platform": "container"}
[talos] 2024/12/09 14:46:12 fetching machine config from: USERDATA environment variable
[talos] 2024/12/09 14:46:12 machine config loaded successfully {"component": "controller-runtime", "controller": "config.AcquireController", "sources": ["container"]}
[talos] 2024/12/09 14:46:12 task loadConfig (1/1): done, 999.421µs
[talos] 2024/12/09 14:46:12 phase config (4/4): done, 1.011246ms
[talos] 2024/12/09 14:46:12 initialize sequence: done: 1.047083176s
[talos] 2024/12/09 14:46:12 install sequence: 0 phase(s)
[talos] 2024/12/09 14:46:12 install sequence: done: 1.691µs
[talos] 2024/12/09 14:46:12 service[apid](Starting): Starting service
[talos] 2024/12/09 14:46:12 service[apid](Waiting): Waiting for service "containerd" to be "up", api certificates
[talos] 2024/12/09 14:46:12 boot sequence: 10 phase(s)
[talos] 2024/12/09 14:46:12 phase saveConfig (1/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task saveConfig (1/1): starting
[talos] 2024/12/09 14:46:12 task saveConfig (1/1): done, 68.708µs
[talos] 2024/12/09 14:46:12 phase saveConfig (1/10): done, 81.815µs
[talos] 2024/12/09 14:46:12 phase memorySizeCheck (2/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task memorySizeCheck (1/1): starting
[talos] task memorySizeCheck (1/1): 2024/12/09 14:46:12 skipping memory size check in the container
[talos] 2024/12/09 14:46:12 task memorySizeCheck (1/1): done, 8.118µs
[talos] 2024/12/09 14:46:12 phase memorySizeCheck (2/10): done, 20.837µs
[talos] 2024/12/09 14:46:12 phase diskSizeCheck (3/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task diskSizeCheck (1/1): starting
[talos] task diskSizeCheck (1/1): 2024/12/09 14:46:12 skipping disk size check in the container
[talos] 2024/12/09 14:46:12 task diskSizeCheck (1/1): done, 4.154µs
[talos] 2024/12/09 14:46:12 phase diskSizeCheck (3/10): done, 9.728µs
[talos] 2024/12/09 14:46:12 phase env (4/10): 2 tasks(s)
[talos] 2024/12/09 14:46:12 task waitForCARoots (2/2): starting
[talos] 2024/12/09 14:46:12 task setUserEnvVars (1/2): starting
[talos] 2024/12/09 14:46:12 task setUserEnvVars (1/2): done, 5.849µs
[talos] 2024/12/09 14:46:12 task waitForCARoots (2/2): done, 72.787µs
[talos] 2024/12/09 14:46:12 phase env (4/10): done, 81.498µs
[talos] 2024/12/09 14:46:12 phase dbus (5/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task startDBus (1/1): starting
[talos] 2024/12/09 14:46:12 kubeprism KubePrism is enabled {"component": "controller-runtime", "controller": "k8s.KubePrismController", "endpoint": "127.0.0.1:7445"}
[talos] 2024/12/09 14:46:12 task startDBus (1/1): done, 259.855µs
[talos] 2024/12/09 14:46:12 phase dbus (5/10): done, 269.503µs
[talos] 2024/12/09 14:46:12 phase sharedFilesystems (6/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 assigned address {"component": "controller-runtime", "controller": "network.AddressSpecController", "address": "169.254.116.108/32", "link": "lo"}
[talos] 2024/12/09 14:46:12 task setupSharedFilesystems (1/1): starting
[talos] 2024/12/09 14:46:12 task setupSharedFilesystems (1/1): done, 28.453µs
[talos] 2024/12/09 14:46:12 phase sharedFilesystems (6/10): done, 40.846µs
[talos] 2024/12/09 14:46:12 phase var (7/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task setupVarDirectory (1/1): starting
[talos] 2024/12/09 14:46:12 task setupVarDirectory (1/1): done, 307.458µs
[talos] 2024/12/09 14:46:12 phase var (7/10): done, 382.974µs
[talos] 2024/12/09 14:46:12 phase userSetup (8/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task writeUserFiles (1/1): starting
[talos] 2024/12/09 14:46:12 task writeUserFiles (1/1): done, 6.837µs
[talos] 2024/12/09 14:46:12 phase userSetup (8/10): done, 42.225µs
[talos] 2024/12/09 14:46:12 phase extendPCRStartAll (9/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task extendPCRStartAll (1/1): starting
[talos] 2024/12/09 14:46:12 created dns upstream {"component": "controller-runtime", "controller": "network.DNSUpstreamController", "addr": "127.0.0.11", "idx": 0}
[talos] 2024/12/09 14:46:12 updated dns server nameservers {"component": "dns-resolve-cache", "addrs": ["127.0.0.11:53"]}
[talos] 2024/12/09 14:46:12 task extendPCRStartAll (1/1): done, 19.774601ms
[talos] 2024/12/09 14:46:12 phase extendPCRStartAll (9/10): done, 19.789737ms
[talos] 2024/12/09 14:46:12 phase startEverything (10/10): 1 tasks(s)
[talos] 2024/12/09 14:46:12 task startAllServices (1/1): starting
[talos] 2024/12/09 14:46:12 service[cri](Starting): Starting service
[talos] 2024/12/09 14:46:12 service[cri](Waiting): Waiting for network
[talos] 2024/12/09 14:46:12 service[cri](Preparing): Running pre state
[talos] task startAllServices (1/1): 2024/12/09 14:46:12 waiting for 5 services
[talos] 2024/12/09 14:46:12 service[cri](Preparing): Creating service runner
[talos] task startAllServices (1/1): 2024/12/09 14:46:12 service "apid" to be "up", service "containerd" to be "up", service "cri" to be "up", service "kubelet" to be "up", service "machined" to be "up"
[talos] 2024/12/09 14:46:12 service[cri](Running): Process Process(["/bin/containerd" "--address" "/run/containerd/containerd.sock" "--config" "/etc/cri/containerd.toml"]) started with PID 50
[talos] 2024/12/09 14:46:12 service[kubelet](Starting): Starting service
[talos] 2024/12/09 14:46:12 service[kubelet](Waiting): Waiting for service "cri" to be "up", time sync, network
[talos] 2024/12/09 14:46:12 service[apid](Preparing): Running pre state
[talos] 2024/12/09 14:46:12 service[apid](Preparing): Creating service runner
[talos] 2024/12/09 14:46:12 service[apid](Running): Started task apid (PID 96) for container apid
[talos] 2024/12/09 14:46:13 service[cri](Running): Health check successful
[talos] 2024/12/09 14:46:13 service[kubelet](Preparing): Running pre state
[talos] 2024/12/09 14:46:13 service[apid](Running): Health check successful
[talos] task startAllServices (1/1): 2024/12/09 14:46:27 service "kubelet" to be "up"
[talos] 2024/12/09 14:46:31 service[kubelet](Preparing): Creating service runner
[talos] 2024/12/09 14:46:31 service[kubelet](Running): Started task kubelet (PID 196) for container kubelet
[talos] 2024/12/09 14:46:33 service[kubelet](Running): Health check successful
[talos] 2024/12/09 14:46:33 task startAllServices (1/1): done, 20.879359245s
[talos] 2024/12/09 14:46:33 phase startEverything (10/10): done, 20.87937547s
[talos] 2024/12/09 14:46:33 boot sequence: done: 20.900126609s
[talos] 2024/12/09 14:48:04 machine is running and ready {"component": "controller-runtime", "controller": "runtime.MachineStatusController"}
@smira
Copy link
Member

smira commented Dec 9, 2024

If you hit an error, please submit a full bug report, including the command you're running and the output and full error message.

Also, you can grab talosctl support bundle.

@ja-softdevel
Copy link
Author

ja-softdevel commented Dec 9, 2024

@smira

I switched versions and ran talosctl cluster create and it failed again with the "context deadline exceeded error"

support.zip

talosctl version
Client:        
    Tag:         v1.9.0-beta.0        
    SHA:         580805ba        
    Built:        
    Go version:  go1.23.3        
    OS/Arch:     linux/amd64
Server:
nodes are not set for the command: please use `--nodes` flag or configuration file to set the nodes to run the command against

adding the --nodes flag

talosctl version --nodes 10.5.0.2 
Client:        
    Tag:         v1.9.0-beta.0        
    SHA:         580805ba        
    Built:        
    Go version:  go1.23.3        
    OS/Arch:     linux/amd64
Server:        
    NODE:        10.5.0.2        
    Tag:         v1.9.0-beta.0        
    SHA:         580805ba        
    Built:        
    Go version:  go1.23.3        
    OS/Arch:     linux/amd64        
    Enabled:     RBAC

@smira
Copy link
Member

smira commented Dec 9, 2024

There's still no output nor the command you ran. The cluster status is okay according to the logs.

@ja-softdevel
Copy link
Author

ja-softdevel commented Dec 9, 2024

The docs say "Once the above finishes successfully, your talosconfig ( ~/.talos/config ) and kubeconfig ( ~/.kube/config ) will be configured to point to the new cluster." This does not happen, I assume the deadline exceeded errror causes the program to exit before creating the files.

I did notice that with v1.9.0-beta.0 a new subfolder is created under ~/.talos/clusters. But this folder is empty.

@ja-softdevel
Copy link
Author

The command

$ talosctl cluster create
validating CIDR and reserving IPs
generating PKI and tokens
creating state directory in "/home/jason/.talos/clusters/talos-default"
creating network talos-default
creating controlplane nodes
creating worker nodes
renamed talosconfig context "talos-default" -> "talos-default-3"
waiting for API
bootstrapping clusterwaiting for etcd to be healthy: OK
waiting for etcd members to be consistent across nodes: OK
waiting for etcd members to be control plane nodes: OK
waiting for apid to be ready: OK
waiting for all nodes memory sizes: OK
waiting for all nodes disk sizes: OK
waiting for no diagnostics: OK
waiting for kubelet to be healthy: OK
waiting for all nodes to finish boot sequence: OK
waiting for all k8s nodes to report: OK
waiting for all control plane static pods to be running: OK
waiting for all control plane components to be ready: OK
waiting for all k8s nodes to report ready: OK
waiting for kube-proxy to report ready: OK
◱ waiting for coredns to report ready: no ready pods found for namespace "kube-system" and label selector "k8s-app=kube-dns"
context deadline exceeded

@smira
Copy link
Member

smira commented Dec 9, 2024

      message: '0/2 nodes are available: 2 node(s) had untolerated taint {node.kubernetes.io/disk-pressure:
        }. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.'

The problem is that under Docker, the kubelet sees your host disk (wherever the /var/lib/docker is) as the Kubernetes disk and tries to do a check on disk free space.

Even though the space might be enough for Kubernetes, it might actually think that your /var is too much full, so denies pod scheduling.

You either can use QEMU which will isolate it from the host, or you free up some space on your host.

@ja-softdevel
Copy link
Author

I cleaned up some space on the host by removing some old Docker Images. Free space went from 84GB to 214GB.

talosctl cluster create still fails to complete correctly and without errors.

Using QEMU defeats the purpose of being about to just spin up talos in Docker.

@smira
Copy link
Member

smira commented Dec 9, 2024

You can do talosctl -n 10.5.0.2 kubeconfig and investigate yourself with kubectl.

@ja-softdevel
Copy link
Author

I investigated it myself by reverting to v1.68. This worked without issues.

Obviously, I'm not the only who has reported the same issue. The command talosctl cluster create seems to fail in version 1.8 and newer. I did not test with version 1.7.

Also, since the command executed correctly, the talos and kube configs were exported correctly.

$ talosctl version --nodes 10.5.0.2
Client:        
  Tag:         v1.6.8        
  SHA:         26c13c8f        
  Built:        
  Go version:  go1.21.12 X:loopvar        
  OS/Arch:     linux/amd64
Server:        
  NODE:        10.5.0.2        
  Tag:         v1.6.8        
  SHA:         26c13c8f        
  Built:        
  Go version:  go1.21.12 X:loopvar        
  OS/Arch:     linux/amd64        
  Enabled:     RBAC
$ talosctl cluster create
validating CIDR and reserving IPs
generating PKI and tokens
downloading ghcr.io/siderolabs/talos:v1.6.8
creating network talos-default
creating controlplane nodes
creating worker nodes
waiting for API
bootstrapping cluster
waiting for etcd to be healthy: OK
waiting for etcd members to be consistent across nodes: OK
waiting for etcd members to be control plane nodes: OK
waiting for apid to be ready: OK
waiting for all nodes memory sizes: OK
waiting for all nodes disk sizes: OK
waiting for kubelet to be healthy: OK
waiting for all nodes to finish boot sequence: OK
waiting for all k8s nodes to report: OK
waiting for all k8s nodes to report ready: OK
waiting for all control plane static pods to be running: OK
waiting for all control plane components to be ready: OK
waiting for kube-proxy to report ready: OK
waiting for coredns to report ready: OK
waiting for all k8s nodes to report schedulable: OK
merging kubeconfig into "/home/jason/.kube/config"
PROVISIONER       docker
NAME              talos-default
NETWORK NAME      talos-default
NETWORK CIDR      10.5.0.0/24
NETWORK GATEWAY   10.5.0.1
NETWORK MTU       1500
NODES:
NAME                            TYPE           IP         CPU    RAM      DISK
/talos-default-controlplane-1   controlplane   10.5.0.2   2.00   2.1 GB   -
/talos-default-worker-1         worker         10.5.0.3   2.00   2.1 GB   -
$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                                   READY   STATUS    RESTARTS      AGE
kube-system   coredns-85b955d87b-2rlhk                               1/1     Running   0             14m
kube-system   coredns-85b955d87b-68f7t                               1/1     Running   0             14m
kube-system   kube-apiserver-talos-default-controlplane-1            1/1     Running   0             14m
kube-system   kube-controller-manager-talos-default-controlplane-1   1/1     Running   2 (15m ago)   13m
kube-system   kube-flannel-t5bsr                                     1/1     Running   0             14m
kube-system   kube-flannel-vzjdm                                     1/1     Running   0             14m
kube-system   kube-proxy-2vk2g                                       1/1     Running   0             14m
kube-system   kube-proxy-xfrnf                                       1/1     Running   0             14m
kube-system   kube-scheduler-talos-default-controlplane-1            1/1     Running   3 (15m ago)   13m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants