rke2-server failing to start #7011

cdmx1 · 2024-10-11T10:16:14Z

Environmental Info:
Rancher 1.29.9 on Amazon Linux 2, using RKE2.
RKE2 Version: rke2 version v1.30.4+rke2r1 (9517eea)

Node(s) CPU architecture, OS, and Version:
Linux ip-30-0-1xx-1xx.ec2.internal 5.10.xxx-xxx.879.amzn2.x86_64 #1 SMP Tue Sep 24 01:40:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
1 master node / 1 worker node

Describe the bug:
Rancher agent enrolment gets stuck during the setup process with messages like:

Reconciling
Waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler

Steps To Reproduce:
Installed Rancher agent using the command provided in the Rancher UI for agent enrollment.
curl -fL https://rancher.example.in/system-agent-install.sh | sudo sh -s - --server https://rancher.example.in --label 'cattle.io/os=linux' --token xxxxxxxxxxxx --etcd --controlplane --worker
Agent enrollment gets stuck with the mentioned errors.

Expected behavior:
Rancher agent should successfully enroll into the RKE2 cluster, and the pods should start normally without networking errors.

Actual behavior:
Rancher agent enrollment gets stuck with the message:
Reconciling
Waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler

Error Log Details:

ERRO[0003] Sending HTTP 503 response to 127.0.0.1:49552: runtime core not ready
INFO[0003] Running kube-proxy ...
ERRO[0008] Kubelet exited: exit status 1
ERRO[0013] Kubelet exited: exit status 1
ERRO[0023] Kubelet exited: exit status 1
INFO[0020] Pod for etcd not synced (pod sandbox not found), retrying
{"level":"warn", "msg":"retrying of unary invoker failed", "error":"tls: failed to verify certificate: x509: certificate signed by unknown authority"}
INFO[0030] Failed to test data store connection: context deadline exceeded
ERRO[0033] Failed to check local etcd status for learner management: context deadline exceeded

The text was updated successfully, but these errors were encountered:

brandond · 2024-10-11T18:18:11Z

ERRO[0008] Kubelet exited: exit status 1
ERRO[0013] Kubelet exited: exit status 1
ERRO[0023] Kubelet exited: exit status 1

The kubelet is crashing. This is usually caused by use of incorrect kubelet-args in the cluster configuration, but it could be due to other things. Check the kubelet log at /var/lib/rancher/rke2/agent/logs/kubelet.log to see what the errors are.

brandond closed this as completed Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rke2-server failing to start #7011

rke2-server failing to start #7011

cdmx1 commented Oct 11, 2024

brandond commented Oct 11, 2024 •

edited

Loading

rke2-server failing to start #7011

rke2-server failing to start #7011

Comments

cdmx1 commented Oct 11, 2024

brandond commented Oct 11, 2024 • edited Loading

brandond commented Oct 11, 2024 •

edited

Loading