Failing Bootstrap Initialization #2189
Replies: 4 comments
-
|
I'm converting this to a discussion, as this doesn't seem to be a bug. |
Beta Was this translation helpful? Give feedback.
-
|
Just to be clear: The bootstrap machine itself stalls on waiting for the API server or the installer? |
Beta Was this translation helpful? Give feedback.
-
|
If I try to install 4.18, it fails waiting for the API server. If I try to install 4.19, the API server starts up, but several containers fail to download and start. Here are the logs from a 4.19 install attempt... .openshift_install.log from the jump server: Here is the full log for the bootsrap server... The issue with 4.19 I believe has to do with this log snipet I see multiple time... May 06 02:32:29 bootstrap.thor.asguard.com kubelet.sh[3099]: E0506 02:32:29.634937 3099 log.go:32] "PullImage from image service failed" err="rpc error: code = Canceled desc = copying system image from manifest list: copying config: context canceled" image="quay.io/okd/scos-content@sha256:acf89b72f9e24c6ad355a3fe0f9dd5f5b8209927d4489e5471fc5583dc24c8c1" |
Beta Was this translation helpful? Give feedback.
-
|
Hi. I believe i have the same issue. Im trying to install 4.19 (note: i already successfully installed 4.8 months ago). Bootstrap node starts, 6443 and 26623 frontends status in haproxy are green. netcatting returns connection ok. (56.3 - haproxy address , 56.10 - bootstrap node address) When i enter through SSH to bootstrap it shows some problem with coreos-fix-selinux-labels.service and node-image-finish.service. When i tried to just disable selinux and restart to see what happens now i have node-image-pull.service failing I tried to debug it with AI and it points at some problems in /usr/local/bin/node-image-pull.sh script https://www.perplexity.ai/search/last-login-tue-may-20-07-23-55-9avgx6X9T5Kftp7irGhTIw EDIT: I checked journalctl of bootstrap node from the start and found this but when i tried to pull it manually it went fine Also i see new 4.19 version came out. I will try it out tomorrow |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
I have been trying for awhile now to setup OKD for the first time, so I have been round and round with documents and Gemini trying to make sure this is not a ME problem. Still could be, but here is my setup:
Dell R730 Host
Proxmox 8.4.1
I have created 3 Control Plane nodes, 3 Worker Nodes, and 1 Bootstrap node. All have 4CPU, 16GB RAM, and 120GB HDDs
Version
I have tried 4.17, 4.18, and 4.19
I am attempting the UPI install on bare metal
How reproducible
Follow the instructions in the documentation
I have created the install-config.yaml, ran the manifest and ignition creation commands on my jump server.
Ran the command: openshift-install coreos print-stream-json | grep '.iso[^.]'
Downloaded the ISO specified for each version I have tried, (FCOS and SCOS)
Boot up the bootstrap VM, run:
sudo coreos-installer install --ignition-url http://192.168.1.91:8000/bootstrap.ign /dev/sda --insecure-ignition
The OS installer does its thing and comes back completed.
I tell it to reboot, and it starts booting the installed OS.
Here it takes some time, but some times I get the initial step to complete where it says it needs to wait for the versions API, some times I never get the API. However I never get past the step on waiting for the 45 minutes for the bootstrap to complete. This is where it always fails.
The API pass/fail may be dependent on 4.18 or 4.19. I have done so many installs at this point its hard to tell. My most recent install was from 4.18, which I will include the logs for. In the morning I will try 4.19 again and include those logs.
Log bundle 4.18
install-config.yaml.txt
.openshift_install.log
journal_bootstrap_current_boot.log
Beta Was this translation helpful? Give feedback.
All reactions