Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provisioning timeout #85

Open
jingvar opened this issue May 27, 2021 · 12 comments
Open

Provisioning timeout #85

jingvar opened this issue May 27, 2021 · 12 comments

Comments

@jingvar
Copy link

jingvar commented May 27, 2021

I have faced with kind of trouble.
Provisioning timeout is too short for my env.
I pleasure to see a configuration of your environment corresponding current timeouts.

@markgoddard
Copy link
Member

Hi @jingvar. Can you provide more information about the error? Which task fails?

@jingvar
Copy link
Author

jingvar commented May 31, 2021

I'm not sure about my first batch, but now I have

TASK [Wait for the ironic node to become active] ******************************************************************************************************************************************************************************************* FAILED - RETRYING: Wait for the ironic node to become active (60 retries left).

fatal: [controller0 -> {{ hostvars[seed_host].ansible_host | default(seed_host) }}]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["docker", "exec", "bifrost_deploy", "bash", "-c", " export OS_CLOUD=bifrost && export OS_BAREMETAL_API_VERSION=1.34 && export BIFROST_INVENTORY_SOURCE=ironic && ansible baremetal --connection local --inventory /etc/bifrost/inventory/ -e @/etc/bifrost/bifrost.yml -e @/etc/bifrost/dib.yml --limit controller0 -m command -a \"baremetal node show {{ inventory_hostname }} -f value -c provision_state\""], "delta": "0:00:12.900611", "end": "2021-05-31 17:23:39.951354", "rc": 0, "start": "2021-05-31 17:23:27.050743", "stderr": "", "stderr_lines": [], "stdout": "controller0 | CHANGED | rc=0 >>\nwait call-back", "stdout_lines": ["controller0 | CHANGED | rc=0 >>", "wait call-back"]} fatal: [compute0 -> {{ hostvars[seed_host].ansible_host | default(seed_host) }}]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["docker", "exec", "bifrost_deploy", "bash", "-c", " export OS_CLOUD=bifrost && export OS_BAREMETAL_API_VERSION=1.34 && export BIFROST_INVENTORY_SOURCE=ironic && ansible baremetal --connection local --inventory /etc/bifrost/inventory/ -e @/etc/bifrost/bifrost.yml -e @/etc/bifrost/dib.yml --limit compute0 -m command -a \"baremetal node show {{ inventory_hostname }} -f value -c provision_state\""], "delta": "0:00:13.568618", "end": "2021-05-31 17:23:41.981203", "rc": 0, "start": "2021-05-31 17:23:28.412585", "stderr": "", "stderr_lines": [], "stdout": "compute0 | CHANGED | rc=0 >>\nwait call-back", "stdout_lines": ["compute0 | CHANGED | rc=0 >>", "wait call-back"]}

(bifrost-deploy)[root@seed bifrost-9.0.2.dev21]# baremetal node list +--------------------------------------+-------------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+-------------+---------------+-------------+--------------------+-------------+ | 6ced105b-8119-4910-907b-126faca79bf1 | compute0 | None | power on | wait call-back | False | | 52b05c6c-10e6-4255-ae6b-63e74ca944c7 | controller0 | None | power on | wait call-back | False | +--------------------------------------+-------------+---------------+-------------+--------------------+-------------+
{"commands": [{"id": "ab0ec511-a31b-4a4b-8ac2-873072a9a757", "command_name": "get_deploy_steps", "command_params": {"node": {"id": 4, "uuid": "52b05c6c-10e6-4255-ae6b-63e74ca944c7", "name": "controller0", "chassis_id": null, "instance_uuid": null, "driver": "ipmi", "driver_info": {"ipmi_address": "192.168.33.4", "ipmi_port": "6230", "ipmi_username": "username", "ipmi_password": "******", "deploy_kernel": "http://192.168.33.5:8080/ipa.kernel", "deploy_ramdisk": "http://192.168.33.5:8080/ipa.initramfs"}, "driver_internal_info": {"deploy_boot_mode": "bios", "last_power_state_change": "2021-05-31T16:53:26.807751", "agent_secret_token": "******", "agent_url": "https://192.168.33.169:9999", "agent_version": "6.4.4.dev17", "agent_last_heartbeat": "2021-05-31T17:00:02.167829", "agent_verify_ca": "/var/lib/ironic/certificates/52b05c6c-10e6-4255-ae6b-63e74ca944c7.crt", "is_whole_disk_image": true, "deploy_steps": [{"step": "deploy", "priority": 100, "argsinfo": null, "interface": "deploy"}, {"step": "write_image", "priority": 80, "argsinfo": null, "interface": "deploy"}, {"step": "prepare_instance_boot", "priority": 60, "argsinfo": null, "interface": "deploy"}, {"step": "tear_down_agent", "priority": 40, "argsinfo": null, "interface": "deploy"}, {"step": "switch_to_tenant_network", "priority": 30, "argsinfo": null, "interface": "deploy"}, {"step": "boot_instance", "priority": 20, "argsinfo": null, "interface": "deploy"}], "deploy_step_index": 0}, "clean_step": {}, "deploy_step": {"step": "deploy", "priority": 100, "argsinfo": null, "interface": "deploy"}, "raid_config": {}, "target_raid_config": {}, "instance_info": {"image_checksum": "96fe772c5df8d0422c3dac67d58749ae", "image_disk_format": "qcow2", "image_source": "http://192.168.33.5:8080/deployment_image.qcow2", "configdrive": "******", "image_url": "http://192.168.33.5:8080/deployment_image.qcow2", "image_type": "whole-disk-image"}, "properties": {"cpu_arch": "x86_64", "cpus": "4", "memory_mb": "8192", "local_gb": 22, "capabilities": "cpu_aes:true,cpu_hugepages:true,cpu_hugepages_1g:true,boot_option:local", "root_device": {}, "vendor": "unknown"}, "reservation": "seed", "conductor_affinity": 1, "conductor_group": "", "power_state": "power on", "target_power_state": null, "provision_state": "deploying", "provision_updated_at": "2021-05-31T17:02:11.000000", "target_provision_state": "active", "maintenance": false, "maintenance_reason": null, "fault": null, "console_enabled": false, "last_error": null, "resource_class": "test-rc", "inspection_finished_at": null, "inspection_started_at": "2021-05-31T16:53:10.000000", "extra": {"pxe_interface_mac": "52:54:00:ff:90:2c", "system_vendor": {"manufacturer": "Red Hat", "product_name": "KVM"}}, "automated_clean": null, "protected": false, "protected_reason": null, "allocation_id": null, "bios_interface": "no-bios", "boot_interface": "ipxe", "console_interface": "no-console", "deploy_interface": "direct", "inspect_interface": "inspector", "management_interface": "ipmitool", "network_interface": "noop", "power_interface": "ipmitool", "raid_interface": "no-raid", "rescue_interface": "no-rescue", "storage_interface": "noop", "vendor_interface": "ipmitool", "traits": {"objects": []}, "owner": null, "lessee": null, "description": null, "retired": false, "retired_reason": null, "network_data": {}, "created_at": "2021-05-31T16:50:22.000000", "updated_at": "2021-05-31T17:02:12.541477"}, "ports": [{"id": 4, "uuid": "1bea11a9-4857-4d9d-b19f-61cc4c90e15a", "node_id": 4, "address": "52:54:00:ff:90:2c", "extra": {}, "local_link_connection": {"switch_id": "7a:15:b0:04:74:db", "switch_info": "brtenks0", "port_id": "p-contr0-0-br"}, "portgroup_id": null, "pxe_enabled": true, "internal_info": {}, "physical_network": "physnet1", "is_smartnic": false, "created_at": "2021-05-31T16:50:23.000000", "updated_at": "2021-05-31T16:50:34.000000"}]}, "command_status": "SUCCEEDED", "command_error": null, "command_result": {"deploy_steps": {"GenericHardwareManager": [{"step": "erase_devices_metadata", "priority": 0, "interface": "deploy", "reboot_requested": false}, {"step": "apply_configuration", "priority": 0, "interface": "raid", "reboot_requested": false, "argsinfo": {"raid_config": {"description": "The RAID configuration to apply.", "required": true}, "delete_existing": {"description": "Setting this to 'True' indicates to delete existing RAID configuration prior to creating the new configuration. Default value is 'True'.", "required": false}}}, {"step": "write_image", "priority": 0, "interface": "deploy", "reboot_requested": false}]}, "hardware_manager_version": {"generic_hardware_manager": "1.1"}}}, {"id": "aaadf52b-1550-43ac-b781-6934f00a7b2b", "command_name": "execute_deploy_step", "command_params": {"step": {"interface": "deploy", "step": "write_image", "args": {"image_info": {"id": "deployment_image.qcow2", "urls": ["http://192.168.33.5:8080/deployment_image.qcow2"], "disk_format": "qcow2", "container_format": null, "stream_raw_images": true, "checksum": "96fe772c5df8d0422c3dac67d58749ae", "node_uuid": "52b05c6c-10e6-4255-ae6b-63e74ca944c7"}, "configdrive": "http://192.168.33.5:8080/configdrive-52b05c6c-10e6-4255-ae6b-63e74ca944c7.iso.gz"}}, "node": {"id": 4, "uuid": "52b05c6c-10e6-4255-ae6b-63e74ca944c7", "name": "controller0", "chassis_id": null, "instance_uuid": null, "driver": "ipmi", "driver_info": {"ipmi_address": "192.168.33.4", "ipmi_port": "6230", "ipmi_username": "username", "ipmi_password": "******", "deploy_kernel": "http://192.168.33.5:8080/ipa.kernel", "deploy_ramdisk": "http://192.168.33.5:8080/ipa.initramfs"}, "driver_internal_info": {"deploy_boot_mode": "bios", "last_power_state_change": "2021-05-31T16:53:26.807751", "agent_secret_token": "******", "agent_url": "https://192.168.33.169:9999", "agent_version": "6.4.4.dev17", "agent_last_heartbeat": "2021-05-31T17:00:02.167829", "agent_verify_ca": "/var/lib/ironic/certificates/52b05c6c-10e6-4255-ae6b-63e74ca944c7.crt", "is_whole_disk_image": true, "deploy_steps": [{"step": "deploy", "priority": 100, "argsinfo": null, "interface": "deploy"}, {"step": "write_image", "priority": 80, "argsinfo": null, "interface": "deploy"}, {"step": "prepare_instance_boot", "priority": 60, "argsinfo": null, "interface": "deploy"}, {"step": "tear_down_agent", "priority": 40, "argsinfo": null, "interface": "deploy"}, {"step": "switch_to_tenant_network", "priority": 30, "argsinfo": null, "interface": "deploy"}, {"step": "boot_instance", "priority": 20, "argsinfo": null, "interface": "deploy"}], "deploy_step_index": 1, "hardware_manager_version": {"generic_hardware_manager": "1.1"}, "agent_cached_deploy_steps": {"deploy": [{"step": "erase_devices_metadata", "priority": 0, "interface": "deploy", "reboot_requested": false}, {"step": "write_image", "priority": 0, "interface": "deploy", "reboot_requested": false}], "raid": [{"step": "apply_configuration", "priority": 0, "interface": "raid", "reboot_requested": false, "argsinfo": {"raid_config": {"description": "The RAID configuration to apply.", "required": true}, "delete_existing": {"description": "Setting this to 'True' indicates to delete existing RAID configuration prior to creating the new configuration. Default value is 'True'.", "required": false}}}]}, "agent_cached_deploy_steps_refreshed": "2021-05-31 17:02:17.355828"}, "clean_step": {}, "deploy_step": {"step": "write_image", "priority": 80, "argsinfo": null, "interface": "deploy"}, "raid_config": {}, "target_raid_config": {}, "instance_info": {"image_checksum": "96fe772c5df8d0422c3dac67d58749ae", "image_disk_format": "qcow2", "image_source": "http://192.168.33.5:8080/deployment_image.qcow2", "configdrive": "******", "image_url": "http://192.168.33.5:8080/deployment_image.qcow2", "image_type": "whole-disk-image"}, "properties": {"cpu_arch": "x86_64", "cpus": "4", "memory_mb": "8192", "local_gb": 22, "capabilities": "cpu_aes:true,cpu_hugepages:true,cpu_hugepages_1g:true,boot_option:local", "root_device": {}, "vendor": "unknown"}, "reservation": "seed", "conductor_affinity": 1, "conductor_group": "", "power_state": "power on", "target_power_state": null, "provision_state": "deploying", "provision_updated_at": "2021-05-31T17:02:11.000000", "target_provision_state": "active", "maintenance": false, "maintenance_reason": null, "fault": null, "console_enabled": false, "last_error": null, "resource_class": "test-rc", "inspection_finished_at": null, "inspection_started_at": "2021-05-31T16:53:10.000000", "extra": {"pxe_interface_mac": "52:54:00:ff:90:2c", "system_vendor": {"manufacturer": "Red Hat", "product_name": "KVM"}}, "automated_clean": null, "protected": false, "protected_reason": null, "allocation_id": null, "bios_interface": "no-bios", "boot_interface": "ipxe", "console_interface": "no-console", "deploy_interface": "direct", "inspect_interface": "inspector", "management_interface": "ipmitool", "network_interface": "noop", "power_interface": "ipmitool", "raid_interface": "no-raid", "rescue_interface": "no-rescue", "storage_interface": "noop", "vendor_interface": "ipmitool", "traits": {"objects": []}, "owner": null, "lessee": null, "description": null, "retired": false, "retired_reason": null, "network_data": {}, "created_at": "2021-05-31T16:50:22.000000", "updated_at": "2021-05-31T17:02:17.520232"}, "ports": [{"id": 4, "uuid": "1bea11a9-4857-4d9d-b19f-61cc4c90e15a", "node_id": 4, "address": "52:54:00:ff:90:2c", "extra": {}, "local_link_connection": {"switch_id": "7a:15:b0:04:74:db", "switch_info": "brtenks0", "port_id": "p-contr0-0-br"}, "portgroup_id": null, "pxe_enabled": true, "internal_info": {}, "physical_network": "physnet1", "is_smartnic": false, "created_at": "2021-05-31T16:50:23.000000", "updated_at": "2021-05-31T16:50:34.000000"}], "deploy_version": {"generic_hardware_manager": "1.1"}}, "command_status": "RUNNING", "command_error": null, "command_result": null}]}

@jingvar
Copy link
Author

jingvar commented May 31, 2021

Probably after one hour

(bifrost-deploy)[root@seed bifrost-9.0.2.dev21]# baremetal node list +--------------------------------------+-------------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+-------------+---------------+-------------+--------------------+-------------+ | 6ced105b-8119-4910-907b-126faca79bf1 | compute0 | None | power on | active | False | | 52b05c6c-10e6-4255-ae6b-63e74ca944c7 | controller0 | None | power on | active | False | +--------------------------------------+-------------+---------------+-------------+--------------------+-------------+
`ssh [email protected]
Activate the web console with: systemctl enable --now cockpit.socket

[centos@compute0 ~]$`

@jingvar
Copy link
Author

jingvar commented May 31, 2021

My env
CPU
64 cores, 2.4 GHz
Intel(R) Xeon(R) CPU E5-4640

Memory
512 GiB

Storage
rotary 7200rpm sata
ST2000DM008-2FR1

@jingvar
Copy link
Author

jingvar commented Jun 1, 2021

Moved QCOWs to RAM (tmpfs) - same result

@markgoddard
Copy link
Member

markgoddard commented Jun 2, 2021

The provisioning timeout (in seconds) may be set via wait_active_timeout, in any of the .yml files in config/src/kayobe-config/etc/kayobe

@markgoddard
Copy link
Member

Could you try increasing it and let us know if it works. We could update the default if so.

@jingvar
Copy link
Author

jingvar commented Jun 3, 2021

There is another issue - Ironic seems broken.

@markgoddard
Copy link
Member

How is it broken? It looks like it successfully provisioned your nodes (eventually).

@jingvar
Copy link
Author

jingvar commented Jun 5, 2021

I redeployed env.
controller0 and compute0 nodes stuck in

^Mboot.ipxe : 404 bytes [script]
^Mpxelinux.cfg/52-54-00-d4-90-8f... ok
^Mhttp://192.168.33.5:8080//b7ac5f3e-4016-47e2-a39b-49fc6d66e4cc/deploy_kernel... ok
^Mhttp://192.168.33.5:8080//b7ac5f3e-4016-47e2-a39b-49fc6d66e4cc/deploy_ramdisk... ok
ESC[2J

(bifrost-deploy)[root@seed bifrost-9.0.2.dev21]# baremetal node list
+--------------------------------------+-------------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name        | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+-------------+---------------+-------------+--------------------+-------------+
| 85093588-1e83-4dbc-b777-03502569b12c | compute0    | None          | power on    | wait call-back     | False       |
| b7ac5f3e-4016-47e2-a39b-49fc6d66e4cc | controller0 | None          | power on    | wait call-back     | False       |
+--------------------------------------+-------------+---------------+-------------+--------------------+-------------+

@jingvar
Copy link
Author

jingvar commented Jun 5, 2021

kayobe overcloud provision got failed
virsh destroy controller0
virsh start controller0

^Mhttp://192.168.33.5:8080/ipa.initramfs... ok
ESC[2JLinux version 5.10.3-tinycore64 (root@box) (gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.35.1) #2021 SMP Mon Dec 28 16:17:51 UTC 2020
Command line: ipa-inspection-callback-url=http://192.168.33.5:5050/v1/continue ipa-api-url=http://192.168.33.5:6385 systemd.journald.forward_to_console=yes BOOTIF=52:54:00:d4:90:8f nofb nomodeset vga=normal console=ttyS0 ipa-collect-lldp=1 ipa-inspection-collectors=default,logs,pci-devices ipa-inspection-benchmarks= ipa-insecure=1 initrd=ipa.initramfs
x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
BIOS-provided physical RAM map:
...

IPA have normal start

@jingvar
Copy link
Author

jingvar commented Jun 7, 2021

I removed /httpboot/pxelinux.cfg/52-54-00-07-57-0f 52-54-00-c9-fe-16
and VMs were successfully inspected and provisioned

less 52-54-00-07-57-0f

#!ipxe

set attempts:int32 10
set i:int32 0

goto deploy

:deploy
imgfree
kernel http://192.168.33.5:8080//7c5bfb54-8283-4739-b173-5d9f7b4889bc/deploy_kernel selinux=0 troubleshoot=0 text systemd.journald.forward_to_console=yes ipa-insecure=1 ipa-insecure=1 ipa-collect-lldp=1 ipa-inspection-collectors=default,logs,pci-devices ipa-inspection-benchmarks= ipa-inspection-callback-url=http://192.168.33.5:5050/v1/continue ipa-api-url=http://192.168.33.5:6385 ipa-global-request-id=req-5062d291-4673-40be-9c3f-0925cf13c56a BOOTIF=${mac} initrd=deploy_ramdisk || goto retry

initrd http://192.168.33.5:8080//7c5bfb54-8283-4739-b173-5d9f7b4889bc/deploy_ramdisk || goto retry
boot

:retry
iseq ${i} ${attempts} && goto fail ||
inc i
echo No response, retrying in {i} seconds.
sleep ${i}
goto deploy

:fail
echo Failed to get a response after ${attempts} attempts
echo Powering off in 30 seconds.
sleep 30
poweroff

:boot_partition
imgfree
kernel no_kernel root={{ ROOT }} ro text systemd.journald.forward_to_console=yes ipa-insecure=1 ipa-insecure=1 ipa-collect-lldp=1 ipa-inspection-collectors=default,logs,pci-devices ipa-inspection-benchmarks= ipa-inspection-callback-url=http://192.168.33.5:5050/v1/continue ipa-api-url=http://192.168.33.5:6385 ipa-global-request-id=req-5062d291-4673-40be-9c3f-0925cf13c56a initrd=ramdisk || goto boot_partition
initrd no_ramdisk || goto boot_partition
boot

:boot_ramdisk
imgfree
kernel no_kernel root=/dev/ram0 text systemd.journald.forward_to_console=yes ipa-insecure=1 ipa-insecure=1 ipa-collect-lldp=1 ipa-inspection-collectors=default,logs,pci-devices ipa-inspection-benchmarks= ipa-inspection-callback-url=http://192.168.33.5:5050/v1/continue ipa-api-url=http://192.168.33.5:6385 ipa-global-request-id=req-5062d291-4673-40be-9c3f-0925cf13c56a  initrd=ramdisk || goto boot_ramdisk
initrd no_ramdisk || goto boot_ramdisk
boot

:boot_whole_disk
sanboot --no-describe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants