Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] About forcibly stopping a virtual machine with fence_kubevirt. #615

Open
HideoYamauchi opened this issue Feb 20, 2025 · 3 comments
Assignees

Comments

@HideoYamauchi
Copy link
Contributor

Hi All,

The fence_kubevirt STONITH off operation performs a shutdown operation on the STONITH target.

The shutdown operation does not stop immediately, so it takes long time until Fail Over.

The virtctl command, which is a management command, has a forced stop function, so it seems possible to stop a virtual machine immediately.

I am not familiar with the kubevirt API, but would it be possible to improve the operation of fence_kubevirt's STONITH so that it stops instantly, like an operation from virtctl?
(Alternatively, it would be useful to have an option to stop it immediately.)

Best Regards,
Hideo Yamauchi.

@oalbrigt oalbrigt self-assigned this Feb 24, 2025
@oalbrigt
Copy link
Collaborator

Can you test this patch (you might also need to update the openshift library for it to work)?
#616

@HideoYamauchi
Copy link
Contributor Author

Hi @oalbrigt

Thank you for your comment and for providing the patch.

I'll take a little time to check the patch.
I'll let you know the results later.

Many thanks,
Hideo Yamauchi.

@HideoYamauchi
Copy link
Contributor Author

Hi @oalbrigt

We confirmed the operation in an OCP 4.17 environment with the RHEL9.5 bundled version and the PR applied version.
After causing a kernel panic on the VM targeted for STONITH, we executed fence_kubevirt off from the command line on the other node.

[root@rh95-01 ~]# diff /usr/sbin/fence_kubevirt /usr/sbin/fence_kubevirt_test
112c112,113
<     return conn.request('put', path, header_params={'accept': '*/*'})
---
>     #return conn.request('put', path, header_params={'accept': '*/*'})
>     return conn.request('put', path, header_params={'accept': '*/*'}, grace_period_seconds=0 if action == 'stop' else None)
[root@rh95-01 ~]#
@RHEL9.5 bundled version 
[root@rh95-01 ~]# time fence_kubevirt  --namespace ocp-virt -o off -n ocp-rh95-02 --disable-timeout=1
Success: Powered OFF

real    3m12.882s
user    0m0.567s
sys     0m0.052s

@PR applied version
[root@rh95-01 ~]# time fence_kubevirt_test  --namespace ocp-virt -o off -n ocp-rh95-02 --disable-timeout=1
Success: Powered OFF

real    3m4.811s
user    0m0.553s
sys     0m0.046s

Apparently, this forced stop is not effective in an OCP 4.17 environment.

As of now, we are unable to check the released OCP 4.18 as we are unable to prepare the environment.

However, we may be able to check for OCP 4.18 in the near future, so please leave this issue and the fix PR open.

Best Regards,
Hideo Yamauchi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants