-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle lost CBT case #258
base: master
Are you sure you want to change the base?
handle lost CBT case #258
Conversation
It is legal that CBT inside QEMU could be lost f.e. once the power on the node would be lost or QEMU will be crashed. Dirty bitmaps are stored inside QCOW2 as auto-clear feature and are removed on every start when the image is unclean. This is done as CBTs are kept in the memory and are written only during normal QEMU shutdown process. This situation should be handled by the backup software. Backup chains should not be broken and the only normal way to handle this would be to make full backup and store it inside incremental chain. Letting resolving this by the end-user would be not friendly to the end-user. The condition is rare and we should just create full backup in this case. Handle this respectively in the code. Note: this requires change in the extent handling. As CBT is not available, BLOCK_STATUS requests offset should be moved over libvirt.CONTEXT_BASE_ALLOCATION namespace processing. Signed-off-by: Denis V. Lunev <[email protected]>
hi, might make sense to extend tests for this situation, how can i produce it? Thanks :) |
simply kill -9 QEMU over the image which has CBT (i.e. full backup has been made some time ago) and start it again. |
gotcha, current version at least bails out if inconsistency is noticed:
|
It needs some more considerations. While im a fan of this idea, the situation at current is:
From there the user must cleanup his checkpoints (remove them with the --metadata option using virsh)
The fallback to the full backup happens after all the checkpoint handling, and messes up things a little.
|
I will take a look at evening, but seems reasonable |
i wonder if there is an libvirt API endpoint that could be used to detect the libvirt.VIR_ERR_CHECKPOINT_INCONSISTENT error berfore actually starting the backup operation. That way a function could be introduced that checks for the consistency before any backup operatoin starts and its easier to do fallback to incremental with base allocation data. |
QEMU API do exists, will try to check with libvirt. Anyway, good point. FYI: seems this code is doing the trick.
|
yes, redefining the checkpoint with the ibvirt.VIR_DOMAIN_CHECKPOINT_CREATE_REDEFINE_VALIDATE option is something that will shield this error. There is already code for redefining the checkpoints (for the transient vm feature, where vm's are migrated between hosts) in: https://github.com/abbbi/virtnbdbackup/blob/master/libvirtnbdbackup/virt/checkpoint.py#L187 Function could be adjusted or code reused that during incremental/differential backup it checks the consistency for all existing checkpoints and does fallback accordingly. From what it looks like, there might also be versions of libvirt that support getting the checkpoint xml with the --size option. If you request checkpoint-dumpxml --size and the bitmap is inconsistent, no size will be returned. |
It is legal that CBT inside QEMU could be lost f.e. once the power on the node would be lost or QEMU will be crashed. Dirty bitmaps are stored inside QCOW2 as auto-clear feature and are removed on every start when the image is unclean. This is done as CBTs are kept in the memory and are written only during normal QEMU shutdown process.
This situation should be handled by the backup software. Backup chains should not be broken and the only normal way to handle this would be to make full backup and store it inside incremental chain. Letting resolving this by the end-user would be not friendly to the end-user.
The condition is rare and we should just create full backup in this case. Handle this respectively in the code.
Note: this requires change in the extent handling. As CBT is not available, BLOCK_STATUS requests offset should be moved over libvirt.CONTEXT_BASE_ALLOCATION namespace processing.