Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: set certificate expiry day for kcpt #363

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

okozachenko1203
Copy link
Member

No description provided.

Comment on lines 49 to 55
kcpt.delete()
del kcpt.obj["metadata"]["uid"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okozachenko1203 I'm concerned about this since we're deleting a resource and creating it again and if we somehow fail to recreate after the delete, the cluster can probably go into a pretty odd state.

Is there a way to avoid the delete by any chance or figure out a way to make this "reliable"? Also, I think maybe we need to introduce locking to this as well in case multiple delete at the same time, then a create conflict happens at the same time, and then all data is lost.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to care about conflicts for both deletion and creation.
object.delete() will ignore ObjectDoesNotExist error and we do create via server-side apply.
https://codeberg.org/hjacobs/pykube-ng/src/commit/c1ac2c97587249e418d04d91f1297d4ffe3631c3/pykube/objects.py#L167-L180

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i think we need lock because deletion and creation is done inside one function in a row to avoid the deletion of newly created kcpt by another replica.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magnum_cluster_api/hacks.py Outdated Show resolved Hide resolved
@okozachenko1203
Copy link
Member Author

recheck

Comment on lines 45 to 67
# NOTE(mnaser): Since the KubeadmControlPlaneTemplate is immutable, we need to
# delete the object and re-create it.
kcpt.delete()

del kcpt.obj["metadata"]["uid"]
kcpt.obj["spec"]["template"]["spec"].setdefault("rolloutBefore", {})
kcpt.obj["spec"]["template"]["spec"]["rolloutBefore"][
"certificatesExpiryDays"
] = 21

# Use tenacity to wait for kcpt to be created
for attempt in Retrying(
retry=retry_if_result(lambda result: result is None),
stop=stop_after_delay(10),
wait=wait_fixed(1),
):
with attempt:
utils.kube_apply_patch(kcpt)
new_kcpt = objects.KubeadmControlPlaneTemplate.objects(
api, namespace="magnum-system"
).get(name=kcpt.obj["metadata"]["name"])
if not attempt.retry_state.outcome.failed:
attempt.retry_state.set_result(new_kcpt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to try and use some sort of exception handling to recreate the KubeadmControlPlaneTemplate if it fails to go through the normal process, because we have no way of "recovering" it successfully if it deletes but doesn't get recreated cleanly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants