Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a cleanup loop for failed vm using configmap #59

Merged
merged 4 commits into from
Aug 29, 2019

Conversation

SchSeba
Copy link
Collaborator

@SchSeba SchSeba commented Aug 27, 2019

Before this PR we have an issue if the vm creation fails because there
is an validation error from the virt-api we didn't clean the allocation.

This PR introduce a cleanup loop into the system.
When we hit the create vm mutating webhook we create the regular
allocation but we also mark the mac address in a waiting configmap.

We have a waiting look the will check if the object is removed from the
map and if is not after 30 second we assume the object wasn't saved
into the etcd (no controller event) so we release the object.

If we get an event from the controller then we remove the mac address
from the configmap.

This PR are also add/remove some empty lines in the code, also fix the import positions and fix the output when a functional test fails

@SchSeba SchSeba force-pushed the waiting_loop branch 3 times, most recently from 5b390fc to 555458c Compare August 27, 2019 13:42
@SchSeba SchSeba requested a review from phoracek August 27, 2019 13:44
@SchSeba
Copy link
Collaborator Author

SchSeba commented Aug 27, 2019

@alonSadan please take a look

@SchSeba SchSeba force-pushed the waiting_loop branch 2 times, most recently from cf696c9 to 123427c Compare August 28, 2019 11:44
@@ -49,18 +50,20 @@ type KubeMacPoolManager struct {
stopSignalChannel chan os.Signal
podNamespace string
podName string
waitingTime int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring explaining why we have this attribute

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -31,6 +31,7 @@ const (
RuntimeObjectFinalizerName = "k8s.v1.cni.cncf.io/kubeMacPool"
networksAnnotation = "k8s.v1.cni.cncf.io/networks"
networksStatusAnnotation = "k8s.v1.cni.cncf.io/networks-status"
vmWaitConfigMapName = "kubemacpool-vm-configmap"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this CM good for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CM saved the allocated mac addresses in waiting state.

We need to use it in case of a race condition that our manager go down and It have that types of allocations so the new leader can continue from the same state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds dangerous synchronisation-wise, but that's not different from the rest of kubemacpool... :D

@@ -0,0 +1,15 @@
package names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you introduce this change in a different commit please?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Expect(err).ToNot(HaveOccurred())
_, err = NewPoolManager(fakeClient, startPoolRangeEnv, endPoolRangeEnv, false)
Expect(err).ToNot(HaveOccurred())
createPoolManager("02:00:00:00:00:00", "02:FF:FF:FF:FF:FF")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please do this change in a separate commit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -333,6 +355,121 @@ func (p *PoolManager) revertAllocationOnVm(vmName string, allocations map[string
p.releaseMacAddressesFromInterfaceMap(allocations)
}

func (p *PoolManager) getOrCreateVmMacWaitMap() (map[string]string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it would make sense to keep these helpers in a separate module. this logic is not trivial and it would be easier to document there

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will prefer to document it here as we are going to have the same functions for pods in the future.

I will add comments onto of every function

@SchSeba SchSeba force-pushed the waiting_loop branch 2 times, most recently from 55e0310 to 4a7fef0 Compare August 29, 2019 09:37
Copy link
Member

@phoracek phoracek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel good about the state of kubemacpool. If I give it +1 it is from full faith in Sebastian. We have to summarize all moving pieces of the controller, write it down and try to come up with a better design. It became a pile of synchronisation mechanisms and ping-pong with Kubernetes API.


configMap, err := p.kubeClient.CoreV1().ConfigMaps(names.MANAGER_NAMESPACE).Get(vmWaitConfigMapName, metav1.GetOptions{})
if err != nil {
// TODO: should we exit here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to resolve this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this.

The next loop will try again

is an validation error from the virt-api we didn't clean the allocation.

This PR introduce a cleanup loop into the system.
When we hit the create vm mutating webhook we create the regular
allocation but we also mark the mac address in a waiting configmap.

We have a waiting look the will check if the object is removed from the
map and if is not after 30 second we assume the object wasn't saved
into the etcd (no controller event) so we release the object.

If we get an event from the controller then we remove the mac address
from the configmap.
@SchSeba
Copy link
Collaborator Author

SchSeba commented Aug 29, 2019

Open an issue to track the implementation #62

@alonSadan
Copy link
Contributor

looks good!

@SchSeba SchSeba merged commit 99bb374 into k8snetworkplumbingwg:master Aug 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants