Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This is not a real PR, it's a collection of TODOs #46

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

flavio
Copy link
Member

@flavio flavio commented Mar 15, 2024

I'm leaving the PR as draft because this is not intended to be merged.

These are the questions I got while reviewing the code base. The idea is to them over GH and then create dedicated issues to track them (if needed).

There are also some small things that, if agreed, I can submit with proper PRs

These are the questions I got while reviewing the code base. We can
discuss them over GH and then create dedicated issues to track them (if
needed).

Signed-off-by: Flavio Castelli <[email protected]>
// creationg of one of the CronJob fails. We create on CronJob per node,
// hence failing on node 5/10 will cause this reconciliation loop to
// exit immediately. However, the shim finalizer has not been set yet,
// that happens at end of this `if` block.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to change the handleInstallShim to return an array of errors, so that all the nodes get the chance to get the shim installed on them.

Then OFC we need to ensure the finalizer is always set, even if there was an error

Copy link

@tillknuesting tillknuesting Mar 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that calling deployJobOnNode should happen concurrently, and potential errors should be returned aggregated to help implement potential strategies, e.g., retries later on if needed.

Also, it would indeed make sense to call ensureFinalizerForShim even if an error happens:


    // Ensure the finalizer is called even if a return happens before
    defer func() {
        err := sr.ensureFinalizerForShim(ctx, &shimResource, KwasmOperatorFinalizer)
        if err != nil {
            log.Error().Msgf("Failed to ensure finalizer: %s", err)
        }
    }()

internal/controller/shim_controller.go Show resolved Hide resolved
internal/controller/shim_controller.go Show resolved Hide resolved
@@ -474,14 +493,15 @@ func (sr *ShimReconciler) handleDeleteShim(ctx context.Context, shim *kwasmv1.Sh
}

func (sr *ShimReconciler) getNodeListFromShimsNodeSelctor(ctx context.Context, shim *kwasmv1.Shim) (*corev1.NodeList, error) {
//TODO (flavio): probably eager optimization - do some pagination?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know of clusters with 100s of nodes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with 5k nodes, it's likely less than 10MB; hence, it's probably not worth it.

internal/controller/shim_controller.go Show resolved Hide resolved
internal/controller/shim_controller.go Show resolved Hide resolved
@phyrog phyrog added area/manager kind/question Further information is requested labels Apr 1, 2024
@flavio flavio mentioned this pull request Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/manager kind/question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants