This is not a real PR, it's a collection of TODOs #46

flavio · 2024-03-15T09:46:13Z

I'm leaving the PR as draft because this is not intended to be merged.

These are the questions I got while reviewing the code base. The idea is to them over GH and then create dedicated issues to track them (if needed).

There are also some small things that, if agreed, I can submit with proper PRs

These are the questions I got while reviewing the code base. We can discuss them over GH and then create dedicated issues to track them (if needed). Signed-off-by: Flavio Castelli <[email protected]>

flavio · 2024-03-15T09:47:50Z

internal/controller/shim_controller.go

+		// creationg of one of the CronJob fails. We create on CronJob per node,
+		// hence failing on node 5/10 will cause this reconciliation loop to
+		// exit immediately. However, the shim finalizer has not been set yet,
+		// that happens at end of this `if` block.


I would propose to change the handleInstallShim to return an array of errors, so that all the nodes get the chance to get the shim installed on them.

Then OFC we need to ensure the finalizer is always set, even if there was an error

I agree that calling deployJobOnNode should happen concurrently, and potential errors should be returned aggregated to help implement potential strategies, e.g., retries later on if needed.

Also, it would indeed make sense to call ensureFinalizerForShim even if an error happens:

// Ensure the finalizer is called even if a return happens before defer func() { err := sr.ensureFinalizerForShim(ctx, &shimResource, KwasmOperatorFinalizer) if err != nil { log.Error().Msgf("Failed to ensure finalizer: %s", err) } }()

internal/controller/shim_controller.go

flavio · 2024-03-15T09:49:06Z

internal/controller/shim_controller.go

@@ -474,14 +493,15 @@ func (sr *ShimReconciler) handleDeleteShim(ctx context.Context, shim *kwasmv1.Sh
 }

 func (sr *ShimReconciler) getNodeListFromShimsNodeSelctor(ctx context.Context, shim *kwasmv1.Shim) (*corev1.NodeList, error) {
+	//TODO (flavio): probably eager optimization - do some pagination?


I know of clusters with 100s of nodes

Even with 5k nodes, it's likely less than 10MB; hence, it's probably not worth it.

internal/controller/shim_controller.go

This is not a real PR, it's a collection of TODOs

5ba680a

These are the questions I got while reviewing the code base. We can discuss them over GH and then create dedicated issues to track them (if needed). Signed-off-by: Flavio Castelli <[email protected]>

flavio commented Mar 15, 2024

View reviewed changes

phyrog added area/manager kind/question Further information is requested labels Apr 1, 2024

flavio mentioned this pull request Apr 19, 2024

Various fixes #106

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This is not a real PR, it's a collection of TODOs #46

This is not a real PR, it's a collection of TODOs #46

flavio commented Mar 15, 2024 •

edited

Loading

flavio Mar 15, 2024

tillknuesting Mar 16, 2024 •

edited

Loading

flavio Mar 15, 2024

tillknuesting Mar 16, 2024

This is not a real PR, it's a collection of TODOs #46

Are you sure you want to change the base?

This is not a real PR, it's a collection of TODOs #46

Conversation

flavio commented Mar 15, 2024 • edited Loading

flavio Mar 15, 2024

Choose a reason for hiding this comment

tillknuesting Mar 16, 2024 • edited Loading

Choose a reason for hiding this comment

flavio Mar 15, 2024

Choose a reason for hiding this comment

tillknuesting Mar 16, 2024

Choose a reason for hiding this comment

flavio commented Mar 15, 2024 •

edited

Loading

tillknuesting Mar 16, 2024 •

edited

Loading