Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[epic] Ensure no two ClusterExtensions manage the same underlying object when concurrent reconciles > 1 #1101

Open
everettraven opened this issue Aug 7, 2024 · 3 comments
Assignees
Labels
epic v1.x Issues related to OLMv1 features that come after 1.0

Comments

@everettraven
Copy link
Contributor

As mentioned in #736 , Helm has support for ensuring the same resources are not managed by multiple Helm Releases. This is sufficient when there is no concurrent reconciliation possible, but we will need to come up with an alternative solution that prevents race conditions when concurrent reconciliation is allowed.

@everettraven everettraven added epic v1.x Issues related to OLMv1 features that come after 1.0 labels Aug 7, 2024
@bentito
Copy link
Contributor

bentito commented Aug 8, 2024

Can you give a concrete example of "when concurrent reconciliation is allowed" including why it would be? It seems like we'd always want Helm's built-in support to ensure the same resources are not managed by multiple Helm Releases. If the possible concurrent manager of a resource is some operator then maybe we need to surface Helm's locks as o-c's own and document, as best practice, for operator authors to respect the locks?

@joelanford
Copy link
Member

joelanford commented Aug 8, 2024

It is simple to implement admission policy that can catch this situation generally during kubernetes admission, rather than relying on a client to do it (which is what happens now).

Helm's built-in support is problematic for three reasons:

  1. It relies on helm to keep doing it and doing it in the same way.
  2. It suffers from race conditions because it is implemented in a client and not during Kubernetes admission
  3. It is not a general solution that we could apply in the potential future where another lifecycling mechanism is supported by OLM.

We may need to increase the concurrency of our reconciler for a variety of reasons. Today reconcile blocks to populate/update the catalog cache and to pull bundle images. In the future, we may need to support helm charts that have hooks that block progression of install/upgrade/uninstall execution, which happens synchronously in the reconciler.

In order to scale to clusters with frequent ClusterExtension interactions, we will very likely need to handle ClusterExtension reconciles concurrently. As soon as we do that, Helm's guarantees disappear because we will be calling it concurrently.

@perdasilva perdasilva self-assigned this Aug 14, 2024
@perdasilva
Copy link
Contributor

I'll take over this and introduce the VAP. I'll see if I can find a way to test the race condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic v1.x Issues related to OLMv1 features that come after 1.0
Projects
Status: No status
Development

No branches or pull requests

4 participants