[epic] Ensure no two ClusterExtensions manage the same underlying object when concurrent reconciles > 1 #1101

everettraven · 2024-08-07T19:58:09Z

As mentioned in #736 , Helm has support for ensuring the same resources are not managed by multiple Helm Releases. This is sufficient when there is no concurrent reconciliation possible, but we will need to come up with an alternative solution that prevents race conditions when concurrent reconciliation is allowed.

bentito · 2024-08-08T14:10:26Z

Can you give a concrete example of "when concurrent reconciliation is allowed" including why it would be? It seems like we'd always want Helm's built-in support to ensure the same resources are not managed by multiple Helm Releases. If the possible concurrent manager of a resource is some operator then maybe we need to surface Helm's locks as o-c's own and document, as best practice, for operator authors to respect the locks?

joelanford · 2024-08-08T21:25:20Z

It is simple to implement admission policy that can catch this situation generally during kubernetes admission, rather than relying on a client to do it (which is what happens now).

Helm's built-in support is problematic for three reasons:

It relies on helm to keep doing it and doing it in the same way.
It suffers from race conditions because it is implemented in a client and not during Kubernetes admission
It is not a general solution that we could apply in the potential future where another lifecycling mechanism is supported by OLM.

We may need to increase the concurrency of our reconciler for a variety of reasons. Today reconcile blocks to populate/update the catalog cache and to pull bundle images. In the future, we may need to support helm charts that have hooks that block progression of install/upgrade/uninstall execution, which happens synchronously in the reconciler.

In order to scale to clusters with frequent ClusterExtension interactions, we will very likely need to handle ClusterExtension reconciles concurrently. As soon as we do that, Helm's guarantees disappear because we will be calling it concurrently.

perdasilva · 2024-08-14T16:01:46Z

I'll take over this and introduce the VAP. I'll see if I can find a way to test the race condition.

everettraven added epic v1.x Issues related to OLMv1 features that come after 1.0 labels Aug 7, 2024

everettraven added this to OLM v1 Aug 7, 2024

perdasilva self-assigned this Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[epic] Ensure no two ClusterExtensions manage the same underlying object when concurrent reconciles > 1 #1101

[epic] Ensure no two ClusterExtensions manage the same underlying object when concurrent reconciles > 1 #1101

everettraven commented Aug 7, 2024

bentito commented Aug 8, 2024

joelanford commented Aug 8, 2024 •

edited

Loading

perdasilva commented Aug 14, 2024

[epic] Ensure no two ClusterExtensions manage the same underlying object when concurrent reconciles > 1 #1101

[epic] Ensure no two ClusterExtensions manage the same underlying object when concurrent reconciles > 1 #1101

Comments

everettraven commented Aug 7, 2024

bentito commented Aug 8, 2024

joelanford commented Aug 8, 2024 • edited Loading

perdasilva commented Aug 14, 2024

joelanford commented Aug 8, 2024 •

edited

Loading