-
Couldn't load subscription status.
- Fork 6.5k
fix: cross-namespace resource traversal (#24379) #24728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: cross-namespace resource traversal (#24379) #24728
Conversation
❌ Preview Environment deleted from BunnyshellAvailable commands (reply to this comment):
|
|
Forgive the AI-ness of the summary (it is very excited about this), but I suppose comparing text documents is in its wheelhouse. We've now written some extra benchmark tests for this use case. They could benefit from more executions on a stable, dedicated environment instead of a workstation to reduce variance, but profiling suggests acceptable behavior for small namespaces. It would probably benefit from an annotation on the Application to enable or disable scanning of the destination namespace. Cross-Namespace Hierarchy Traversal Performance AnalysisExecutive SummaryThis document summarizes the performance characteristics of the cross-namespace hierarchy traversal feature in ArgoCD's Implementation OverviewThe feature adds support for Kubernetes' native capability of cluster-scoped resources owning namespaced resources Key Design Decisions
Performance ResultsBaseline Performance (No Cross-Namespace References)When starting from namespaced resources with the orphaned namespace parameter empty (
Conclusion: No performance regression for the common case, actually slightly improved. Cross-Namespace Traversal (Starting from Namespaced Resources)Testing with varying percentages of cross-namespace relationships:
Key Finding: The overhead of cross-namespace traversal is minimal when starting from namespaced resources. Namespace Scanning PerformanceWhen starting from cluster-scoped resources with namespace scanning enabled: Scaling with Namespace Size (10% cross-namespace refs)
Scaling Characteristic: O(n) where n = namespace size, ~35 ns per resource Impact of Cross-Namespace Percentage (5,000 resources)
Key Insight: Namespace scanning performance scales with the percentage of cross-namespace references found during the scan. Overkill Example: 25% Cross-Namespace References
Performance HotspotsBaseline Case (No Namespace Scanning)CPU profiling reveals the main costs when traversing without namespace scanning:
Namespace Scanning CaseWhen namespace scanning is enabled (
Code Locations of Most Expensive Operations// Most expensive operations in buildGraphWithCrossNamespace:
// 21% of time - Building UID index for namespace resources
for _, node := range nsNodes { // line 1309
nodesByUID[node.Ref.UID] = append(nodesByUID[node.Ref.UID], node) // line 1310
}
// 16% of time - Allocating maps in graph structure
if _, ok := graph[uidNodeKey]; !ok { // line 1341
graph[uidNodeKey] = make(map[types.UID]*Resource) // line 1342
}
// 14% of time - Inserting resources into graph
graph[uidNodeKey][childNode.Ref.UID] = childNode // line 1346
// 8% of time - Computing resource keys
uidNodeKey := uidNode.ResourceKey() // line 1340
// Resource scanning to find candidates:
for _, resource := range nsNodes { // line 1170
for _, ownerRef := range resource.OwnerRefs { // line 1171
if ownerRef.UID != "" && clusterKeyUIDs[ownerRef.UID] { // line 1175
namespaceCandidates = append(namespaceCandidates, resource) // line 1186
}
}
}Optimizations Applied
Production Readiness
Recommendations
Testing MethodologyBenchmarks were conducted using:
Test environment:
ConclusionThe cross-namespace hierarchy traversal feature successfully adds new functionality with:
The performance is notable, but for some workloads would be worth the added functionality of properly tracking |
| refs := tc.cluster.resources[kube.GetResourceKey(pvc)].OwnerRefs | ||
| resource := tc.cluster.resources[kube.GetResourceKey(pvc)] | ||
| if resource == nil { | ||
| return false // Resource not ready yet, keep retrying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest version of testify.Eventually does an immediate check instead of waiting for a tick, but that can cause nil refs, so we check.
|
I don't think the current benchmarks really involve the new use case (resources owned by cluster-scoped resources). Can those benchmarks be updated so that the new code will be more likely to be hit? |
Yeah, and I think there's a further optimization here in this impl. Stay tuned. |
| assert.Equal(t, 1, visitCount["child"], "child should be visited once") | ||
| } | ||
|
|
||
| func TestIterateHierarchyV2_NoDuplicatesCrossNamespace(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a problem in this impl where nodes were visited multiple times, once from themselves as the root and once from the parent, because the visited map is per-namespace. So we had to promote the visited map to be shared across namespaces.
We should watch this to make sure it isn't too unwieldy.
| return resources | ||
| } | ||
|
|
||
| func BenchmarkIterateHierarchyV2CrossNamespace(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test obviously won't meaningfully compare against master, but it's useful to compare between our proposed implementations of cluster-parent-namespaced-child tracking.
gitops-engine/pkg/cache/cluster.go
Outdated
| } | ||
|
|
||
| // processNamespaceHierarchy handles the hierarchy traversal for a single namespace | ||
| func (c *clusterCache) processNamespaceHierarchy(namespaceKeys []kube.ResourceKey, nsNodes map[kube.ResourceKey]*Resource, graph map[kube.ResourceKey]map[types.UID]*Resource, visited map[kube.ResourceKey]int, action func(resource *Resource, namespaceResources map[kube.ResourceKey]*Resource) bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just extracted this into a helper.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #24728 +/- ##
==========================================
+ Coverage 60.80% 60.82% +0.02%
==========================================
Files 404 404
Lines 66217 66334 +117
==========================================
+ Hits 40261 40348 +87
- Misses 22712 22747 +35
+ Partials 3244 3239 -5 ☔ View full report in Codecov by Sentry. |
115ec5b to
664d4cc
Compare
|
@crenshaw-dev I updated the above performance benchmarking notes. After some experimentation I discovered a flaw here that is I think unavoidable. We consciously avoid ever scanning the whole cluster to track down unresolved refs. My first pass required that all resources that we want to traverse are present in So I am considering a couple of approaches:
Maybe it's a default-disabled, per-Application annotation? Meanwhile, I went to great lengths not to touch the original path in the name of performance. That naturally results in some duplication. I could dig further into specific ways we could consolidate the old/new paths and any possible performance penalties we might incur for doing so. |
Could you expand on this a bit? The way I'm reading it, "between objects in the manifest" means that, for example, ReplicaSets would not be identified as children of Deployments or Pods would not be identified as children of ReplicaSets (because the ReplicaSets and Pods are not in manifests). But my experience is that those relationships are discovered with the current implementation. |
664d4cc to
26ee5dc
Compare
| // Update the graph for this owner to include the child. | ||
| if _, ok := graph[uidNode.ResourceKey()]; !ok { | ||
| graph[uidNode.ResourceKey()] = make(map[types.UID]*Resource) | ||
| if _, ok := graph[uidNodeKey]; !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perf optimization; the lookup is expensive in the hot path and we do it multiple times.
| // Check by UID if available | ||
| if ownerRef.UID != "" && clusterKeyUIDs[ownerRef.UID] { | ||
| isChildOfOurKeys = true | ||
| } else if ownerRef.UID == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb defensive coding that we can probably remove since this doesn't seem like it would ever be valid in the api spec?
Those are all namespace-scoped. If you had a cluster-scoped object with a namespaced child, it wouldn't be. In any case, I ended up pushing an implementation to scan the I think probably this should be enabled on a per-app basis since it is fairly expensive. Benchmark results updated. |
3cd946e to
fe1b057
Compare
… of namespaced children Signed-off-by: Jonathan Ogilvie <[email protected]>
Signed-off-by: Jonathan Ogilvie <[email protected]>
Signed-off-by: Jonathan Ogilvie <[email protected]>
Signed-off-by: Jonathan Ogilvie <[email protected]>
Signed-off-by: Jonathan Ogilvie <[email protected]>
Signed-off-by: Jonathan Ogilvie <[email protected]>
Signed-off-by: Jonathan Ogilvie <[email protected]>
…ionships based on input keys; defer to namespace scan Signed-off-by: Jonathan Ogilvie <[email protected]>
fe1b057 to
14e2269
Compare
|
Superseded by a better impl here: #24847 |
Fixes #24379.
This PR adds resource-tree (including UI) support for cluster-scoped parents owning namespaced children. This is a valid kube owner relationship which is sometimes manifested in projects like Crossplane, but in the current version we don't actually capture the children in the resource tree.
Checklist: