fix: cross-namespace resource traversal (#24379) #24728

jcogilvie · 2025-09-24T15:30:15Z

This PR adds resource-tree (including UI) support for cluster-scoped parents owning namespaced children. This is a valid kube owner relationship which is sometimes manifested in projects like Crossplane, but in the current version we don't actually capture the children in the resource tree.

Checklist:

bunnyshell · 2025-09-24T15:30:19Z

❌ Preview Environment deleted from Bunnyshell

Available commands (reply to this comment):

🚀 /bns:deploy to deploy the environment

jcogilvie · 2025-09-24T15:36:28Z

Forgive the AI-ness of the summary (it is very excited about this), but I suppose comparing text documents is in its wheelhouse. We've now written some extra benchmark tests for this use case. They could benefit from more executions on a stable, dedicated environment instead of a workstation to reduce variance, but profiling suggests acceptable behavior for small namespaces. It would probably benefit from an annotation on the Application to enable or disable scanning of the destination namespace.

Cross-Namespace Hierarchy Traversal Performance Analysis

Executive Summary

This document summarizes the performance characteristics of the cross-namespace hierarchy traversal feature in ArgoCD's
gitops-engine, which enables tracking owner references from cluster-scoped parents to namespaced children.

Implementation Overview

The feature adds support for Kubernetes' native capability of cluster-scoped resources owning namespaced resources
(e.g., an operator owning Pods within a namespace). This relationship type was previously unsupported by ArgoCD.

Key Design Decisions

Dual-path architecture:
- Fast path: When no cross-namespace references exist (common case)
- Slow path: When cross-namespace references need resolution
Emergency fallback: Environment variable GITOPS_ENGINE_DISABLE_CLUSTER_SCOPED_PARENT_REFS to disable the
feature if needed
Namespace scanning: Optional parameter to scan entire namespaces for orphaned resources

Performance Results

Baseline Performance (No Cross-Namespace References)

When starting from namespaced resources with the orphaned namespace parameter empty (""):

Scenario	Performance	Heap Allocations
Original (master)	45.02 ns/op	0 B/op, 0 allocs/op
With feature	44.25 ns/op	0 B/op, 0 allocs/op
Improvement	+1.7%	No change

Conclusion: No performance regression for the common case, actually slightly improved.

Cross-Namespace Traversal (Starting from Namespaced Resources)

Testing with varying percentages of cross-namespace relationships:

Cross-Namespace %	Performance	Notes
0%	44.25 ns/op	Baseline
1%	44.34 ns/op	Minimal overhead
5%	44.20 ns/op	Minimal overhead
10%	46.99 ns/op	~6% overhead
25%	44.23 ns/op	Minimal overhead

Key Finding: The overhead of cross-namespace traversal is minimal when starting from namespaced resources.

Namespace Scanning Performance

When starting from cluster-scoped resources with namespace scanning enabled:

Scaling with Namespace Size (10% cross-namespace refs)

Namespace Size	With Scanning	Without Scanning	Overhead
100 resources	13.7 µs	5.6 µs	2.4x
1,000 resources	42.5 µs	5.6 µs	7.6x
5,000 resources	172.7 µs	5.7 µs	30.3x
10,000 resources	346.2 µs	5.7 µs	60.7x
20,000 resources	711.3 µs	5.7 µs	124.8x

Scaling Characteristic: O(n) where n = namespace size, ~35 ns per resource

Impact of Cross-Namespace Percentage (5,000 resources)

Cross-Namespace %	Performance	Relative to 0%
0%	26.0 µs	1.0x (baseline)
5%	168.5 µs	6.5x
10%	172.7 µs	6.6x
25%	209.2 µs	8.0x
50%	267.7 µs	10.3x

Key Insight: Namespace scanning performance scales with the percentage of cross-namespace references found during the scan.

Overkill Example: 25% Cross-Namespace References

Scenario	Performance	Use Case
Cluster-scoped, no scan	297.9 µs	Finding only cluster-scoped children
Cluster-scoped, with scan	779.9 µs	Finding all children including namespaced
Overhead	2.6x	-

Performance Hotspots

Baseline Case (No Namespace Scanning)

CPU profiling reveals the main costs when traversing without namespace scanning:

Map lookups (50%): runtime.mapaccess2 - checking resource existence
Hash computation (21%): aeshashbody - computing ResourceKey hashes
Graph building (15%): Building parent-child relationship graphs

Namespace Scanning Case

When namespace scanning is enabled (orphanedResourceNamespace != ""), additional hotspots emerge:

Graph construction map operations (45% of scan time):
- Line 1310: nodesByUID[node.Ref.UID] = append(...) - Building UID index (21% of graph build time)
- Line 1342: graph[uidNodeKey] = make(map[types.UID]*Resource) - Allocating child maps (16%)
- Line 1346: graph[uidNodeKey][childNode.Ref.UID] = childNode - Inserting into graph (14%)
- Line 1340: uidNode.ResourceKey() - Computing resource keys (8%)
Map lookups (20% of scan time):
- Line 1332: uidNodes, ok := nodesByUID[ownerRef.UID] - Looking up namespace-local parents
- Line 1335: uidNodes, ok = clusterNodesByUID[ownerRef.UID] - Looking up cluster-scoped parents
- Line 1344: r, ok := graph[uidNodeKey][childNode.Ref.UID] - Checking for existing entries
Resource scanning (18% of scan time): Lines 1170-1190
- Iterating through all namespace resources to find potential children
- Line 1175: UID lookups in clusterKeyUIDs map to check parent ownership
- Line 1186: Building namespaceCandidates slice
Memory allocations (10% of scan time):
- Line 1131: Building clusterNodesByUID index for cluster resources
- Line 1156-1157: Creating lookup maps for cluster resource UIDs and names

Code Locations of Most Expensive Operations

// Most expensive operations in buildGraphWithCrossNamespace:

// 21% of time - Building UID index for namespace resources
 for _, node := range nsNodes {                                      // line 1309
     nodesByUID[node.Ref.UID] = append(nodesByUID[node.Ref.UID], node) // line 1310
 }

 // 16% of time - Allocating maps in graph structure
 if _, ok := graph[uidNodeKey]; !ok {                               // line 1341
     graph[uidNodeKey] = make(map[types.UID]*Resource) // line 1342
 }

 // 14% of time - Inserting resources into graph
 graph[uidNodeKey][childNode.Ref.UID] = childNode // line 1346

 // 8% of time - Computing resource keys
 uidNodeKey := uidNode.ResourceKey() // line 1340

 // Resource scanning to find candidates:
 for _, resource := range nsNodes {                                  // line 1170
     for _, ownerRef := range resource.OwnerRefs {                  // line 1171
         if ownerRef.UID != "" && clusterKeyUIDs[ownerRef.UID] {   // line 1175
             namespaceCandidates = append(namespaceCandidates, resource) // line 1186
         }
     }
 }

Optimizations Applied

Cached ResourceKey() calls: Reduced repeated hash computations in graph building
- Result: 10% performance improvement in namespace scanning scenarios
Conditional graph building: Only build cross-namespace graphs when needed
- Fast path uses simpler graph building without cross-namespace support

Production Readiness

✅ No performance regression for common cases
✅ Linear, predictable scaling for namespace scanning
✅ Zero heap allocations in critical paths
✅ Emergency disable mechanism via environment variable
✅ Well-understood performance characteristics

Recommendations

For typical workloads (<1,000 resources per namespace): Use freely, overhead is minimal
For large namespaces (>10,000 resources): Consider the ~350-700µs overhead when using namespace scanning
For extreme cases: Use GITOPS_ENGINE_DISABLE_CLUSTER_SCOPED_PARENT_REFS=1 to disable if needed

Testing Methodology

Benchmarks were conducted using:

Go's built-in benchmark framework
CPU profiling with pprof
Memory profiling to verify zero allocations
Parameterized tests with varying resource counts and cross-namespace percentages

Test environment:

Platform: darwin/arm64
CPU: Apple M2 Max
Go version: As per go.mod

Conclusion

The cross-namespace hierarchy traversal feature successfully adds new functionality with:

No measurable impact on existing use cases
Predictable, linear scaling for new namespace scanning capability
Production-ready performance characteristics
Clear escape hatch for edge cases

The performance is notable, but for some workloads would be worth the added functionality of properly tracking
cluster-scoped to namespaced ownership relationships, bringing ArgoCD into full alignment with Kubernetes' ownership model.

jcogilvie · 2025-09-24T15:52:34Z

gitops-engine/pkg/cache/cluster_test.go

-				refs := tc.cluster.resources[kube.GetResourceKey(pvc)].OwnerRefs
+				resource := tc.cluster.resources[kube.GetResourceKey(pvc)]
+				if resource == nil {
+					return false // Resource not ready yet, keep retrying


latest version of testify.Eventually does an immediate check instead of waiting for a tick, but that can cause nil refs, so we check.

crenshaw-dev · 2025-09-24T16:05:47Z

I don't think the current benchmarks really involve the new use case (resources owned by cluster-scoped resources). Can those benchmarks be updated so that the new code will be more likely to be hit?

jcogilvie · 2025-09-24T16:09:42Z

I don't think the current benchmarks really involve the new use case (resources owned by cluster-scoped resources). Can those benchmarks be updated so that the new code will be more likely to be hit?

Yeah, and I think there's a further optimization here in this impl. Stay tuned.

jcogilvie · 2025-09-24T19:47:38Z

gitops-engine/pkg/cache/cluster_test.go

+	assert.Equal(t, 1, visitCount["child"], "child should be visited once")
+}
+
+func TestIterateHierarchyV2_NoDuplicatesCrossNamespace(t *testing.T) {


We had a problem in this impl where nodes were visited multiple times, once from themselves as the root and once from the parent, because the visited map is per-namespace. So we had to promote the visited map to be shared across namespaces.

We should watch this to make sure it isn't too unwieldy.

jcogilvie · 2025-09-24T19:48:36Z

gitops-engine/pkg/cache/cluster_test.go

+	return resources
+}
+
+func BenchmarkIterateHierarchyV2CrossNamespace(b *testing.B) {


This test obviously won't meaningfully compare against master, but it's useful to compare between our proposed implementations of cluster-parent-namespaced-child tracking.

jcogilvie · 2025-09-24T19:49:58Z

gitops-engine/pkg/cache/cluster.go

+}
+
+// processNamespaceHierarchy handles the hierarchy traversal for a single namespace
+func (c *clusterCache) processNamespaceHierarchy(namespaceKeys []kube.ResourceKey, nsNodes map[kube.ResourceKey]*Resource, graph map[kube.ResourceKey]map[types.UID]*Resource, visited map[kube.ResourceKey]int, action func(resource *Resource, namespaceResources map[kube.ResourceKey]*Resource) bool) {


Just extracted this into a helper.

codecov · 2025-09-24T20:18:41Z

Codecov Report

❌ Patch coverage is 51.33333% with 73 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.82%. Comparing base (1b973b8) to head (14e2269).
⚠️ Report is 17 commits behind head on master.

Files with missing lines	Patch %	Lines
gitops-engine/pkg/cache/cluster.go	49.65%	66 Missing and 7 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #24728      +/-   ##
==========================================
+ Coverage   60.80%   60.82%   +0.02%     
==========================================
  Files         404      404              
  Lines       66217    66334     +117     
==========================================
+ Hits        40261    40348      +87     
- Misses      22712    22747      +35     
+ Partials     3244     3239       -5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jcogilvie · 2025-09-25T21:23:32Z

@crenshaw-dev I updated the above performance benchmarking notes. After some experimentation I discovered a flaw here that is I think unavoidable.

We consciously avoid ever scanning the whole cluster to track down unresolved refs. My first pass required that all resources that we want to traverse are present in keys []kube.ResourceKey. This would work for tracking parent/child relationships between objects in the manifest, but wouldn't work for dynamically-generated downstream resources (e.g. Crossplane providers creating Deployments downstream).

So I am considering a couple of approaches:

add the cluster scan at the back on some kind of toggle to support that case. With that scan implemented we go from about 50ns to 1ms(!) per iteration
add a scan of only the Application's destination namespace, which has a cost significantly less than that but which scales with the resource count in the namespace. This is what I went with for now.

Maybe it's a default-disabled, per-Application annotation?

Meanwhile, I went to great lengths not to touch the original path in the name of performance. That naturally results in some duplication. I could dig further into specific ways we could consolidate the old/new paths and any possible performance penalties we might incur for doing so.

crenshaw-dev · 2025-09-26T17:29:19Z

We require that all resources that we want to traverse are present in keys []kube.ResourceKey. This will work for tracking parent/child relationships between objects in the manifest, but wouldn't work for dynamically-generated downstream resources

Could you expand on this a bit? The way I'm reading it, "between objects in the manifest" means that, for example, ReplicaSets would not be identified as children of Deployments or Pods would not be identified as children of ReplicaSets (because the ReplicaSets and Pods are not in manifests). But my experience is that those relationships are discovered with the current implementation.

jcogilvie · 2025-09-26T19:07:11Z

gitops-engine/pkg/cache/cluster.go

 					// Update the graph for this owner to include the child.
-					if _, ok := graph[uidNode.ResourceKey()]; !ok {
-						graph[uidNode.ResourceKey()] = make(map[types.UID]*Resource)
+					if _, ok := graph[uidNodeKey]; !ok {


Perf optimization; the lookup is expensive in the hot path and we do it multiple times.

jcogilvie · 2025-09-26T19:17:00Z

gitops-engine/pkg/cache/cluster.go

+			// Check by UID if available
+			if ownerRef.UID != "" && clusterKeyUIDs[ownerRef.UID] {
+				isChildOfOurKeys = true
+			} else if ownerRef.UID == "" {


Dumb defensive coding that we can probably remove since this doesn't seem like it would ever be valid in the api spec?

jcogilvie · 2025-09-26T19:57:21Z

Could you expand on this a bit? The way I'm reading it, "between objects in the manifest" means that, for example, ReplicaSets would not be identified as children of Deployments or Pods would not be identified as children of ReplicaSets (because the ReplicaSets and Pods are not in manifests). But my experience is that those relationships are discovered with the current implementation.

Those are all namespace-scoped. If you had a cluster-scoped object with a namespaced child, it wouldn't be.

In any case, I ended up pushing an implementation to scan the Application destination namespace.

I think probably this should be enabled on a per-app basis since it is fairly expensive. Benchmark results updated.

… of namespaced children Signed-off-by: Jonathan Ogilvie <[email protected]>

Signed-off-by: Jonathan Ogilvie <[email protected]>

…ionships based on input keys; defer to namespace scan Signed-off-by: Jonathan Ogilvie <[email protected]>

jcogilvie · 2025-10-06T14:30:39Z

Superseded by a better impl here: #24847

jcogilvie requested a review from a team as a code owner September 24, 2025 15:30

jcogilvie changed the title ~~Fix: cross-namespace resource traversal~~ Fix: cross-namespace resource traversal (#24379) Sep 24, 2025

jcogilvie changed the title ~~Fix: cross-namespace resource traversal (#24379)~~ fix: cross-namespace resource traversal (#24379) Sep 24, 2025

jcogilvie commented Sep 24, 2025

View reviewed changes

jcogilvie marked this pull request as draft September 24, 2025 15:59

jcogilvie commented Sep 24, 2025

View reviewed changes

jcogilvie force-pushed the fix/cross-namespace-hierarchy-traversal-v2 branch 2 times, most recently from 115ec5b to 664d4cc Compare September 25, 2025 21:19

jcogilvie marked this pull request as ready for review September 25, 2025 21:20

jcogilvie marked this pull request as draft September 25, 2025 21:42

jcogilvie force-pushed the fix/cross-namespace-hierarchy-traversal-v2 branch from 664d4cc to 26ee5dc Compare September 26, 2025 17:48

jcogilvie commented Sep 26, 2025

View reviewed changes

jcogilvie force-pushed the fix/cross-namespace-hierarchy-traversal-v2 branch from 3cd946e to fe1b057 Compare October 1, 2025 16:30

jcogilvie added 6 commits October 2, 2025 10:31

Support cross-namespace resource traversal for cluster-scoped parents…

5b833d8

… of namespaced children Signed-off-by: Jonathan Ogilvie <[email protected]>

Try app namespace scan

bbc83c1

Signed-off-by: Jonathan Ogilvie <[email protected]>

Improve performance of per-app namespace scan

cd17ebf

Signed-off-by: Jonathan Ogilvie <[email protected]>

Refactor benchmark tests to table-driven

5409ab1

Signed-off-by: Jonathan Ogilvie <[email protected]>

Clean up tests

54e4f66

Signed-off-by: Jonathan Ogilvie <[email protected]>

Orphan scan in its own function

d87577b

Signed-off-by: Jonathan Ogilvie <[email protected]>

jcogilvie added 2 commits October 2, 2025 10:31

Add purpose comment to test

8e3f1f2

Signed-off-by: Jonathan Ogilvie <[email protected]>

Remove usually redundant processing for cluster/ns parent/child relat…

14e2269

…ionships based on input keys; defer to namespace scan Signed-off-by: Jonathan Ogilvie <[email protected]>

jcogilvie force-pushed the fix/cross-namespace-hierarchy-traversal-v2 branch from fe1b057 to 14e2269 Compare October 2, 2025 14:31

jcogilvie closed this Oct 6, 2025

Uh oh!

fix: cross-namespace resource traversal (#24379) #24728

fix: cross-namespace resource traversal (#24379) #24728

Conversation

jcogilvie commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bunnyshell bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Preview Environment deleted from Bunnyshell

Uh oh!

jcogilvie commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cross-Namespace Hierarchy Traversal Performance Analysis

Executive Summary

Implementation Overview

Key Design Decisions

Performance Results

Baseline Performance (No Cross-Namespace References)

Cross-Namespace Traversal (Starting from Namespaced Resources)

Namespace Scanning Performance

Scaling with Namespace Size (10% cross-namespace refs)

Impact of Cross-Namespace Percentage (5,000 resources)

Overkill Example: 25% Cross-Namespace References

Performance Hotspots

Baseline Case (No Namespace Scanning)

Namespace Scanning Case

Code Locations of Most Expensive Operations

Optimizations Applied

Production Readiness

Recommendations

Testing Methodology

Conclusion

Uh oh!

jcogilvie Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

crenshaw-dev commented Sep 24, 2025

Uh oh!

jcogilvie commented Sep 24, 2025

Uh oh!

jcogilvie Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

jcogilvie Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

jcogilvie Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jcogilvie commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crenshaw-dev commented Sep 26, 2025

Uh oh!

jcogilvie Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

jcogilvie Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

jcogilvie commented Sep 26, 2025

Uh oh!

jcogilvie commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jcogilvie commented Sep 24, 2025 •

edited

Loading

bunnyshell bot commented Sep 24, 2025 •

edited

Loading

jcogilvie commented Sep 24, 2025 •

edited

Loading

codecov bot commented Sep 24, 2025 •

edited

Loading

jcogilvie commented Sep 25, 2025 •

edited

Loading