Skip to content

Conversation

@kkk777-7
Copy link
Member

@kkk777-7 kkk777-7 commented Nov 16, 2025

What this PR does / why we need it:

Introduce Translator Context to improve GatewayAPI translator performance.

Now, Gateway API translator methods hold a lot of local maps to improve retrievals.
These can be all moved to a common preprocessing step, saved in the Translator context and reused across methods.

Also each resource is currently retrieved by performing a linear search over resource.Resources.
Therefore, for resources that are accessed frequently, maintaining a map is expected to improve CPU performance.

This PR's scope

  • Service Map
  • NamespaceMap
  • ServiceImportMap
  • BackendMap
  • SecretMap
  • ConfigMapMap
  • ClusterTrustBundleMap
  • EndpointSliceMap

Although the following items were initially considered, the improvements were minimal and would increase the complexity, so they were excluded from the scope. Ref : #7535 (comment)

  • Policy Target Gateway Map
  • Policy Target Route Map

Which issue(s) this PR fixes:

Fixes #6711

Release Notes: No

Benchmark Detail

gobench result

gatewayapi translator benchmark (vs latest main branch)

  • Execution Time (sec/op) improves especially on large workloads, 50% faster.
    • large workload : HTTPRoute(2k), GRPCRoute(250), UDPRoute(100), ConfigMap(500), SP(500), BTP(500), EEP(500), Service(2k), EndpointSlice(2k)
  • no regressions of memory usage or allocation counts in any workloads.
                               │    main.txt    │               fix.txt                │
                               │     sec/op     │    sec/op      vs base               │
GatewayAPITranslator/small-10     2.012m ± 164%   2.188m ± 144%        ~ (p=0.132 n=6)
GatewayAPITranslator/medium-10    5.752m ±   6%   5.497m ±  35%        ~ (p=0.132 n=6)
GatewayAPITranslator/large-10    116.67m ±   5%   64.12m ±   9%  -45.04% (p=0.002 n=6)
geomean                           11.05m          9.171m         -17.02%

                               │    main.txt    │               fix.txt                │
                               │      B/op      │      B/op       vs base              │
GatewayAPITranslator/small-10    803.7Ki ± 277%   801.9Ki ± 278%       ~ (p=0.900 n=6)
GatewayAPITranslator/medium-10   3.018Mi ±   1%   3.006Mi ±   3%       ~ (p=0.065 n=6)
GatewayAPITranslator/large-10    23.35Mi ±   0%   23.32Mi ±   0%  -0.10% (p=0.002 n=6)
geomean                          3.810Mi          3.801Mi         -0.24%

                               │   main.txt    │               fix.txt               │
                               │   allocs/op   │   allocs/op    vs base              │
GatewayAPITranslator/small-10    12.89k ± 127%   12.90k ± 127%  +0.06% (p=0.035 n=6)
GatewayAPITranslator/medium-10   52.63k ±   0%   52.64k ±   0%  +0.03% (p=0.002 n=6)
GatewayAPITranslator/large-10    412.3k ±   0%   412.3k ±   0%       ~ (p=0.065 n=6)
geomean                          65.41k          65.43k         +0.03%

@codecov
Copy link

codecov bot commented Nov 16, 2025

Codecov Report

❌ Patch coverage is 95.20548% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.21%. Comparing base (f609278) to head (25a048e).
⚠️ Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
internal/gatewayapi/securitypolicy.go 78.57% 4 Missing and 2 partials ⚠️
internal/gatewayapi/contexts.go 96.62% 2 Missing and 1 partial ⚠️
internal/gatewayapi/backendtlspolicy.go 90.00% 0 Missing and 2 partials ⚠️
internal/gatewayapi/backendtrafficpolicy.go 94.44% 0 Missing and 1 partial ⚠️
internal/gatewayapi/listener.go 94.11% 0 Missing and 1 partial ⚠️
internal/gatewayapi/route.go 97.22% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7535      +/-   ##
==========================================
- Coverage   71.57%   71.21%   -0.37%     
==========================================
  Files         231      274      +43     
  Lines       42625    34898    -7727     
==========================================
- Hits        30507    24851    -5656     
+ Misses      10344     8256    -2088     
- Partials     1774     1791      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kkk777-7
Copy link
Member Author

wait #7534

@kkk777-7 kkk777-7 marked this pull request as ready for review November 16, 2025 12:39
@kkk777-7 kkk777-7 requested a review from a team as a code owner November 16, 2025 12:39
@kkk777-7
Copy link
Member Author

/retest

@shreealt
Copy link
Contributor

shreealt commented Nov 16, 2025

@kkk777-7 you might wanna merge the main branch to this branch to include the disk space cleaner.

@zirain zirain force-pushed the perf-translator-context branch from 9527aa1 to ce0cfcc Compare November 16, 2025 13:01
@zhaohuabing
Copy link
Member

The GetService in the resource.go can be deleted - it's no longer used.

func (r *Resources) GetService(namespace, name string) *corev1.Service {
for _, svc := range r.Services {
if svc.Namespace == namespace && svc.Name == name {
return svc
}
}
return nil
}

@kkk777-7 kkk777-7 force-pushed the perf-translator-context branch from f9b5a6a to 3d4c714 Compare November 18, 2025 17:05
@zhaohuabing
Copy link
Member

zhaohuabing commented Nov 19, 2025

Think out load: could we also add other resources(Namespace,Secret,ConfigMap, etc.) to the translator context to avoid linear lookups to improve CPU performance?

This could increase memory usage - shouldn't be much increase since we just mirror the slices that already in resource using pointers, but we should benchmark both CPU and memory to ensure we don’t introduce significant overhead.

@zirain
Copy link
Member

zirain commented Nov 19, 2025

Think out load: could we also add other resources(Namespace,Secret,ConfigMap, etc.) to the translator context to avoid linear lookups to improve CPU performance?

we should do that.

@kkk777-7
Copy link
Member Author

Think out load: could we also add other resources(Namespace,Secret,ConfigMap, etc.) to the translator context to avoid linear lookups to improve CPU performance?

+1
Even with the addition of a 2k service map, the memory increase was minimal, so i think the benefit on CPU performance will be actually the bigger win!
I’ll add these in context and share the benchmark results :)

@zirain
Copy link
Member

zirain commented Nov 19, 2025

GetEndpointSlicesForBackend is another bottleneck.

zirain
zirain previously approved these changes Nov 20, 2025
@zirain zirain mentioned this pull request Nov 20, 2025
GatewayControllerName: string(rs.GatewayClass.Spec.ControllerName),
GatewayClassName: gwapiv1.ObjectName(rs.GatewayClass.Name),
GlobalRateLimitEnabled: true,
EndpointRoutingDisabled: false,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TranslateGatewayAPIToXds sets EndpointRoutingDisabled: true.
so, added new bench test.

- fqdn:
hostname: backend-v3.gateway-conformance-infra.svc.cluster.local
port: 8080
- apiVersion: gateway.envoyproxy.io/v1alpha1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkk777-7
Copy link
Member Author

Hi @zirain, @zhaohuabing, got good results 🎉
In large-scale environment, memory usage barely increased, and the CPU became 50% faster.

                               │    main.txt    │               fix.txt                │
                               │     sec/op     │    sec/op      vs base               │
GatewayAPITranslator/small-10     2.012m ± 164%   2.188m ± 144%        ~ (p=0.132 n=6)
GatewayAPITranslator/medium-10    5.752m ±   6%   5.497m ±  35%        ~ (p=0.132 n=6)
GatewayAPITranslator/large-10    116.67m ±   5%   64.12m ±   9%  -45.04% (p=0.002 n=6)
geomean                           11.05m          9.171m         -17.02%

                               │    main.txt    │               fix.txt                │
                               │      B/op      │      B/op       vs base              │
GatewayAPITranslator/small-10    803.7Ki ± 277%   801.9Ki ± 278%       ~ (p=0.900 n=6)
GatewayAPITranslator/medium-10   3.018Mi ±   1%   3.006Mi ±   3%       ~ (p=0.065 n=6)
GatewayAPITranslator/large-10    23.35Mi ±   0%   23.32Mi ±   0%  -0.10% (p=0.002 n=6)
geomean                          3.810Mi          3.801Mi         -0.24%

                               │   main.txt    │               fix.txt               │
                               │   allocs/op   │   allocs/op    vs base              │
GatewayAPITranslator/small-10    12.89k ± 127%   12.90k ± 127%  +0.06% (p=0.035 n=6)
GatewayAPITranslator/medium-10   52.63k ±   0%   52.64k ±   0%  +0.03% (p=0.002 n=6)
GatewayAPITranslator/large-10    412.3k ±   0%   412.3k ±   0%       ~ (p=0.065 n=6)
geomean                          65.41k          65.43k         +0.03%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Preprocess context in Gateway API Translator

4 participants