Skip to content

Conversation

rockdaboot
Copy link
Contributor

  1. fileToMappings were allocated with more entries than required.
  2. The mappings array can now be collected by the GC. Previously, there were references to it and prevented timely GC'ing.
$ benchstat old.txt new.txt 
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/ebpf-profiler/process
cpu: AMD Ryzen 7 8845HS w/ Radeon 780M Graphics
                             │   old.txt   │              new.txt               │
                             │   sec/op    │   sec/op     vs base               │
SystemProcess_GetMappings-16   50.42µ ± 7%   46.34µ ± 2%  -8.10% (p=0.000 n=10)

                             │   old.txt    │               new.txt                │
                             │     B/op     │     B/op      vs base                │
SystemProcess_GetMappings-16   9.619Ki ± 1%   6.826Ki ± 0%  -29.04% (p=0.000 n=10)

                             │   old.txt   │              new.txt               │
                             │  allocs/op  │ allocs/op   vs base                │
SystemProcess_GetMappings-16   11.000 ± 0%   9.000 ± 0%  -18.18% (p=0.000 n=10)

The quick&dirty benchmark code

func BenchmarkSystemProcess_GetMappings(b *testing.B) {
       pid := libpf.PID(os.Getpid())
       pr := New(pid, pid)

       b.ReportAllocs()
       b.ResetTimer()

       for b.Loop() {
               _, _, err := pr.GetMappings()
               if err != nil {
                       b.Fatal(err)
               }
       }
}

@rockdaboot rockdaboot requested review from a team as code owners August 20, 2025 18:03
Copy link
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. fileToMappings were allocated with more entries than required.

I suppose this is the primary improvement.

  1. The mappings array can now be collected by the GC. Previously, there were references to it and prevented timely GC'ing.

Can you elaborate this. If GetMappings is called multiple times, both the slice and the map are updated. I don't think references should be leaked. The places where its used do not leak the pointer out. And the conversion from pointer to struct will introduce memory copies.

Did you test with just removing the map prealloc size? But it is possible that there is some unexpected interaction with the code generation and/or the GCs involved here that I cannot immediately see.

On a related side note, I will refactor most of the mappings related structures and code to use the unique package once #671 is merged. So this will likely get a remake soon. About half of that commit is already ready.

But happy to get this in first since it looks small and clean improvement.

Comment on lines 266 to 272
// The likeliness of an empty path or duplicate path is high, create a unique set of paths.
uniquePaths := make(map[string]int)
for idx, m := range mappings {
path := m.Path.String()
if path != "" {
uniquePaths[path] = idx
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already creating a map without preallocated size. Wouldn't it make sense to just make the fileToMapping map be without initial size? Or does the change from *Mapping to Mapping make this option be GC unfriendly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to avoid over-committment of fileToMapping, which takes place due to re-allocation by factor of 2. E.g. 33 entries would re-allocate to 64 map entries (map internally).
To set the map size correctly (to 33 in the example), the cost for this is a temporary map uniquePaths, where the key (path) is already allocated and the value is something small (int).

Why make this effort at all? Because fileToMapping persists in memory until the process exits.

To show the effect on memory allocations, here is the comparison for executable with 972 unique path entries (an IntelliJIDEA process):
(451 iterations, 46 allocs/op)

         .          .    283:   sp.fileToMapping = make(map[string]Mapping)
         .          .    284:   for idx, m := range mappings {
         .          .    285:           path := m.Path.String()
         .          .    286:           if path != "" {
  145.69MB   145.69MB    287:                   sp.fileToMapping[path] = mappings[idx]
         .          .    288:           }
         .          .    289:   }

(463 iterations, 49 allocs/op)

        .          .    266:   uniquePaths := make(map[string]int)
         .          .    267:   for idx, m := range mappings {
         .          .    268:           path := m.Path.String()
         .          .    269:           if path != "" {
   49.02MB    49.02MB    270:                   uniquePaths[path] = idx
         .          .    271:           }
         .          .    272:   }
   67.47MB    67.47MB    275:   sp.fileToMapping = make(map[string]Mapping, len(uniquePaths))
         .          .    276:   for path, idx := range uniquePaths {
         .          .    277:           sp.fileToMapping[path] = mappings[idx]
         .          .    278:   }

The approach with uniquePaths reduces overall heap usage. The numbers include temporary allocs and persistent allocs.

The uniquePaths can be pre-allocated. Not sure, what the sweet spot is, but let's give it a try with uniquePaths pre-allocated for 1024 entries:
(456 iterations, 34 allocs/op)

   24.63MB    24.63MB    266:   uniquePaths := make(map[string]int, 1024)
         .          .    267:   for idx, m := range mappings {
         .          .    268:           path := m.Path.String()
         .          .    269:           if path != "" {
         .          .    270:                   uniquePaths[path] = idx
         .          .    271:           }
         .          .    272:   }
   64.27MB    64.27MB    275:   sp.fileToMapping = make(map[string]Mapping, len(uniquePaths))
         .          .    276:   for path, idx := range uniquePaths {
         .          .    277:           sp.fileToMapping[path] = mappings[idx]
         .          .    278:   }

So we can go down from 145MB to 64MB (for ~450 loop iterations!) in persistent heap usage. Of course, this is for an extreme executable. But it shows the positive effect of using the uniquePaths temporary map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether we want to have 1024 integers pre-allocated or less. Most executables will have much less unique paths, so likely the sweet spot is something less like 64 or 128. Any preference here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make this effort at all? Because fileToMapping persists in memory until the process exits.

Not exactly. The systemProcess instance is created on demand when mappings sync is requested by the ebpf component. Once the sync is done, there will be no reference to the process anymore, and the whole systemProcess can be reaped.

For me, with just doing:

diff --git a/process/process.go b/process/process.go
index c66afbe..3389850 100644
--- a/process/process.go
+++ b/process/process.go
@@ -263,7 +263,7 @@ func (sp *systemProcess) GetMappings() ([]Mapping, uint32, error) {
                }
        }
 
-       fileToMapping := make(map[string]*Mapping, len(mappings))
+       fileToMapping := make(map[string]*Mapping)
        for idx := range mappings {
                m := &mappings[idx]
                fileToMapping[m.Path.String()] = m

will yield:

goos: linux
goarch: amd64
pkg: go.opentelemetry.io/ebpf-profiler/process
cpu: 12th Gen Intel(R) Core(TM) i7-1265U
                             │   old.txt   │              new.txt               │
                             │   sec/op    │   sec/op     vs base               │
SystemProcess_GetMappings-12   18.49µ ± 2%   17.51µ ± 1%  -5.31% (p=0.000 n=10)

                             │   old.txt    │               new.txt                │
                             │     B/op     │     B/op      vs base                │
SystemProcess_GetMappings-12   3.977Ki ± 0%   2.389Ki ± 0%  -39.93% (p=0.000 n=10)

                             │   old.txt   │              new.txt               │
                             │  allocs/op  │ allocs/op   vs base                │
SystemProcess_GetMappings-12   10.000 ± 0%   8.000 ± 0%  -20.00% (p=0.000 n=10)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The systemProcess instance is created on demand when mappings sync is requested by the ebpf component. Once the sync is done, there will be no reference to the process anymore, and the whole systemProcess can be reaped.

Thanks, now I can see it. Didn't step up enough towards the caller and made a wrong assumption here. That makes the code changes much simpler.

@rockdaboot
Copy link
Contributor Author

rockdaboot commented Aug 21, 2025

  1. The mappings array can now be collected by the GC. Previously, there were references to it and prevented timely GC'ing.

Can you elaborate this. If GetMappings is called multiple times, both the slice and the map are updated. I don't think references should be leaked. The places where its used do not leak the pointer out. And the conversion from pointer to struct will introduce memory copies.

The caller of GetMappings() will iterate over the returned mappings - thereafter mappings can be GC'ed. But this is prevented by some references into that slice from within process.fileToMapping, which persists until the process exits (if I read and understand the code correctly).
So in order to GC mappings, we should make copies of the relatively small entries when creating fileToMapping (this PR does it).

Did you test with just removing the map prealloc size? But it is possible that there is some unexpected interaction with the code generation and/or the GCs involved here that I cannot immediately see.

See #715 (comment)

On a related side note, I will refactor most of the mappings related structures and code to use the unique package once #671 is merged. So this will likely get a remake soon. About half of that commit is already ready.

But happy to get this in first since it looks small and clean improvement.

👍

@rockdaboot rockdaboot force-pushed the improve-getmappings branch from e86c51b to 60d6052 Compare August 21, 2025 17:37
for idx := range mappings {
m := &mappings[idx]
fileToMapping[m.Path.String()] = m
sp.fileToMapping = make(map[string]*Mapping)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sp.fileToMapping = make(map[string]*Mapping)
fileToMapping := make(map[string]*Mapping)

Could we keep this original code and temporary variable?
It looks a bit cleaner, and helps with the thinking that sp.fileToMapping is replaced in one assignment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants