-
Notifications
You must be signed in to change notification settings - Fork 332
Reduce memory consumption of systemProcess.getmappings #715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- fileToMappings were allocated with more entries than required.
I suppose this is the primary improvement.
- The mappings array can now be collected by the GC. Previously, there were references to it and prevented timely GC'ing.
Can you elaborate this. If GetMappings
is called multiple times, both the slice and the map are updated. I don't think references should be leaked. The places where its used do not leak the pointer out. And the conversion from pointer to struct
will introduce memory copies.
Did you test with just removing the map prealloc size? But it is possible that there is some unexpected interaction with the code generation and/or the GCs involved here that I cannot immediately see.
On a related side note, I will refactor most of the mappings related structures and code to use the unique
package once #671 is merged. So this will likely get a remake soon. About half of that commit is already ready.
But happy to get this in first since it looks small and clean improvement.
process/process.go
Outdated
// The likeliness of an empty path or duplicate path is high, create a unique set of paths. | ||
uniquePaths := make(map[string]int) | ||
for idx, m := range mappings { | ||
path := m.Path.String() | ||
if path != "" { | ||
uniquePaths[path] = idx | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already creating a map without preallocated size. Wouldn't it make sense to just make the fileToMapping
map be without initial size? Or does the change from *Mapping
to Mapping
make this option be GC unfriendly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to avoid over-committment of fileToMapping
, which takes place due to re-allocation by factor of 2. E.g. 33 entries would re-allocate to 64 map entries (map internally).
To set the map size correctly (to 33 in the example), the cost for this is a temporary map uniquePaths
, where the key (path) is already allocated and the value is something small (int).
Why make this effort at all? Because fileToMapping
persists in memory until the process exits.
To show the effect on memory allocations, here is the comparison for executable with 972 unique path entries (an IntelliJIDEA process):
(451 iterations, 46 allocs/op)
. . 283: sp.fileToMapping = make(map[string]Mapping)
. . 284: for idx, m := range mappings {
. . 285: path := m.Path.String()
. . 286: if path != "" {
145.69MB 145.69MB 287: sp.fileToMapping[path] = mappings[idx]
. . 288: }
. . 289: }
(463 iterations, 49 allocs/op)
. . 266: uniquePaths := make(map[string]int)
. . 267: for idx, m := range mappings {
. . 268: path := m.Path.String()
. . 269: if path != "" {
49.02MB 49.02MB 270: uniquePaths[path] = idx
. . 271: }
. . 272: }
67.47MB 67.47MB 275: sp.fileToMapping = make(map[string]Mapping, len(uniquePaths))
. . 276: for path, idx := range uniquePaths {
. . 277: sp.fileToMapping[path] = mappings[idx]
. . 278: }
The approach with uniquePaths
reduces overall heap usage. The numbers include temporary allocs and persistent allocs.
The uniquePaths
can be pre-allocated. Not sure, what the sweet spot is, but let's give it a try with uniquePaths
pre-allocated for 1024 entries:
(456 iterations, 34 allocs/op)
24.63MB 24.63MB 266: uniquePaths := make(map[string]int, 1024)
. . 267: for idx, m := range mappings {
. . 268: path := m.Path.String()
. . 269: if path != "" {
. . 270: uniquePaths[path] = idx
. . 271: }
. . 272: }
64.27MB 64.27MB 275: sp.fileToMapping = make(map[string]Mapping, len(uniquePaths))
. . 276: for path, idx := range uniquePaths {
. . 277: sp.fileToMapping[path] = mappings[idx]
. . 278: }
So we can go down from 145MB to 64MB (for ~450 loop iterations!) in persistent heap usage. Of course, this is for an extreme executable. But it shows the positive effect of using the uniquePaths
temporary map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether we want to have 1024 integers pre-allocated or less. Most executables will have much less unique paths, so likely the sweet spot is something less like 64 or 128. Any preference here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make this effort at all? Because fileToMapping persists in memory until the process exits.
Not exactly. The systemProcess
instance is created on demand when mappings sync is requested by the ebpf component. Once the sync is done, there will be no reference to the process anymore, and the whole systemProcess
can be reaped.
For me, with just doing:
diff --git a/process/process.go b/process/process.go
index c66afbe..3389850 100644
--- a/process/process.go
+++ b/process/process.go
@@ -263,7 +263,7 @@ func (sp *systemProcess) GetMappings() ([]Mapping, uint32, error) {
}
}
- fileToMapping := make(map[string]*Mapping, len(mappings))
+ fileToMapping := make(map[string]*Mapping)
for idx := range mappings {
m := &mappings[idx]
fileToMapping[m.Path.String()] = m
will yield:
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/ebpf-profiler/process
cpu: 12th Gen Intel(R) Core(TM) i7-1265U
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
SystemProcess_GetMappings-12 18.49µ ± 2% 17.51µ ± 1% -5.31% (p=0.000 n=10)
│ old.txt │ new.txt │
│ B/op │ B/op vs base │
SystemProcess_GetMappings-12 3.977Ki ± 0% 2.389Ki ± 0% -39.93% (p=0.000 n=10)
│ old.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
SystemProcess_GetMappings-12 10.000 ± 0% 8.000 ± 0% -20.00% (p=0.000 n=10)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The systemProcess instance is created on demand when mappings sync is requested by the ebpf component. Once the sync is done, there will be no reference to the process anymore, and the whole systemProcess can be reaped.
Thanks, now I can see it. Didn't step up enough towards the caller and made a wrong assumption here. That makes the code changes much simpler.
The caller of
See #715 (comment)
👍 |
e86c51b
to
60d6052
Compare
for idx := range mappings { | ||
m := &mappings[idx] | ||
fileToMapping[m.Path.String()] = m | ||
sp.fileToMapping = make(map[string]*Mapping) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sp.fileToMapping = make(map[string]*Mapping) | |
fileToMapping := make(map[string]*Mapping) |
Could we keep this original code and temporary variable?
It looks a bit cleaner, and helps with the thinking that sp.fileToMapping
is replaced in one assignment.
fileToMappings
were allocated with more entries than required.mappings
array can now be collected by the GC. Previously, there were references to it and prevented timely GC'ing.The quick&dirty benchmark code