chore: Memory improvements in dataobj page downloads #19301

benclive · 2025-09-25T11:20:33Z

What this PR does / why we need it:
A small optimization to use page compressed size for downloading page data (this didn't do much on the LogQL suite)
Introduced a basic buffer pool mechanism to rangeio's optimized requests, which resulted in a 30-40% reduction in bytes allocated during the benchmarks.

benchstat before.txt after_pooling.txt                                                              ok | 3.12.3 py | 12:08:05 
goos: darwin
goarch: arm64
pkg: github.com/grafana/loki/v3/pkg/logql/bench
cpu: Apple M3 Max
                                                                                                                                                                          │ before.txt │         after_pooling.txt         │
                                                                                                                                                                          │   sec/op   │   sec/op    vs base               │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14   8.850 ± 2%   8.586 ± 2%  -2.98% (p=0.000 n=10)

                                                                                                                                                                          │     before.txt     │           after_pooling.txt            │
                                                                                                                                                                          │ kilobytesProcessed │ kilobytesProcessed  vs base            │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14          52.87k ± 0%          52.87k ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

                                                                                                                                                                          │   before.txt   │         after_pooling.txt          │
                                                                                                                                                                          │ linesProcessed │ linesProcessed  vs base            │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14      6.233M ± 0%      6.233M ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

                                                                                                                                                                          │   before.txt    │          after_pooling.txt          │
                                                                                                                                                                          │ postFilterLines │ postFilterLines  vs base            │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14        0.000 ± 0%        0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

                                                                                                                                                                          │   before.txt   │           after_pooling.txt           │
                                                                                                                                                                          │      B/op      │     B/op       vs base                │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14   1083.9Mi ± 16%   675.5Mi ± 17%  -37.68% (p=0.000 n=10)

                                                                                                                                                                          │ before.txt  │       after_pooling.txt       │
                                                                                                                                                                          │  allocs/op  │  allocs/op   vs base          │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14   2.416M ± 0%   2.415M ± 0%  ~ (p=0.184 n=10)

pkg/dataobj/sections/internal/columnar/decoder.go

rfratto · 2025-09-30T13:25:10Z

pkg/util/rangeio/rangeio.go

+	for _, r := range optimized {
+		putBytesBuffer(&r.Data)
+	}


I believe this still allocates because it's taking the address of the struct field and not actually the byte slice.

As an alternative, you can:

Use a pool of *bytes.Buffer where you use their raw byte slices

Update optimizedRanges to return a release func() which returns the original buffers back to the pool.

Then you'd have

optimized, release := optimizedRanges(cfg, ranges) defer release()

This is what I did in my hack, and it worked out well.

The only issue is that it's a little annoying to get the slice from *bytes.Buffer, I had to do this:

size := coalescedChunks[i].Length // TODO: Release but back into the // pool in the returned release func buf := bytesBufferPool.Get().(*bytes.Buffer) buf.Reset() buf.Grow(size) out[i] = Range{ Data: buf.Bytes()[:size], Offset: coalescedChunks[i].Offset, }

Nice, thats definitely cleaner. I played around with bytes.Buffer originally, but I was returning the byte slice too soon so wasn't able to track when it grew. Thanks for the tip.

Here are the new benchmark results:

goos: darwin goarch: arm64 pkg: github.com/grafana/loki/v3/pkg/logql/bench cpu: Apple M3 Max │ before.txt │ after.txt │ │ sec/op │ sec/op vs base │ LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14 10.43 ± 5% 10.18 ± 3% -2.48% (p=0.019 n=10) │ before.txt │ after.txt │ │ kilobytesProcessed │ kilobytesProcessed vs base │ LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14 52.87k ± 0% 52.87k ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ before.txt │ after.txt │ │ linesProcessed │ linesProcessed vs base │ LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14 6.233M ± 0% 6.233M ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ before.txt │ after.txt │ │ postFilterLines │ postFilterLines vs base │ LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ before.txt │ after.txt │ │ B/op │ B/op vs base │ LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14 1022.1Mi ± 6% 730.1Mi ± 18% -28.57% (p=0.000 n=10) │ before.txt │ after.txt │ │ allocs/op │ allocs/op vs base │ LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14 2.416M ± 0% 2.415M ± 0% ~ (p=0.796 n=10)

rfratto

Looks good!

rfratto · 2025-10-01T11:58:06Z

pkg/util/rangeio/rangeio.go

 		i += 2 // Skip over the range we just inserted.
 	}

+	usedBuffers := []*bytes.Buffer{}


nit: you could preallocate this to the target capacity (or length) since it'll have the same number of elements as out

github-actions · 2025-10-01T12:08:50Z

😢 zizmor failed with exit code 14.

Expand for full output

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
  --> ./.github/workflows/images.yml:44:7
   |
44 |       "uses": "actions/setup-node@v4"
   |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
69 |       "uses": "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
   |       --------------------------------------------------------------------------- runtime artifacts usually published here
   |
   = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/images.yml:167:7
    |
167 |       "uses": "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
192 |       "uses": "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       --------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/images.yml:290:7
    |
290 |       "uses": "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
315 |       "uses": "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       --------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/images.yml:413:7
    |
413 |       "uses": "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
438 |       "uses": "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       --------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:220:7
    |
220 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
248 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:293:7
    |
293 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
321 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:366:7
    |
366 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
394 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:445:7
    |
445 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
473 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:518:7
    |
518 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
546 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:597:7
    |
597 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
625 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:676:7
    |
676 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
704 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:848:7
    |
848 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
876 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:927:7
    |
927 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
955 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:62:7
     |
  62 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:220:7
     |
 220 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:293:7
     |
 293 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:366:7
     |
 366 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:445:7
     |
 445 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:518:7
     |
 518 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:597:7
     |
 597 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:676:7
     |
 676 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:753:7
     |
 753 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:848:7
     |
 848 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:927:7
     |
 927 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:1003:7
     |
1003 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/release.yml:44:7
    |
 44 |         uses: "actions/setup-node@v4"
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
435 | / "on":
436 | |   push:
437 | |     branches:
438 | |     - "release-[0-9]+.[0-9]+.x"
439 | |     - "k[0-9]+"
440 | |     - "main"
    | |____________- generally used when publishing artifacts generated at runtime
    |
    = note: audit confidence → Low

331 findings (15 ignored, 290 suppressed): 0 informational, 0 low, 0 medium, 26 high

benclive added 2 commits September 25, 2025 11:50

Use compressed size in fetch

81c5569

Add pooling to optimized rangeio fetches

d4318dc

benclive requested a review from a team as a code owner September 25, 2025 11:20

pull-request-size bot added the size/S label Sep 25, 2025

Use a pointer in the pool

ac6510b

pull-request-size bot added size/M and removed size/S labels Sep 25, 2025

Use pointers

c7cfc77

rfratto reviewed Sep 30, 2025

View reviewed changes

Use bytes.Buffer in pool

69f5be3

This comment has been minimized.

Sign in to view

rfratto approved these changes Oct 1, 2025

View reviewed changes

pre allocate slice

fad9de7

benclive enabled auto-merge (squash) October 1, 2025 12:09

benclive merged commit c5a0a38 into main Oct 1, 2025
62 of 63 checks passed

benclive deleted the benclive/use-compressed-size-in-page-fetch branch October 1, 2025 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: Memory improvements in dataobj page downloads #19301

chore: Memory improvements in dataobj page downloads #19301

Uh oh!

benclive commented Sep 25, 2025

Uh oh!

Uh oh!

rfratto Sep 30, 2025

Uh oh!

benclive Oct 1, 2025

Uh oh!

This comment has been minimized.

rfratto left a comment

Uh oh!

rfratto Oct 1, 2025

Uh oh!

github-actions bot commented Oct 1, 2025

Uh oh!

Uh oh!

Uh oh!

chore: Memory improvements in dataobj page downloads #19301

chore: Memory improvements in dataobj page downloads #19301

Uh oh!

Conversation

benclive commented Sep 25, 2025

Uh oh!

Uh oh!

rfratto Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

benclive Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

rfratto left a comment

Choose a reason for hiding this comment

Uh oh!

rfratto Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 1, 2025

Uh oh!

Uh oh!

Uh oh!