Skip to content

Conversation

benclive
Copy link
Contributor

What this PR does / why we need it:
A small optimization to use page compressed size for downloading page data (this didn't do much on the LogQL suite)
Introduced a basic buffer pool mechanism to rangeio's optimized requests, which resulted in a 30-40% reduction in bytes allocated during the benchmarks.

benchstat before.txt after_pooling.txt                                                              ok | 3.12.3 py | 12:08:05 
goos: darwin
goarch: arm64
pkg: github.com/grafana/loki/v3/pkg/logql/bench
cpu: Apple M3 Max
                                                                                                                                                                          │ before.txt │         after_pooling.txt         │
                                                                                                                                                                          │   sec/op   │   sec/op    vs base               │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14   8.850 ± 2%   8.586 ± 2%  -2.98% (p=0.000 n=10)

                                                                                                                                                                          │     before.txt     │           after_pooling.txt            │
                                                                                                                                                                          │ kilobytesProcessed │ kilobytesProcessed  vs base            │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14          52.87k ± 0%          52.87k ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

                                                                                                                                                                          │   before.txt   │         after_pooling.txt          │
                                                                                                                                                                          │ linesProcessed │ linesProcessed  vs base            │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14      6.233M ± 0%      6.233M ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

                                                                                                                                                                          │   before.txt    │          after_pooling.txt          │
                                                                                                                                                                          │ postFilterLines │ postFilterLines  vs base            │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14        0.000 ± 0%        0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

                                                                                                                                                                          │   before.txt   │           after_pooling.txt           │
                                                                                                                                                                          │      B/op      │     B/op       vs base                │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14   1083.9Mi ± 16%   675.5Mi ± 17%  -37.68% (p=0.000 n=10)

                                                                                                                                                                          │ before.txt  │       after_pooling.txt       │
                                                                                                                                                                          │  allocs/op  │  allocs/op   vs base          │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14   2.416M ± 0%   2.415M ± 0%  ~ (p=0.184 n=10)

@benclive benclive requested a review from a team as a code owner September 25, 2025 11:20
@pull-request-size pull-request-size bot added size/M and removed size/S labels Sep 25, 2025
Comment on lines 271 to 273
for _, r := range optimized {
putBytesBuffer(&r.Data)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this still allocates because it's taking the address of the struct field and not actually the byte slice.

As an alternative, you can:

  • Use a pool of *bytes.Buffer where you use their raw byte slices
  • Update optimizedRanges to return a release func() which returns the original buffers back to the pool.

Then you'd have

optimized, release := optimizedRanges(cfg, ranges)
defer release() 

This is what I did in my hack, and it worked out well.

The only issue is that it's a little annoying to get the slice from *bytes.Buffer, I had to do this:

size := coalescedChunks[i].Length

// TODO: Release but back into the 
// pool in the returned release func
buf := bytesBufferPool.Get().(*bytes.Buffer)
buf.Reset()
buf.Grow(size) 

out[i] = Range{
	Data: buf.Bytes()[:size],
	Offset: coalescedChunks[i].Offset,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thats definitely cleaner. I played around with bytes.Buffer originally, but I was returning the byte slice too soon so wasn't able to track when it grew. Thanks for the tip.

Here are the new benchmark results:

goos: darwin
goarch: arm64
pkg: github.com/grafana/loki/v3/pkg/logql/bench
cpu: Apple M3 Max
                                                                                                                                                                          │ before.txt │             after.txt             │
                                                                                                                                                                          │   sec/op   │   sec/op    vs base               │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14   10.43 ± 5%   10.18 ± 3%  -2.48% (p=0.019 n=10)

                                                                                                                                                                          │     before.txt     │               after.txt                │
                                                                                                                                                                          │ kilobytesProcessed │ kilobytesProcessed  vs base            │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14          52.87k ± 0%          52.87k ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

                                                                                                                                                                          │   before.txt   │             after.txt              │
                                                                                                                                                                          │ linesProcessed │ linesProcessed  vs base            │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14      6.233M ± 0%      6.233M ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

                                                                                                                                                                          │   before.txt    │              after.txt              │
                                                                                                                                                                          │ postFilterLines │ postFilterLines  vs base            │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14        0.000 ± 0%        0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

                                                                                                                                                                          │  before.txt   │               after.txt               │
                                                                                                                                                                          │     B/op      │     B/op       vs base                │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14   1022.1Mi ± 6%   730.1Mi ± 18%  -28.57% (p=0.000 n=10)

                                                                                                                                                                          │ before.txt  │           after.txt           │
                                                                                                                                                                          │  allocs/op  │  allocs/op   vs base          │
LogQL/query=sum_by_(cluster,_namespace)_(count_over_time({service_name="grafana",_env="prod",_region="us-west-2"}_|=_"level"_[1m0s]))/kind=metric/store=dataobj-engine-14   2.416M ± 0%   2.415M ± 0%  ~ (p=0.796 n=10)

This comment has been minimized.

Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

i += 2 // Skip over the range we just inserted.
}

usedBuffers := []*bytes.Buffer{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could preallocate this to the target capacity (or length) since it'll have the same number of elements as out

Copy link
Contributor

github-actions bot commented Oct 1, 2025

😢 zizmor failed with exit code 14.

Expand for full output
error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
  --> ./.github/workflows/images.yml:44:7
   |
44 |       "uses": "actions/setup-node@v4"
   |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
69 |       "uses": "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
   |       --------------------------------------------------------------------------- runtime artifacts usually published here
   |
   = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/images.yml:167:7
    |
167 |       "uses": "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
192 |       "uses": "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       --------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/images.yml:290:7
    |
290 |       "uses": "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
315 |       "uses": "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       --------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/images.yml:413:7
    |
413 |       "uses": "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
438 |       "uses": "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       --------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:220:7
    |
220 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
248 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:293:7
    |
293 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
321 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:366:7
    |
366 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
394 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:445:7
    |
445 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
473 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:518:7
    |
518 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
546 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:597:7
    |
597 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
625 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:676:7
    |
676 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
704 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:848:7
    |
848 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
876 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/minor-release-pr.yml:927:7
    |
927 |       uses: "actions/setup-node@v4"
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
955 |       uses: "docker/build-push-action@14487ce63c7a62a4a324b0bfb37086795e31c6c1"
    |       ------------------------------------------------------------------------- runtime artifacts usually published here
    |
    = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:62:7
     |
  62 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:220:7
     |
 220 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:293:7
     |
 293 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:366:7
     |
 366 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:445:7
     |
 445 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:518:7
     |
 518 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:597:7
     |
 597 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:676:7
     |
 676 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:753:7
     |
 753 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:848:7
     |
 848 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:927:7
     |
 927 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
    --> ./.github/workflows/patch-release-pr.yml:1003:7
     |
1003 |         uses: "actions/setup-node@v4"
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
1093 | / "on":
1094 | |   push:
1095 | |     branches:
1096 | |     - "release-[0-9]+.[0-9]+.x"
     | |_______________________________- generally used when publishing artifacts generated at runtime
     |
     = note: audit confidence → Low

error[cache-poisoning]: runtime artifacts potentially vulnerable to a cache poisoning attack
   --> ./.github/workflows/release.yml:44:7
    |
 44 |         uses: "actions/setup-node@v4"
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cache enabled by default here
...
435 | / "on":
436 | |   push:
437 | |     branches:
438 | |     - "release-[0-9]+.[0-9]+.x"
439 | |     - "k[0-9]+"
440 | |     - "main"
    | |____________- generally used when publishing artifacts generated at runtime
    |
    = note: audit confidence → Low

331 findings (15 ignored, 290 suppressed): 0 informational, 0 low, 0 medium, 26 high

@benclive benclive enabled auto-merge (squash) October 1, 2025 12:09
@benclive benclive merged commit c5a0a38 into main Oct 1, 2025
62 of 63 checks passed
@benclive benclive deleted the benclive/use-compressed-size-in-page-fetch branch October 1, 2025 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants