Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQE: add support for quantile_over_time #10629

Merged
merged 9 commits into from
Feb 18, 2025

Conversation

charleskorn
Copy link
Contributor

What this PR does

This PR adds support in MQE for quantile_over_time.

Which issue(s) this PR fixes or relates to

#10067

Checklist

  • Tests updated.
  • [n/a] Documentation added.
  • [covered by Mimir Query Engine #10067] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • [n/a] about-versioning.md updated with experimental features.

@charleskorn charleskorn marked this pull request as ready for review February 12, 2025 05:57
@charleskorn charleskorn requested a review from a team as a code owner February 12, 2025 05:57
pkg/streamingpromql/engine_test.go Show resolved Hide resolved
@@ -179,7 +179,13 @@ func (m *FunctionOverRangeVector) NextSeries(ctx context.Context) (types.Instant

func (m *FunctionOverRangeVector) emitAnnotation(generator types.AnnotationGenerator) {
metricName := m.metricNames.GetMetricNameForSeries(m.currentSeriesIndex)
m.Annotations.Add(generator(metricName, m.Inner.ExpressionPosition()))
pos := m.Inner.ExpressionPosition()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps types.EmitAnnotationFunc could take in a parameter for argument index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difficulty with this is that FunctionOverRangeVector doesn't know which order the arguments were in originally (inner operator first, or scalars first?), so I think what's here is the least-confusing option.

pkg/streamingpromql/operators/functions/range_vectors.go Outdated Show resolved Hide resolved
@charleskorn charleskorn enabled auto-merge (squash) February 14, 2025 06:21
Copy link
Contributor

@jhesketh jhesketh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance of a benchmark for this one please? I'd also like to see the effect on other range-vector queries against main.

pkg/streamingpromql/engine_test.go Outdated Show resolved Hide resolved
@charleskorn
Copy link
Contributor Author

Any chance of a benchmark for this one please? I'd also like to see the effect on other range-vector queries against main.

tl;dr: MQE is far faster than Prometheus' engine for quantile_over_time (up to 50% faster), and there seems to be little impact on other functions over range vectors.

quantile_over_time: MQE vs Prometheus
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/streamingpromql/benchmarks
cpu: Apple M1 Pro
                                                                          │ Prometheus  │               Mimir                │
                                                                          │   sec/op    │   sec/op     vs base               │
Query/quantile_over_time(0.3,_a_1[1m]),_instant_query-10                    140.8µ ± 2%   135.0µ ± 1%   -4.14% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_1[1m]),_range_query_with_100_steps-10       182.6µ ± 2%   156.3µ ± 2%  -14.39% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_1[1m]),_range_query_with_1000_steps-10      540.3µ ± 4%   264.9µ ± 1%  -50.97% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_100[1m]),_instant_query-10                  514.9µ ± 2%   411.7µ ± 2%  -20.06% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_100[1m]),_range_query_with_100_steps-10     3.810m ± 1%   1.989m ± 1%  -47.79% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_100[1m]),_range_query_with_1000_steps-10    27.36m ± 1%   11.80m ± 1%  -56.86% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_2000[1m]),_instant_query-10                 5.707m ± 1%   4.433m ± 3%  -22.33% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_2000[1m]),_range_query_with_100_steps-10    63.39m ± 0%   33.72m ± 1%  -46.81% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_2000[1m]),_range_query_with_1000_steps-10   453.5m ± 1%   221.8m ± 0%  -51.09% (p=0.002 n=6)
geomean                                                                     3.675m        2.299m       -37.46%

                                                                          │   Prometheus   │                Mimir                │
                                                                          │      B/op      │     B/op      vs base               │
Query/quantile_over_time(0.3,_a_1[1m]),_instant_query-10                      24.64Ki ± 0%   20.96Ki ± 0%  -14.91% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_1[1m]),_range_query_with_100_steps-10         58.69Ki ± 0%   22.06Ki ± 0%  -62.42% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_1[1m]),_range_query_with_1000_steps-10       360.49Ki ± 0%   25.96Ki ± 0%  -92.80% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_100[1m]),_instant_query-10                    165.2Ki ± 0%   120.1Ki ± 0%  -27.26% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_100[1m]),_range_query_with_100_steps-10      3323.0Ki ± 0%   212.6Ki ± 0%  -93.60% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_100[1m]),_range_query_with_1000_steps-10    32041.6Ki ± 0%   595.0Ki ± 0%  -98.14% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_2000[1m]),_instant_query-10                   2.799Mi ± 0%   1.991Mi ± 0%  -28.88% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_2000[1m]),_range_query_with_100_steps-10     66.827Mi ± 0%   3.889Mi ± 1%  -94.18% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_2000[1m]),_range_query_with_1000_steps-10    636.69Mi ± 0%   11.12Mi ± 1%  -98.25% (p=0.002 n=6)
geomean                                                                       2.141Mi        294.9Ki       -86.55%

                                                                          │  Prometheus   │               Mimir                │
                                                                          │   allocs/op   │  allocs/op   vs base               │
Query/quantile_over_time(0.3,_a_1[1m]),_instant_query-10                       448.0 ± 0%    367.0 ± 0%  -18.08% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_1[1m]),_range_query_with_100_steps-10          663.0 ± 0%    377.0 ± 0%  -43.14% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_1[1m]),_range_query_with_1000_steps-10        2491.0 ± 0%    400.0 ± 0%  -83.94% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_100[1m]),_instant_query-10                    2.172k ± 0%   1.884k ± 0%  -13.26% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_100[1m]),_range_query_with_100_steps-10      23.010k ± 0%   2.695k ± 0%  -88.29% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_100[1m]),_range_query_with_1000_steps-10    205.518k ± 0%   5.016k ± 0%  -97.56% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_2000[1m]),_instant_query-10                   34.89k ± 0%   30.78k ± 0%  -11.78% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_2000[1m]),_range_query_with_100_steps-10     454.60k ± 0%   46.93k ± 0%  -89.68% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_2000[1m]),_range_query_with_1000_steps-10   4101.18k ± 0%   93.00k ± 0%  -97.73% (p=0.002 n=6)
geomean                                                                       19.92k        3.858k       -80.63%

                                                                          │  Prometheus  │               Mimir                │
                                                                          │      B       │      B        vs base              │
Query/quantile_over_time(0.3,_a_1[1m]),_instant_query-10                    66.53Mi ± 0%   66.02Mi ± 2%       ~ (p=0.087 n=6)
Query/quantile_over_time(0.3,_a_1[1m]),_range_query_with_100_steps-10       63.64Mi ± 1%   65.41Mi ± 1%  +2.77% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_1[1m]),_range_query_with_1000_steps-10      62.57Mi ± 1%   64.51Mi ± 2%  +3.10% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_100[1m]),_instant_query-10                  61.46Mi ± 1%   61.68Mi ± 2%       ~ (p=0.851 n=6)
Query/quantile_over_time(0.3,_a_100[1m]),_range_query_with_100_steps-10     63.55Mi ± 1%   61.12Mi ± 1%  -3.82% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_100[1m]),_range_query_with_1000_steps-10    67.44Mi ± 1%   63.07Mi ± 1%  -6.48% (p=0.002 n=6)
Query/quantile_over_time(0.3,_a_2000[1m]),_instant_query-10                 63.33Mi ± 2%   63.78Mi ± 1%       ~ (p=0.132 n=6)
Query/quantile_over_time(0.3,_a_2000[1m]),_range_query_with_100_steps-10    71.98Mi ± 1%   70.32Mi ± 1%  -2.30% (p=0.004 n=6)
Query/quantile_over_time(0.3,_a_2000[1m]),_range_query_with_1000_steps-10   129.1Mi ± 1%   122.6Mi ± 1%  -5.09% (p=0.002 n=6)
geomean                                                                     70.14Mi        69.21Mi       -1.33%

Impact on other functions over range vectors, using rate as a representative example

Given the small absolute differences and the movement both up and down, I'm putting these differences down to noise rather than any systematic change.

                                                                      │ rate-before.txt │          rate-after.txt           │
                                                                      │     sec/op      │   sec/op     vs base              │
Query/rate(a_1[1m]),_instant_query/engine=Mimir-10                          134.2µ ± 3%   138.9µ ± 4%       ~ (p=0.065 n=6)
Query/rate(a_1[1m]),_range_query_with_100_steps/engine=Mimir-10             151.5µ ± 1%   150.6µ ± 2%       ~ (p=0.310 n=6)
Query/rate(a_1[1m]),_range_query_with_1000_steps/engine=Mimir-10            213.2µ ± 2%   226.1µ ± 6%  +6.06% (p=0.002 n=6)
Query/rate(a_100[1m]),_instant_query/engine=Mimir-10                        403.8µ ± 1%   407.5µ ± 3%       ~ (p=0.240 n=6)
Query/rate(a_100[1m]),_range_query_with_100_steps/engine=Mimir-10           1.532m ± 1%   1.538m ± 2%       ~ (p=0.310 n=6)
Query/rate(a_100[1m]),_range_query_with_1000_steps/engine=Mimir-10          6.999m ± 0%   7.086m ± 2%  +1.25% (p=0.002 n=6)
Query/rate(a_2000[1m]),_instant_query/engine=Mimir-10                       4.339m ± 1%   4.307m ± 2%       ~ (p=0.937 n=6)
Query/rate(a_2000[1m]),_range_query_with_100_steps/engine=Mimir-10          23.97m ± 1%   24.00m ± 2%       ~ (p=0.310 n=6)
Query/rate(a_2000[1m]),_range_query_with_1000_steps/engine=Mimir-10         128.3m ± 1%   128.7m ± 1%       ~ (p=0.065 n=6)
Query/rate(a_1[1m]),_range_query_with_10000_steps/engine=Mimir-10           894.0µ ± 1%   901.3µ ± 1%       ~ (p=0.093 n=6)
Query/rate(a_100[1m]),_range_query_with_10000_steps/engine=Mimir-10         65.19m ± 0%   65.22m ± 1%       ~ (p=0.818 n=6)
Query/rate(a_2000[1m]),_range_query_with_10000_steps/engine=Mimir-10         1.300 ± 2%    1.298 ± 1%       ~ (p=0.394 n=6)
Query/rate(nh_1[1m]),_instant_query/engine=Mimir-10                         139.1µ ± 2%   143.8µ ± 1%  +3.42% (p=0.002 n=6)
Query/rate(nh_1[1m]),_range_query_with_100_steps/engine=Mimir-10            262.5µ ± 2%   263.5µ ± 2%       ~ (p=0.818 n=6)
Query/rate(nh_1[1m]),_range_query_with_1000_steps/engine=Mimir-10           688.4µ ± 1%   660.8µ ± 2%  -4.00% (p=0.002 n=6)
Query/rate(nh_100[1m]),_instant_query/engine=Mimir-10                       582.7µ ± 5%   583.6µ ± 1%       ~ (p=0.699 n=6)
Query/rate(nh_100[1m]),_range_query_with_100_steps/engine=Mimir-10          11.39m ± 0%   11.39m ± 0%       ~ (p=1.000 n=6)
Query/rate(nh_100[1m]),_range_query_with_1000_steps/engine=Mimir-10         44.34m ± 1%   44.14m ± 3%       ~ (p=0.093 n=6)
Query/rate(nh_2000[1m]),_instant_query/engine=Mimir-10                      6.494m ± 1%   6.495m ± 1%       ~ (p=0.818 n=6)
Query/rate(nh_2000[1m]),_range_query_with_100_steps/engine=Mimir-10         210.6m ± 1%   210.7m ± 1%       ~ (p=0.394 n=6)
Query/rate(nh_2000[1m]),_range_query_with_1000_steps/engine=Mimir-10        850.4m ± 1%   852.1m ± 1%       ~ (p=0.589 n=6)
Query/rate(nh_1[1m]),_range_query_with_10000_steps/engine=Mimir-10          4.790m ± 1%   4.843m ± 1%  +1.10% (p=0.026 n=6)
Query/rate(nh_100[1m]),_range_query_with_10000_steps/engine=Mimir-10        417.1m ± 3%   417.2m ± 1%       ~ (p=0.589 n=6)
Query/rate(nh_2000[1m]),_range_query_with_10000_steps/engine=Mimir-10        8.644 ± 1%    8.699 ± 2%       ~ (p=0.065 n=6)

@charleskorn charleskorn force-pushed the charleskorn/mqe-quantile-over-time branch from 5372b3d to 1c685a4 Compare February 18, 2025 03:40
@charleskorn charleskorn merged commit c49a041 into main Feb 18, 2025
28 checks passed
@charleskorn charleskorn deleted the charleskorn/mqe-quantile-over-time branch February 18, 2025 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants