Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQE Absent function #10523

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open

MQE Absent function #10523

wants to merge 36 commits into from

Conversation

lamida
Copy link
Contributor

@lamida lamida commented Jan 27, 2025

What this PR does

Implement absent function. Added a new Operator: Absent. We also need to modify InstantVectorFunctionOperatorFactory to pass parser.Expressions object needed to evaluate the labels of the absent argument.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@lamida lamida marked this pull request as ready for review January 27, 2025 14:57
@lamida lamida requested a review from a team as a code owner January 27, 2025 14:57
@lamida lamida requested a review from charleskorn January 27, 2025 17:31
innerExpr parser.Expr
inner types.InstantVectorOperator
expressionPosition posrange.PositionRange
absentCount int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding your intention correctly, would innerSeriesCount be a clearer name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is count how often the series is not there or being absent. But let me double check again according to the feedback in #10523 (comment) to ensure the correctness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct - I think we're missing a few test cases that might flush out the correct behaviour:

  • instant query over a selector that selects multiple series, each with a point at the query time
  • instant query over an expression that could produce multiple series, but doesn't have any points (eg. absent(metric_with_many_series > Inf))
  • range query over a selector that selects a single series that does not have points at every time step in the range
  • same as above, but with points at every time step in the range
  • range query over a selector that selects multiple series that together don't have points at every time step in the range
  • same as above, but where each time step has a sample in at least one series
  • range query over an expression that could produce multiple series, but doesn't have any points (eg. absent(metric_with_many_series > Inf))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed feedback to ensure the correctness. I will add the test cases to ensure the correct behaviour.

)

// AbsentOperator is an operator that implements the absent() function.
type AbsentOperator struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] We generally haven't used an Operator suffix for operators - I think Absent would be sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] This probably belongs in the functions package given it implements a function.

@@ -22,6 +23,7 @@ type InstantVectorFunctionOperatorFactory func(
annotations *annotations.Annotations,
expressionPosition posrange.PositionRange,
timeRange types.QueryTimeRange,
innerExpressions parser.Expressions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Perhaps argumentExpressions would be a clearer name? Or argExpressions to maintain symmetry with args?

@lamida lamida force-pushed the lamida/mqe-absent-function branch 2 times, most recently from d7818b2 to 51762d5 Compare February 10, 2025 03:21
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
@lamida lamida force-pushed the lamida/mqe-absent-function branch from 66776aa to 607c848 Compare February 13, 2025 19:47
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
@lamida lamida requested a review from charleskorn February 13, 2025 20:29
@lamida
Copy link
Contributor Author

lamida commented Feb 13, 2025

@charleskorn I have addressed all of your comments from the previous review. Please help to do another review. 🙏

eval range from 0s to 40m step 4m absent(non_existent_metric)
{} 1x10

# test look back period
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to test the lookback period specifically for absent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in eb544c1

clear

# absent() tests
load 4m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest modifying these tests to use a 6m step to make the expected behaviour clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 600248e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

600248e is only part of what I'd recommend: I'd also recommend changing the step on the range query test cases to 6m as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same applies to the test added in 3765fcf)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you want us to test unaligned range query with the sample. Nevertheless set the evaulation step to 6m too in ac1f1c9

Comment on lines +952 to +954
series{case="a"} _ 1 _ 1 _ 1 _ 1 _ 1 _
series{case="b"} _ _ 2 _ 2 _ 2 _ 2 _ 2
series{case="c"} _ _ _ 3 _ _ _ 3 _ _ _
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a test for instant and range queries with absent(series{case=~"(a|b)"})? This will flush out the changes required in the operator to correctly handle multiple input series.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 3765fcf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't handle the case where there are multiple inner series correctly.

I think we need to do something like this:

In SeriesMetadata:

  • call Inner.SeriesMetadata() to get the list of inner series
  • create a slice to keep track of whether we've seen a point at each time step
  • for each inner series:
    • call Inner.NextSeries() to get the data for that series
    • update the presence slice based on the series' data
  • if there is a point present at every time step, return no series
  • otherwise, if there are any points absent:
    • store the presence slice for use in NextSeries() later
    • return a single series (as it does currently)

In NextSeries:

  • if there is a point present at every time step, or if NextSeries() has been called before, return EOS
  • otherwise:
    • construct the slice of FPoints based on the presence slice created in SeriesMetadata()
    • return the FPoints

pkg/streamingpromql/functions_test.go Outdated Show resolved Hide resolved
lamida and others added 6 commits February 14, 2025 09:17
Comment on lines 987 to 988
series{case="a"} _ _ 1 _ _ 1 _ _ 1
series{case="b"} _ _ _ 2 2 _ 2 2 _ 2 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be this, no?

Suggested change
series{case="a"} _ _ 1 _ _ 1 _ _ 1
series{case="b"} _ _ _ 2 2 _ 2 2 _ 2 2
{} 1 1 _ _ _ _ _ _ _ _ _

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the series range evaluation series{case=~"(a|b)"}. The absent evaluation itself absent(series{case=~"(a|b)"}) is in the next line. It is easier to see the expected output of absent after seeing the multiple series range evaluation above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this less confusing, let's just remove that range query evaluation c9de184

Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
@lamida lamida requested a review from charleskorn February 14, 2025 10:42
Copy link
Contributor

@jhesketh jhesketh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this has already been discussed/considered (if so, please disregard this comment).

However, I wonder if given the uniqueness of this operator if it makes sense to handle it as a special case instead of as an InstantVectorFunctionOperatorFactory?

Eg in convertFunctionCallToInstantVectorOperator in query.go we check if it's the absent function and then create the operator and pass it the full inner expression instead of looking it up in instantVectorFunctionOperatorFactories.

This way the absent operator can reason about the inner series. For example, if I'm understand the function correctly, there is nothing to do if the inner expression is something other than a selector (eg there's no point evaluating a sum).

Sorry if this has already been discussed or derails things too much!

@charleskorn
Copy link
Contributor

charleskorn commented Feb 17, 2025

This way the absent operator can reason about the inner series. For example, if I'm understand the function correctly, there is nothing to do if the inner expression is something other than a selector (eg there's no point evaluating a sum).

absent can be run over any expression that produces an instant vector. The only difference in behaviour between the case where absent is run over a instant vector selector and some other expression is the labels it will return:

  • if the argument is an instant vector selector, absent will infer the output labels from the selector
  • if the argument isn't an instant vector selector, absent will always produce a series with no labels (ie. {})

@jhesketh
Copy link
Contributor

This way the absent operator can reason about the inner series. For example, if I'm understand the function correctly, there is nothing to do if the inner expression is something other than a selector (eg there's no point evaluating a sum).

absent can be run over any expression that produces an instant vector. The only difference in behaviour between the case where absent is run over a instant vector selector and some other expression is the labels it will return:

* if the argument is an instant vector selector, `absent` will infer the output labels from the selector

* if the argument isn't an instant vector selector, `absent` will always produce a series with no labels (ie. `{}`)

Right, I see. However it may still be neater to have the operator called specifically in query.go to avoid passing both an evaluated and non-evaluated expression to all the functions (which feels odd).

}
defer types.PutSeriesMetadataSlice(innerMetadata)

if a.presence == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this check is needed. Won't this always be nil?

}

func (a *Absent) NextSeries(_ context.Context) (types.InstantVectorSeriesData, error) {
output := types.InstantVectorSeriesData{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have an EOS error if this has been called already?

Comment on lines +58 to +60
for range a.timeRange.StepCount {
a.presence = append(a.presence, false)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary: we should be able to reslice a.presence with something like a.presence = a.presence[:a.timeRange.StepCount].

The pool guarantees that all values in the slice up to the capacity requested are already false, even if the slice is reused.

Comment on lines +70 to +72
if err != nil && errors.Is(err, types.EOS) {
return metadata, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only return if the error is an EOS? Shouldn't we return any time we get an error?

Comment on lines +75 to +87
for step := range a.timeRange.StepCount {
t := a.timeRange.IndexTime(int64(step))
for _, s := range series.Floats {
if t == s.T {
a.presence[step] = true
}
}
for _, s := range series.Histograms {
if t == s.T {
a.presence[step] = true
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very inefficient - for every time step, we'll iterate through the floats and histograms slices.

What if iterated through Floats, and used a.timeRange.PointIndex() to find the right index into a.presence to set to true for each point?

output := types.InstantVectorSeriesData{}

var err error
output.Floats, err = types.FPointSlicePool.Get(a.timeRange.StepCount, a.memoryConsumptionTracker)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] We could defer allocating this until we know we're returning at least one point in the if below

@charleskorn
Copy link
Contributor

However it may still be neater to have the operator called specifically in query.go to avoid passing both an evaluated and non-evaluated expression to all the functions (which feels odd).

Are you referring to passing the expression to the function factory? If so, I don't mind this so much, but I don't hold this opinion strongly. Making absent a special case also feels a bit odd to me.

@jhesketh
Copy link
Contributor

However it may still be neater to have the operator called specifically in query.go to avoid passing both an evaluated and non-evaluated expression to all the functions (which feels odd).

Are you referring to passing the expression to the function factory? If so, I don't mind this so much, but I don't hold this opinion strongly. Making absent a special case also feels a bit odd to me.

Yes, that's what I was referring to.
It feels odd to me to pass both the parsed arguments, and unparsed expressions. Especially since the expressions are only used by absent.
I don't hold this strongly either, but find of the two oddities, handling absent in a special case is my personal preference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants