-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MQE Absent function #10523
base: main
Are you sure you want to change the base?
MQE Absent function #10523
Conversation
innerExpr parser.Expr | ||
inner types.InstantVectorOperator | ||
expressionPosition posrange.PositionRange | ||
absentCount int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding your intention correctly, would innerSeriesCount
be a clearer name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is count how often the series is not there or being absent
. But let me double check again according to the feedback in #10523 (comment) to ensure the correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct - I think we're missing a few test cases that might flush out the correct behaviour:
- instant query over a selector that selects multiple series, each with a point at the query time
- instant query over an expression that could produce multiple series, but doesn't have any points (eg.
absent(metric_with_many_series > Inf)
) - range query over a selector that selects a single series that does not have points at every time step in the range
- same as above, but with points at every time step in the range
- range query over a selector that selects multiple series that together don't have points at every time step in the range
- same as above, but where each time step has a sample in at least one series
- range query over an expression that could produce multiple series, but doesn't have any points (eg.
absent(metric_with_many_series > Inf)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed feedback to ensure the correctness. I will add the test cases to ensure the correct behaviour.
) | ||
|
||
// AbsentOperator is an operator that implements the absent() function. | ||
type AbsentOperator struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] We generally haven't used an Operator
suffix for operators - I think Absent
would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] This probably belongs in the functions
package given it implements a function.
pkg/streamingpromql/functions.go
Outdated
@@ -22,6 +23,7 @@ type InstantVectorFunctionOperatorFactory func( | |||
annotations *annotations.Annotations, | |||
expressionPosition posrange.PositionRange, | |||
timeRange types.QueryTimeRange, | |||
innerExpressions parser.Expressions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Perhaps argumentExpressions
would be a clearer name? Or argExpressions
to maintain symmetry with args
?
d7818b2
to
51762d5
Compare
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
66776aa
to
607c848
Compare
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
@charleskorn I have addressed all of your comments from the previous review. Please help to do another review. 🙏 |
eval range from 0s to 40m step 4m absent(non_existent_metric) | ||
{} 1x10 | ||
|
||
# test look back period |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to test the lookback period specifically for absent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in eb544c1
clear | ||
|
||
# absent() tests | ||
load 4m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest modifying these tests to use a 6m step to make the expected behaviour clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 600248e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
600248e is only part of what I'd recommend: I'd also recommend changing the step on the range query test cases to 6m as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same applies to the test added in 3765fcf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you want us to test unaligned range query with the sample. Nevertheless set the evaulation step to 6m too in ac1f1c9
series{case="a"} _ 1 _ 1 _ 1 _ 1 _ 1 _ | ||
series{case="b"} _ _ 2 _ 2 _ 2 _ 2 _ 2 | ||
series{case="c"} _ _ _ 3 _ _ _ 3 _ _ _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a test for instant and range queries with absent(series{case=~"(a|b)"})
? This will flush out the changes required in the operator to correctly handle multiple input series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 3765fcf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't handle the case where there are multiple inner series correctly.
I think we need to do something like this:
In SeriesMetadata
:
- call
Inner.SeriesMetadata()
to get the list of inner series - create a slice to keep track of whether we've seen a point at each time step
- for each inner series:
- call
Inner.NextSeries()
to get the data for that series - update the presence slice based on the series' data
- call
- if there is a point present at every time step, return no series
- otherwise, if there are any points absent:
- store the presence slice for use in
NextSeries()
later - return a single series (as it does currently)
- store the presence slice for use in
In NextSeries
:
- if there is a point present at every time step, or if
NextSeries()
has been called before, returnEOS
- otherwise:
- construct the slice of
FPoint
s based on the presence slice created inSeriesMetadata()
- return the
FPoint
s
- construct the slice of
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Co-authored-by: Charles Korn <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
series{case="a"} _ _ 1 _ _ 1 _ _ 1 | ||
series{case="b"} _ _ _ 2 2 _ 2 2 _ 2 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be this, no?
series{case="a"} _ _ 1 _ _ 1 _ _ 1 | |
series{case="b"} _ _ _ 2 2 _ 2 2 _ 2 2 | |
{} 1 1 _ _ _ _ _ _ _ _ _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just the series range evaluation series{case=~"(a|b)"}
. The absent evaluation itself absent(series{case=~"(a|b)"})
is in the next line. It is easier to see the expected output of absent after seeing the multiple series range evaluation above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this less confusing, let's just remove that range query evaluation c9de184
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
Signed-off-by: Jon Kartago Lamida <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if this has already been discussed/considered (if so, please disregard this comment).
However, I wonder if given the uniqueness of this operator if it makes sense to handle it as a special case instead of as an InstantVectorFunctionOperatorFactory
?
Eg in convertFunctionCallToInstantVectorOperator
in query.go
we check if it's the absent
function and then create the operator and pass it the full inner expression instead of looking it up in instantVectorFunctionOperatorFactories
.
This way the absent
operator can reason about the inner series. For example, if I'm understand the function correctly, there is nothing to do if the inner expression is something other than a selector (eg there's no point evaluating a sum
).
Sorry if this has already been discussed or derails things too much!
|
Right, I see. However it may still be neater to have the operator called specifically in query.go to avoid passing both an evaluated and non-evaluated expression to all the functions (which feels odd). |
} | ||
defer types.PutSeriesMetadataSlice(innerMetadata) | ||
|
||
if a.presence == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this check is needed. Won't this always be nil?
} | ||
|
||
func (a *Absent) NextSeries(_ context.Context) (types.InstantVectorSeriesData, error) { | ||
output := types.InstantVectorSeriesData{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to have an EOS error if this has been called already?
for range a.timeRange.StepCount { | ||
a.presence = append(a.presence, false) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary: we should be able to reslice a.presence
with something like a.presence = a.presence[:a.timeRange.StepCount]
.
The pool guarantees that all values in the slice up to the capacity requested are already false
, even if the slice is reused.
if err != nil && errors.Is(err, types.EOS) { | ||
return metadata, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only return if the error is an EOS
? Shouldn't we return any time we get an error?
for step := range a.timeRange.StepCount { | ||
t := a.timeRange.IndexTime(int64(step)) | ||
for _, s := range series.Floats { | ||
if t == s.T { | ||
a.presence[step] = true | ||
} | ||
} | ||
for _, s := range series.Histograms { | ||
if t == s.T { | ||
a.presence[step] = true | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very inefficient - for every time step, we'll iterate through the floats and histograms slices.
What if iterated through Floats
, and used a.timeRange.PointIndex()
to find the right index into a.presence
to set to true
for each point?
output := types.InstantVectorSeriesData{} | ||
|
||
var err error | ||
output.Floats, err = types.FPointSlicePool.Get(a.timeRange.StepCount, a.memoryConsumptionTracker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] We could defer allocating this until we know we're returning at least one point in the if
below
Are you referring to passing the expression to the function factory? If so, I don't mind this so much, but I don't hold this opinion strongly. Making |
Yes, that's what I was referring to. |
What this PR does
Implement absent function. Added a new Operator:
Absent
. We also need to modifyInstantVectorFunctionOperatorFactory
to passparser.Expressions
object needed to evaluate the labels of the absent argument.Which issue(s) this PR fixes or relates to
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.