-
Notifications
You must be signed in to change notification settings - Fork 46
Add query per stage latency metrics #839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -328,7 +328,8 @@ public SearchResponse handle(IndexState indexState, SearchRequest searchRequest) | |
InnerHitFetchTask::getDiagnostic))); | ||
} | ||
searchContext.getResponseBuilder().setDiagnostics(diagnostics); | ||
|
||
// TODO: These are the diagnostics I want to publish to prometheus, I'll try to figure out | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just left these comments here as a reference for myself. |
||
// where other metrics are being published and follow that pattern | ||
if (profileResultBuilder != null) { | ||
searchContext.getResponseBuilder().setProfileResult(profileResultBuilder); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -120,6 +120,7 @@ public static void updateSearchResponseMetrics( | |
.labelValues(index, "facet:" + entry.getKey()) | ||
.observe(entry.getValue()); | ||
} | ||
searchStageLatencyMs.labelValues(index, "rescore").observe(diagnostics.getRescoreTimeMs());//adding extra rescore metric to avoid calculating average of all rescorers | ||
for (Map.Entry<String, Double> entry : diagnostics.getRescorersTimeMsMap().entrySet()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For now, searchStageLatencyMs only has the rescorer latency for each rescorer, but for an initial overview, a general latency for all rescorers might also be useful. |
||
searchStageLatencyMs | ||
.labelValues(index, "rescorer:" + entry.getKey()) | ||
|
@@ -139,6 +140,9 @@ public MetricSnapshots collect() { | |
try { | ||
metrics.add(searchTimeoutCount.collect()); | ||
metrics.add(searchTerminatedEarlyCount.collect()); | ||
metrics.add(searchStageLatencyMs.collect());// Just adding this here should mean it gets published to prometheus, is that what we want? | ||
// when is publishVerboseMetrics set to true, I couldn't find this metric in the grafana shard without any filters? | ||
// maybe it makes sense to add an extra parameter like publishSearchStageLatencyMs with default value true (?) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it make sense to have an option to turn this on and off separately? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a live index setting which enables publishing verbose metrics There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think if a metric is per-hit, it should come under the verbose metrics There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that makes sense. I was considering adding something like But if we only turn this on in case we want to investigate something, or the performance impact is not too high, I suppose that wouldn't be necessary. |
||
|
||
boolean publishVerboseMetrics = false; | ||
Set<String> indexNames = globalState.getIndexNames(); | ||
|
@@ -151,7 +155,6 @@ public MetricSnapshots collect() { | |
if (publishVerboseMetrics) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When would we set this to true? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only when we need to the extra metrics to investigate something |
||
metrics.add(searchResponseSizeBytes.collect()); | ||
metrics.add(searchResponseTotalHits.collect()); | ||
metrics.add(searchStageLatencyMs.collect()); | ||
} | ||
} catch (Exception e) { | ||
logger.warn("Error getting search response metrics: ", e); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the diagnostics object here contains all of the required metrics, including the rescorer metrics. Can you please confirm once?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks like it here. What I couldn't find there was a total rescore time metric, but just adding up the time of all rescorers in a prometheus query might be an easier option.