Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,8 @@ public SearchResponse handle(IndexState indexState, SearchRequest searchRequest)
InnerHitFetchTask::getDiagnostic)));
}
searchContext.getResponseBuilder().setDiagnostics(diagnostics);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the diagnostics object here contains all of the required metrics, including the rescorer metrics. Can you please confirm once?

Copy link
Author

@mathisnyp mathisnyp Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks like it here. What I couldn't find there was a total rescore time metric, but just adding up the time of all rescorers in a prometheus query might be an easier option.


// TODO: These are the diagnostics I want to publish to prometheus, I'll try to figure out
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just left these comments here as a reference for myself.

// where other metrics are being published and follow that pattern
if (profileResultBuilder != null) {
searchContext.getResponseBuilder().setProfileResult(profileResultBuilder);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ public static void updateSearchResponseMetrics(
.labelValues(index, "facet:" + entry.getKey())
.observe(entry.getValue());
}
searchStageLatencyMs.labelValues(index, "rescore").observe(diagnostics.getRescoreTimeMs());//adding extra rescore metric to avoid calculating average of all rescorers
for (Map.Entry<String, Double> entry : diagnostics.getRescorersTimeMsMap().entrySet()) {
Copy link
Author

@mathisnyp mathisnyp Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, searchStageLatencyMs only has the rescorer latency for each rescorer, but for an initial overview, a general latency for all rescorers might also be useful.

searchStageLatencyMs
.labelValues(index, "rescorer:" + entry.getKey())
Expand All @@ -139,6 +140,9 @@ public MetricSnapshots collect() {
try {
metrics.add(searchTimeoutCount.collect());
metrics.add(searchTerminatedEarlyCount.collect());
metrics.add(searchStageLatencyMs.collect());// Just adding this here should mean it gets published to prometheus, is that what we want?
// when is publishVerboseMetrics set to true, I couldn't find this metric in the grafana shard without any filters?
// maybe it makes sense to add an extra parameter like publishSearchStageLatencyMs with default value true (?)
Copy link
Author

@mathisnyp mathisnyp Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have an option to turn this on and off separately?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a live index setting which enables publishing verbose metrics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if a metric is per-hit, it should come under the verbose metrics

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. I was considering adding something like publishOnlyPerQueryStageLatency, which could offer a way to only publish searchStageLatencyMs without searchResponseSizeBytes and searchResponseTotalHits. (All three would still be published if verbose is set to true.)

But if we only turn this on in case we want to investigate something, or the performance impact is not too high, I suppose that wouldn't be necessary.


boolean publishVerboseMetrics = false;
Set<String> indexNames = globalState.getIndexNames();
Expand All @@ -151,7 +155,6 @@ public MetricSnapshots collect() {
if (publishVerboseMetrics) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would we set this to true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only when we need to the extra metrics to investigate something

metrics.add(searchResponseSizeBytes.collect());
metrics.add(searchResponseTotalHits.collect());
metrics.add(searchStageLatencyMs.collect());
}
} catch (Exception e) {
logger.warn("Error getting search response metrics: ", e);
Expand Down