-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Add new metric slow_request_throughput
#619
base: main
Are you sure you want to change the base?
Conversation
… throughput of slow queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening!
The only thing I'm not sure about is the cutoff in milliseconds. I don't think that's what the original issue discussed, can you double-check?
It would also be nice to see some tests, the logic is getting a bit more complicated and shouldn't stay untested.
@@ -105,6 +110,17 @@ func (i Instrument) Wrap(next http.Handler) http.Handler { | |||
labelValues = append(labelValues, tenantID) | |||
instrument.ObserveWithExemplar(r.Context(), i.PerTenantDuration.WithLabelValues(labelValues...), respMetrics.Duration.Seconds()) | |||
} | |||
if i.SlowRequestCutoff > 0 && respMetrics.Duration > i.SlowRequestCutoff { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the original (private :( ) issue it's discussed to have the cutoff be the volume ("N samples") instead of latency. Should this be the other way around?
@@ -105,6 +110,17 @@ func (i Instrument) Wrap(next http.Handler) http.Handler { | |||
labelValues = append(labelValues, tenantID) | |||
instrument.ObserveWithExemplar(r.Context(), i.PerTenantDuration.WithLabelValues(labelValues...), respMetrics.Duration.Seconds()) | |||
} | |||
if i.SlowRequestCutoff > 0 && respMetrics.Duration > i.SlowRequestCutoff { | |||
parts := strings.Split(w.Header().Get("Server-Timing"), ", ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some tests for this parsing? I see that there is no instrument_test.go
, but I think how we handle some edge cases especially when parsing these headers is important to put in a test.
Since here the output is in Prometheus metrics, it's may not immediately obvious how to write tests against them (it def. wasn't to me when I started). Here's an example which asserts on the metrics the code generates.
Lines 17 to 37 in 90d7ee0
func TestRateLimitedLoggerLogs(t *testing.T) { | |
buf := bytes.NewBuffer(nil) | |
c := newCounterLogger(buf) | |
reg := prometheus.NewPedanticRegistry() | |
r := NewRateLimitedLogger(c, 1, 1, reg) | |
level.Error(r).Log("msg", "error will be logged") | |
assert.Equal(t, 1, c.count) | |
logContains := []string{"error", "error will be logged"} | |
c.assertContains(t, logContains) | |
require.NoError(t, testutil.GatherAndCompare(reg, strings.NewReader(` | |
# HELP logger_rate_limit_discarded_log_lines_total Total number of discarded log lines per level. | |
# TYPE logger_rate_limit_discarded_log_lines_total counter | |
logger_rate_limit_discarded_log_lines_total{level="info"} 0 | |
logger_rate_limit_discarded_log_lines_total{level="debug"} 0 | |
logger_rate_limit_discarded_log_lines_total{level="warn"} 0 | |
logger_rate_limit_discarded_log_lines_total{level="error"} 0 | |
`))) | |
} |
slow_request_server_throughput
slow_request_throughput
What this PR does:
This PR adds two configuration parameters
server.throughput-config.slow-request-cutoff
andserver.throughput-config.unit
, exposes a new metricslow_request_server_throughput
and calculates throughput inunits/s
for the metric using information from headerServer-Timing
.If
server.throughput-config.slow-request-cutoff
is0
no throughput will be calculated.Implemented code is really similar to this branch by @krajorama. But adds some flexibility to measure throughput based on different signals. It will default to
total_samples
as processed samples are easier to explain to users. Discussed with @dimitarvdimitrov.Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]