Feature: Add new metric `slow_request_throughput` #619

tinitiuset · 2024-11-14T18:03:57Z

What this PR does:

This PR adds two configuration parameters server.throughput-config.slow-request-cutoff and server.throughput-config.unit, exposes a new metric slow_request_server_throughput and calculates throughput in units/s for the metric using information from header Server-Timing.

If server.throughput-config.slow-request-cutoff is 0 no throughput will be calculated.

Implemented code is really similar to this branch by @krajorama. But adds some flexibility to measure throughput based on different signals. It will default to total_samples as processed samples are easier to explain to users. Discussed with @dimitarvdimitrov.

Checklist

Tests updated
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

… throughput of slow queries.

CLAassistant · 2024-11-14T18:04:04Z

All committers have signed the CLA.

dimitarvdimitrov

Thanks for opening!

The only thing I'm not sure about is the cutoff in milliseconds. I don't think that's what the original issue discussed, can you double-check?

It would also be nice to see some tests, the logic is getting a bit more complicated and shouldn't stay untested.

CHANGELOG.md

server/server.go

server/metrics.go

middleware/instrument.go

dimitarvdimitrov · 2024-11-15T13:29:20Z

middleware/instrument.go

@@ -105,6 +110,17 @@ func (i Instrument) Wrap(next http.Handler) http.Handler {
 			labelValues = append(labelValues, tenantID)
 			instrument.ObserveWithExemplar(r.Context(), i.PerTenantDuration.WithLabelValues(labelValues...), respMetrics.Duration.Seconds())
 		}
+		if i.SlowRequestCutoff > 0 && respMetrics.Duration > i.SlowRequestCutoff {


in the original (private :( ) issue it's discussed to have the cutoff be the volume ("N samples") instead of latency. Should this be the other way around?

server/metrics.go

dimitarvdimitrov · 2024-11-15T13:35:32Z

middleware/instrument.go

@@ -105,6 +110,17 @@ func (i Instrument) Wrap(next http.Handler) http.Handler {
 			labelValues = append(labelValues, tenantID)
 			instrument.ObserveWithExemplar(r.Context(), i.PerTenantDuration.WithLabelValues(labelValues...), respMetrics.Duration.Seconds())
 		}
+		if i.SlowRequestCutoff > 0 && respMetrics.Duration > i.SlowRequestCutoff {
+			parts := strings.Split(w.Header().Get("Server-Timing"), ", ")


can you add some tests for this parsing? I see that there is no instrument_test.go, but I think how we handle some edge cases especially when parsing these headers is important to put in a test.

Since here the output is in Prometheus metrics, it's may not immediately obvious how to write tests against them (it def. wasn't to me when I started). Here's an example which asserts on the metrics the code generates.

dskit/log/ratelimit_test.go

Lines 17 to 37 in 90d7ee0

func TestRateLimitedLoggerLogs(t *testing.T) {

buf := bytes.NewBuffer(nil)

c := newCounterLogger(buf)

reg := prometheus.NewPedanticRegistry()

r := NewRateLimitedLogger(c, 1, 1, reg)

level.Error(r).Log("msg", "error will be logged")

assert.Equal(t, 1, c.count)

logContains := []string{"error", "error will be logged"}

c.assertContains(t, logContains)

require.NoError(t, testutil.GatherAndCompare(reg, strings.NewReader(`

# HELP logger_rate_limit_discarded_log_lines_total Total number of discarded log lines per level.

# TYPE logger_rate_limit_discarded_log_lines_total counter

logger_rate_limit_discarded_log_lines_total{level="info"} 0

logger_rate_limit_discarded_log_lines_total{level="debug"} 0

logger_rate_limit_discarded_log_lines_total{level="warn"} 0

logger_rate_limit_discarded_log_lines_total{level="error"} 0

`)))

}

server/server.go

…ervations

Feature: Add new metric slow_request_server_throughput to track the…

96d3097

… throughput of slow queries.

tinitiuset requested review from krajorama, dimitarvdimitrov and aldernero November 14, 2024 18:03

dimitarvdimitrov reviewed Nov 15, 2024

View reviewed changes

Refactor variables and config keys, add logic to avoid unnecesary obs…

c8afdc1

…ervations

tinitiuset changed the title ~~Feature: Add new metric slow_request_server_throughput~~ Feature: Add new metric slow_request_throughput Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add new metric `slow_request_throughput` #619

Feature: Add new metric `slow_request_throughput` #619

tinitiuset commented Nov 14, 2024

CLAassistant commented Nov 14, 2024 •

edited

Loading

dimitarvdimitrov left a comment

dimitarvdimitrov Nov 15, 2024

dimitarvdimitrov Nov 15, 2024

	func TestRateLimitedLoggerLogs(t *testing.T) {
	buf := bytes.NewBuffer(nil)
	c := newCounterLogger(buf)
	reg := prometheus.NewPedanticRegistry()
	r := NewRateLimitedLogger(c, 1, 1, reg)

	level.Error(r).Log("msg", "error will be logged")
	assert.Equal(t, 1, c.count)

	logContains := []string{"error", "error will be logged"}
	c.assertContains(t, logContains)

	require.NoError(t, testutil.GatherAndCompare(reg, strings.NewReader(`
	# HELP logger_rate_limit_discarded_log_lines_total Total number of discarded log lines per level.
	# TYPE logger_rate_limit_discarded_log_lines_total counter
	logger_rate_limit_discarded_log_lines_total{level="info"} 0
	logger_rate_limit_discarded_log_lines_total{level="debug"} 0
	logger_rate_limit_discarded_log_lines_total{level="warn"} 0
	logger_rate_limit_discarded_log_lines_total{level="error"} 0
	`)))
	}

Feature: Add new metric slow_request_throughput #619

Are you sure you want to change the base?

Feature: Add new metric slow_request_throughput #619

Conversation

tinitiuset commented Nov 14, 2024

CLAassistant commented Nov 14, 2024 • edited Loading

dimitarvdimitrov left a comment

Choose a reason for hiding this comment

dimitarvdimitrov Nov 15, 2024

Choose a reason for hiding this comment

dimitarvdimitrov Nov 15, 2024

Choose a reason for hiding this comment

Feature: Add new metric `slow_request_throughput` #619

Feature: Add new metric `slow_request_throughput` #619

CLAassistant commented Nov 14, 2024 •

edited

Loading