-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Describe the bug
Hi all,
we are exploring some cost calculation based on fetched chunk size using -frontend.query-stats-enabled
, but noticed that fetched_chunks_bytes
, query_wall_time_seconds
& fetched_series_count
are always zero (even though the metrics seems correct).
level=info ts=2023-11-20T10:36:33.397823633Z caller=handler.go:204 org_id=<redacted> traceID=<redacted> msg="query stats" component=query-frontend method=GET path=/loki/api/v1/query_range response_time=12.107848ms query_wall_time_seconds=0 fetched_series_count=0 fetched_chunks_bytes=0 param_query="...redacted"
level=info ts=2023-11-20T10:36:30.763321991Z caller=handler.go:204 org_id=<redacted> traceID=<redacted> msg="query stats" component=query-frontend method=GET path=/loki/api/v1/query_range response_time=11.702471ms query_wall_time_seconds=0 fetched_series_count=0 fetched_chunks_bytes=0 param_end=1700476590000000000 param_query="...redacted"
level=info ts=2023-11-20T10:36:09.806590352Z caller=handler.go:204 org_id=<redacted> traceID=<redacted> msg="query stats" component=query-frontend method=GET path=/loki/api/v1/query_range response_time=10.484677ms query_wall_time_seconds=0 fetched_series_count=0 fetched_chunks_bytes=0 param_query="...redacted"
...(omitted)
(fetched chunk size using metrics)
To Reproduce
Steps to reproduce the behavior:
- Installed Loki in SSD mode
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
loki-ssd <redacted> 16 2023-11-20 09:37:41.803206617 +0000 UTC deployed loki-5.35.0 2.9.2
- set
-frontend.query-stats-enabled
totrue
and run some queries that are not recent so chunks will be fetched in object storage.
Expected behavior
fetched_chunks_bytes
should be no-zero if it actually fetched some chunks.query_wall_time_seconds
&fetched_series_count
should be updated as well.
Environment:
- Infrastructure: EKS 1.25, backend storage is S3.
- Deployment tool: Helm w/ Loki Chart 5.35.0
Screenshots, Promtail config, or terminal output
For outputs, see above.
Related notes:
We are looking for a simple way to gather bytes that were fetched by each stream in order to have a concept of costs on read path. Ideally, we can just transform the log and use remote write to generate the trends.
Query logs like the following look somehow feasible but it uses human-readable format and would probably require some additional parsing.
level=info ts=2023-11-20T06:53:07.467232981Z caller=metrics.go:159 component=frontend org_id=<redacted> traceID=<redacted> latency=slow query="<redacted>" query_type=metric range_type=range length=6h0m0s start_delta=6h0m10.623222507s end_delta=10.623222703s step=1m0s duration=10.4652657s status=200 limit=1000 returned_lines=0 throughput=16MB total_bytes=166MB ...<omitted>
Therefore, I started to evaluate -frontend.query-stats-enabled
option and see how it works.
I checked the source code a bit for fetched_chunks_bytes
and it looks like the Stats
instance was never updated along the way.
Here the log prints numBytes
:
loki/pkg/lokifrontend/frontend/transport/handler.go
Lines 189 to 206 in 30d0030
numBytes := stats.LoadFetchedChunkBytes() | |
// Track stats. | |
f.querySeconds.WithLabelValues(userID).Add(wallTime.Seconds()) | |
f.querySeries.WithLabelValues(userID).Add(float64(numSeries)) | |
f.queryBytes.WithLabelValues(userID).Add(float64(numBytes)) | |
f.activeUsers.UpdateUserTimestamp(userID, time.Now()) | |
// Log stats. | |
logMessage := append([]interface{}{ | |
"msg", "query stats", | |
"component", "query-frontend", | |
"method", r.Method, | |
"path", r.URL.Path, | |
"response_time", queryResponseTime, | |
"query_wall_time_seconds", wallTime.Seconds(), | |
"fetched_series_count", numSeries, | |
"fetched_chunks_bytes", numBytes, |
In which LoadFetchedChunkBytes
looks fine since the Stats
instance is non-nil:
loki/pkg/querier/stats/stats.go
Lines 82 to 88 in 30d0030
func (s *Stats) LoadFetchedChunkBytes() uint64 { | |
if s == nil { | |
return 0 | |
} | |
return atomic.LoadUint64(&s.FetchedChunkBytes) | |
} |
Because it's initialized here:
loki/pkg/lokifrontend/frontend/transport/handler.go
Lines 117 to 121 in 30d0030
if f.cfg.QueryStatsEnabled { | |
var ctx context.Context | |
stats, ctx = querier_stats.ContextWithEmptyStats(r.Context()) | |
r = r.WithContext(ctx) | |
} |
And then get called at:
f.reportQueryStats(r, queryString, queryResponseTime, stats) |
But the stats object seems never get updated.
These are just some thoughts from my limited go experience, I could be wrong though.
Anyway, if you need more information, please let me know, thanks!