Skip to content

fetched_chunks_bytes, query_wall_time_seconds & fetched_series_count always report 0 #11264

@wtchangdm

Description

@wtchangdm

Describe the bug
Hi all,

we are exploring some cost calculation based on fetched chunk size using -frontend.query-stats-enabled, but noticed that fetched_chunks_bytes, query_wall_time_seconds & fetched_series_count are always zero (even though the metrics seems correct).

level=info ts=2023-11-20T10:36:33.397823633Z caller=handler.go:204 org_id=<redacted> traceID=<redacted> msg="query stats" component=query-frontend method=GET path=/loki/api/v1/query_range response_time=12.107848ms query_wall_time_seconds=0 fetched_series_count=0 fetched_chunks_bytes=0 param_query="...redacted"
level=info ts=2023-11-20T10:36:30.763321991Z caller=handler.go:204 org_id=<redacted> traceID=<redacted> msg="query stats" component=query-frontend method=GET path=/loki/api/v1/query_range response_time=11.702471ms query_wall_time_seconds=0 fetched_series_count=0 fetched_chunks_bytes=0 param_end=1700476590000000000 param_query="...redacted"
level=info ts=2023-11-20T10:36:09.806590352Z caller=handler.go:204 org_id=<redacted> traceID=<redacted> msg="query stats" component=query-frontend method=GET path=/loki/api/v1/query_range response_time=10.484677ms query_wall_time_seconds=0 fetched_series_count=0 fetched_chunks_bytes=0 param_query="...redacted"
...(omitted)

(fetched chunk size using metrics)

To Reproduce
Steps to reproduce the behavior:

  1. Installed Loki in SSD mode
NAME         	NAMESPACE	REVISION	UPDATED                                	STATUS  	CHART      	APP VERSION
loki-ssd	<redacted>  	16      	2023-11-20 09:37:41.803206617 +0000 UTC	deployed	loki-5.35.0	2.9.2
  1. set -frontend.query-stats-enabled to true and run some queries that are not recent so chunks will be fetched in object storage.

Expected behavior

  • fetched_chunks_bytes should be no-zero if it actually fetched some chunks.
  • query_wall_time_seconds & fetched_series_count should be updated as well.

Environment:

  • Infrastructure: EKS 1.25, backend storage is S3.
  • Deployment tool: Helm w/ Loki Chart 5.35.0

Screenshots, Promtail config, or terminal output
For outputs, see above.

Related notes:

We are looking for a simple way to gather bytes that were fetched by each stream in order to have a concept of costs on read path. Ideally, we can just transform the log and use remote write to generate the trends.

Query logs like the following look somehow feasible but it uses human-readable format and would probably require some additional parsing.

level=info ts=2023-11-20T06:53:07.467232981Z caller=metrics.go:159 component=frontend org_id=<redacted> traceID=<redacted> latency=slow query="<redacted>" query_type=metric range_type=range length=6h0m0s start_delta=6h0m10.623222507s end_delta=10.623222703s step=1m0s duration=10.4652657s status=200 limit=1000 returned_lines=0 throughput=16MB total_bytes=166MB ...<omitted>

Therefore, I started to evaluate -frontend.query-stats-enabled option and see how it works.

I checked the source code a bit for fetched_chunks_bytes and it looks like the Stats instance was never updated along the way.

Here the log prints numBytes:

numBytes := stats.LoadFetchedChunkBytes()
// Track stats.
f.querySeconds.WithLabelValues(userID).Add(wallTime.Seconds())
f.querySeries.WithLabelValues(userID).Add(float64(numSeries))
f.queryBytes.WithLabelValues(userID).Add(float64(numBytes))
f.activeUsers.UpdateUserTimestamp(userID, time.Now())
// Log stats.
logMessage := append([]interface{}{
"msg", "query stats",
"component", "query-frontend",
"method", r.Method,
"path", r.URL.Path,
"response_time", queryResponseTime,
"query_wall_time_seconds", wallTime.Seconds(),
"fetched_series_count", numSeries,
"fetched_chunks_bytes", numBytes,

In which LoadFetchedChunkBytes looks fine since the Stats instance is non-nil:

func (s *Stats) LoadFetchedChunkBytes() uint64 {
if s == nil {
return 0
}
return atomic.LoadUint64(&s.FetchedChunkBytes)
}

Because it's initialized here:

if f.cfg.QueryStatsEnabled {
var ctx context.Context
stats, ctx = querier_stats.ContextWithEmptyStats(r.Context())
r = r.WithContext(ctx)
}

And then get called at:

f.reportQueryStats(r, queryString, queryResponseTime, stats)

But the stats object seems never get updated.

These are just some thoughts from my limited go experience, I could be wrong though.

Anyway, if you need more information, please let me know, thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugSomehing is not working as expected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions