querier: adjust requested limit parameter for downstream requests to series endpoint #10652

narqo · 2025-02-14T15:47:43Z

What this PR does

This is a follow-up to #10620

Here I'm updating the MetricsForLabelMatchers to adjust the limit parameter passed to the downstream ingesters, base on the replication sets. The idea, suggested by @pracucci is to approximate the limit as:

L / (<shard-size> / <replication-factor>)

Note that for ingest storage, the above formula is equivalent to L / <partition-shard-size>.

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

Signed-off-by: Vladimir Varankin <[email protected]>

pkg/distributor/distributor.go

pracucci · 2025-02-17T08:01:34Z

pkg/distributor/distributor.go

+	if d.cfg.IngestStorageConfig.Enabled {
+		// When ingest storage is enabled, each partition, represented by one replication set is owned by only one ingester.
+		// So we use the number of replication sets to count the number of shards.
+		shardSize = len(replicationSets)


len(replicationSets) includes both ACTIVE and INACTIVE partitions. To be more accurate, I think we should only consider ACTIVE partitions (the shard size on the write path, where series are written, is made only of ACTIVE partitions). It may not be a big deal: if we decide to not address it, then I would at least leave a comment here about it.

pracucci · 2025-02-17T08:05:08Z

pkg/distributor/distributor.go

+	} else if len(replicationSets) == 1 {
+		// We expect to always have exactly 1 replication set when ingest storage is disabled.
+		// In classic Mimir the total number of shards (ingestion-tenant-shard-size) is the number of ingesters in the shard across all zones.
+		shardSize = len(replicationSets[0].Instances) / d.ingestersRing.ReplicationFactor()


Similar comment about read-only instances here. Another case where the number of instances is higher than the real shard size is when the "lookback" threshold triggers (e.g. for 12h after a scale up). The number of instances we lookup is higher than the real shard size. I think what we really want here is the shard size as computed on the write path, and not the read path. It would be more accurate computing the actual shard size by looking at the min between "configured tenant shard size" and "writeable instances/partitions in the ring".

I've updated the PR. The idea makes sense to me, although I'm not fully sure whether I've picked the right methods in the ring to calculate what you'd suggested. PTAL

Signed-off-by: Vladimir Varankin <[email protected]>

narqo requested review from pracucci and dimitarvdimitrov February 14, 2025 15:47

narqo changed the title ~~distributor: adjust requested limit parameter for downstream requests for series endpoint~~ querier: adjust requested limit parameter for downstream requests for series endpoint Feb 14, 2025

narqo added 3 commits February 14, 2025 20:56

querier: log hints limit in debug span

dff3080

Signed-off-by: Vladimir Varankin <[email protected]>

distributor: adjust request limit for replication sets

fee9e94

Signed-off-by: Vladimir Varankin <[email protected]>

update changelog

4d7ffd2

Signed-off-by: Vladimir Varankin <[email protected]>

narqo force-pushed the vldmr/series-api-limit branch from b17685e to 4d7ffd2 Compare February 14, 2025 19:56

narqo changed the title ~~querier: adjust requested limit parameter for downstream requests for series endpoint~~ querier: adjust requested limit parameter for downstream requests to series endpoint Feb 14, 2025

pracucci reviewed Feb 17, 2025

View reviewed changes

narqo added 3 commits February 17, 2025 15:33

only take into account active writers

92e193d

Signed-off-by: Vladimir Varankin <[email protected]>

more comments

e9b53f0

Signed-off-by: Vladimir Varankin <[email protected]>

fix comments

d23ffc1

Signed-off-by: Vladimir Varankin <[email protected]>

narqo marked this pull request as ready for review February 17, 2025 20:28

narqo requested a review from a team as a code owner February 17, 2025 20:28

narqo requested a review from pracucci February 17, 2025 20:28

add tests

4bfdff4

Signed-off-by: Vladimir Varankin <[email protected]>

narqo force-pushed the vldmr/series-api-limit branch from 8ae53eb to 4bfdff4 Compare February 17, 2025 20:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

querier: adjust requested limit parameter for downstream requests to series endpoint #10652

querier: adjust requested limit parameter for downstream requests to series endpoint #10652

narqo commented Feb 14, 2025 •

edited

Loading

pracucci Feb 17, 2025

pracucci Feb 17, 2025

narqo Feb 17, 2025

querier: adjust requested limit parameter for downstream requests to series endpoint #10652

Are you sure you want to change the base?

querier: adjust requested limit parameter for downstream requests to series endpoint #10652

Conversation

narqo commented Feb 14, 2025 • edited Loading

What this PR does

Checklist

pracucci Feb 17, 2025

Choose a reason for hiding this comment

pracucci Feb 17, 2025

Choose a reason for hiding this comment

narqo Feb 17, 2025

Choose a reason for hiding this comment

narqo commented Feb 14, 2025 •

edited

Loading