Calculate size based flush threshold per topic #14765

itschrispeck · 2025-01-07T03:37:42Z

With multi-stream ingestion added, it makes sense to calculate segment flush thresholds based on individual topics, not at a table level. This patch has been running internally, behavior was validated by checking logs for calculated thresholds.

The metrics should be backwards compatible (at least for our metrics system they are, not familiar w/ all). For example, the gauge essentially changes from numRowsThreshold.tableName to numRowsThreshold.tableName.topicName

Jackie-Jiang

Can you elaborate more on why do we want to separate size info for each topic? If the data is ingested into the same table, they should have the same schema, and most of the case similar data distribution. I guess in most cases user still want to track per table threshold.
If this is indeed needed, can we make it configurable, and add a config flag to enable it?

codecov-commenter · 2025-01-07T04:19:05Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.90%. Comparing base (59551e4) to head (c012d4f).
Report is 1539 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #14765      +/-   ##
============================================
+ Coverage     61.75%   63.90%   +2.15%     
- Complexity      207     1607    +1400     
============================================
  Files          2436     2703     +267     
  Lines        133233   150742   +17509     
  Branches      20636    23289    +2653     
============================================
+ Hits          82274    96339   +14065     
- Misses        44911    47203    +2292     
- Partials       6048     7200    +1152

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (+99.99%)`	⬆️
integration	`100.00% <ø> (+99.99%)`	⬆️
integration1	`100.00% <ø> (+99.99%)`	⬆️
integration2	`0.00% <ø> (ø)`
java-11	`63.83% <100.00%> (+2.12%)`	⬆️
java-21	`63.77% <100.00%> (+2.15%)`	⬆️
skip-bytebuffers-false	`63.90% <100.00%> (+2.15%)`	⬆️
skip-bytebuffers-true	`63.71% <100.00%> (+35.98%)`	⬆️
temurin	`63.90% <100.00%> (+2.15%)`	⬆️
unittests	`63.90% <100.00%> (+2.15%)`	⬆️
unittests1	`56.30% <ø> (+9.41%)`	⬆️
unittests2	`34.21% <100.00%> (+6.48%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

itschrispeck · 2025-01-07T04:42:25Z

Can you elaborate more on why do we want to separate size info for each topic? If the data is ingested into the same table, they should have the same schema, and most of the case similar data distribution. I guess in most cases user still want to track per table threshold.

For our multi-stream ingestion use case data rarely has the same schema or a similar data distribution. For us, topics are logically separate datasets. The current implementation lead to many index build failures, e.g. forward index size, too many MV values, etc. Beyond build failures, we also saw wild swings in segment sizes as many segments from one topic flushed, and then many segments from another topic flushed.

A concrete example is for observability, we have service A and service B emitting to topics A and B, both ingesting into a single table. Generally they are not emitting the same shape or volume of logs, which causes inaccurate segment size estimations. Even if they are emitting the same shape/size, deployments that change the log fingerprints cannot always happen in sync, so if computed at a table level and service A was upgraded and B wasn't yet, there would be a period where segment build failures are likely. Another case is that we turn on debug level logging temporarily for A, and the same issues happen.

If this is indeed needed, can we make it configurable, and add a config flag to enable it?

What's the downside of leaving this per topic by default? If we make the above assumption, that data should have the same schema/similar data distribution, then computing it at a per topic level should not be measurably different than computing it per table. If the assumption doesn't hold, then we have a better behavior by default. I feel that an extra SegmentSizeBasedFlushThresholdUpdater instance per topic should not be too expensive to hold.

calculate size based flush threshold per topic

c012d4f

itschrispeck added enhancement ingestion real-time labels Jan 7, 2025

Jackie-Jiang reviewed Jan 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate size based flush threshold per topic #14765

Calculate size based flush threshold per topic #14765

itschrispeck commented Jan 7, 2025 •

edited

Loading

Jackie-Jiang left a comment

codecov-commenter commented Jan 7, 2025 •

edited

Loading

itschrispeck commented Jan 7, 2025

Calculate size based flush threshold per topic #14765

Are you sure you want to change the base?

Calculate size based flush threshold per topic #14765

Conversation

itschrispeck commented Jan 7, 2025 • edited Loading

Jackie-Jiang left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jan 7, 2025 • edited Loading

Codecov Report

itschrispeck commented Jan 7, 2025

itschrispeck commented Jan 7, 2025 •

edited

Loading

codecov-commenter commented Jan 7, 2025 •

edited

Loading