Move metrics rollup percentile computation to PostgreSQL #1927

TS0713 · 2026-01-06T07:33:10Z

Optimize Hourly Rollups Using PostgreSQL-Native Percentile Computation

Performance & Stability Improvements

This PR addresses issue #1810 by improving the performance, memory efficiency, and concurrency safety of the hourly metrics rollup pipeline.

When running on PostgreSQL, hourly rollups now use database-native percentile functions (percentile_cont) to compute p50 / p95 / p99. This removes unnecessary Python-side computation, reduces CPU overhead, and consolidates rollups into a single grouped aggregation query. For non-Postgres databases, the existing Python-based percentile logic is preserved to maintain compatibility.

To prevent high memory usage and potential OOM crashes, aggregation queries now stream results in configurable batches using YIELD_BATCH_SIZE (default: 1000, env-configurable), instead of materializing the full result set in memory. This keeps memory usage predictable and stable under high traffic volumes.

Rollup writes are now fully dialect-aware via _upsert_rollup:

PostgreSQL and SQLite use native UPSERT semantics.

Other dialects fall back to a safe Python insert/update flow with explicit race-condition handling.

These changes prevent duplicate rows, race conditions, memory spikes, and intermittent rollup failures under concurrent load.

- Add PostgreSQL-native percentile_cont for p50/p95/p99 calculations - Implement dialect-aware UPSERT with PostgreSQL/SQLite native support - Add configurable YIELD_BATCH_SIZE for streaming query results - Fallback to Python-based percentile for non-PostgreSQL databases - Use savepoint for fallback upsert to avoid full transaction rollback - Document new settings in README, Helm chart, and configuration docs Closes #1810 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

crivetimihai

Rebased, fixed

TS0713 requested review from kevalmahajan and madhav165 January 6, 2026 07:33

TS0713 force-pushed the issue-1810-metrics-percentiles-sql branch 2 times, most recently from d196d32 to 57e0991 Compare January 13, 2026 01:34

TS0713 marked this pull request as ready for review January 13, 2026 01:35

TS0713 requested a review from crivetimihai as a code owner January 13, 2026 01:35

crivetimihai self-assigned this Jan 13, 2026

crivetimihai force-pushed the issue-1810-metrics-percentiles-sql branch 3 times, most recently from 198536f to fc5009a Compare January 13, 2026 11:29

crivetimihai force-pushed the issue-1810-metrics-percentiles-sql branch from fc5009a to 573ed9f Compare January 13, 2026 11:44

crivetimihai approved these changes Jan 13, 2026

View reviewed changes

crivetimihai merged commit 0ca66e0 into main Jan 13, 2026
52 checks passed

crivetimihai deleted the issue-1810-metrics-percentiles-sql branch January 13, 2026 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move metrics rollup percentile computation to PostgreSQL #1927

Move metrics rollup percentile computation to PostgreSQL #1927

Uh oh!

TS0713 commented Jan 6, 2026 •

edited

Loading

Uh oh!

crivetimihai left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Move metrics rollup percentile computation to PostgreSQL #1927

Move metrics rollup percentile computation to PostgreSQL #1927

Uh oh!

Conversation

TS0713 commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Optimize Hourly Rollups Using PostgreSQL-Native Percentile Computation

Performance & Stability Improvements

Uh oh!

crivetimihai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TS0713 commented Jan 6, 2026 •

edited

Loading