Skip to content

Conversation

@TS0713
Copy link
Collaborator

@TS0713 TS0713 commented Jan 6, 2026

Optimize Hourly Rollups Using PostgreSQL-Native Percentile Computation

Performance & Stability Improvements

This PR addresses issue #1810 by improving the performance, memory efficiency, and concurrency safety of the hourly metrics rollup pipeline.

When running on PostgreSQL, hourly rollups now use database-native percentile functions (percentile_cont) to compute p50 / p95 / p99. This removes unnecessary Python-side computation, reduces CPU overhead, and consolidates rollups into a single grouped aggregation query. For non-Postgres databases, the existing Python-based percentile logic is preserved to maintain compatibility.

To prevent high memory usage and potential OOM crashes, aggregation queries now stream results in configurable batches using YIELD_BATCH_SIZE (default: 1000, env-configurable), instead of materializing the full result set in memory. This keeps memory usage predictable and stable under high traffic volumes.

Rollup writes are now fully dialect-aware via _upsert_rollup:

PostgreSQL and SQLite use native UPSERT semantics.

Other dialects fall back to a safe Python insert/update flow with explicit race-condition handling.

These changes prevent duplicate rows, race conditions, memory spikes, and intermittent rollup failures under concurrent load.

@TS0713 TS0713 force-pushed the issue-1810-metrics-percentiles-sql branch 2 times, most recently from d196d32 to 57e0991 Compare January 13, 2026 01:34
@TS0713 TS0713 marked this pull request as ready for review January 13, 2026 01:35
@TS0713 TS0713 requested a review from crivetimihai as a code owner January 13, 2026 01:35
@crivetimihai crivetimihai self-assigned this Jan 13, 2026
@crivetimihai crivetimihai force-pushed the issue-1810-metrics-percentiles-sql branch 3 times, most recently from 198536f to fc5009a Compare January 13, 2026 11:29
- Add PostgreSQL-native percentile_cont for p50/p95/p99 calculations
- Implement dialect-aware UPSERT with PostgreSQL/SQLite native support
- Add configurable YIELD_BATCH_SIZE for streaming query results
- Fallback to Python-based percentile for non-PostgreSQL databases
- Use savepoint for fallback upsert to avoid full transaction rollback
- Document new settings in README, Helm chart, and configuration docs

Closes #1810

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai crivetimihai force-pushed the issue-1810-metrics-percentiles-sql branch from fc5009a to 573ed9f Compare January 13, 2026 11:44
Copy link
Member

@crivetimihai crivetimihai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased, fixed

@crivetimihai crivetimihai merged commit 0ca66e0 into main Jan 13, 2026
52 checks passed
@crivetimihai crivetimihai deleted the issue-1810-metrics-percentiles-sql branch January 13, 2026 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants