-
Notifications
You must be signed in to change notification settings - Fork 458
Open
Open
Copy link
Labels
Milestone
Description
Summary
Admin metrics rollup tables stay empty during the benchmark window (load tests < 1 hour), so admin metrics queries fall back to raw scans only. This increases DB load and admin metrics latency.
Evidence
Rollup tables during benchmark:
- tool_metrics_hourly: 0
- resource_metrics_hourly: 0
- prompt_metrics_hourly: 0
- server_metrics_hourly: 0
- a2a_agent_metrics_hourly: 0
After manual rollup (, hours_back=24):
- tool_metrics_hourly: 5
- resource_metrics_hourly: 4
- prompt_metrics_hourly: 3
- server_metrics_hourly: 0
- a2a_agent_metrics_hourly: 0
Rollup summary:
- total_records_aggregated=1,569,843
- rollups_updated=12 (created=0)
- duration_seconds=13.12
Raw data window observed:
- tool_metrics min_ts=2026-01-06 17:19:56
- tool_metrics max_ts=2026-01-06 18:06:12
Note: current load test duration <1 hour, so rollups do not cover the window and combined raw+rollup queries still scan raw metrics.
Impact
- Admin metrics pages trigger expensive raw scans.
- Increased DB CPU and higher /admin* latency under load.
Potential fixes
- Ensure a rollup job runs on hour boundaries (or more frequently) during tests.
- Run a manual rollup before benchmarking if the test window is <1 hour.
- Increase metrics cache TTLs so admin UI does not recompute heavy metrics every request.
Validation
- Rollup tables contain data for the benchmark window after an hour boundary.
- Admin metrics queries use rollups and show reduced total time in pg_stat_statements.
- /admin metrics endpoints show lower p95 latency under load.