Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,10 @@ DB_PREPARE_THRESHOLD=5
# Recommended: true for PostgreSQL production deployments, auto-detected for SQLite
USE_POSTGRESDB_PERCENTILES=true

# The number of rows fetched from the database at a time when streaming results,
# to limit memory usage and avoid loading all rows into RAM at once.
YIELD_BATCH_SIZE=1000

# Cache Backend Configuration
# Options: database (default), memory (in-process), redis (distributed)
# - database: Uses SQLite/PostgreSQL for persistence (good for single-node)
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2042,6 +2042,8 @@ Automatic management of metrics data to prevent unbounded table growth and maint
| `METRICS_ROLLUP_LATE_DATA_HOURS` | Hours to re-process for late-arriving data | `1` | 1-48 |
| `METRICS_DELETE_RAW_AFTER_ROLLUP` | Delete raw metrics after rollup exists | `true` | bool |
| `METRICS_DELETE_RAW_AFTER_ROLLUP_HOURS` | Hours to retain raw when rollup exists | `1` | 1-8760 |
| `USE_POSTGRESDB_PERCENTILES` | Use PostgreSQL-native percentile_cont for p50/p95/p99 | `true` | bool |
| `YIELD_BATCH_SIZE` | Rows per batch when streaming rollup queries | `1000` | 100-10000 |

**Key Features:**
- 📊 **Hourly rollup**: Pre-aggregated summaries with p50/p95/p99 percentiles
Expand Down
2 changes: 2 additions & 0 deletions charts/mcp-stack/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -361,6 +361,8 @@ mcpContextForge:
METRICS_ROLLUP_LATE_DATA_HOURS: "1" # hours to re-process for late-arriving data (1-48)
METRICS_DELETE_RAW_AFTER_ROLLUP: "true" # delete raw metrics after hourly rollup exists
METRICS_DELETE_RAW_AFTER_ROLLUP_HOURS: "1" # hours to retain raw when rollup exists (1-8760)
USE_POSTGRESDB_PERCENTILES: "true" # use PostgreSQL-native percentile_cont for p50/p95/p99
YIELD_BATCH_SIZE: "1000" # rows per batch when streaming rollup queries (100-10000)

# ─ Transports ─
TRANSPORT_TYPE: all # comma-separated list: http, ws, sse, stdio, all
Expand Down
11 changes: 11 additions & 0 deletions docs/docs/manage/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -1123,6 +1123,17 @@ METRICS_DELETE_RAW_AFTER_ROLLUP=true # Delete raw after rollup (default: true)
METRICS_DELETE_RAW_AFTER_ROLLUP_HOURS=1 # Hours before deletion (default: 1)
```

**Performance Optimization (PostgreSQL):**

```bash
USE_POSTGRESDB_PERCENTILES=true # Use PostgreSQL-native percentile_cont (default: true)
YIELD_BATCH_SIZE=1000 # Rows per batch for streaming queries (default: 1000)
```

When `USE_POSTGRESDB_PERCENTILES=true` (default), PostgreSQL uses native `percentile_cont()` for p50/p95/p99 calculations, which is 5-10x faster than Python-based percentile computation. For SQLite or when disabled, falls back to Python linear interpolation.

`YIELD_BATCH_SIZE` controls memory usage by streaming query results in batches instead of loading all rows into RAM at once.

#### Configuration Examples

**Default (recommended for most deployments):**
Expand Down
9 changes: 9 additions & 0 deletions mcpgateway/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -998,6 +998,15 @@ def _parse_allowed_origins(cls, v: Any) -> Set[str]:
metrics_aggregation_backfill_hours: int = Field(default=6, ge=0, le=168, description="Hours of structured logs to backfill into performance metrics on startup")
metrics_aggregation_window_minutes: int = Field(default=5, description="Time window for metrics aggregation (minutes)")
metrics_aggregation_auto_start: bool = Field(default=False, description="Automatically run the log aggregation loop on application startup")
yield_batch_size: int = Field(
default=1000,
ge=100,
le=100000,
description="Number of rows fetched per batch when streaming hourly metric data from the database. "
"Used to limit memory usage during aggregation and percentile calculations. "
"Smaller values reduce memory footprint but increase DB round-trips; larger values improve throughput "
"at the cost of higher memory usage.",
)

# Execution Metrics Recording
# Controls whether tool/resource/prompt/server/A2A execution metrics are written to the database.
Expand Down
Loading
Loading