-
Notifications
You must be signed in to change notification settings - Fork 458
Description
Overview
Split the 59 application tables across up to 4 specialized PostgreSQL instances. All databases are optional - by default everything runs in a single database, and you can progressively offload workloads as needed.
| Database | Purpose | Tables | Write Volume | Optional |
|---|---|---|---|---|
DATABASE_URL |
Core OLTP | 36 (or all 59) | Low-Medium | Required |
METRICS_DATABASE_URL |
Execution metrics | 14 | Very High | Yes |
LOGS_DATABASE_URL |
Audit & compliance | 4 | High | Yes |
OBSERVABILITY_DATABASE_URL |
Distributed tracing | 5 | Very High | Yes |
Deployment Scenarios
| Scenario | Configuration | Tables in CORE |
|---|---|---|
| Simple (default) | Only DATABASE_URL |
All 59 |
| Offload Metrics | + METRICS_DATABASE_URL |
45 |
| Offload Logs | + LOGS_DATABASE_URL |
55 |
| Offload Observability | + OBSERVABILITY_DATABASE_URL |
54 |
| Offload Metrics + Logs | + both | 41 |
| Full Split | All 4 URLs | 36 |
Architecture Diagram
┌─────────────────┐
┌───►│ postgres:5432 │ (CORE)
│ │ database: mcp │
┌─────────────┐ ┌────────────────┐ │ └─────────────────┘
│ Gateway │───►│ PgBouncer │───┤ ┌─────────────────┐
│ Workers │ │ :6432 │ ├───►│ metrics-pg:5432 │ (METRICS) [optional]
└─────────────┘ │ │ │ │ db: mcp_metrics │
│ Routes by DB │ │ └─────────────────┘
│ name in URL │ │ ┌─────────────────┐
└────────────────┘ ├───►│ logs-pg:5432 │ (LOGS) [optional]
│ │ db: mcp_logs │
│ └─────────────────┘
│ ┌─────────────────┐
└───►│ obs-pg:5432 │ (OBSERVABILITY) [optional]
│ db: mcp_obs │
└─────────────────┘
PgBouncer Configuration
The edoburu/pgbouncer image supports multiple databases via the DATABASE_URLS environment variable (comma-separated).
Minimal Setup (Single Database)
# docker-compose.yml - Everything in one database
pgbouncer:
image: edoburu/pgbouncer:latest
environment:
- DATABASE_URL=postgres://postgres:${POSTGRES_PASSWORD}@postgres:5432/mcp
- LISTEN_PORT=6432
- POOL_MODE=transaction
- MAX_CLIENT_CONN=5000
- DEFAULT_POOL_SIZE=600
- MAX_DB_CONNECTIONS=700Full Split (4 Databases)
# docker-compose.yml - All databases through single PgBouncer
pgbouncer:
image: edoburu/pgbouncer:latest
environment:
# Multiple databases via comma-separated URLs
- DATABASE_URLS=postgres://postgres:${POSTGRES_PASSWORD}@postgres:5432/mcp,postgres://postgres:${POSTGRES_PASSWORD}@metrics-pg:5432/mcp_metrics,postgres://postgres:${POSTGRES_PASSWORD}@logs-pg:5432/mcp_logs,postgres://postgres:${POSTGRES_PASSWORD}@obs-pg:5432/mcp_obs
- LISTEN_PORT=6432
- POOL_MODE=transaction
- MAX_CLIENT_CONN=5000
- DEFAULT_POOL_SIZE=600
- MAX_DB_CONNECTIONS=700
- MIN_POOL_SIZE=100
- RESERVE_POOL_SIZE=150
- SERVER_RESET_QUERY=DISCARD ALL
- SERVER_RESET_QUERY_ALWAYS=1
depends_on:
postgres:
condition: service_healthy
metrics-pg:
condition: service_healthy
logs-pg:
condition: service_healthy
obs-pg:
condition: service_healthyThis generates the following PgBouncer config:
[databases]
mcp = host=postgres port=5432 auth_user=postgres
mcp_metrics = host=metrics-pg port=5432 auth_user=postgres
mcp_logs = host=logs-pg port=5432 auth_user=postgres
mcp_obs = host=obs-pg port=5432 auth_user=postgres
[pgbouncer]
pool_mode = transaction
max_client_conn = 5000
default_pool_size = 600
max_db_connections = 700
...Gateway Configuration
All URLs Through PgBouncer
# Gateway environment - all connections pooled
DATABASE_URL=postgresql+psycopg://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/mcp
METRICS_DATABASE_URL=postgresql+psycopg://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/mcp_metrics
LOGS_DATABASE_URL=postgresql+psycopg://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/mcp_logs
OBSERVABILITY_DATABASE_URL=postgresql+psycopg://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/mcp_obsFallback Behavior
When optional URLs are not set, the application uses DATABASE_URL for those tables:
# Minimal - everything in one DB (current behavior, fully backward compatible)
DATABASE_URL=postgresql+psycopg://postgres:pass@pgbouncer:6432/mcp
# Partial offload - only metrics separate
DATABASE_URL=postgresql+psycopg://postgres:pass@pgbouncer:6432/mcp
METRICS_DATABASE_URL=postgresql+psycopg://postgres:pass@pgbouncer:6432/mcp_metrics
# LOGS_DATABASE_URL not set → uses DATABASE_URL
# OBSERVABILITY_DATABASE_URL not set → uses DATABASE_URL
# Offload metrics and logs, keep observability in core
DATABASE_URL=postgresql+psycopg://postgres:pass@pgbouncer:6432/mcp
METRICS_DATABASE_URL=postgresql+psycopg://postgres:pass@pgbouncer:6432/mcp_metrics
LOGS_DATABASE_URL=postgresql+psycopg://postgres:pass@pgbouncer:6432/mcp_logs
# OBSERVABILITY_DATABASE_URL not set → uses DATABASE_URLDatabase 1: CORE (DATABASE_URL) - Required
Purpose: Transactional data requiring ACID consistency and complex joins.
Tables (36)
ENTITIES (9)
├── tools
├── servers
├── resources
├── prompts
├── gateways
├── a2a_agents
├── grpc_services
├── llm_models
└── llm_providers
AUTH/IDENTITY (15)
├── email_users
├── email_teams
├── email_team_members
├── email_team_member_history
├── email_team_invitations
├── email_team_join_requests
├── email_api_tokens
├── email_auth_events # Auth events stay in CORE (not LOGS)
├── roles
├── user_roles
├── pending_user_approvals
├── token_revocations
├── sso_providers
├── sso_auth_sessions
└── registered_oauth_clients
OAUTH (2)
├── oauth_states
└── oauth_tokens
ASSOCIATIONS (4)
├── server_tool_association
├── server_resource_association
├── server_prompt_association
└── server_a2a_association
OTHER (6)
├── mcp_sessions
├── mcp_messages
├── resource_subscriptions
├── global_config
├── alembic_version
└── toolops_test_cases
PostgreSQL Tuning
postgres:
image: postgres:18
command:
- "-c"
- "max_connections=800"
- "-c"
- "shared_buffers=512MB"
- "-c"
- "synchronous_commit=on" # Data integrity required
- "-c"
- "wal_level=replica" # Support replication
- "-c"
- "checkpoint_completion_target=0.9"
- "-c"
- "random_page_cost=1.1" # SSD optimized
deploy:
resources:
limits:
cpus: '4'
memory: 8GDatabase 2: METRICS (METRICS_DATABASE_URL) - Optional
Purpose: High-volume execution metrics with time-based aggregations.
Tables (14)
ENTITY METRICS - Raw (5) - denormalized with entity name
├── tool_metrics → tool_id + tool_name (soft reference, no FK)
├── server_metrics → server_id + server_name (soft reference, no FK)
├── resource_metrics → resource_id + resource_name (soft reference, no FK)
├── prompt_metrics → prompt_id + prompt_name (soft reference, no FK)
└── a2a_agent_metrics → a2a_agent_id + agent_name (soft reference, no FK)
ENTITY METRICS - Hourly Rollups (5) - denormalized, no FK
├── tool_metrics_hourly (tool_id + tool_name, no FK)
├── server_metrics_hourly (server_id + server_name, no FK)
├── resource_metrics_hourly (resource_id + resource_name, no FK)
├── prompt_metrics_hourly (prompt_id + prompt_name, no FK)
└── a2a_agent_metrics_hourly (a2a_agent_id + agent_name, no FK)
PERFORMANCE (3)
├── performance_metrics
├── performance_snapshots
└── performance_aggregates
LLM USAGE (1)
└── token_usage_logs
FK Strategy
Both raw and hourly metrics tables require FK removal for the database split:
Raw metrics tables use soft references (VARCHAR ID without FK constraint):
- Allows entity deletion without CASCADE
- Metrics history preserved even if entity removed
- No cross-database FK needed
- NEW: Include denormalized
{entity}_namecolumn captured at write time
Hourly rollup tables also require FK removal:
- Currently have nullable FK to entity tables (with ON DELETE SET NULL)
- For split: Remove FK constraint, keep soft reference (ID as VARCHAR)
- Already have denormalized
{entity}_namecolumn - no schema change needed - Self-sufficient for reporting without joins to CORE database
Entity Name Capture (Denormalization Strategy)
When is entity_name captured?
The entity name is captured at write time when recording a metric, not during rollup:
# In tool_service.py when recording a metric (using dependency injection)
async def record_tool_execution(
self,
tool_id: str,
response_time: float,
success: bool,
db: Session = Depends(get_db), # CORE DB for tool lookup
metrics_db: Session = Depends(get_metrics_db) # METRICS DB for write
):
# Lookup tool name from CORE DB (or cache)
tool = db.query(Tool).filter(Tool.id == tool_id).first()
tool_name = tool.name if tool else "unknown"
# Write to METRICS DB with denormalized name
metric = ToolMetric(
tool_id=tool_id,
tool_name=tool_name, # Captured at write time
timestamp=datetime.now(UTC),
response_time=response_time,
is_success=success,
)
metrics_db.add(metric)
metrics_db.commit()Schema change required:
class ToolMetric(Base):
__tablename__ = "tool_metrics"
tool_id: Mapped[str] = mapped_column(String(36), nullable=False, index=True)
tool_name: Mapped[str] = mapped_column(String(255), nullable=False) # NEW
timestamp: Mapped[datetime] = ...How are entity renames handled?
- Raw metrics: Capture name at write time. Historical metrics show the name at time of execution.
- Hourly rollups: Inherit name from raw metrics. Shows name at time each metric was recorded.
- Dashboard queries: Use denormalized
{entity}_namecolumn (e.g.,tool_name,server_name) directly - no join to CORE needed. - Trade-off: After rename, historical data shows old name. This is intentional for audit accuracy.
If you need current names in reports, add a lookup cache:
# Cache entity_id → current_name mapping, refresh every 5 min
ENTITY_NAME_CACHE: Dict[str, str] = {}PostgreSQL Tuning
metrics-pg:
image: postgres:18
environment:
- POSTGRES_DB=mcp_metrics
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
command:
- "-c"
- "max_connections=200"
- "-c"
- "shared_buffers=256MB"
- "-c"
- "synchronous_commit=off" # Async WAL for speed
- "-c"
- "wal_level=minimal" # No replication needed
- "-c"
- "max_wal_senders=0" # Disable replication
- "-c"
- "checkpoint_timeout=15min" # Less frequent checkpoints
- "-c"
- "max_wal_size=4GB" # Buffer more WAL
- "-c"
- "work_mem=32MB" # Larger for aggregations
deploy:
resources:
limits:
cpus: '2'
memory: 4GPartitioning (Recommended)
-- Partition raw metrics by week for easy retention management
CREATE TABLE tool_metrics (
id SERIAL,
tool_id VARCHAR(36) NOT NULL,
tool_name VARCHAR(255) NOT NULL, -- Denormalized entity name
timestamp TIMESTAMPTZ NOT NULL,
response_time DOUBLE PRECISION,
is_success BOOLEAN
) PARTITION BY RANGE (timestamp);
-- Create index on tool_id for lookups
CREATE INDEX idx_tool_metrics_tool_id ON tool_metrics (tool_id);
CREATE TABLE tool_metrics_2026_w01 PARTITION OF tool_metrics
FOR VALUES FROM ('2026-01-01') TO ('2026-01-08');
-- Drop old partitions for retention
DROP TABLE tool_metrics_2025_w40; -- Drop data older than 90 daysDatabase 3: LOGS (LOGS_DATABASE_URL) - Optional
Purpose: Compliance audit trails and application logs.
Durability: Configurable via LOGS_DB_DURABLE environment variable.
true(default): Full durability (synchronous_commit=on) for SOC2/HIPAA compliancefalse: Relaxed durability (synchronous_commit=off) for development/non-compliance environments
Tables (4)
AUDIT (2) - No FK
├── audit_trails # SOC2/HIPAA compliance
└── permission_audit_log # RBAC changes
STRUCTURED LOGS (2) - Internal FK only
├── structured_log_entries # Application logs
└── security_events # FK → structured_log_entries (internal)
Note:
email_auth_eventsremains in CORE database with other auth tables.
FK Strategy
security_events.log_entry_id→structured_log_entries.id(internal FK, stays valid)- No external dependencies - entire cluster is self-contained
PostgreSQL Tuning
Choose the appropriate configuration based on your durability requirements:
Production / Compliance (default):
logs-pg:
image: postgres:18
environment:
- POSTGRES_DB=mcp_logs
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
command:
- "-c"
- "max_connections=100"
- "-c"
- "shared_buffers=128MB"
- "-c"
- "synchronous_commit=on" # Full durability
- "-c"
- "wal_level=replica" # Supports backup/replication
- "-c"
- "wal_compression=on"
- "-c"
- "autovacuum_vacuum_scale_factor=0.1"
- "-c"
- "checkpoint_completion_target=0.9"
deploy:
resources:
limits:
cpus: '1'
memory: 2GDevelopment / Non-compliance (set LOGS_DB_DURABLE=false):
logs-pg:
command:
# ... same base settings, but with relaxed durability:
- "-c"
- "synchronous_commit=off" # Faster writes, risk of loss
- "-c"
- "wal_level=minimal" # No replication supportNote:
LOGS_DB_DURABLEis an application-level setting. When set tofalse, the application accepts that audit data may be lost on crash. The PostgreSQL container must be configured to match (use the development config above).
Retention Policy
-- Delete logs older than 90 days (configurable)
DELETE FROM structured_log_entries
WHERE timestamp < NOW() - INTERVAL '90 days';
-- Keep audit trails longer (365 days for compliance)
DELETE FROM audit_trails
WHERE timestamp < NOW() - INTERVAL '365 days';Database 4: OBSERVABILITY (OBSERVABILITY_DATABASE_URL) - Optional
Purpose: Distributed tracing with OpenTelemetry-style spans.
Tables (5)
TRACES - Self-contained FK cluster
├── observability_traces # Root (no FK)
│ ├── observability_spans # FK → traces, FK → spans (self-ref)
│ │ └── observability_events # FK → spans
│ └── observability_metrics # FK → traces
└── observability_saved_queries # No FK (user queries)
FK Strategy
All FKs are internal to this database - the cluster is self-contained:
spans.trace_id→traces.trace_idspans.parent_span_id→spans.span_idevents.span_id→spans.span_idmetrics.trace_id→traces.trace_id
PostgreSQL Tuning
obs-pg:
image: postgres:18
environment:
- POSTGRES_DB=mcp_obs
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
command:
- "-c"
- "max_connections=100"
- "-c"
- "shared_buffers=256MB"
- "-c"
- "synchronous_commit=off"
- "-c"
- "wal_level=minimal"
- "-c"
- "max_wal_senders=0"
- "-c"
- "work_mem=64MB" # Large for trace reconstruction
- "-c"
- "effective_cache_size=1GB" # Cache hot traces
deploy:
resources:
limits:
cpus: '2'
memory: 4GAlternative: TimescaleDB
For high-volume tracing, consider TimescaleDB:
obs-pg:
image: timescale/timescaledb:latest-pg16
# Automatic time-based partitioning
# Better compression for time-series
# Continuous aggregates for dashboardsApplication Configuration
config.py
class Settings(BaseSettings):
# Primary database (required)
database_url: str = Field(
default="sqlite:///./mcp.db",
description="Primary database for core entities and auth"
)
# Secondary databases (optional - fall back to database_url if not set)
metrics_database_url: Optional[str] = Field(
default=None,
description="Database for execution metrics. Falls back to database_url."
)
logs_database_url: Optional[str] = Field(
default=None,
description="Database for audit/compliance logs. Falls back to database_url."
)
observability_database_url: Optional[str] = Field(
default=None,
description="Database for distributed tracing. Falls back to database_url."
)
# LOGS database durability setting
logs_db_durable: bool = Field(
default=True,
description="Enable full durability for LOGS database (synchronous_commit=on). "
"Set to False for development/non-compliance environments."
)Table Routing (db.py)
# Which database each table belongs to
METRICS_TABLES = {
'tool_metrics', 'tool_metrics_hourly',
'server_metrics', 'server_metrics_hourly',
'resource_metrics', 'resource_metrics_hourly',
'prompt_metrics', 'prompt_metrics_hourly',
'a2a_agent_metrics', 'a2a_agent_metrics_hourly',
'performance_metrics', 'performance_snapshots', 'performance_aggregates',
'token_usage_logs',
}
LOGS_TABLES = {
'audit_trails', 'permission_audit_log',
'structured_log_entries', 'security_events',
} # Note: email_auth_events stays in CORE with other auth tables
OBSERVABILITY_TABLES = {
'observability_traces', 'observability_spans',
'observability_events', 'observability_metrics',
'observability_saved_queries',
}
# Create engines (optional DBs fall back to primary)
engine = create_engine(settings.database_url, ...)
metrics_engine = (
create_engine(settings.metrics_database_url, ...)
if settings.metrics_database_url else engine
)
logs_engine = (
create_engine(settings.logs_database_url, ...)
if settings.logs_database_url else engine
)
observability_engine = (
create_engine(settings.observability_database_url, ...)
if settings.observability_database_url else engine
)
# Session factories per database
SessionLocal = sessionmaker(bind=engine)
MetricsSessionLocal = sessionmaker(bind=metrics_engine)
LogsSessionLocal = sessionmaker(bind=logs_engine)
ObservabilitySessionLocal = sessionmaker(bind=observability_engine)
def get_db():
"""Get CORE database session."""
db = SessionLocal()
try:
yield db
finally:
db.close()
def get_metrics_db():
"""Get METRICS database session (falls back to CORE if not configured)."""
db = MetricsSessionLocal()
try:
yield db
finally:
db.close()
def get_logs_db():
"""Get LOGS database session (falls back to CORE if not configured)."""
db = LogsSessionLocal()
try:
yield db
finally:
db.close()
def get_observability_db():
"""Get OBSERVABILITY database session (falls back to CORE if not configured)."""
db = ObservabilitySessionLocal()
try:
yield db
finally:
db.close()Avoiding Cross-Database Queries
Critical: Never join tables across databases. Each service must use the correct session:
# WRONG - Uses CORE session for metrics table
async def get_tool_metrics(db: Session = Depends(get_db)):
return db.query(ToolMetric).all() # Will fail if metrics in separate DB!
# CORRECT - Uses METRICS session
async def get_tool_metrics(metrics_db: Session = Depends(get_metrics_db)):
return metrics_db.query(ToolMetric).all()
# CORRECT - Cross-DB lookup via separate sessions
async def get_tool_with_metrics(
tool_id: str,
db: Session = Depends(get_db),
metrics_db: Session = Depends(get_metrics_db)
):
tool = db.query(Tool).filter(Tool.id == tool_id).first() # CORE DB
metrics = metrics_db.query(ToolMetric).filter(ToolMetric.tool_id == tool_id).all() # METRICS DB
return {"tool": tool, "metrics": metrics}Complete docker-compose.yml Example
version: "3.8"
services:
# ==========================================================================
# Gateway Application
# ==========================================================================
gateway:
image: mcpgateway:latest
environment:
# All database URLs go through PgBouncer
- DATABASE_URL=postgresql+psycopg://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/mcp
- METRICS_DATABASE_URL=postgresql+psycopg://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/mcp_metrics
- LOGS_DATABASE_URL=postgresql+psycopg://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/mcp_logs
- OBSERVABILITY_DATABASE_URL=postgresql+psycopg://postgres:${POSTGRES_PASSWORD}@pgbouncer:6432/mcp_obs
# ... other settings
depends_on:
pgbouncer:
condition: service_healthy
# ==========================================================================
# PgBouncer - Connection Pooler (routes to all databases)
# ==========================================================================
pgbouncer:
image: edoburu/pgbouncer:latest
environment:
# Multiple databases via comma-separated URLs
- DATABASE_URLS=postgres://postgres:${POSTGRES_PASSWORD}@postgres:5432/mcp,postgres://postgres:${POSTGRES_PASSWORD}@metrics-pg:5432/mcp_metrics,postgres://postgres:${POSTGRES_PASSWORD}@logs-pg:5432/mcp_logs,postgres://postgres:${POSTGRES_PASSWORD}@obs-pg:5432/mcp_obs
- LISTEN_PORT=6432
- POOL_MODE=transaction
- MAX_CLIENT_CONN=5000
- DEFAULT_POOL_SIZE=600
- MAX_DB_CONNECTIONS=700
- MIN_POOL_SIZE=100
- RESERVE_POOL_SIZE=150
- RESERVE_POOL_TIMEOUT=2
- SERVER_RESET_QUERY=DISCARD ALL
- SERVER_RESET_QUERY_ALWAYS=1
- SERVER_LIFETIME=3600
- SERVER_IDLE_TIMEOUT=600
- QUERY_WAIT_TIMEOUT=60
- CLIENT_IDLE_TIMEOUT=300
- IDLE_TRANSACTION_TIMEOUT=300
ports:
- "6432:6432"
depends_on:
postgres:
condition: service_healthy
metrics-pg:
condition: service_healthy
logs-pg:
condition: service_healthy
obs-pg:
condition: service_healthy
healthcheck:
test: ["CMD", "pg_isready", "-h", "localhost", "-p", "6432"]
interval: 10s
timeout: 5s
retries: 5
# ==========================================================================
# Primary Database - CORE (OLTP)
# ==========================================================================
postgres:
image: postgres:18
environment:
- POSTGRES_DB=mcp
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
command:
- "-c"
- "max_connections=800"
- "-c"
- "shared_buffers=512MB"
- "-c"
- "synchronous_commit=on"
- "-c"
- "wal_level=replica"
- "-c"
- "checkpoint_completion_target=0.9"
volumes:
- pgdata:/var/lib/postgresql/data
deploy:
resources:
limits:
cpus: '4'
memory: 8G
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres -d mcp"]
interval: 10s
timeout: 5s
retries: 5
# ==========================================================================
# Metrics Database - High Write Volume (OLAP)
# ==========================================================================
metrics-pg:
image: postgres:18
environment:
- POSTGRES_DB=mcp_metrics
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
command:
- "-c"
- "max_connections=200"
- "-c"
- "shared_buffers=256MB"
- "-c"
- "synchronous_commit=off"
- "-c"
- "wal_level=minimal"
- "-c"
- "max_wal_senders=0"
- "-c"
- "checkpoint_timeout=15min"
- "-c"
- "max_wal_size=4GB"
volumes:
- metricsdata:/var/lib/postgresql/data
deploy:
resources:
limits:
cpus: '2'
memory: 4G
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres -d mcp_metrics"]
interval: 10s
timeout: 5s
retries: 5
# ==========================================================================
# Logs Database - Audit & Compliance (FULL DURABILITY REQUIRED)
# ==========================================================================
logs-pg:
image: postgres:18
environment:
- POSTGRES_DB=mcp_logs
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
command:
- "-c"
- "max_connections=100"
- "-c"
- "shared_buffers=128MB"
- "-c"
- "synchronous_commit=on" # REQUIRED for audit compliance
- "-c"
- "wal_level=replica" # Allow backup/replication
- "-c"
- "wal_compression=on"
- "-c"
- "checkpoint_completion_target=0.9"
volumes:
- logsdata:/var/lib/postgresql/data
deploy:
resources:
limits:
cpus: '1'
memory: 2G
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres -d mcp_logs"]
interval: 10s
timeout: 5s
retries: 5
# ==========================================================================
# Observability Database - Distributed Tracing
# ==========================================================================
obs-pg:
image: postgres:18
environment:
- POSTGRES_DB=mcp_obs
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
command:
- "-c"
- "max_connections=100"
- "-c"
- "shared_buffers=256MB"
- "-c"
- "synchronous_commit=off"
- "-c"
- "wal_level=minimal"
- "-c"
- "max_wal_senders=0"
- "-c"
- "work_mem=64MB"
volumes:
- obsdata:/var/lib/postgresql/data
deploy:
resources:
limits:
cpus: '2'
memory: 4G
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres -d mcp_obs"]
interval: 10s
timeout: 5s
retries: 5
volumes:
pgdata:
metricsdata:
logsdata:
obsdata:Expected Impact
Primary Database Relief
| Metric | Before | After (Full Split) |
|---|---|---|
| Tables | 59 | 36 |
| Write I/O | 100% | ~25% |
| CPU Usage | 99% | ~50-60% |
| Connection Pressure | High | Medium |
Workload Isolation
| Database | Optimized For | sync_commit | Durability |
|---|---|---|---|
| CORE | Low-latency reads, complex joins | ON | Full |
| METRICS | High-throughput writes, aggregations | OFF | Relaxed (re-computable) |
| LOGS | Append-only, compliance audit | Configurable | Configurable (default: full) |
| OBSERVABILITY | Time-series, trace reconstruction | OFF | Relaxed (ephemeral) |
LOGS durability: Set
LOGS_DB_DURABLE=falsefor development. Production compliance environments should use the default (true).
Implementation Requirements
Schema Changes (Required Before Multi-DB Support)
These changes enable the database split and must be applied via Alembic migration:
-
Remove FK constraints from both raw and hourly metrics tables:
- Raw metrics:
tool_metrics,server_metrics,resource_metrics,prompt_metrics,a2a_agent_metrics - Hourly rollups:
tool_metrics_hourly,server_metrics_hourly,resource_metrics_hourly,prompt_metrics_hourly,a2a_agent_metrics_hourly
- Raw metrics:
-
Add
{entity}_namecolumn to raw metrics tables for denormalization (hourly tables already have this) -
Update service layer to use correct session factories per table group
New Deployment Setup
For new installations with multi-database support:
-
Configure environment variables:
# Database URLs DATABASE_URL=postgresql+psycopg://...@pgbouncer:6432/mcp METRICS_DATABASE_URL=postgresql+psycopg://...@pgbouncer:6432/mcp_metrics # Optional LOGS_DATABASE_URL=postgresql+psycopg://...@pgbouncer:6432/mcp_logs # Optional OBSERVABILITY_DATABASE_URL=postgresql+psycopg://...@pgbouncer:6432/mcp_obs # Optional # LOGS durability (default: true for compliance; set false for dev only) LOGS_DB_DURABLE=true
-
Deploy database containers (see docker-compose example above)
-
Run Alembic migrations on each configured database:
# Core (always) alembic upgrade head # Secondary databases (if configured) ALEMBIC_TARGET_DB=metrics DATABASE_URL=$METRICS_DATABASE_URL alembic upgrade head ALEMBIC_TARGET_DB=logs DATABASE_URL=$LOGS_DATABASE_URL alembic upgrade head ALEMBIC_TARGET_DB=observability DATABASE_URL=$OBSERVABILITY_DATABASE_URL alembic upgrade head
-
Start application - routing happens automatically based on configured URLs
Existing installations: If enabling multi-database on an existing single-database deployment, historical data for metrics/logs/observability remains in the CORE database and won't be visible in the new secondary databases. See Appendix: Migrating Existing Single-DB Installations for data migration steps.
Alembic Per-Database Targeting
Use include_object filter to scope migrations per database:
# alembic/env.py
import os
METRICS_TABLES = {'tool_metrics', 'tool_metrics_hourly', ...} # Full list
LOGS_TABLES = {'audit_trails', 'permission_audit_log', ...}
OBSERVABILITY_TABLES = {'observability_traces', ...}
def include_object(object, name, type_, reflected, compare_to):
"""Filter tables based on ALEMBIC_TARGET_DB environment variable."""
if type_ != "table":
return True
target_db = os.environ.get("ALEMBIC_TARGET_DB", "core")
if target_db == "metrics":
return name in METRICS_TABLES
elif target_db == "logs":
return name in LOGS_TABLES
elif target_db == "observability":
return name in OBSERVABILITY_TABLES
else: # "core" or unset
return name not in (METRICS_TABLES | LOGS_TABLES | OBSERVABILITY_TABLES)
def run_migrations_online():
# ... existing code ...
context.configure(
connection=connection,
target_metadata=target_metadata,
include_object=include_object, # Add this
# ...
)Usage:
# Migrate CORE database (default)
alembic upgrade head
# Migrate METRICS database
ALEMBIC_TARGET_DB=metrics DATABASE_URL=$METRICS_DATABASE_URL alembic upgrade head
# Migrate LOGS database
ALEMBIC_TARGET_DB=logs DATABASE_URL=$LOGS_DATABASE_URL alembic upgrade head
# Migrate OBSERVABILITY database
ALEMBIC_TARGET_DB=observability DATABASE_URL=$OBSERVABILITY_DATABASE_URL alembic upgrade headAppendix: Migrating Existing Single-DB Installations
Note: This section is reference documentation for administrators who want to migrate an existing single-database deployment to multi-database. This is not part of the core feature implementation.
Migration Steps (Admin Reference)
-
Backup existing database
pg_dump -h postgres -p 5432 -U postgres -d mcp > backup.sql -
Deploy secondary database containers
-
Run Alembic migrations on secondary databases
ALEMBIC_TARGET_DB=metrics DATABASE_URL=$METRICS_DATABASE_URL alembic upgrade head # Repeat for logs, observability
-
Copy data from primary to secondary (connect directly, not via PgBouncer)
# Example for METRICS tables pg_dump -h postgres -p 5432 -U postgres -d mcp \ --data-only \ -t tool_metrics -t tool_metrics_hourly \ -t server_metrics -t server_metrics_hourly \ -t resource_metrics -t resource_metrics_hourly \ -t prompt_metrics -t prompt_metrics_hourly \ -t a2a_agent_metrics -t a2a_agent_metrics_hourly \ -t performance_metrics -t performance_snapshots -t performance_aggregates \ -t token_usage_logs \ | psql -h metrics-pg -p 5432 -U postgres -d mcp_metrics
-
Stop application, configure URLs, restart
-
Verify data integrity (compare row counts)
-
(Optional) Drop migrated tables from primary after verification
Reverting to Single-DB
If migrated tables still exist in primary: Simply unset the optional URLs:
# Only DATABASE_URL configured = single-database mode
DATABASE_URL=postgresql+psycopg://postgres:pass@pgbouncer:6432/mcp
# METRICS_DATABASE_URL= (unset)
# LOGS_DATABASE_URL= (unset)
# OBSERVABILITY_DATABASE_URL= (unset)The application falls back to using DATABASE_URL for all tables.
If migrated tables were dropped from primary: You must restore data first:
# 1. Copy data from secondary back to primary
pg_dump -h metrics-pg -p 5432 -U postgres -d mcp_metrics --data-only \
| psql -h postgres -p 5432 -U postgres -d mcp
# 2. Then unset URLs and restartWarning: If you dropped tables from primary without a backup and the secondary databases are unavailable, data recovery is not possible. Always maintain backups before dropping tables.