-
Notifications
You must be signed in to change notification settings - Fork 458
Description
Description
This analysis of the MCP Gateway plugin framework (mcpgateway/plugins) identifies blocking synchronous operations, inefficient data handling, and optimization opportunities that could impact request latency and throughput under load.
Critical Performance Issues
1. Blocking Database Operations in Plugin Execution Path
Location: mcpgateway/plugins/framework/manager.py:312-353
Severity: HIGH
Issue: The observability instrumentation in _execute_with_timeout() performs synchronous database operations within the async plugin execution path:
db = SessionLocal() # Synchronous DB session creation
try:
service = ObservabilityService()
span_id = service.start_span(db=db, ...) # Synchronous DB write
result = await asyncio.wait_for(hook_ref.hook(payload, context), timeout=self.timeout)
service.end_span(db=db, span_id=span_id, ...) # Synchronous DB write
finally:
db.close()Impact:
- Blocks the async event loop during every plugin execution
- Database I/O latency directly adds to plugin execution time
- Under high load, this creates a severe bottleneck as all plugin executions serialize on database operations
- Each plugin hook invocation incurs 2+ database round trips (start_span, end_span)
Recommendation:
- Convert
ObservabilityServiceto use async database operations withAsyncSession - Use connection pooling to avoid session creation overhead
- Consider batching observability writes or using an async queue
- Make observability instrumentation optional or configurable to bypass in performance-critical paths
2. Synchronous File I/O During Plugin Initialization
Location: mcpgateway/plugins/framework/loader/config.py:76-83
Severity: MEDIUM
Issue: Configuration loading uses blocking file I/O:
with open(os.path.normpath(config), "r", encoding="utf-8") as file:
template = file.read() # Blocking read
if use_jinja:
rendered_template = jinja_env.from_string(template).render(env=os.environ) # Blocking render
config_data = yaml.safe_load(rendered_template) # Blocking parseImpact:
- Blocks event loop during plugin manager initialization
- Delays startup time
- Prevents concurrent initialization of other async components
Recommendation:
- Use
aiofilesfor async file reading:async with aiofiles.open(...) - Consider caching parsed configuration
- Move Jinja2 rendering to async context if templates are complex
- For YAML parsing, consider using
yaml.CSafeLoaderfor better performance
3. Inefficient Context Copying Per Plugin
Location: mcpgateway/plugins/framework/manager.py:158-165
Severity: MEDIUM
Issue: For each plugin in the execution chain, a new GlobalContext is created with copied state and metadata:
tmp_global_context = GlobalContext(
request_id=global_context.request_id,
user=global_context.user,
tenant_id=global_context.tenant_id,
server_id=global_context.server_id,
state={} if not global_context.state else copyonwrite(global_context.state),
metadata={} if not global_context.metadata else copyonwrite(global_context.metadata),
)Impact:
- Creates
Ncontext objects forNplugins in the execution chain copyonwrite()createsCopyOnWriteDictwrapper objects for each plugin- Memory allocation overhead accumulates with many plugins
- Garbage collection pressure increases
Recommendation:
- Share the same GlobalContext instance across plugins unless modifications are detected
- Only create COW wrappers when a plugin actually modifies state
- Consider object pooling for frequently created context objects
- Implement lazy COW: start with shared reference, copy only on write
4. Sequential Plugin Execution (Missing Parallelization)
Location: mcpgateway/plugins/framework/manager.py:148-189
Severity: MEDIUM
Issue: All plugins execute sequentially even when they have the same priority:
for hook_ref in hook_refs:
# Execute plugins one by one
result = await self.execute_plugin(...)
if result.modified_payload is not None:
current_payload = result.modified_payloadThe parallel_execution_within_band setting exists (models.py:720) but is not implemented.
Impact:
- Total plugin execution time is the sum of all plugin latencies
- Independent plugins (same priority, no dependencies) could run concurrently
- Wastes async execution opportunities
Recommendation:
- Implement
parallel_execution_within_bandfeature - Group plugins by priority and execute same-priority plugins concurrently using
asyncio.gather() - Handle payload mutations carefully: parallel plugins shouldn't modify payload, only validate
- Aggregate results from parallel plugins appropriately
5. Expensive Payload Size Validation
Location: mcpgateway/plugins/framework/manager.py:363-382
Severity: LOW-MEDIUM
Issue: Payload size validation converts entire payload to string:
if hasattr(payload, "args") and payload.args:
total_size = sum(len(str(v)) for v in payload.args.values()) # Expensive string conversionImpact:
- String conversion is expensive for complex nested objects
- Creates temporary string objects that need garbage collection
- Runs on every plugin execution in the hot path
Recommendation:
- Use
sys.getsizeof()with recursive size calculation - Implement a custom size estimator for Pydantic models
- Cache size calculations if payload is immutable
- Only validate size when payload exceeds a quick heuristic threshold
6. Database Session Per Plugin Execution
Location: mcpgateway/plugins/framework/manager.py:319
Severity: MEDIUM
Issue: Each plugin execution creates a new database session for observability:
db = SessionLocal() # New session per plugin executionImpact:
- Session creation overhead on every plugin call
- Connection pool churn
- Potential connection exhaustion under high load
Recommendation:
- Reuse database sessions across plugin executions within the same request
- Pass session from request handler to plugin manager
- Use async session pools with proper lifecycle management
- Consider connection pooling parameters (pool size, overflow, timeout)
Moderate Performance Issues
7. Hook Discovery via Runtime Inspection
Location: mcpgateway/plugins/framework/base.py:396-405
Severity: LOW
Issue: During plugin registration, HookRef.__init__() scans all methods using inspect.getmembers():
for name, method in inspect.getmembers(plugin_ref.plugin, predicate=inspect.ismethod):
metadata = get_hook_metadata(method)
if metadata and metadata.hook_type == hook:
self._func = method
breakImpact:
- Happens during plugin initialization, not in hot path
- Still adds initialization latency
- Repeated for every hook type
Recommendation:
- Cache discovered hooks at class level
- Use metaclass or decorator to register hooks at class definition time
- Build hook registry during import rather than initialization
8. External Plugin HTTP Retry Logic
Location: mcpgateway/plugins/framework/external/mcp/client.py:187-227
Severity: LOW
Issue: Retry logic with exponential backoff is hand-rolled and waits between attempts:
for attempt in range(max_retries):
try:
# Connection attempt
except Exception as e:
delay = base_delay * (2**attempt)
await asyncio.sleep(delay) # Blocks this pluginImpact:
- Retry delays block plugin initialization
- Other plugins cannot initialize while one is retrying
- No circuit breaker pattern
Recommendation:
- Use proven retry libraries like
tenacitywith async support - Implement circuit breaker pattern to fail fast on known-bad endpoints
- Initialize plugins concurrently so one slow plugin doesn't block others
- Add health checks before retrying
9. Registry Priority Cache Invalidation
Location: mcpgateway/plugins/framework/registry.py:99-100, 120-121
Severity: LOW
Issue: Entire priority cache invalidated on plugin registration changes:
self._priority_cache.pop(hook_type, None) # Invalidate entire cacheImpact:
- Forces re-sorting of plugin lists on next access
- Typically happens at startup, not runtime
- Minimal impact in practice
Recommendation:
- Incremental cache updates instead of invalidation
- Keep priority cache always consistent
- No action needed unless dynamic plugin loading is frequent
10. Pydantic Validation in External Plugin Communication
Location: mcpgateway/plugins/framework/external/mcp/client.py:258, 262, 267
Severity: LOW-MEDIUM
Issue: Multiple Pydantic model validations during external plugin invocation:
res = orjson.loads(content.text) # JSON parse
if CONTEXT in res:
cxt = PluginContext.model_validate(res[CONTEXT]) # Validation 1
if RESULT in res:
return result_type.model_validate(res[RESULT]) # Validation 2
if ERROR in res:
error = PluginErrorModel.model_validate(res[ERROR]) # Validation 3Impact:
- Validation overhead on every external plugin call
- Nested model validation is expensive
- Repeated for each plugin invocation
Recommendation:
- Use
model_validate_json()directly on the JSON string to avoid intermediate dict - Enable Pydantic V2's performance optimizations
- Consider marking frequently used models with
model_config = ConfigDict(from_attributes=True, validate_assignment=False) - Cache validated schemas where appropriate
Minor Performance Considerations
11. Async Methods That Don't Need To Be Async
Location: mcpgateway/plugins/framework/models.py:863-866
Issue: PluginContext.cleanup() is async but only clears dicts:
async def cleanup(self) -> None:
self.state.clear()
self.metadata.clear()Impact: Minimal, but adds unnecessary async overhead
Recommendation: Make it synchronous unless there's a future need for async cleanup
12. Import Module Caching
Location: mcpgateway/plugins/framework/utils.py:22-41
Issue: Uses @cache decorator on import_module() which is already cached by Python
Impact: Negligible, @cache is redundant but harmless
Recommendation: No action needed, existing approach is fine
Performance Testing Recommendations
To validate these issues and measure improvements:
-
Load Testing:
- Test with 1, 5, 10, 20 plugins in execution chain
- Measure p50, p95, p99 latency under various loads
- Monitor event loop lag/blocking time
-
Profiling:
- Use
py-spyoraustinfor low-overhead async profiling - Profile plugin execution with observability enabled/disabled
- Measure database operation latency distribution
- Use
-
Benchmarking Scenarios:
- Baseline: No plugins enabled
- Single plugin (enforce mode)
- Multiple independent plugins (same priority)
- Chain of dependent plugins (different priorities)
- External plugins over HTTP
-
Metrics to Track:
- Plugin execution time (per plugin and total)
- Database operation latency
- Context creation/copy overhead
- Memory allocation rate
- Event loop blocking time
Implementation Priority
Phase 1 - High Impact
- Make observability service async or optional
- Implement async file I/O in config loader
- Fix database session management (reuse sessions)
Phase 2 - Medium Impact
- Implement parallel plugin execution within priority bands
- Optimize context copying (lazy COW)
- Improve payload size validation
Phase 3 - Low Impact
- Cache hook discovery
- Improve external plugin retry logic
- Optimize Pydantic validation