perf: parallelize MCP server initialization in get_all_tool_definitions#138
perf: parallelize MCP server initialization in get_all_tool_definitions#138JasonOA888 wants to merge 2 commits intoMiroMindAI:mainfrom
Conversation
Partially addresses MiroMindAI#137 MCP tool servers were being initialized sequentially in a for loop, causing ~70-80s overhead per task (tool-python ~33s, search ~21s, jina ~17s). This change: - Refactors server connection logic into a helper function _get_server_tools() - Uses asyncio.gather() to connect to all servers in parallel - Expected savings: ~40-50s per task initialization The parallel approach maintains the same error handling behavior: - Failed connections still add an error entry - Exceptions from asyncio.gather are logged and handled gracefully
Vanint
left a comment
There was a problem hiding this comment.
Good performance improvement — parallelizing the MCP server init is a clear win. A couple of issues to address before merging:
Must Fix
- Exception handling loses server identity: When return_exceptions=True catches a failure, the current log message has no indication of which server failed:
f"Unexpected error during parallel server initialization: {result}"
Use zip to preserve the mapping:
for config, result in zip(self.server_configs, results):
if isinstance(result, Exception):
self._log("error", "ToolManager | Parallel Init Error",
f"Server '{config['name']}' failed: {result}")
else:
all_servers_for_prompt.append(result)
- Exception path drops the fallback error entry: In the original sequential code, a failed server still gets an entry with {"error": ...} appended to all_servers_for_prompt, so downstream code knows the server exists but failed. In the new code, if an exception escapes past the internal try/except in _get_server_tools, the outer handler just logs and skips — the server silently disappears from the result. Should add a fallback entry in the outer exception branch as well:
if isinstance(result, Exception):
self._log(...)
all_servers_for_prompt.append({
"name": config["name"],
"tools": [{"error": f"Unable to fetch tools: {result}"}]
})
Suggestion
- Confirm task_log is safe under concurrent writes: Multiple _get_server_tools coroutines now call self._log concurrently. If task_log.log_step isn't designed for concurrent access, logs could interleave or error. Worth a quick check.
Clean change overall — fix the exception handling gaps and this is good to go.
- Use zip(configs, results) to identify which server failed - Append fallback error entry instead of silently dropping failed servers - Matches original sequential behavior where failed servers stay in result list
|
Thanks for the thorough review @Vanint! Both issues addressed in the latest push (2f4c336):
Re: concurrent |
Partially addresses #137
Problem:
MCP tool servers were being initialized sequentially in a for loop:
With 1266 BC-EN tasks, this adds significant overhead to evaluation runs.
Solution:
_get_server_tools()helper functionasyncio.gather()to connect to all servers in parallelChanges:
libs/miroflow-tools/src/miroflow_tools/manager.py:_get_server_tools(config)asyncio.gather(..., return_exceptions=True)Error handling preserved:
Benchmark impact: