perf: parallelize MCP server initialization in get_all_tool_definitions by JasonOA888 · Pull Request #138 · MiroMindAI/MiroThinker

JasonOA888 · 2026-03-17T02:09:09Z

Partially addresses #137

Problem:
MCP tool servers were being initialized sequentially in a for loop:

tool-python (E2B sandbox): ~33s
search_and_scrape_webpage (Serper): ~21s
jina_scrape_llm_summary: ~17s
Total: ~71s per task

With 1266 BC-EN tasks, this adds significant overhead to evaluation runs.

Solution:

Refactored server connection logic into _get_server_tools() helper function
Used asyncio.gather() to connect to all servers in parallel
Expected savings: ~40-50s per task (parallel time = max of individual times, not sum)

Changes:

libs/miroflow-tools/src/miroflow_tools/manager.py:
- New internal async function _get_server_tools(config)
- Parallel execution via asyncio.gather(..., return_exceptions=True)
- Graceful handling of exceptions from parallel execution

Error handling preserved:

Failed connections still add an error entry to results
Exceptions are logged and handled without crashing the entire initialization

Benchmark impact:

BC-EN (1266 tasks): ~40-50s × 1266 = ~14-17 hours saved per run
BC-ZH (289 tasks): ~4 hours saved per run

Partially addresses MiroMindAI#137 MCP tool servers were being initialized sequentially in a for loop, causing ~70-80s overhead per task (tool-python ~33s, search ~21s, jina ~17s). This change: - Refactors server connection logic into a helper function _get_server_tools() - Uses asyncio.gather() to connect to all servers in parallel - Expected savings: ~40-50s per task initialization The parallel approach maintains the same error handling behavior: - Failed connections still add an error entry - Exceptions from asyncio.gather are logged and handled gracefully

Vanint

Good performance improvement — parallelizing the MCP server init is a clear win. A couple of issues to address before merging:

Must Fix

Exception handling loses server identity: When return_exceptions=True catches a failure, the current log message has no indication of which server failed:

f"Unexpected error during parallel server initialization: {result}"

Use zip to preserve the mapping:

for config, result in zip(self.server_configs, results):
    if isinstance(result, Exception):
        self._log("error", "ToolManager | Parallel Init Error",
                  f"Server '{config['name']}' failed: {result}")
    else:
        all_servers_for_prompt.append(result)

Exception path drops the fallback error entry: In the original sequential code, a failed server still gets an entry with {"error": ...} appended to all_servers_for_prompt, so downstream code knows the server exists but failed. In the new code, if an exception escapes past the internal try/except in _get_server_tools, the outer handler just logs and skips — the server silently disappears from the result. Should add a fallback entry in the outer exception branch as well:

if isinstance(result, Exception):
    self._log(...)
    all_servers_for_prompt.append({
        "name": config["name"],
        "tools": [{"error": f"Unable to fetch tools: {result}"}]
    })

Suggestion

Confirm task_log is safe under concurrent writes: Multiple _get_server_tools coroutines now call self._log concurrently. If task_log.log_step isn't designed for concurrent access, logs could interleave or error. Worth a quick check.

Clean change overall — fix the exception handling gaps and this is good to go.

- Use zip(configs, results) to identify which server failed - Append fallback error entry instead of silently dropping failed servers - Matches original sequential behavior where failed servers stay in result list

JasonOA888 · 2026-04-10T14:58:46Z

Thanks for the thorough review @Vanint! Both issues addressed in the latest push (2f4c336):

Server identity preserved — now using zip(self.server_configs, results) so error logs clearly show Server '{config[name]}' failed: {result}.
Fallback error entry — failed servers now get a {"name": ..., "tools": [{"error": ...}]} entry appended, matching the original sequential behavior where failed servers stay in the result list.

Re: concurrent _log — log_step calls self.step_logs.append() which is atomic under CPython's GIL, so concurrent writes from coroutines are safe. Happy to add an asyncio.Lock guard if you'd prefer belt-and-suspenders.

wangbinluo mentioned this pull request Mar 17, 2026

perf: parallelize tool server init and reduce LLM retry overhead #139

Draft

4 tasks

Vanint requested changes Apr 10, 2026

View reviewed changes

fix: preserve server identity in parallel init error handling

2f4c336

- Use zip(configs, results) to identify which server failed - Append fallback error entry instead of silently dropping failed servers - Matches original sequential behavior where failed servers stay in result list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: parallelize MCP server initialization in get_all_tool_definitions#138

perf: parallelize MCP server initialization in get_all_tool_definitions#138
JasonOA888 wants to merge 2 commits intoMiroMindAI:mainfrom
JasonOA888:perf/parallel-mcp-init

JasonOA888 commented Mar 17, 2026

Uh oh!

Vanint left a comment

Uh oh!

JasonOA888 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JasonOA888 commented Mar 17, 2026

Uh oh!

Vanint left a comment

Choose a reason for hiding this comment

Must Fix

Suggestion

Uh oh!

JasonOA888 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants