-
Couldn't load subscription status.
- Fork 100
feat: paralell tool execution #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request introduces parallel execution for tool calls in the RAG pipeline to improve performance when multiple tools are invoked simultaneously. The refactoring splits the tool execution logic into separate functions and leverages Python's ThreadPoolExecutor for concurrent processing.
Key changes:
- Refactored monolithic
_run_toolsfunction into_run_tool(single execution) and_run_tools(parallel orchestration) - Implemented parallel tool execution using
ThreadPoolExecutorwith configurable worker count - Added error handling with future cancellation to prevent resource leaks on failures
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| chunk_spans = retrieve_context(**kwargs) | ||
| message = { | ||
| "role": "tool", | ||
| "content": '{{"documents": [{elements}]}}'.format( | ||
| elements=", ".join( | ||
| chunk_span.to_json(index=i + 1) for i, chunk_span in enumerate(chunk_spans) | ||
| ) | ||
| ), | ||
| "tool_call_id": tool_call.id, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If chunk_spans is empty, the content field will be {"documents": []}, which might be fine, but might not be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should confirm if retrieve_context can return 0 chunks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not change this behavior with respect to the main branch.
Without metadata_filter, I think retrieve_context always returns a list of ChunkSpans, even if they are not that relevant (low similarity). With a metadata_filter applied (f.e. using self-query), the list could be empty.
I don't know if an empty list should be an issue, I find it more correct than retrieving non-relevant chunks for a query that does not have related documents on the database.
Open to discuss this :)
@Robbe-Superlinear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that retrieve_context should return an empty list when no relevant chunks are found after applying metadata filtering.
However, how should we handle this case in the response? Currently, we return a message like this:
{
"role": "tool",
"content": "{documents: []}",
"tool_call_id": tool_call.id
}
We could consider a few alternatives for better clarity:
Option 1:
{
"role": "tool",
"content": "{documents: [], message: No results found}",
"tool_call_id": tool_call.id
}
Option 2:
{
"role": "tool",
"content": "{message: No results found}",
"tool_call_id": tool_call.id
}
Option 3: We could return None and let the _run_tools function handle empty tool_call responses.
Pull Request Overview
This pull request introduces parallel execution for tool calls in the RAG pipeline to improve performance when multiple tools are invoked simultaneously. The refactoring splits the tool execution logic into separate functions and leverages Python's
ThreadPoolExecutorfor concurrent processing.Key changes:
_run_toolsfunction into_run_tool(single execution) and_run_tools(parallel orchestration)ThreadPoolExecutorwith configurable worker count