-
Notifications
You must be signed in to change notification settings - Fork 154
Description
Problem Statement
The current approach to get tools from an MCP server and pass them to Agent when it is created leads to high latency and input token costs when the MCP server returns a high number of tools.
Example: The MCP server aws-dataprocessing-mcp-server provides ~ 32 tools. The # of characters from all these tools are 137,514. This could be approx. 40K tokens per interaction. In a complex data engineering task, the agent sends tool descriptions multiple times and I have seen upwards of 1 million input tokens being sent to the LLM by the time the job is finished.
queries = [
"How many total taxi rides happened in each month in 2025? Draw a ride count vs month bar chart.",
"Identify the top 5 hours the day when we saw maximum fares in April 2025. How does that compare with January 2025. Draw a fare vs hours bar chart for those 2 months."
]
response = ""
with data_mcp_client:
# Get the data processing tools from MCP server
data_tools = data_mcp_client.list_tools_sync()
#Add the python repl tool to execute python code to create charts
final_tools = data_tools + [python_repl]
# Pass the system prompt, the Claude Sonnet 3.7 model, and all the tools to the agent
data_lake_agent = Agent(system_prompt = query_system_prompt, model=model, tools=final_tools)
# Invoke the agent with each of the query
for query_in in queries:
response = data_lake_agent(query_in)
time.sleep(5)
By the time the above code finishes, this typically produces between 500K to 1M input tokens.
Proposed Solution
First things first
Please analyze the current implementation and optimize the number of times the agent sends all the tool descriptions to the LLM. Once this approach is exhausted, we could look at the following approaches.
Approach # 1: (Limit tools - not an ideal solution)
One possible solution is to limit the number of tools. Example: I determined from traces the tools that were used. So I reduced the number of tools to 5 that we send to the Agent to optimize on input tokens (and hence latency & costs). This reduced the number of input tokens to 300K.
But this forces me to assume the type of questions users will ask and hence determine the tools that I should pass to the agent. This disadvantage will not be acceptable in many agentic scenarios.
Approach # 2: (Semantic Search Approach)
The hypothesis here is to determine the tools that are most likely required to solve the question by doing a semantic search between the task and the tool descriptions.
For example: From the system prompt and input prompt, the agent could determine that 6/32 tools are most relevant (using semantic search) and it will try to solve the problem with those tools.
Thus, you can pass all the tools to the Agent or point to AgentCore Memory where the embeddings for tools are stored. Pointing Agent to a Memory ID or some other config that stores the tools to be used is a better approach than passing the list of tools in code anyway because that allows the agents to dynamically pick or drop new tools to use without changing code.
Approach # 3: (hybrid approach - suggested approach)
Let the user specify 'mandatory tools' and 'optional tools'. This gives a hint to the agent to try to use the mandatory tools for sure. In addition, it will use semantic search approach for the non-mandatory tools.
I am hoping such a hybrid approach will optimize the latency and costs for users.
Use Case
In many use cases, the domain and the problem being solved will help users determine which tools to pass to the Agent.
However there are some use cases such as data exploration where you don;t know in advance which of the 32 tools listed here aws-dataprocessing-mcp-server are needed. For such use cases, the hybrid approach of 1/ mandatory tools + 2/ optional tools with semantic tool search capability at run time will help.
Alternatives Solutions
Already suggested above in 3 approaches.
Additional Context
No response