Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions 02-samples/19-data-analyst-pandas/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
__pycache__/
venv/
.venv/
.ipynb_checkpoints/
.DS_Store
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
337 changes: 337 additions & 0 deletions 02-samples/19-data-analyst-pandas/data-analyst.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,337 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data Analyst: Optimizing Context Efficiency in Strands Agents\n",
"\n",
"## Overview\n",
"\n",
"Efficient context management is crucial in Agentic AI, even as Large Language Models (LLMs) advance in their ability to process ever-larger context windows. While attention mechanisms like sliding window and block-wise attention have improved handling of larger contexts, these innovations are relatively new and not fully adopted by most mainstream models. The computational complexity of attention—typically O(N<sup>2</sup>)—means that doubling the window size multiplies the required processing by four, which highlights the ongoing need for efficiency. Additionally, larger contexts can lead to the \"needle in a haystack\" problem, where relevant details are easily lost, further emphasizing the value of focused context practices. In this example, we will demonstrate how to wrap existing python modules with the @tool decorator to create tools that work independently of model context and minimize token consumption. We will build a \"data analyst\" to process tabular spreadsheet data without consuming excessive context in the agent's model.\n",
"\n",
"### Sample Details\n",
"\n",
"<div style=\"float: left; margin-right: 20px;\">\n",
" \n",
"| Information | Details |\n",
"|------------------------|------------------------------------------------------------|\n",
"| **Agent Architecture** | Single-agent |\n",
"| **Native Tools** | None |\n",
"| **Custom Tools** | add, subtract, Pandas Dataframe manipulation |\n",
"| **MCP Servers** | None |\n",
"| **Use Case Vertical** | Any |\n",
"| **Complexity** | Intermediate |\n",
"| **Model Provider** | Amazon Bedrock |\n",
"| **SDK Used** | Strands Agents SDK |\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Architecture\n",
"\n",
"![Architecture Diagram](./architecture.png)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Key Features\n",
"\n",
"The Strands Agents framework is inherently suited for efficient context management, particularly through structured tool use and agent memory. For example, when processing an Excel spreadsheet, one approach is to have the agent ingest the entire file using dedicated tools for reading and analyzing Excel data. While feasible for smaller tables, this method does not scale well to enterprise scenarios due to context window limitations. Alternatively, chunking and summarizing multiple rows can reduce context demands, but requires extra coding and can still consume significant model context.\n",
"\n",
"## Leveraging Tools with Strands\n",
"\n",
"Strands makes it easy to convert existing Python functions into agent tools, simply by adding the @tool decorator before defining a function. Using open-source libraries, agents can build memory-resident models of data structures, allowing the agent to manipulate and learn from large data sets without overloading model context.\n",
"\n",
"## Implementing Agent Tools\n",
"\n",
"To maximize efficiency, consider wrapping the Python library in a class and exposing relevant functions as tools, each with concise docstrings summarizing their purpose. These descriptions, informed by the library's API and documentation, help the model decide when to apply specific tools, streamlining both context consumption and usability.\n",
"\n",
"This approach demonstrates how the Strands framework enables scalable, context-efficient workflows for agents operating on real-world, data-intensive tasks.\n",
"\n",
"## Setup and prerequisites\n",
"\n",
"### Prerequisites\n",
"* Python 3.10+\n",
"* AWS account\n",
"* AWS CLI configured with appropriate credentials\n",
"* [Model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access-modify.html) enabled for any model that supports tool use\n",
"\n",
"Let's now install the requirement packages for our agent."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# installing pre-requisites\n",
"!pip install -r requirements.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage\n",
"\n",
"Here is a trivial arithmetic library, where we define two functions \"add\" and \"subtract\":"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Writing arithmetic.py\n"
]
}
],
"source": [
"%%writefile arithmetic.py\n",
"\"\"\"\n",
"Arithmetic Module\n",
"\"\"\"\n",
"\n",
"def add(a, b):\n",
" return a + b\n",
"\n",
"def subtract(a, b):\n",
" return a - b\n",
"\n",
"if __name__ == \"__main__\":\n",
" print(f\"5 + 3 = {add(5, 3)}\")\n",
" print(f\"10 - 4 = {subtract(10, 4)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can create tools from this library using Strands:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import from arithmetic.py\n",
"from arithmetic import *\n",
"from strands import tool\n",
"\n",
"@tool\n",
"def add_wrapper(a, b):\n",
" \"\"\"Add a to b\"\"\"\n",
" return add(a, b)\n",
"\n",
"@tool\n",
"def subtract_wrapper(a, b):\n",
" \"\"\"Subtract b from a\"\"\"\n",
" return subtract(a, b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Python module that wraps the library and serves tools to the agent can then be built as an object-oriented module, with a data structure in the module that retains state as long as the module's calling context remains active. The agent can iteratively process tasks and refine its results, according to its instructions, and selectively use tools to provide information to the model without requiring excessive consumption of context. In this example, we create a class that saves the results of all previous operations in a \"paper tape\" list:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Example of a stateful wrapper class\n",
"\n",
"class ArithmeticTools:\n",
"\n",
" def __init__(self):\n",
" self.paper_tape = []\n",
"\n",
" @tool\n",
" def add(self, a, b):\n",
" \"\"\"Add two numbers\"\"\"\n",
" result = a + b\n",
" self.paper_tape.append(result)\n",
" return result\n",
"\n",
" @tool\n",
" def subtract(self, a, b):\n",
" \"\"\"Subtract b from a\"\"\"\n",
" result = a - b\n",
" self.paper_tape.append(result)\n",
" return result\n",
"\n",
" @tool\n",
" def print_paper_tape(self):\n",
" \"\"\"Get the list of previously saved values\"\"\"\n",
" return self.paper_tape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When working with large datasets, such as a 100KB Excel spreadsheet, storing the entire file in the model's context can consume between 13,000 and 25,000 tokens—quickly consuming capacity of most current LLMs. In contrast, by loading the spreadsheet into a Python class instance as a Pandas data frame, Strands tools can interact with this data structure and supply only the necessary information to the model, reducing context usage and improving efficiency.\n",
"\n",
"While there are already MCP servers for Pandas and for Excel, these servers do not enable all of the functions we require. So we will now create Strands tools from the existing Python Pandas module. Here is the beginning of the pandas_strands_tools.py module. The full file is available in the current directory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"import pandas as pd\n",
"from typing import Any, Dict, List, Optional\n",
"from strands import tool\n",
"\n",
"# Global DataFrame that all tools will operate on\n",
"df_glob = pd.DataFrame()\n",
"\n",
"# Tools that load data and set df_glob\n",
"\n",
"@tool\n",
"def pd_read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, \n",
" index_col=None, usecols=None, dtype=None) -> str:\n",
" \"\"\"\n",
" Read a comma-separated values (csv) file into the global DataFrame.\n",
" \n",
" Args:\n",
" filepath_or_buffer: str, path object or file-like object\n",
" sep: str, default ','\n",
" delimiter: str, optional\n",
" header: int, Sequence of int, 'infer' or None, default 'infer'\n",
" names: Sequence of Hashable, optional\n",
" index_col: Hashable, Sequence of Hashable or False, optional\n",
" usecols: Sequence of Hashable or Callable, optional\n",
" dtype: dtype or dict of {Hashable : dtype}, optional\n",
" \n",
" Returns:\n",
" Status message with DataFrame shape\n",
" \"\"\"\n",
" global df_glob\n",
" df_glob = pd.read_csv(filepath_or_buffer=filepath_or_buffer, sep=sep, delimiter=delimiter, \n",
" header=header, names=names, index_col=index_col, usecols=usecols, \n",
" dtype=dtype)\n",
" return f\"Loaded CSV into df_glob: {df_glob.shape[0]} rows × {df_glob.shape[1]} columns\"\n",
"\n"
]
}
],
"source": [
"!head -n 34 pandas_strands_tools.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Keep in mind that retaining state in the class as in self.paper_tape, or as a global in the Pandas module, limits the agent to a single instance of state. In a different tutorial, we will explore retaining state as part of the agent's context instead, for instances where the agent needs to access multiple instances at once, such as several spreadsheets. Strands' ability to access tool context is unique among frameworks.\n",
"\n",
"Here is a simple agent that uses the Pandas tools we've created to answer questions about a spreadsheet imported to a Pandas dataframe. Try it with the General-Ledger.xslx file, which is a synthetic public domain data set of accounting ledger entries. You can ask questions like, \"Which department had the highest expenses?\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# pandas_agent.py\n",
"# interacts with Pandas module wrapped in Strands @tool decorators\n",
"#\n",
"\n",
"from strands import Agent\n",
"from pandas_strands_tools import pandas_tools\n",
"\n",
"# Create agent with pre-loaded state and tools\n",
"agent = Agent(\n",
" tools=pandas_tools\n",
")\n",
"\n",
"if __name__ == \"__main__\":\n",
" print(\"Pandasbot: Ask me about spreadsheets. Type 'exit' to quit.\\n\") \n",
"\n",
" response = agent(\"List all available pandas tools\")\n",
" print(f\"\\nExcelBot > {response}\")\n",
"\n",
" # Run the agent in a loop for interactive conversation\n",
" while True:\n",
" user_input = input(\"\\nPrompt > \")\n",
" if user_input.lower() == \"exit\":\n",
" print(\"Bye.\")\n",
" break\n",
" response = agent(user_input)\n",
" print(f\"\\nExcelBot > {response}\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While this spreadsheet contains 5000 rows of data, the data is processed by the agent, not the model, which drastically reduces context consumption.\n",
"\n",
"## Strands as a Model-First Framework\n",
"\n",
"Strands is designed with a \"model-first\" philosophy, emphasizing the LLM’s strengths in reasoning and planning actions. This approach allows seamless integration with existing libraries and modules, enabling agents to manipulate large datasets in memory rather than within the model’s limited context window. By equipping the model with purpose-built tools, Strands leverages external resources for data management while letting the model focus on the higher-level logic and decision-making. This method not only enhances scalability, but also makes Strands especially well-suited for enterprise-grade data processing tasks.\n",
"In this notebook, you learned how to use an existing Python module as a tool, to optimize consumption of context tokens in the agent's model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Disclaimer\n",
"\n",
"This sample is provided for educational and demonstration purposes only. It is not intended for production use without further development, testing, and hardening.\n",
"\n",
"For production deployments, consider:\n",
"- Implementing appropriate content filtering and safety measures\n",
"- Following security best practices for your deployment environment\n",
"- Conducting thorough testing and validation\n",
"- Reviewing and adjusting configurations for your specific requirements\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
26 changes: 26 additions & 0 deletions 02-samples/19-data-analyst-pandas/pandas_agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# pandas_agent.py
# interacts with Pandas module wrapped in Strands @tool decorators
#

from strands import Agent
from pandas_strands_tools import pandas_tools

# Create agent with pre-loaded state and tools
agent = Agent(
tools=pandas_tools
)

if __name__ == "__main__":
print("Pandasbot: Ask me about spreadsheets. Type 'exit' to quit.\n")

response = agent("List all available pandas tools")
print(f"\nExcelBot > {response}")

# Run the agent in a loop for interactive conversation
while True:
user_input = input("\nPrompt > ")
if user_input.lower() == "exit":
print("Bye.")
break
response = agent(user_input)
print(f"\nExcelBot > {response}")
Loading
Loading