diff --git a/02-samples/19-data-analyst-pandas/.gitignore b/02-samples/19-data-analyst-pandas/.gitignore
new file mode 100644
index 00000000..a190d3cf
--- /dev/null
+++ b/02-samples/19-data-analyst-pandas/.gitignore
@@ -0,0 +1,5 @@
+__pycache__/
+venv/
+.venv/
+.ipynb_checkpoints/
+.DS_Store
diff --git a/02-samples/19-data-analyst-pandas/General-Ledger.xlsx b/02-samples/19-data-analyst-pandas/General-Ledger.xlsx
new file mode 100644
index 00000000..79df2d52
Binary files /dev/null and b/02-samples/19-data-analyst-pandas/General-Ledger.xlsx differ
diff --git a/02-samples/19-data-analyst-pandas/architecture.png b/02-samples/19-data-analyst-pandas/architecture.png
new file mode 100644
index 00000000..ddbbf86c
Binary files /dev/null and b/02-samples/19-data-analyst-pandas/architecture.png differ
diff --git a/02-samples/19-data-analyst-pandas/data-analyst.ipynb b/02-samples/19-data-analyst-pandas/data-analyst.ipynb
new file mode 100644
index 00000000..dab9c303
--- /dev/null
+++ b/02-samples/19-data-analyst-pandas/data-analyst.ipynb
@@ -0,0 +1,337 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Data Analyst: Optimizing Context Efficiency in Strands Agents\n",
+ "\n",
+ "## Overview\n",
+ "\n",
+ "Efficient context management is crucial in Agentic AI, even as Large Language Models (LLMs) advance in their ability to process ever-larger context windows. While attention mechanisms like sliding window and block-wise attention have improved handling of larger contexts, these innovations are relatively new and not fully adopted by most mainstream models. The computational complexity of attention—typically O(N2)—means that doubling the window size multiplies the required processing by four, which highlights the ongoing need for efficiency. Additionally, larger contexts can lead to the \"needle in a haystack\" problem, where relevant details are easily lost, further emphasizing the value of focused context practices. In this example, we will demonstrate how to wrap existing python modules with the @tool decorator to create tools that work independently of model context and minimize token consumption. We will build a \"data analyst\" to process tabular spreadsheet data without consuming excessive context in the agent's model.\n",
+ "\n",
+ "### Sample Details\n",
+ "\n",
+ "
\n",
+ " \n",
+ "| Information | Details |\n",
+ "|------------------------|------------------------------------------------------------|\n",
+ "| **Agent Architecture** | Single-agent |\n",
+ "| **Native Tools** | None |\n",
+ "| **Custom Tools** | add, subtract, Pandas Dataframe manipulation |\n",
+ "| **MCP Servers** | None |\n",
+ "| **Use Case Vertical** | Any |\n",
+ "| **Complexity** | Intermediate |\n",
+ "| **Model Provider** | Amazon Bedrock |\n",
+ "| **SDK Used** | Strands Agents SDK |\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Architecture\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n",
+ "## Key Features\n",
+ "\n",
+ "The Strands Agents framework is inherently suited for efficient context management, particularly through structured tool use and agent memory. For example, when processing an Excel spreadsheet, one approach is to have the agent ingest the entire file using dedicated tools for reading and analyzing Excel data. While feasible for smaller tables, this method does not scale well to enterprise scenarios due to context window limitations. Alternatively, chunking and summarizing multiple rows can reduce context demands, but requires extra coding and can still consume significant model context.\n",
+ "\n",
+ "## Leveraging Tools with Strands\n",
+ "\n",
+ "Strands makes it easy to convert existing Python functions into agent tools, simply by adding the @tool decorator before defining a function. Using open-source libraries, agents can build memory-resident models of data structures, allowing the agent to manipulate and learn from large data sets without overloading model context.\n",
+ "\n",
+ "## Implementing Agent Tools\n",
+ "\n",
+ "To maximize efficiency, consider wrapping the Python library in a class and exposing relevant functions as tools, each with concise docstrings summarizing their purpose. These descriptions, informed by the library's API and documentation, help the model decide when to apply specific tools, streamlining both context consumption and usability.\n",
+ "\n",
+ "This approach demonstrates how the Strands framework enables scalable, context-efficient workflows for agents operating on real-world, data-intensive tasks.\n",
+ "\n",
+ "## Setup and prerequisites\n",
+ "\n",
+ "### Prerequisites\n",
+ "* Python 3.10+\n",
+ "* AWS account\n",
+ "* AWS CLI configured with appropriate credentials\n",
+ "* [Model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access-modify.html) enabled for any model that supports tool use\n",
+ "\n",
+ "Let's now install the requirement packages for our agent."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# installing pre-requisites\n",
+ "!pip install -r requirements.txt"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Usage\n",
+ "\n",
+ "Here is a trivial arithmetic library, where we define two functions \"add\" and \"subtract\":"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Writing arithmetic.py\n"
+ ]
+ }
+ ],
+ "source": [
+ "%%writefile arithmetic.py\n",
+ "\"\"\"\n",
+ "Arithmetic Module\n",
+ "\"\"\"\n",
+ "\n",
+ "def add(a, b):\n",
+ " return a + b\n",
+ "\n",
+ "def subtract(a, b):\n",
+ " return a - b\n",
+ "\n",
+ "if __name__ == \"__main__\":\n",
+ " print(f\"5 + 3 = {add(5, 3)}\")\n",
+ " print(f\"10 - 4 = {subtract(10, 4)}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can create tools from this library using Strands:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Import from arithmetic.py\n",
+ "from arithmetic import *\n",
+ "from strands import tool\n",
+ "\n",
+ "@tool\n",
+ "def add_wrapper(a, b):\n",
+ " \"\"\"Add a to b\"\"\"\n",
+ " return add(a, b)\n",
+ "\n",
+ "@tool\n",
+ "def subtract_wrapper(a, b):\n",
+ " \"\"\"Subtract b from a\"\"\"\n",
+ " return subtract(a, b)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The Python module that wraps the library and serves tools to the agent can then be built as an object-oriented module, with a data structure in the module that retains state as long as the module's calling context remains active. The agent can iteratively process tasks and refine its results, according to its instructions, and selectively use tools to provide information to the model without requiring excessive consumption of context. In this example, we create a class that saves the results of all previous operations in a \"paper tape\" list:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Example of a stateful wrapper class\n",
+ "\n",
+ "class ArithmeticTools:\n",
+ "\n",
+ " def __init__(self):\n",
+ " self.paper_tape = []\n",
+ "\n",
+ " @tool\n",
+ " def add(self, a, b):\n",
+ " \"\"\"Add two numbers\"\"\"\n",
+ " result = a + b\n",
+ " self.paper_tape.append(result)\n",
+ " return result\n",
+ "\n",
+ " @tool\n",
+ " def subtract(self, a, b):\n",
+ " \"\"\"Subtract b from a\"\"\"\n",
+ " result = a - b\n",
+ " self.paper_tape.append(result)\n",
+ " return result\n",
+ "\n",
+ " @tool\n",
+ " def print_paper_tape(self):\n",
+ " \"\"\"Get the list of previously saved values\"\"\"\n",
+ " return self.paper_tape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "When working with large datasets, such as a 100KB Excel spreadsheet, storing the entire file in the model's context can consume between 13,000 and 25,000 tokens—quickly consuming capacity of most current LLMs. In contrast, by loading the spreadsheet into a Python class instance as a Pandas data frame, Strands tools can interact with this data structure and supply only the necessary information to the model, reducing context usage and improving efficiency.\n",
+ "\n",
+ "While there are already MCP servers for Pandas and for Excel, these servers do not enable all of the functions we require. So we will now create Strands tools from the existing Python Pandas module. Here is the beginning of the pandas_strands_tools.py module. The full file is available in the current directory."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "import pandas as pd\n",
+ "from typing import Any, Dict, List, Optional\n",
+ "from strands import tool\n",
+ "\n",
+ "# Global DataFrame that all tools will operate on\n",
+ "df_glob = pd.DataFrame()\n",
+ "\n",
+ "# Tools that load data and set df_glob\n",
+ "\n",
+ "@tool\n",
+ "def pd_read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, \n",
+ " index_col=None, usecols=None, dtype=None) -> str:\n",
+ " \"\"\"\n",
+ " Read a comma-separated values (csv) file into the global DataFrame.\n",
+ " \n",
+ " Args:\n",
+ " filepath_or_buffer: str, path object or file-like object\n",
+ " sep: str, default ','\n",
+ " delimiter: str, optional\n",
+ " header: int, Sequence of int, 'infer' or None, default 'infer'\n",
+ " names: Sequence of Hashable, optional\n",
+ " index_col: Hashable, Sequence of Hashable or False, optional\n",
+ " usecols: Sequence of Hashable or Callable, optional\n",
+ " dtype: dtype or dict of {Hashable : dtype}, optional\n",
+ " \n",
+ " Returns:\n",
+ " Status message with DataFrame shape\n",
+ " \"\"\"\n",
+ " global df_glob\n",
+ " df_glob = pd.read_csv(filepath_or_buffer=filepath_or_buffer, sep=sep, delimiter=delimiter, \n",
+ " header=header, names=names, index_col=index_col, usecols=usecols, \n",
+ " dtype=dtype)\n",
+ " return f\"Loaded CSV into df_glob: {df_glob.shape[0]} rows × {df_glob.shape[1]} columns\"\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "!head -n 34 pandas_strands_tools.py"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Keep in mind that retaining state in the class as in self.paper_tape, or as a global in the Pandas module, limits the agent to a single instance of state. In a different tutorial, we will explore retaining state as part of the agent's context instead, for instances where the agent needs to access multiple instances at once, such as several spreadsheets. Strands' ability to access tool context is unique among frameworks.\n",
+ "\n",
+ "Here is a simple agent that uses the Pandas tools we've created to answer questions about a spreadsheet imported to a Pandas dataframe. Try it with the General-Ledger.xslx file, which is a synthetic public domain data set of accounting ledger entries. You can ask questions like, \"Which department had the highest expenses?\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# pandas_agent.py\n",
+ "# interacts with Pandas module wrapped in Strands @tool decorators\n",
+ "#\n",
+ "\n",
+ "from strands import Agent\n",
+ "from pandas_strands_tools import pandas_tools\n",
+ "\n",
+ "# Create agent with pre-loaded state and tools\n",
+ "agent = Agent(\n",
+ " tools=pandas_tools\n",
+ ")\n",
+ "\n",
+ "if __name__ == \"__main__\":\n",
+ " print(\"Pandasbot: Ask me about spreadsheets. Type 'exit' to quit.\\n\") \n",
+ "\n",
+ " response = agent(\"List all available pandas tools\")\n",
+ " print(f\"\\nExcelBot > {response}\")\n",
+ "\n",
+ " # Run the agent in a loop for interactive conversation\n",
+ " while True:\n",
+ " user_input = input(\"\\nPrompt > \")\n",
+ " if user_input.lower() == \"exit\":\n",
+ " print(\"Bye.\")\n",
+ " break\n",
+ " response = agent(user_input)\n",
+ " print(f\"\\nExcelBot > {response}\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "While this spreadsheet contains 5000 rows of data, the data is processed by the agent, not the model, which drastically reduces context consumption.\n",
+ "\n",
+ "## Strands as a Model-First Framework\n",
+ "\n",
+ "Strands is designed with a \"model-first\" philosophy, emphasizing the LLM’s strengths in reasoning and planning actions. This approach allows seamless integration with existing libraries and modules, enabling agents to manipulate large datasets in memory rather than within the model’s limited context window. By equipping the model with purpose-built tools, Strands leverages external resources for data management while letting the model focus on the higher-level logic and decision-making. This method not only enhances scalability, but also makes Strands especially well-suited for enterprise-grade data processing tasks.\n",
+ "In this notebook, you learned how to use an existing Python module as a tool, to optimize consumption of context tokens in the agent's model."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Disclaimer\n",
+ "\n",
+ "This sample is provided for educational and demonstration purposes only. It is not intended for production use without further development, testing, and hardening.\n",
+ "\n",
+ "For production deployments, consider:\n",
+ "- Implementing appropriate content filtering and safety measures\n",
+ "- Following security best practices for your deployment environment\n",
+ "- Conducting thorough testing and validation\n",
+ "- Reviewing and adjusting configurations for your specific requirements\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.2"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/02-samples/19-data-analyst-pandas/pandas_agent.py b/02-samples/19-data-analyst-pandas/pandas_agent.py
new file mode 100644
index 00000000..072caa85
--- /dev/null
+++ b/02-samples/19-data-analyst-pandas/pandas_agent.py
@@ -0,0 +1,26 @@
+# pandas_agent.py
+# interacts with Pandas module wrapped in Strands @tool decorators
+#
+
+from strands import Agent
+from pandas_strands_tools import pandas_tools
+
+# Create agent with pre-loaded state and tools
+agent = Agent(
+ tools=pandas_tools
+)
+
+if __name__ == "__main__":
+ print("Pandasbot: Ask me about spreadsheets. Type 'exit' to quit.\n")
+
+ response = agent("List all available pandas tools")
+ print(f"\nExcelBot > {response}")
+
+ # Run the agent in a loop for interactive conversation
+ while True:
+ user_input = input("\nPrompt > ")
+ if user_input.lower() == "exit":
+ print("Bye.")
+ break
+ response = agent(user_input)
+ print(f"\nExcelBot > {response}")
diff --git a/02-samples/19-data-analyst-pandas/pandas_strands_tools.py b/02-samples/19-data-analyst-pandas/pandas_strands_tools.py
new file mode 100644
index 00000000..85a5e125
--- /dev/null
+++ b/02-samples/19-data-analyst-pandas/pandas_strands_tools.py
@@ -0,0 +1,383 @@
+import pandas as pd
+from typing import Any, Dict, List, Optional
+from strands import tool
+
+# Global DataFrame that all tools will operate on
+df_glob = pd.DataFrame()
+
+# Tools that load data and set df_glob
+
+@tool
+def pd_read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None,
+ index_col=None, usecols=None, dtype=None) -> str:
+ """
+ Read a comma-separated values (csv) file into the global DataFrame.
+
+ Args:
+ filepath_or_buffer: str, path object or file-like object
+ sep: str, default ','
+ delimiter: str, optional
+ header: int, Sequence of int, 'infer' or None, default 'infer'
+ names: Sequence of Hashable, optional
+ index_col: Hashable, Sequence of Hashable or False, optional
+ usecols: Sequence of Hashable or Callable, optional
+ dtype: dtype or dict of {Hashable : dtype}, optional
+
+ Returns:
+ Status message with DataFrame shape
+ """
+ global df_glob
+ df_glob = pd.read_csv(filepath_or_buffer=filepath_or_buffer, sep=sep, delimiter=delimiter,
+ header=header, names=names, index_col=index_col, usecols=usecols,
+ dtype=dtype)
+ return f"Loaded CSV into df_glob: {df_glob.shape[0]} rows × {df_glob.shape[1]} columns"
+
+@tool
+def pd_read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None,
+ dtype=None) -> str:
+ """
+ Read an Excel file into the global DataFrame.
+
+ Args:
+ io: str, bytes, ExcelFile, xlrd.Book, path object, or file-like object
+ sheet_name: str, int, list, or None, default 0
+ header: int, list of int, default 0
+ names: array-like, default None
+ index_col: int, str, list of int, default None
+ usecols: str, list-like, or callable, default None
+ dtype: Type name or dict of column -> type, default None
+
+ Returns:
+ Status message with DataFrame shape
+ """
+ global df_glob
+ df_glob = pd.read_excel(io=io, sheet_name=sheet_name, header=header, names=names,
+ index_col=index_col, usecols=usecols, dtype=dtype)
+ return f"Loaded Excel into df_glob: {df_glob.shape[0]} rows × {df_glob.shape[1]} columns"
+
+@tool
+def pd_get_shape() -> str:
+ """Get the shape of the global DataFrame."""
+ global df_glob
+ return f"df_glob shape: {df_glob.shape[0]} rows × {df_glob.shape[1]} columns"
+
+@tool
+def pd_get_columns() -> str:
+ """Get column names from the global DataFrame."""
+ global df_glob
+ return f"df_glob columns: {list(df_glob.columns)}"
+
+@tool
+def pd_head(n: int = 5) -> str:
+ """
+ Return the first n rows of the global DataFrame.
+
+ Args:
+ n: int, default 5 - number of rows to return
+
+ Returns:
+ String representation of first n rows
+ """
+ global df_glob
+ return df_glob.head(n).to_string()
+
+@tool
+def pd_tail(n: int = 5) -> str:
+ """
+ Return the last n rows of the global DataFrame.
+
+ Args:
+ n: int, default 5 - number of rows to return
+
+ Returns:
+ String representation of last n rows
+ """
+ global df_glob
+ return df_glob.tail(n).to_string()
+
+@tool
+def pd_describe() -> str:
+ """Generate descriptive statistics of the global DataFrame."""
+ global df_glob
+ return df_glob.describe().to_string()
+
+@tool
+def pd_info() -> str:
+ """Get information about the global DataFrame."""
+ global df_glob
+ import io
+ buffer = io.StringIO()
+ df_glob.info(buf=buffer)
+ return buffer.getvalue()
+
+@tool
+def pd_filter_rows(column: str, operator: str, value: Any) -> str:
+ """
+ Filter the global DataFrame based on a condition (modifies df_glob in place).
+
+ Args:
+ column: str - column name to filter on
+ operator: str - comparison operator ('>', '<', '>=', '<=', '==', '!=')
+ value: value to compare against
+
+ Returns:
+ Status message with new shape
+ """
+ global df_glob
+
+ if operator == '>':
+ df_glob = df_glob[df_glob[column] > value]
+ elif operator == '<':
+ df_glob = df_glob[df_glob[column] < value]
+ elif operator == '>=':
+ df_glob = df_glob[df_glob[column] >= value]
+ elif operator == '<=':
+ df_glob = df_glob[df_glob[column] <= value]
+ elif operator == '==':
+ df_glob = df_glob[df_glob[column] == value]
+ elif operator == '!=':
+ df_glob = df_glob[df_glob[column] != value]
+ else:
+ return f"Error: Unsupported operator: {operator}"
+
+ return f"Filtered df_glob: {df_glob.shape[0]} rows remaining"
+
+@tool
+def pd_sort_values(by: str, ascending: bool = True) -> str:
+ """
+ Sort the global DataFrame by column (modifies df_glob in place).
+
+ Args:
+ by: str - column name to sort by
+ ascending: bool, default True
+
+ Returns:
+ Status message
+ """
+ global df_glob
+ df_glob = df_glob.sort_values(by=by, ascending=ascending)
+ return f"Sorted df_glob by '{by}' ({'ascending' if ascending else 'descending'})"
+
+@tool
+def pd_drop_duplicates(subset: Optional[str] = None) -> str:
+ """
+ Remove duplicate rows from the global DataFrame (modifies df_glob).
+
+ Args:
+ subset: column label or sequence of labels, optional
+
+ Returns:
+ Status message with new shape
+ """
+ global df_glob
+ original_rows = df_glob.shape[0]
+ df_glob = df_glob.drop_duplicates(subset=subset)
+ removed = original_rows - df_glob.shape[0]
+ return f"Removed {removed} duplicate rows. df_glob now has {df_glob.shape[0]} rows"
+
+@tool
+def pd_dropna(axis: int = 0, how: str = 'any') -> str:
+ """
+ Remove missing values from the global DataFrame (modifies df_glob).
+
+ Args:
+ axis: {0/'index', 1/'columns'}, default 0
+ how: {'any', 'all'}, default 'any'
+
+ Returns:
+ Status message with new shape
+ """
+ global df_glob
+ original_rows = df_glob.shape[0]
+ df_glob = df_glob.dropna(axis=axis, how=how)
+ removed = original_rows - df_glob.shape[0]
+ return f"Removed {removed} rows with NA values. df_glob now has {df_glob.shape[0]} rows"
+
+@tool
+def pd_fillna(value: Any) -> str:
+ """
+ Fill NA/NaN values in the global DataFrame (modifies df_glob).
+
+ Args:
+ value: scalar, dict, Series, or DataFrame - value to fill
+
+ Returns:
+ Status message
+ """
+ global df_glob
+ df_glob = df_glob.fillna(value=value)
+ return f"Filled NA values in df_glob with {value}"
+
+@tool
+def pd_rename_columns(columns: Dict[str, str]) -> str:
+ """
+ Rename columns in the global DataFrame (modifies df_glob).
+
+ Args:
+ columns: dict - mapping of old names to new names
+
+ Returns:
+ Status message
+ """
+ global df_glob
+ df_glob = df_glob.rename(columns=columns)
+ return f"Renamed columns in df_glob. New columns: {list(df_glob.columns)}"
+
+@tool
+def pd_select_columns(columns: List[str]) -> str:
+ """
+ Select specific columns from the global DataFrame (modifies df_glob).
+
+ Args:
+ columns: list of str - column names to keep
+
+ Returns:
+ Status message
+ """
+ global df_glob
+ df_glob = df_glob[columns]
+ return f"Selected columns in df_glob: {list(df_glob.columns)}"
+
+@tool
+def pd_groupby_sum(by: str, value_column: str) -> str:
+ """
+ Group the global DataFrame by column and sum (modifies df_glob).
+
+ Args:
+ by: str - column to group by
+ value_column: str - column to sum
+
+ Returns:
+ String representation of grouped data
+ """
+ global df_glob
+ result = df_glob.groupby(by)[value_column].sum()
+ return result.to_string()
+
+@tool
+def pd_groupby_mean(by: str, value_column: str) -> str:
+ """
+ Group the global DataFrame by column and calculate mean.
+
+ Args:
+ by: str - column to group by
+ value_column: str - column to average
+
+ Returns:
+ String representation of grouped data
+ """
+ global df_glob
+ result = df_glob.groupby(by)[value_column].mean()
+ return result.to_string()
+
+@tool
+def pd_value_counts(column: str) -> str:
+ """
+ Get value counts for a column in the global DataFrame.
+
+ Args:
+ column: str - column name
+
+ Returns:
+ String representation of value counts
+ """
+ global df_glob
+ return df_glob[column].value_counts().to_string()
+
+@tool
+def pd_column_sum(column: str) -> str:
+ """
+ Sum a numeric column in the global DataFrame.
+
+ Args:
+ column: str - column name
+
+ Returns:
+ Sum as string
+ """
+ global df_glob
+ return f"{column} sum: {df_glob[column].sum()}"
+
+@tool
+def pd_column_mean(column: str) -> str:
+ """
+ Calculate mean of a numeric column in the global DataFrame.
+
+ Args:
+ column: str - column name
+
+ Returns:
+ Mean as string
+ """
+ global df_glob
+ return f"{column} mean: {df_glob[column].mean():.2f}"
+
+@tool
+def pd_column_max(column: str) -> str:
+ """
+ Get maximum value in a column of the global DataFrame.
+
+ Args:
+ column: str - column name
+
+ Returns:
+ Max value as string
+ """
+ global df_glob
+ return f"{column} max: {df_glob[column].max()}"
+
+@tool
+def pd_column_min(column: str) -> str:
+ """
+ Get minimum value in a column of the global DataFrame.
+
+ Args:
+ column: str - column name
+
+ Returns:
+ Min value as string
+ """
+ global df_glob
+ return f"{column} min: {df_glob[column].min()}"
+
+@tool
+def pd_reset_index() -> str:
+ """Reset the index of the global DataFrame (modifies df_glob)."""
+ global df_glob
+ df_glob = df_glob.reset_index(drop=True)
+ return "Reset df_glob index"
+
+@tool
+def pd_create_empty() -> str:
+ """Create an empty global DataFrame."""
+ global df_glob
+ df_glob = pd.DataFrame()
+ return "Created empty df_glob"
+
+# List of all tools
+pandas_tools = [
+ pd_read_csv,
+ pd_read_excel,
+ pd_get_shape,
+ pd_get_columns,
+ pd_head,
+ pd_tail,
+ pd_describe,
+ pd_info,
+ pd_filter_rows,
+ pd_sort_values,
+ pd_drop_duplicates,
+ pd_dropna,
+ pd_fillna,
+ pd_rename_columns,
+ pd_select_columns,
+ pd_groupby_sum,
+ pd_groupby_mean,
+ pd_value_counts,
+ pd_column_sum,
+ pd_column_mean,
+ pd_column_max,
+ pd_column_min,
+ pd_reset_index,
+ pd_create_empty,
+]
\ No newline at end of file
diff --git a/02-samples/19-data-analyst-pandas/requirements.txt b/02-samples/19-data-analyst-pandas/requirements.txt
new file mode 100644
index 00000000..1c9a7e4a
--- /dev/null
+++ b/02-samples/19-data-analyst-pandas/requirements.txt
@@ -0,0 +1,2 @@
+pandas>=1.3.0
+strands-agents
\ No newline at end of file
diff --git a/02-samples/README.md b/02-samples/README.md
index 8a8c0c40..2d5b5550 100644
--- a/02-samples/README.md
+++ b/02-samples/README.md
@@ -22,4 +22,5 @@
| 16 | [Lambda Error Analysis Agent](./16-lambda-error-analysis-agent/) | Intelligent Lambda error diagnostics with AI-powered root cause analysis. Features event-driven architecture, multi-tool Strands agent (source code, logs, Knowledge Base), confidence scoring, and complete CDK infrastructure with Bedrock Knowledge Base integration. Automates error analysis and provides actionable recommendations for Lambda failures. |
| 17 | [GenAI-powered wealth and financial advisory tools](./17-genai-powered-financial-advisor-tools/) | A comprehensive GenAI-powered wealth and financial advisory tools built with Strands Agents, Amazon Bedrock, and MCP designed to automate client meeting analysis, advisor follow-up item analysis, client portfolio analysis, market research, and report generation for financial advisors. |
| 18 | [Autonomous AI Advertising Agent with Crypto Payments](./18-ai-ads-generation-agent-with-crypto-payments/) | AI agent that creates complete advertising campaigns and autonomously pays for premium services using cryptocurrency via the X402 payment protocol. Demonstrates economic agency with weather-responsive ad generation, AI image generation, USDC payments on Base Sepolia testnet, and complete HTML campaign output. Integrates Strands Agents with Coinbase AgentKit. |
+|19 | [Pandas Data Analyst](./19-data-analyst-pandas) | The Pandas-based data analyst illustrates how to create tools from existing resources to minimize context consumption in a model.|