Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions samples/mcs-finance-statement-agent/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
node_modules
lib

135 changes: 135 additions & 0 deletions samples/mcs-finance-statement-agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Finance Statement Agent

## Summary

Conversational AI for credit financial statement extraction. An analyst uploads a PDF financial report in Microsoft Teams; a Copilot Studio agent orchestrates an Azure-hosted pipeline that extracts the **Income Statement, Balance Sheet, Cash Flow,** and computed **Ratios** into a downloadable Excel workbook ready for credit spreading.

Multi-language: English, Chinese, Japanese, French (auto-detected, label-reconciled to a canonical schema).

![Architecture](./assets/architecture.png)

**Data flow:** Analyst → Teams → Copilot Studio → Custom Connector → Azure Functions (HTTP 202) → Poll until complete → Excel generated → SAS download link in chat.

## Frameworks

![drop](https://img.shields.io/badge/Microsoft Copilot Studio-latest-green.svg)
![drop](https://img.shields.io/badge/Azure Functions-Python 3.11-green.svg)
![drop](https://img.shields.io/badge/Azure OpenAI-GPT‑4.1-green.svg)
![drop](https://img.shields.io/badge/Power Apps Code App-React + TS-green.svg)

## Prerequisites

* Microsoft 365 tenant with **Copilot Studio** licensed
* **Power Platform** environment with Dataverse and custom connector permissions
* **Azure subscription** with rights to create Resource Group, Function App, Content Understanding (or Document Intelligence), Azure OpenAI (with `gpt-4.1` deployment), and Storage Account
* [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli) 2.50+
* [Azure Functions Core Tools](https://learn.microsoft.com/azure/azure-functions/functions-run-local) 4.x
* [Power Platform CLI (`pac`)](https://learn.microsoft.com/power-platform/developer/cli/introduction)
* Python 3.11+, Node.js 18+

## Contributors

mcs-finance-statement-agent | Shaji Sivaraman ([@sgshaji](https://github.com/sgshaji)), Microsoft

## Version history

Version | Date | Author | Comments
--------|------|--------|---------
1.0 | April 30, 2026 | Shaji Sivaraman | Initial release

## Disclaimer

**THIS CODE IS PROVIDED *AS IS* WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING ANY IMPLIED WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR NON-INFRINGEMENT.**

---

## Minimal Path to Awesome

The sample is split across three deployables: an **Azure Functions** backend, a **Copilot Studio** agent, and an optional **Power Apps Code App** for human-in-the-loop review.

### 1. Clone and authenticate

```bash
git clone https://github.com/pnp/copilot-pro-dev-samples.git
cd copilot-pro-dev-samples/samples/mcs-finance-statement-agent/src
az login
az account set --subscription "<subscription-id>"
```

### 2. Provision Azure resources

Create the resources listed in **Prerequisites**. Grant the Function App's **system-assigned managed identity** these roles at the resource group scope:
* `Cognitive Services User` — Content Understanding / Document Intelligence
* `Cognitive Services OpenAI User` — Azure OpenAI
* `Storage Blob Data Contributor` — Storage Account

### 3. Configure and run the backend

```bash
cd azure-functions
cp .env.example .env
# edit .env with your endpoints (no API keys — Managed Identity handles auth)
pip install -r requirements.txt
func start # http://localhost:7071/api/health
```

### 4. Deploy the Function

```bash
func azure functionapp publish <your-function-app> --python
```

### 5. Set up the custom connector

1. Power Platform → **Custom connectors → New → Import OpenAPI file** → `docs/custom-connector-swagger.yml`
2. Update the Host to your Function App
3. Authentication: **API key** (header `x-functions-key`) — retrieve via:
```bash
az functionapp keys list --name <your-function-app> --resource-group <your-rg>
```
4. Create a connection using the new connector

### 6. Push the Copilot Studio agent

```bash
cd ../copilot-studio-agent
pac copilot push --environment <env-id>
```

### 7. (Optional) Deploy the HITL review Code App

```bash
cd ../code-app
npm install && npm run build
pac code push
```

## Features

This sample demonstrates an end-to-end agentic pattern for processing long-running document workloads from Copilot Studio:

* **Async extraction with polling** — Power Platform custom connectors have a default 30-second synchronous request timeout. The pipeline returns `HTTP 202 {jobId}` in ~100 ms; the Copilot Studio topic polls `/extract/status/{jobId}` every 30 s until `completed` (`ConditionGroup` + `GotoAction` loop bounded by max-attempts)
* **Pluggable extraction backend** — Content Understanding (default), Document Intelligence, Textract, or local pdfplumber, selectable via `EXTRACTION_BACKEND`
* **5-stage pipeline** — analyze → select → extract → enrich → validate. Backends emit a common markdown + HTML-table format so Stages 2–5 are reusable
* **Multi-language label reconciliation** — Azure OpenAI maps source-language labels to a canonical English schema for English, Chinese, Japanese, and French statements
* **Managed Identity end-to-end** — no API keys for Azure service-to-service auth. The only secret is the Function key consumed by the Power Platform custom connector
* **Job state in Blob with 30-min TTL** — bounded storage; SAS URLs returned in chat for the generated Excel
* **Human-in-the-loop review** — optional Power Apps Code App provides an analyst grid backed by Dataverse for correcting extracted values before downstream credit spreading
* **Multi-row column-header parsing** — handles statements where Q4 and FY columns share a parent header (e.g., Meta Income Statement)
* **>5 MB upload handling** — uses Copilot Studio `Question` node bound to `FilePrebuiltEntity` (direct `Activity.Attachments` access fails for files > 5 MB)

### Repository layout (under `src/`)

```
src/
├── azure-functions/ # Python backend (HTTP 202 async pipeline)
│ ├── function_app.py # HTTP router
│ └── extractor/ # 5-stage pipeline + clients (CU, DI, Textract, pdfplumber)
├── copilot-studio-agent/ # Copilot Studio YAML (agent, topics, actions, workflows)
├── code-app/ # Power Apps Code App — React HITL review grid (Dataverse)
└── docs/
├── architecture.png # Architecture diagram
└── custom-connector-swagger.yml # Swagger 2.0 spec for the custom connector
```

<img src="https://m365-visitor-stats.azurewebsites.net/copilot-pro-dev-samples/samples/mcs-finance-statement-agent" />
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Deploy Azure Function

on:
push:
branches: [main]
paths:
- 'azure-functions/**'
workflow_dispatch:

env:
AZURE_FUNCTIONAPP_NAME: fin-stmt-extractor-v2
AZURE_FUNCTIONAPP_PACKAGE_PATH: azure-functions
PYTHON_VERSION: '3.11'

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}

- name: Install dependencies
run: |
cd ${{ env.AZURE_FUNCTIONAPP_PACKAGE_PATH }}
pip install -r requirements.txt --target=".python_packages/lib/site-packages"

- name: Deploy to Azure Functions
uses: Azure/functions-action@v1
with:
app-name: ${{ env.AZURE_FUNCTIONAPP_NAME }}
package: ${{ env.AZURE_FUNCTIONAPP_PACKAGE_PATH }}
publish-profile: ${{ secrets.AZURE_FUNCTIONAPP_PUBLISH_PROFILE }}
scm-do-build-during-deployment: false
enable-oryx-build: false
50 changes: 50 additions & 0 deletions samples/mcs-finance-statement-agent/src/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Node
node_modules/
dist/
.vite/

# Python
__pycache__/
*.pyc
.venv/
venv/
.pytest_cache/

# Power Platform
.power/
power.config.json
.mcs/

# Azure Functions
local.settings.json
.python_packages/
bin/
obj/

# IDE
.vscode/
.idea/
*.swp

# OS
.DS_Store
Thumbs.db

# Environment — never commit secrets
.env
.env.local

# Build artifacts / pip install noise
=*

# Sample / customer-specific artifacts — never commit
*.xlsx
*.pdf
*.pptx
*.docx
docs/samples/
_*
_tmp_logos/

# MCP widget test scaffolding (cloned from microsoft/mcp-interactiveUI-samples)
mcp-widget-test/
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Copy this file to .env and fill in your values
# Do NOT commit .env to source control
#
# Authentication: All services use Managed Identity (DefaultAzureCredential).
# API keys are disabled by corp policy — no AZURE_*_KEY vars needed.
# Ensure the Function App's managed identity has the required roles:
# - Cognitive Services User on the Document Intelligence resource
# - Cognitive Services OpenAI User on the Azure OpenAI resource

# ----- Azure Document Intelligence (table/PDF extraction) -----
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-docai-resource.cognitiveservices.azure.com/

# ----- Azure OpenAI (for LLM classification + enrichment) -----
AZURE_OPENAI_ENDPOINT=https://your-aoai-resource.openai.azure.com
AZURE_OPENAI_DEPLOYMENT=gpt-4.1
AZURE_OPENAI_API_VERSION=2024-12-01-preview

# ----- Extraction Backend -----
# EXTRACTION_BACKEND=document_intelligence # "document_intelligence" (default), "textract", or "pdfplumber"

# ----- AWS Textract (only if EXTRACTION_BACKEND=textract) -----
# AWS_REGION=us-east-1
# AWS_S3_BUCKET=
# AWS_S3_PREFIX=textract-input
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.venv
.env
tests/
__pycache__/
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
{
"description": "Extracts structured rows from a financial statement with analytics-ready output.",
"baseAnalyzerId": "prebuilt-documentAnalyzer",
"config": {
"returnDetails": true,
"enableOcr": true,
"enableLayout": true,
"estimateFieldSourceAndConfidence": true
},
"fieldSchema": {
"fields": {
"columns": {
"type": "array",
"method": "generate",
"description": "Extract the period column headers from the FIRST financial statement table found in the document. Ignore any reconciliation tables, supplemental tables, or non-GAAP tables that may appear on the same page. For each column return: label (English normalized, e.g. 'Q4 2025'), label_raw (original language), period_type (quarter/annual/year_to_date/half_year/nine_months/instant/other), fiscal_year (integer), is_comparative (true for prior period).",
"items": {
"type": "object",
"properties": {
"label": {"type": "string", "description": "English normalized period label"},
"label_raw": {"type": "string", "description": "Period label in original language"},
"period_type": {"type": "string", "description": "quarter/annual/year_to_date/half_year/nine_months/instant/other"},
"fiscal_year": {"type": "integer"},
"is_comparative": {"type": "boolean"}
}
}
},
"rows": {
"type": "array",
"method": "generate",
"description": "Extract EVERY INDIVIDUAL row from the FIRST financial statement table found in the document. Do NOT summarize or combine rows. If the table shows 'Cost of revenue', 'Research and development', 'Marketing and sales', 'General and administrative' as separate line items, extract EACH ONE as its own row. Do NOT collapse them into a single 'Costs and expenses' row. Ignore any GAAP-to-non-GAAP reconciliation tables, supplemental tables, segment tables, or free cash flow calculations that may appear on the same or adjacent pages. CRITICAL RULES: (1) For 'values', return a JSON array string with one entry per column. Each entry is the numeric value (negative for parenthesised amounts like (26,248) becomes -26248), or null for blank cells. Example: '[59893, 48385, 200966, 164501]' or '[null, 94, null, 383]'. (2) For non-English labels, canonical_key MUST be the English IFRS/GAAP equivalent in snake_case (e.g. 货币资金->cash_and_cash_equivalents). (3) section must be one of: current_assets, non_current_assets, assets, current_liabilities, non_current_liabilities, liabilities, equity, revenue, operating_expenses, non_operating, tax, eps, shares, operating_activities, investing_activities, financing_activities, cash_reconciliation, supplemental_disclosures, other.",
"items": {
"type": "object",
"properties": {
"label_raw": {
"type": "string",
"description": "Row label exactly as in the document (original language)"
},
"label_normalized": {
"type": "string",
"description": "English IFRS/GAAP equivalent label"
},
"label_language": {
"type": "string",
"description": "ISO 639-1: en, zh, ja, etc."
},
"canonical_key": {
"type": "string",
"description": "English snake_case key (e.g. revenue, total_assets, net_income)"
},
"row_type": {
"type": "string",
"description": "section_header, line_item, subtotal, or total"
},
"indent_level": {
"type": "integer",
"description": "0=top level, 1=within section, 2=sub-item"
},
"section": {
"type": "string",
"description": "Section identifier (e.g. current_assets, operating_activities)"
},
"values": {
"type": "string",
"description": "JSON array of numeric values, one per column. Use null for blank cells. Example: '[59893, 48385, 200966, 164501]' or '[null, null, -26248, -30125]'. Parenthesised amounts become negative."
},
"values_raw": {
"type": "string",
"description": "JSON array of display strings, one per column. Example: '[\"59,893\", \"48,385\", \"200,966\", \"164,501\"]' or '[null, null, \"(26,248)\", \"(30,125)\"]'"
}
}
}
}
}
}
}
Loading