pnp · sgshaji · Apr 30, 2026
diff --git a/samples/mcs-finance-statement-agent/.gitignore b/samples/mcs-finance-statement-agent/.gitignore
@@ -0,0 +1,3 @@
+node_modules
+lib
+
diff --git a/samples/mcs-finance-statement-agent/README.md b/samples/mcs-finance-statement-agent/README.md
@@ -0,0 +1,135 @@
+# Finance Statement Agent
+
+## Summary
+
+Conversational AI for credit financial statement extraction. An analyst uploads a PDF financial report in Microsoft Teams; a Copilot Studio agent orchestrates an Azure-hosted pipeline that extracts the **Income Statement, Balance Sheet, Cash Flow,** and computed **Ratios** into a downloadable Excel workbook ready for credit spreading.
+
+Multi-language: English, Chinese, Japanese, French (auto-detected, label-reconciled to a canonical schema).
+
+![Architecture](./assets/architecture.png)
+
+**Data flow:** Analyst → Teams → Copilot Studio → Custom Connector → Azure Functions (HTTP 202) → Poll until complete → Excel generated → SAS download link in chat.
+
+## Frameworks
+
+![drop](https://img.shields.io/badge/Microsoft&nbsp;Copilot&nbsp;Studio-latest-green.svg)
+![drop](https://img.shields.io/badge/Azure&nbsp;Functions-Python&nbsp;3.11-green.svg)
+![drop](https://img.shields.io/badge/Azure&nbsp;OpenAI-GPT&#8209;4.1-green.svg)
+![drop](https://img.shields.io/badge/Power&nbsp;Apps&nbsp;Code&nbsp;App-React&nbsp;+&nbsp;TS-green.svg)
+
+## Prerequisites
+
+* Microsoft 365 tenant with **Copilot Studio** licensed
+* **Power Platform** environment with Dataverse and custom connector permissions
+* **Azure subscription** with rights to create Resource Group, Function App, Content Understanding (or Document Intelligence), Azure OpenAI (with `gpt-4.1` deployment), and Storage Account
+* [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli) 2.50+
+* [Azure Functions Core Tools](https://learn.microsoft.com/azure/azure-functions/functions-run-local) 4.x
+* [Power Platform CLI (`pac`)](https://learn.microsoft.com/power-platform/developer/cli/introduction)
+* Python 3.11+, Node.js 18+
+
+## Contributors
+
+mcs-finance-statement-agent | Shaji Sivaraman ([@sgshaji](https://github.com/sgshaji)), Microsoft
+
+## Version history
+
+Version | Date | Author | Comments
+--------|------|--------|---------
+1.0 | April 30, 2026 | Shaji Sivaraman | Initial release
+
+## Disclaimer
+
+**THIS CODE IS PROVIDED *AS IS* WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING ANY IMPLIED WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR NON-INFRINGEMENT.**
+
+---
+
+## Minimal Path to Awesome
+
+The sample is split across three deployables: an **Azure Functions** backend, a **Copilot Studio** agent, and an optional **Power Apps Code App** for human-in-the-loop review.
+
+### 1. Clone and authenticate
+
+```bash
+git clone https://github.com/pnp/copilot-pro-dev-samples.git
+cd copilot-pro-dev-samples/samples/mcs-finance-statement-agent/src
+az login
+az account set --subscription "<subscription-id>"
+```
+
+### 2. Provision Azure resources
+
+Create the resources listed in **Prerequisites**. Grant the Function App's **system-assigned managed identity** these roles at the resource group scope:
+* `Cognitive Services User` — Content Understanding / Document Intelligence
+* `Cognitive Services OpenAI User` — Azure OpenAI
+* `Storage Blob Data Contributor` — Storage Account
+
+### 3. Configure and run the backend
+
+```bash
+cd azure-functions
+cp .env.example .env
+# edit .env with your endpoints (no API keys — Managed Identity handles auth)
+pip install -r requirements.txt
+func start                      # http://localhost:7071/api/health
+```
+
+### 4. Deploy the Function
+
+```bash
+func azure functionapp publish <your-function-app> --python
+```
+
+### 5. Set up the custom connector
+
+1. Power Platform → **Custom connectors → New → Import OpenAPI file** → `docs/custom-connector-swagger.yml`
+2. Update the Host to your Function App
+3. Authentication: **API key** (header `x-functions-key`) — retrieve via:
+   ```bash
+   az functionapp keys list --name <your-function-app> --resource-group <your-rg>
+   ```
+4. Create a connection using the new connector
+
+### 6. Push the Copilot Studio agent
+
+```bash
+cd ../copilot-studio-agent
+pac copilot push --environment <env-id>
+```
+
+### 7. (Optional) Deploy the HITL review Code App
+
+```bash
+cd ../code-app
+npm install && npm run build
+pac code push
+```
+
+## Features
+
+This sample demonstrates an end-to-end agentic pattern for processing long-running document workloads from Copilot Studio:
+
+* **Async extraction with polling** — Power Platform custom connectors have a default 30-second synchronous request timeout. The pipeline returns `HTTP 202 {jobId}` in ~100 ms; the Copilot Studio topic polls `/extract/status/{jobId}` every 30 s until `completed` (`ConditionGroup` + `GotoAction` loop bounded by max-attempts)
+* **Pluggable extraction backend** — Content Understanding (default), Document Intelligence, Textract, or local pdfplumber, selectable via `EXTRACTION_BACKEND`
+* **5-stage pipeline** — analyze → select → extract → enrich → validate. Backends emit a common markdown + HTML-table format so Stages 2–5 are reusable
+* **Multi-language label reconciliation** — Azure OpenAI maps source-language labels to a canonical English schema for English, Chinese, Japanese, and French statements
+* **Managed Identity end-to-end** — no API keys for Azure service-to-service auth. The only secret is the Function key consumed by the Power Platform custom connector
+* **Job state in Blob with 30-min TTL** — bounded storage; SAS URLs returned in chat for the generated Excel
+* **Human-in-the-loop review** — optional Power Apps Code App provides an analyst grid backed by Dataverse for correcting extracted values before downstream credit spreading
+* **Multi-row column-header parsing** — handles statements where Q4 and FY columns share a parent header (e.g., Meta Income Statement)
+* **>5 MB upload handling** — uses Copilot Studio `Question` node bound to `FilePrebuiltEntity` (direct `Activity.Attachments` access fails for files > 5 MB)
+
+### Repository layout (under `src/`)
+
+```
+src/
+├── azure-functions/         # Python backend (HTTP 202 async pipeline)
+│   ├── function_app.py      # HTTP router
+│   └── extractor/           # 5-stage pipeline + clients (CU, DI, Textract, pdfplumber)
+├── copilot-studio-agent/    # Copilot Studio YAML (agent, topics, actions, workflows)
+├── code-app/                # Power Apps Code App — React HITL review grid (Dataverse)
+└── docs/
+    ├── architecture.png             # Architecture diagram
+    └── custom-connector-swagger.yml # Swagger 2.0 spec for the custom connector
+```
+
+<img src="https://m365-visitor-stats.azurewebsites.net/copilot-pro-dev-samples/samples/mcs-finance-statement-agent" />
diff --git a/samples/mcs-finance-statement-agent/assets/architecture.png b/samples/mcs-finance-statement-agent/assets/architecture.png
diff --git a/samples/mcs-finance-statement-agent/src/.github/workflows/deploy-function.yml b/samples/mcs-finance-statement-agent/src/.github/workflows/deploy-function.yml
@@ -0,0 +1,38 @@
+name: Deploy Azure Function
+
+on:
+  push:
+    branches: [main]
+    paths:
+      - 'azure-functions/**'
+  workflow_dispatch:
+
+env:
+  AZURE_FUNCTIONAPP_NAME: fin-stmt-extractor-v2
+  AZURE_FUNCTIONAPP_PACKAGE_PATH: azure-functions
+  PYTHON_VERSION: '3.11'
+
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ env.PYTHON_VERSION }}
+
+      - name: Install dependencies
+        run: |
+          cd ${{ env.AZURE_FUNCTIONAPP_PACKAGE_PATH }}
+          pip install -r requirements.txt --target=".python_packages/lib/site-packages"
+
+      - name: Deploy to Azure Functions
+        uses: Azure/functions-action@v1
+        with:
+          app-name: ${{ env.AZURE_FUNCTIONAPP_NAME }}
+          package: ${{ env.AZURE_FUNCTIONAPP_PACKAGE_PATH }}
+          publish-profile: ${{ secrets.AZURE_FUNCTIONAPP_PUBLISH_PROFILE }}
+          scm-do-build-during-deployment: false
+          enable-oryx-build: false
diff --git a/samples/mcs-finance-statement-agent/src/.gitignore b/samples/mcs-finance-statement-agent/src/.gitignore
@@ -0,0 +1,50 @@
+# Node
+node_modules/
+dist/
+.vite/
+
+# Python
+__pycache__/
+*.pyc
+.venv/
+venv/
+.pytest_cache/
+
+# Power Platform
+.power/
+power.config.json
+.mcs/
+
+# Azure Functions
+local.settings.json
+.python_packages/
+bin/
+obj/
+
+# IDE
+.vscode/
+.idea/
+*.swp
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Environment — never commit secrets
+.env
+.env.local
+
+# Build artifacts / pip install noise
+=*
+
+# Sample / customer-specific artifacts — never commit
+*.xlsx
+*.pdf
+*.pptx
+*.docx
+docs/samples/
+_*
+_tmp_logos/
+
+# MCP widget test scaffolding (cloned from microsoft/mcp-interactiveUI-samples)
+mcp-widget-test/
diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/.env.example b/samples/mcs-finance-statement-agent/src/azure-functions/.env.example
@@ -0,0 +1,24 @@
+# Copy this file to .env and fill in your values
+# Do NOT commit .env to source control
+#
+# Authentication: All services use Managed Identity (DefaultAzureCredential).
+# API keys are disabled by corp policy — no AZURE_*_KEY vars needed.
+# Ensure the Function App's managed identity has the required roles:
+#   - Cognitive Services User on the Document Intelligence resource
+#   - Cognitive Services OpenAI User on the Azure OpenAI resource
+
+# ----- Azure Document Intelligence (table/PDF extraction) -----
+AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-docai-resource.cognitiveservices.azure.com/
+
+# ----- Azure OpenAI (for LLM classification + enrichment) -----
+AZURE_OPENAI_ENDPOINT=https://your-aoai-resource.openai.azure.com
+AZURE_OPENAI_DEPLOYMENT=gpt-4.1
+AZURE_OPENAI_API_VERSION=2024-12-01-preview
+
+# ----- Extraction Backend -----
+# EXTRACTION_BACKEND=document_intelligence   # "document_intelligence" (default), "textract", or "pdfplumber"
+
+# ----- AWS Textract (only if EXTRACTION_BACKEND=textract) -----
+# AWS_REGION=us-east-1
+# AWS_S3_BUCKET=
+# AWS_S3_PREFIX=textract-input
diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/.funcignore b/samples/mcs-finance-statement-agent/src/azure-functions/.funcignore
@@ -0,0 +1,4 @@
+.venv
+.env
+tests/
+__pycache__/
diff --git a/...statement-agent/src/azure-functions/analyzer_templates/financial_statement_extractor.json b/...statement-agent/src/azure-functions/analyzer_templates/financial_statement_extractor.json
@@ -0,0 +1,75 @@
+{
+  "description": "Extracts structured rows from a financial statement with analytics-ready output.",
+  "baseAnalyzerId": "prebuilt-documentAnalyzer",
+  "config": {
+    "returnDetails": true,
+    "enableOcr": true,
+    "enableLayout": true,
+    "estimateFieldSourceAndConfidence": true
+  },
+  "fieldSchema": {
+    "fields": {
+      "columns": {
+        "type": "array",
+        "method": "generate",
+        "description": "Extract the period column headers from the FIRST financial statement table found in the document. Ignore any reconciliation tables, supplemental tables, or non-GAAP tables that may appear on the same page. For each column return: label (English normalized, e.g. 'Q4 2025'), label_raw (original language), period_type (quarter/annual/year_to_date/half_year/nine_months/instant/other), fiscal_year (integer), is_comparative (true for prior period).",
+        "items": {
+          "type": "object",
+          "properties": {
+            "label": {"type": "string", "description": "English normalized period label"},
+            "label_raw": {"type": "string", "description": "Period label in original language"},
+            "period_type": {"type": "string", "description": "quarter/annual/year_to_date/half_year/nine_months/instant/other"},
+            "fiscal_year": {"type": "integer"},
+            "is_comparative": {"type": "boolean"}
+          }
+        }
+      },
+      "rows": {
+        "type": "array",
+        "method": "generate",
+        "description": "Extract EVERY INDIVIDUAL row from the FIRST financial statement table found in the document. Do NOT summarize or combine rows. If the table shows 'Cost of revenue', 'Research and development', 'Marketing and sales', 'General and administrative' as separate line items, extract EACH ONE as its own row. Do NOT collapse them into a single 'Costs and expenses' row. Ignore any GAAP-to-non-GAAP reconciliation tables, supplemental tables, segment tables, or free cash flow calculations that may appear on the same or adjacent pages. CRITICAL RULES: (1) For 'values', return a JSON array string with one entry per column. Each entry is the numeric value (negative for parenthesised amounts like (26,248) becomes -26248), or null for blank cells. Example: '[59893, 48385, 200966, 164501]' or '[null, 94, null, 383]'. (2) For non-English labels, canonical_key MUST be the English IFRS/GAAP equivalent in snake_case (e.g. 货币资金->cash_and_cash_equivalents). (3) section must be one of: current_assets, non_current_assets, assets, current_liabilities, non_current_liabilities, liabilities, equity, revenue, operating_expenses, non_operating, tax, eps, shares, operating_activities, investing_activities, financing_activities, cash_reconciliation, supplemental_disclosures, other.",
+        "items": {
+          "type": "object",
+          "properties": {
+            "label_raw": {
+              "type": "string",
+              "description": "Row label exactly as in the document (original language)"
+            },
+            "label_normalized": {
+              "type": "string",
+              "description": "English IFRS/GAAP equivalent label"
+            },
+            "label_language": {
+              "type": "string",
+              "description": "ISO 639-1: en, zh, ja, etc."
+            },
+            "canonical_key": {
+              "type": "string",
+              "description": "English snake_case key (e.g. revenue, total_assets, net_income)"
+            },
+            "row_type": {
+              "type": "string",
+              "description": "section_header, line_item, subtotal, or total"
+            },
+            "indent_level": {
+              "type": "integer",
+              "description": "0=top level, 1=within section, 2=sub-item"
+            },
+            "section": {
+              "type": "string",
+              "description": "Section identifier (e.g. current_assets, operating_activities)"
+            },
+            "values": {
+              "type": "string",
+              "description": "JSON array of numeric values, one per column. Use null for blank cells. Example: '[59893, 48385, 200966, 164501]' or '[null, null, -26248, -30125]'. Parenthesised amounts become negative."
+            },
+            "values_raw": {
+              "type": "string",
+              "description": "JSON array of display strings, one per column. Example: '[\"59,893\", \"48,385\", \"200,966\", \"164,501\"]' or '[null, null, \"(26,248)\", \"(30,125)\"]'"
+            }
+          }
+        }
+      }
+    }
+  }
+}