diff --git a/samples/mcs-finance-statement-agent/.gitignore b/samples/mcs-finance-statement-agent/.gitignore new file mode 100644 index 000000000..83dbaee65 --- /dev/null +++ b/samples/mcs-finance-statement-agent/.gitignore @@ -0,0 +1,3 @@ +node_modules +lib + diff --git a/samples/mcs-finance-statement-agent/README.md b/samples/mcs-finance-statement-agent/README.md new file mode 100644 index 000000000..ceac72d53 --- /dev/null +++ b/samples/mcs-finance-statement-agent/README.md @@ -0,0 +1,135 @@ +# Finance Statement Agent + +## Summary + +Conversational AI for credit financial statement extraction. An analyst uploads a PDF financial report in Microsoft Teams; a Copilot Studio agent orchestrates an Azure-hosted pipeline that extracts the **Income Statement, Balance Sheet, Cash Flow,** and computed **Ratios** into a downloadable Excel workbook ready for credit spreading. + +Multi-language: English, Chinese, Japanese, French (auto-detected, label-reconciled to a canonical schema). + +![Architecture](./assets/architecture.png) + +**Data flow:** Analyst → Teams → Copilot Studio → Custom Connector → Azure Functions (HTTP 202) → Poll until complete → Excel generated → SAS download link in chat. + +## Frameworks + +![drop](https://img.shields.io/badge/Microsoft Copilot Studio-latest-green.svg) +![drop](https://img.shields.io/badge/Azure Functions-Python 3.11-green.svg) +![drop](https://img.shields.io/badge/Azure OpenAI-GPT‑4.1-green.svg) +![drop](https://img.shields.io/badge/Power Apps Code App-React + TS-green.svg) + +## Prerequisites + +* Microsoft 365 tenant with **Copilot Studio** licensed +* **Power Platform** environment with Dataverse and custom connector permissions +* **Azure subscription** with rights to create Resource Group, Function App, Content Understanding (or Document Intelligence), Azure OpenAI (with `gpt-4.1` deployment), and Storage Account +* [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli) 2.50+ +* [Azure Functions Core Tools](https://learn.microsoft.com/azure/azure-functions/functions-run-local) 4.x +* [Power Platform CLI (`pac`)](https://learn.microsoft.com/power-platform/developer/cli/introduction) +* Python 3.11+, Node.js 18+ + +## Contributors + +mcs-finance-statement-agent | Shaji Sivaraman ([@sgshaji](https://github.com/sgshaji)), Microsoft + +## Version history + +Version | Date | Author | Comments +--------|------|--------|--------- +1.0 | April 30, 2026 | Shaji Sivaraman | Initial release + +## Disclaimer + +**THIS CODE IS PROVIDED *AS IS* WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING ANY IMPLIED WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR NON-INFRINGEMENT.** + +--- + +## Minimal Path to Awesome + +The sample is split across three deployables: an **Azure Functions** backend, a **Copilot Studio** agent, and an optional **Power Apps Code App** for human-in-the-loop review. + +### 1. Clone and authenticate + +```bash +git clone https://github.com/pnp/copilot-pro-dev-samples.git +cd copilot-pro-dev-samples/samples/mcs-finance-statement-agent/src +az login +az account set --subscription "" +``` + +### 2. Provision Azure resources + +Create the resources listed in **Prerequisites**. Grant the Function App's **system-assigned managed identity** these roles at the resource group scope: +* `Cognitive Services User` — Content Understanding / Document Intelligence +* `Cognitive Services OpenAI User` — Azure OpenAI +* `Storage Blob Data Contributor` — Storage Account + +### 3. Configure and run the backend + +```bash +cd azure-functions +cp .env.example .env +# edit .env with your endpoints (no API keys — Managed Identity handles auth) +pip install -r requirements.txt +func start # http://localhost:7071/api/health +``` + +### 4. Deploy the Function + +```bash +func azure functionapp publish --python +``` + +### 5. Set up the custom connector + +1. Power Platform → **Custom connectors → New → Import OpenAPI file** → `docs/custom-connector-swagger.yml` +2. Update the Host to your Function App +3. Authentication: **API key** (header `x-functions-key`) — retrieve via: + ```bash + az functionapp keys list --name --resource-group + ``` +4. Create a connection using the new connector + +### 6. Push the Copilot Studio agent + +```bash +cd ../copilot-studio-agent +pac copilot push --environment +``` + +### 7. (Optional) Deploy the HITL review Code App + +```bash +cd ../code-app +npm install && npm run build +pac code push +``` + +## Features + +This sample demonstrates an end-to-end agentic pattern for processing long-running document workloads from Copilot Studio: + +* **Async extraction with polling** — Power Platform custom connectors have a default 30-second synchronous request timeout. The pipeline returns `HTTP 202 {jobId}` in ~100 ms; the Copilot Studio topic polls `/extract/status/{jobId}` every 30 s until `completed` (`ConditionGroup` + `GotoAction` loop bounded by max-attempts) +* **Pluggable extraction backend** — Content Understanding (default), Document Intelligence, Textract, or local pdfplumber, selectable via `EXTRACTION_BACKEND` +* **5-stage pipeline** — analyze → select → extract → enrich → validate. Backends emit a common markdown + HTML-table format so Stages 2–5 are reusable +* **Multi-language label reconciliation** — Azure OpenAI maps source-language labels to a canonical English schema for English, Chinese, Japanese, and French statements +* **Managed Identity end-to-end** — no API keys for Azure service-to-service auth. The only secret is the Function key consumed by the Power Platform custom connector +* **Job state in Blob with 30-min TTL** — bounded storage; SAS URLs returned in chat for the generated Excel +* **Human-in-the-loop review** — optional Power Apps Code App provides an analyst grid backed by Dataverse for correcting extracted values before downstream credit spreading +* **Multi-row column-header parsing** — handles statements where Q4 and FY columns share a parent header (e.g., Meta Income Statement) +* **>5 MB upload handling** — uses Copilot Studio `Question` node bound to `FilePrebuiltEntity` (direct `Activity.Attachments` access fails for files > 5 MB) + +### Repository layout (under `src/`) + +``` +src/ +├── azure-functions/ # Python backend (HTTP 202 async pipeline) +│ ├── function_app.py # HTTP router +│ └── extractor/ # 5-stage pipeline + clients (CU, DI, Textract, pdfplumber) +├── copilot-studio-agent/ # Copilot Studio YAML (agent, topics, actions, workflows) +├── code-app/ # Power Apps Code App — React HITL review grid (Dataverse) +└── docs/ + ├── architecture.png # Architecture diagram + └── custom-connector-swagger.yml # Swagger 2.0 spec for the custom connector +``` + + diff --git a/samples/mcs-finance-statement-agent/assets/architecture.png b/samples/mcs-finance-statement-agent/assets/architecture.png new file mode 100644 index 000000000..955265286 Binary files /dev/null and b/samples/mcs-finance-statement-agent/assets/architecture.png differ diff --git a/samples/mcs-finance-statement-agent/src/.github/workflows/deploy-function.yml b/samples/mcs-finance-statement-agent/src/.github/workflows/deploy-function.yml new file mode 100644 index 000000000..ea124169f --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/.github/workflows/deploy-function.yml @@ -0,0 +1,38 @@ +name: Deploy Azure Function + +on: + push: + branches: [main] + paths: + - 'azure-functions/**' + workflow_dispatch: + +env: + AZURE_FUNCTIONAPP_NAME: fin-stmt-extractor-v2 + AZURE_FUNCTIONAPP_PACKAGE_PATH: azure-functions + PYTHON_VERSION: '3.11' + +jobs: + deploy: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Setup Python + uses: actions/setup-python@v5 + with: + python-version: ${{ env.PYTHON_VERSION }} + + - name: Install dependencies + run: | + cd ${{ env.AZURE_FUNCTIONAPP_PACKAGE_PATH }} + pip install -r requirements.txt --target=".python_packages/lib/site-packages" + + - name: Deploy to Azure Functions + uses: Azure/functions-action@v1 + with: + app-name: ${{ env.AZURE_FUNCTIONAPP_NAME }} + package: ${{ env.AZURE_FUNCTIONAPP_PACKAGE_PATH }} + publish-profile: ${{ secrets.AZURE_FUNCTIONAPP_PUBLISH_PROFILE }} + scm-do-build-during-deployment: false + enable-oryx-build: false diff --git a/samples/mcs-finance-statement-agent/src/.gitignore b/samples/mcs-finance-statement-agent/src/.gitignore new file mode 100644 index 000000000..df617af87 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/.gitignore @@ -0,0 +1,50 @@ +# Node +node_modules/ +dist/ +.vite/ + +# Python +__pycache__/ +*.pyc +.venv/ +venv/ +.pytest_cache/ + +# Power Platform +.power/ +power.config.json +.mcs/ + +# Azure Functions +local.settings.json +.python_packages/ +bin/ +obj/ + +# IDE +.vscode/ +.idea/ +*.swp + +# OS +.DS_Store +Thumbs.db + +# Environment — never commit secrets +.env +.env.local + +# Build artifacts / pip install noise +=* + +# Sample / customer-specific artifacts — never commit +*.xlsx +*.pdf +*.pptx +*.docx +docs/samples/ +_* +_tmp_logos/ + +# MCP widget test scaffolding (cloned from microsoft/mcp-interactiveUI-samples) +mcp-widget-test/ diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/.env.example b/samples/mcs-finance-statement-agent/src/azure-functions/.env.example new file mode 100644 index 000000000..0c7b3f967 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/.env.example @@ -0,0 +1,24 @@ +# Copy this file to .env and fill in your values +# Do NOT commit .env to source control +# +# Authentication: All services use Managed Identity (DefaultAzureCredential). +# API keys are disabled by corp policy — no AZURE_*_KEY vars needed. +# Ensure the Function App's managed identity has the required roles: +# - Cognitive Services User on the Document Intelligence resource +# - Cognitive Services OpenAI User on the Azure OpenAI resource + +# ----- Azure Document Intelligence (table/PDF extraction) ----- +AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-docai-resource.cognitiveservices.azure.com/ + +# ----- Azure OpenAI (for LLM classification + enrichment) ----- +AZURE_OPENAI_ENDPOINT=https://your-aoai-resource.openai.azure.com +AZURE_OPENAI_DEPLOYMENT=gpt-4.1 +AZURE_OPENAI_API_VERSION=2024-12-01-preview + +# ----- Extraction Backend ----- +# EXTRACTION_BACKEND=document_intelligence # "document_intelligence" (default), "textract", or "pdfplumber" + +# ----- AWS Textract (only if EXTRACTION_BACKEND=textract) ----- +# AWS_REGION=us-east-1 +# AWS_S3_BUCKET= +# AWS_S3_PREFIX=textract-input diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/.funcignore b/samples/mcs-finance-statement-agent/src/azure-functions/.funcignore new file mode 100644 index 000000000..03995fe51 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/.funcignore @@ -0,0 +1,4 @@ +.venv +.env +tests/ +__pycache__/ \ No newline at end of file diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/analyzer_templates/financial_statement_extractor.json b/samples/mcs-finance-statement-agent/src/azure-functions/analyzer_templates/financial_statement_extractor.json new file mode 100644 index 000000000..ab56cf4d7 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/analyzer_templates/financial_statement_extractor.json @@ -0,0 +1,75 @@ +{ + "description": "Extracts structured rows from a financial statement with analytics-ready output.", + "baseAnalyzerId": "prebuilt-documentAnalyzer", + "config": { + "returnDetails": true, + "enableOcr": true, + "enableLayout": true, + "estimateFieldSourceAndConfidence": true + }, + "fieldSchema": { + "fields": { + "columns": { + "type": "array", + "method": "generate", + "description": "Extract the period column headers from the FIRST financial statement table found in the document. Ignore any reconciliation tables, supplemental tables, or non-GAAP tables that may appear on the same page. For each column return: label (English normalized, e.g. 'Q4 2025'), label_raw (original language), period_type (quarter/annual/year_to_date/half_year/nine_months/instant/other), fiscal_year (integer), is_comparative (true for prior period).", + "items": { + "type": "object", + "properties": { + "label": {"type": "string", "description": "English normalized period label"}, + "label_raw": {"type": "string", "description": "Period label in original language"}, + "period_type": {"type": "string", "description": "quarter/annual/year_to_date/half_year/nine_months/instant/other"}, + "fiscal_year": {"type": "integer"}, + "is_comparative": {"type": "boolean"} + } + } + }, + "rows": { + "type": "array", + "method": "generate", + "description": "Extract EVERY INDIVIDUAL row from the FIRST financial statement table found in the document. Do NOT summarize or combine rows. If the table shows 'Cost of revenue', 'Research and development', 'Marketing and sales', 'General and administrative' as separate line items, extract EACH ONE as its own row. Do NOT collapse them into a single 'Costs and expenses' row. Ignore any GAAP-to-non-GAAP reconciliation tables, supplemental tables, segment tables, or free cash flow calculations that may appear on the same or adjacent pages. CRITICAL RULES: (1) For 'values', return a JSON array string with one entry per column. Each entry is the numeric value (negative for parenthesised amounts like (26,248) becomes -26248), or null for blank cells. Example: '[59893, 48385, 200966, 164501]' or '[null, 94, null, 383]'. (2) For non-English labels, canonical_key MUST be the English IFRS/GAAP equivalent in snake_case (e.g. 货币资金->cash_and_cash_equivalents). (3) section must be one of: current_assets, non_current_assets, assets, current_liabilities, non_current_liabilities, liabilities, equity, revenue, operating_expenses, non_operating, tax, eps, shares, operating_activities, investing_activities, financing_activities, cash_reconciliation, supplemental_disclosures, other.", + "items": { + "type": "object", + "properties": { + "label_raw": { + "type": "string", + "description": "Row label exactly as in the document (original language)" + }, + "label_normalized": { + "type": "string", + "description": "English IFRS/GAAP equivalent label" + }, + "label_language": { + "type": "string", + "description": "ISO 639-1: en, zh, ja, etc." + }, + "canonical_key": { + "type": "string", + "description": "English snake_case key (e.g. revenue, total_assets, net_income)" + }, + "row_type": { + "type": "string", + "description": "section_header, line_item, subtotal, or total" + }, + "indent_level": { + "type": "integer", + "description": "0=top level, 1=within section, 2=sub-item" + }, + "section": { + "type": "string", + "description": "Section identifier (e.g. current_assets, operating_activities)" + }, + "values": { + "type": "string", + "description": "JSON array of numeric values, one per column. Use null for blank cells. Example: '[59893, 48385, 200966, 164501]' or '[null, null, -26248, -30125]'. Parenthesised amounts become negative." + }, + "values_raw": { + "type": "string", + "description": "JSON array of display strings, one per column. Example: '[\"59,893\", \"48,385\", \"200,966\", \"164,501\"]' or '[null, null, \"(26,248)\", \"(30,125)\"]'" + } + } + } + } + } + } +} diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/analyzer_templates/financial_statement_locator.json b/samples/mcs-finance-statement-agent/src/azure-functions/analyzer_templates/financial_statement_locator.json new file mode 100644 index 000000000..ec25de88e --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/analyzer_templates/financial_statement_locator.json @@ -0,0 +1,69 @@ +{ + "description": "Identifies financial statements (Balance Sheet, Income Statement, Cash Flow) in a document. Works for any language, any accounting standard.", + "baseAnalyzerId": "prebuilt-documentAnalyzer", + "config": { + "returnDetails": true + }, + "fieldSchema": { + "fields": { + "statements": { + "type": "array", + "method": "generate", + "description": "Identify ALL financial statements in this document. Look for: balance_sheet (Statement of Financial Position / Balance Sheet / 资产负债表 / 貸借対照表), income_statement (Income Statement / P&L / Statement of Operations / 利润表 / 損益計算書), cash_flow (Cash Flow Statement / 现金流量表 / キャッシュ・フロー計算書). IMPORTANT: prefer the CONSOLIDATED version over parent/standalone. Each statement must be followed by actual numeric financial data (not just a heading in a table of contents). For each statement found, return all of the following fields.", + "items": { + "type": "object", + "properties": { + "statement_type": { + "type": "string", + "description": "One of: balance_sheet, income_statement, cash_flow" + }, + "title_raw": { + "type": "string", + "description": "Statement heading exactly as it appears in the document (original language)" + }, + "title_english": { + "type": "string", + "description": "English translation of the statement title (e.g. 'Consolidated Balance Sheet')" + }, + "page_start": { + "type": "integer", + "description": "First page number where this statement's data begins" + }, + "page_end": { + "type": "integer", + "description": "Last page number where this statement's data ends" + }, + "company_name": { + "type": "string", + "description": "Normalized English company name" + }, + "company_name_raw": { + "type": "string", + "description": "Company name as it appears in the document (original language)" + }, + "report_language": { + "type": "string", + "description": "ISO 639-1 language code (e.g. 'en', 'zh', 'ja')" + }, + "currency": { + "type": "string", + "description": "ISO 4217 currency code (e.g. 'USD', 'CNY', 'JPY'). Normalize RMB to CNY." + }, + "unit": { + "type": "string", + "description": "Unit of measurement: ones, thousands, millions, billions. Chinese 元=ones, 万元=ten_thousands, 千元=thousands." + }, + "is_consolidated": { + "type": "boolean", + "description": "true if this is a consolidated statement (合并/Consolidated), false if parent/standalone" + }, + "accounting_standard": { + "type": "string", + "description": "One of: US_GAAP, IFRS, Chinese_ASBE, Japanese_GAAP, other" + } + } + } + } + } + } +} diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/azure_cu_client.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/azure_cu_client.py new file mode 100644 index 000000000..0a7837aa8 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/azure_cu_client.py @@ -0,0 +1,106 @@ +""" +extractor/azure_cu_client.py +---------------------------- +Thin wrapper around the Azure Content Understanding (prebuilt-read) REST API. + +Responsibilities: + - POST a PDF file to the Azure CU endpoint + - Handle both async (202 + Operation-Location) and sync (200 + body) modes + - Poll the operation URL until analysis status is 'succeeded' + +Configuration (via .env): + AZURE_CU_ENDPOINT — Full analyze URL including api-version query param + +Authentication: Managed Identity (DefaultAzureCredential). + Requires Cognitive Services User role on the resource. +""" + +import os +import time + +import requests +from azure.identity import DefaultAzureCredential +from dotenv import load_dotenv + +load_dotenv(override=False) + +ENDPOINT: str = os.environ["AZURE_CU_ENDPOINT"] + +# Always use managed identity — no API keys +_credential = DefaultAzureCredential() + + +def _get_auth_headers() -> dict: + """Return auth headers using managed identity token.""" + token = _credential.get_token("https://cognitiveservices.azure.com/.default") + return {"Authorization": f"Bearer {token.token}"} + +_POLL_INTERVAL_SECONDS = 3 +_MAX_POLL_ATTEMPTS = 60 + + +def submit_document(file_path: str) -> tuple[str | None, dict | None]: + """ + POST the PDF to Azure Content Understanding. + + Returns a (operation_location, result) tuple: + - Async mode: (url_string, None) — caller must poll the URL. + - Sync mode: (None, result_dict) — result is already available. + """ + with open(file_path, "rb") as f: + data = f.read() + + headers = {**_get_auth_headers(), "Content-Type": "application/pdf"} + + response = requests.post(ENDPOINT, headers=headers, data=data, timeout=60) + + response.raise_for_status() + + operation_location = response.headers.get("operation-location") + if operation_location: + # Async mode — caller must poll + return operation_location, None + + if not response.text.strip(): + raise ValueError( + f"Azure returned HTTP {response.status_code} with an empty body " + "and no operation-location header. Check your AZURE_CU_ENDPOINT " + "and that the managed identity has Cognitive Services User role." + ) + + # Synchronous mode — result is in the response body directly + return None, response.json() + + +def poll_result(operation_location: str) -> dict: + """ + Poll the operation URL until status is 'succeeded'. + Returns the full result dict. + Raises RuntimeError on failure, TimeoutError if max attempts exceeded. + """ + headers = _get_auth_headers() + + for attempt in range(1, _MAX_POLL_ATTEMPTS + 1): + response = requests.get(operation_location, headers=headers, timeout=30) + response.raise_for_status() + result = response.json() + + status = result.get("status", "unknown") + print(f" [{attempt}/{_MAX_POLL_ATTEMPTS}] status: {status}") + + if status.lower() == "succeeded": + return result + + if status.lower() == "failed": + error = result.get("error", {}) + raise RuntimeError( + f"Azure Content Understanding analysis failed. " + f"Code: {error.get('code')} | Message: {error.get('message')}" + ) + + time.sleep(_POLL_INTERVAL_SECONDS) + + raise TimeoutError( + f"Analysis did not complete after {_MAX_POLL_ATTEMPTS} polling attempts " + f"({_MAX_POLL_ATTEMPTS * _POLL_INTERVAL_SECONDS}s)." + ) diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/card_builder.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/card_builder.py new file mode 100644 index 000000000..c8117e73c --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/card_builder.py @@ -0,0 +1,672 @@ +"""Adaptive Card JSON builder for financial statement HITL review.""" + +import json +import re +from typing import Any + +STATEMENT_DISPLAY_NAMES = { + "balance_sheet": "Balance Sheet", + "income_statement": "Income Statement", + "cash_flow": "Cash Flow", +} + +SECTION_CHOICES = [ + {"title": "Current Assets", "value": "current_assets"}, + {"title": "Non-Current Assets", "value": "non_current_assets"}, + {"title": "Current Liabilities", "value": "current_liabilities"}, + {"title": "Non-Current Liabilities", "value": "non_current_liabilities"}, + {"title": "Equity", "value": "equity"}, + {"title": "Revenue", "value": "revenue"}, + {"title": "Operating Expenses", "value": "operating_expenses"}, + {"title": "Tax", "value": "tax"}, + {"title": "Other", "value": "other"}, +] + +ROW_TYPE_CHOICES = [ + {"title": "Line Item", "value": "line_item"}, + {"title": "Section Header", "value": "section_header"}, + {"title": "Subtotal", "value": "subtotal"}, + {"title": "Total", "value": "total"}, +] + +EDIT_ALL_PAGE_SIZE = 20 + + +# ─── Navigator Card ───────────────────────────────────────────────────────── + + +REVIEW_GRID_URL = os.environ.get( + "REVIEW_GRID_URL", + "https://apps.powerapps.com/play/e/Default-269a92bb-9fea-4197-94ce-f1993452a98a" + "/a/33824507-997c-4c99-9c70-b498395ef0fe" + "?tenantId=269a92bb-9fea-4197-94ce-f1993452a98a", +) + + +def build_navigator_card( + company_name: str, + currency: str, + unit: str, + confidence: dict, + summary: list, + job_id: str = "", +) -> dict: + """Build the entry-point Navigator card showing all 3 statements.""" + + # Build summary table rows + table_rows = [{ + "type": "TableRow", + "style": "accent", + "cells": [ + {"type": "TableCell", "items": [ + {"type": "TextBlock", "text": "Statement", "weight": "Bolder", "size": "Small"}, + ]}, + {"type": "TableCell", "items": [ + {"type": "TextBlock", "text": "Confidence", "weight": "Bolder", "size": "Small"}, + ]}, + {"type": "TableCell", "items": [ + {"type": "TextBlock", "text": "Rows", "weight": "Bolder", "size": "Small", + "horizontalAlignment": "Right"}, + ]}, + ], + }] + + summary_map = {s.get("statement_type", s.get("statement", "")): s.get("row_count", s.get("rows", 0)) for s in summary} + + for stmt_key in ("balance_sheet", "income_statement", "cash_flow"): + display_name = STATEMENT_DISPLAY_NAMES[stmt_key] + conf_entry = confidence.get(stmt_key) + row_count = summary_map.get(stmt_key, 0) + + if conf_entry is None: + level_text = "\u2014 Not found" + color = "Default" + else: + level = conf_entry.get("level", "medium") + flagged = conf_entry.get("flagged_rows", []) + if level == "high": + level_text = "\u2713 High" + color = "Good" + elif level == "medium": + level_text = f"\u26a0 Medium ({len(flagged)} issues)" + color = "Warning" + else: + level_text = f"\u26a0 Low ({len(flagged)} issues)" + color = "Attention" + + table_rows.append({ + "type": "TableRow", + "cells": [ + {"type": "TableCell", "items": [ + {"type": "TextBlock", "text": display_name, "weight": "Bolder"}, + ]}, + {"type": "TableCell", "items": [ + {"type": "TextBlock", "text": level_text, "color": color}, + ]}, + {"type": "TableCell", "items": [ + {"type": "TextBlock", "text": str(row_count), + "horizontalAlignment": "Right"}, + ]}, + ], + }) + + body = [ + {"type": "TextBlock", "text": "Extraction Complete", "size": "Large", "weight": "Bolder"}, + {"type": "TextBlock", "text": f"{company_name}", "size": "Medium", "isSubtle": True}, + {"type": "TextBlock", "text": f"Currency: {currency} \u2502 Unit: {unit}", + "isSubtle": True, "spacing": "Small"}, + { + "type": "Table", + "gridStyle": "accent", + "firstRowAsHeader": True, + "showGridLines": True, + "columns": [{"width": 2}, {"width": 2}, {"width": 1}], + "rows": table_rows, + }, + ] + + actions = [ + {"type": "Action.Submit", "title": "Generate Excel", "data": {"action": "skip_review"}}, + ] + + return { + "$schema": "http://adaptivecards.io/schemas/adaptive-card.json", + "type": "AdaptiveCard", + "version": "1.5", + "body": body, + "actions": actions, + } + + +# ─── Statement Review Card ────────────────────────────────────────────────── + + +def _flag_reason(row: dict, validation: dict) -> str | None: + """Determine the flag reason for a row.""" + reasons = [] + if not row.get("label_raw"): + reasons.append("Missing label") + # Check validation warnings for subtotal mismatch + for w in validation.get("warnings", []): + if isinstance(w, dict) and w.get("row_index") == row["row_index"] and "subtotal" in str(w).lower(): + reasons.append("Subtotal mismatch") + break + if row.get("section") == "other": + reasons.append("Unclassified section") + for val in row.get("values", []): + if val.get("normalized") is None and val.get("raw"): + reasons.append("Value parse error") + break + label = (row.get("label_raw") or "").lower() + if "total" in label and row.get("row_type") == "line_item": + reasons.append("Possible type mismatch") + return "; ".join(reasons) if reasons else None + + +def _get_row_value(row: dict, field: str, corrections: dict) -> str: + """Get the value for a row field, applying corrections if present.""" + row_key = f"row_{row['row_index']}" + corr = corrections.get(row_key, {}) + if field == "label": + return corr.get("label", row.get("label_normalized") or row.get("label_raw", "")) + elif field == "section": + return corr.get("section", row.get("section", "other")) + elif field == "row_type": + return corr.get("row_type", row.get("row_type", "line_item")) + elif field.startswith("val_"): + col_idx = int(field.split("_")[1]) + if field in corr: + return corr[field] + values = row.get("values", []) + if col_idx < len(values): + return values[col_idx].get("raw", "") or "" + return "" + return "" + + +# ─── Table-based row builders ──────────────────────────────────────────────── + + +def _build_table_header_row(columns: list) -> dict: + """Build the header row for the financial data table.""" + cells = [ + {"type": "TableCell", "items": [ + {"type": "TextBlock", "text": "Line Item", "weight": "Bolder", "size": "Small"}, + ]}, + ] + for col in columns: + cells.append({"type": "TableCell", "items": [ + {"type": "TextBlock", "text": col["label"], "weight": "Bolder", + "size": "Small", "horizontalAlignment": "Right"}, + ]}) + return {"type": "TableRow", "style": "accent", "cells": cells} + + +def _build_readonly_table_row(row: dict, columns: list) -> dict: + """Build a read-only table row with proper visual hierarchy.""" + row_type = row.get("row_type", "line_item") + is_emphasis = row_type in ("section_header", "subtotal", "total") + weight = "Bolder" if is_emphasis else "Default" + + label = row.get("label_normalized") or row.get("label_raw", "") or "" + + cells = [ + {"type": "TableCell", "items": [ + {"type": "TextBlock", "text": label, "weight": weight, "wrap": True}, + ]}, + ] + for col in columns: + ci = col["column_index"] + values = row.get("values", []) + raw = values[ci].get("raw", "") if ci < len(values) else "" + cells.append({"type": "TableCell", "items": [ + {"type": "TextBlock", "text": raw or "\u2014", "weight": weight, + "horizontalAlignment": "Right"}, + ]}) + + table_row: dict[str, Any] = {"type": "TableRow", "cells": cells} + if row_type in ("subtotal", "total"): + table_row["style"] = "accent" + return table_row + + +def _build_editable_table_row( + row: dict, columns: list, corrections: dict, flag_text: str | None, +) -> dict: + """Build an editable table row with inline input fields. + + The label cell contains: flag badge, label input, and section/type dropdowns. + Value cells contain text inputs aligned with the table columns. + """ + idx = row["row_index"] + + # ── Label cell: flag + input + section/type ── + label_items: list[dict[str, Any]] = [] + if flag_text: + label_items.append({ + "type": "TextBlock", "text": f"\u26a0 {flag_text}", + "size": "Small", "color": "Warning", "wrap": True, + }) + label_items.append({ + "type": "Input.Text", + "id": f"row_{idx}_label", + "value": _get_row_value(row, "label", corrections), + "placeholder": "Label", + }) + label_items.append({ + "type": "ColumnSet", + "columns": [ + { + "type": "Column", "width": "stretch", + "items": [{ + "type": "Input.ChoiceSet", + "id": f"row_{idx}_section", + "value": _get_row_value(row, "section", corrections), + "style": "compact", + "choices": SECTION_CHOICES, + }], + }, + { + "type": "Column", "width": "stretch", + "items": [{ + "type": "Input.ChoiceSet", + "id": f"row_{idx}_row_type", + "value": _get_row_value(row, "row_type", corrections), + "style": "compact", + "choices": ROW_TYPE_CHOICES, + }], + }, + ], + }) + + cells: list[dict[str, Any]] = [{"type": "TableCell", "items": label_items}] + + # ── Value cells ── + for col in columns: + ci = col["column_index"] + cells.append({"type": "TableCell", "items": [{ + "type": "Input.Text", + "id": f"row_{idx}_val_{ci}", + "value": _get_row_value(row, f"val_{ci}", corrections), + }]}) + + table_row: dict[str, Any] = {"type": "TableRow", "cells": cells} + if flag_text: + table_row["style"] = "warning" + return table_row + + +def build_statement_review_card( + statement_type: str, + statement_json: dict, + confidence_entry: dict, + corrections: dict, + step_num: int, + total_steps: int, + editable: bool = True, + edit_all: bool = False, + edit_all_page: int = 0, +) -> dict: + """Build a review card for a single statement. + + All rendering modes use a single Table element for consistent column + alignment, grid lines, and proper header row. + + Rendering modes: + - Standard (editable=True, edit_all=False): all rows in original order, + flagged rows as editable table rows, clean rows as read-only table rows. + - Edit All (edit_all=True): paginated, ALL rows as editable table rows. + - Read-only (editable=False): all rows as read-only table rows. + """ + + display_name = STATEMENT_DISPLAY_NAMES.get(statement_type, statement_type) + rows = statement_json.get("rows", []) + columns = statement_json.get("columns", []) + validation = statement_json.get("validation", {}) + metadata = statement_json.get("statement_metadata", {}) + + level = confidence_entry.get("level", "medium") + flagged_indices = set(confidence_entry.get("flagged_rows", [])) + + # Page info + page_range = metadata.get("page_range", {}) + page_start = page_range.get("start", "?") + page_end = page_range.get("end", "?") + page_text = f"Pages {page_start}-{page_end}" if page_start != page_end else f"Page {page_start}" + + level_display = {"high": "High", "medium": "Medium", "low": "Low"}.get(level, level.title()) + confidence_icon = "\u2713" if level == "high" else "\u26a0" + + body: list[dict[str, Any]] = [ + {"type": "TextBlock", "text": f"{display_name} Review", "size": "Large", "weight": "Bolder"}, + { + "type": "ColumnSet", + "columns": [ + { + "type": "Column", + "items": [{"type": "TextBlock", "text": f"{confidence_icon} {level_display} Confidence"}], + }, + { + "type": "Column", + "items": [ + { + "type": "TextBlock", + "text": f"{page_text} \u2502 {len(rows)} rows \u2502 {len(flagged_indices)} flagged", + } + ], + }, + ], + }, + ] + + # Handle empty statement + if len(rows) == 0: + body.append({"type": "TextBlock", "text": "No data extracted", "isSubtle": True}) + actions = _build_navigation_actions(step_num, total_steps, editable) + return _wrap_card(body, actions) + + total_pages = max(1, -(-len(rows) // EDIT_ALL_PAGE_SIZE)) # ceil division + + # ── Build the data table ── + col_defs = [{"width": 3}] + [{"width": 1} for _ in columns] + table_rows = [_build_table_header_row(columns)] + + if editable and edit_all: + # ── Edit All mode: paginated, every row editable ── + edit_all_page = min(edit_all_page, total_pages - 1) + page_start_idx = edit_all_page * EDIT_ALL_PAGE_SIZE + page_end_idx = min(page_start_idx + EDIT_ALL_PAGE_SIZE, len(rows)) + page_rows = rows[page_start_idx:page_end_idx] + + body.append({ + "type": "TextBlock", + "text": f"Editing rows {page_start_idx + 1}\u2013{page_end_idx} of {len(rows)}", + "isSubtle": True, + }) + for row in page_rows: + flag_text = None + if row["row_index"] in flagged_indices: + flag_text = _flag_reason(row, validation) or "Review needed" + table_rows.append( + _build_editable_table_row(row, columns, corrections, flag_text) + ) + + elif editable: + # ── Standard mode: flagged rows editable, clean rows read-only ── + for row in rows: + if row["row_index"] in flagged_indices: + reason = _flag_reason(row, validation) or "Review needed" + table_rows.append( + _build_editable_table_row(row, columns, corrections, reason) + ) + else: + table_rows.append(_build_readonly_table_row(row, columns)) + + else: + # ── Read-only mode ── + for row in rows: + table_rows.append(_build_readonly_table_row(row, columns)) + + body.append({ + "type": "Table", + "gridStyle": "accent", + "firstRowAsHeader": True, + "showGridLines": True, + "columns": col_defs, + "rows": table_rows, + }) + + actions = _build_navigation_actions( + step_num, total_steps, editable, + edit_all=edit_all, edit_all_page=edit_all_page, total_pages=total_pages, + ) + return _wrap_card(body, actions) + + +def _build_navigation_actions( + step_num: int, + total_steps: int, + editable: bool, + *, + edit_all: bool = False, + edit_all_page: int = 0, + total_pages: int = 1, +) -> list: + actions = [] + + # Edit All pagination + if edit_all: + if edit_all_page > 0: + actions.append({ + "type": "Action.Submit", "title": "\u2190 Page", + "data": {"action": "edit_all_page_prev"}, + }) + if edit_all_page < total_pages - 1: + actions.append({ + "type": "Action.Submit", "title": "Page \u2192", + "data": {"action": "edit_all_page_next"}, + }) + + # Statement navigation + if step_num > 1: + actions.append({"type": "Action.Submit", "title": "\u2190 Previous", "data": {"action": "previous"}}) + if step_num < total_steps: + actions.append({"type": "Action.Submit", "title": "Next \u2192", "data": {"action": "next"}}) + else: + actions.append({ + "type": "Action.Submit", + "title": "Submit & Generate Excel", + "data": {"action": "submit"}, + }) + + # Mode switches + if editable and not edit_all: + actions.append({"type": "Action.Submit", "title": "Edit All", "data": {"action": "edit_all"}}) + if not editable: + actions.append({"type": "Action.Submit", "title": "Edit Anyway", "data": {"action": "edit_anyway"}}) + return actions + + +def _wrap_card(body: list, actions: list) -> dict: + return { + "$schema": "http://adaptivecards.io/schemas/adaptive-card.json", + "type": "AdaptiveCard", + "version": "1.5", + "body": body, + "actions": actions, + } + + +# ─── Parse Card Submission ─────────────────────────────────────────────────── + +_ROW_FIELD_RE = re.compile(r"^row_(\d+)_(.+)$") + + +def parse_card_submission(payload: dict, statement_json: dict) -> tuple[str, dict]: + """Parse flat Adaptive Card submission into (action, corrections) tuple.""" + + action = payload.get("action", "") + rows = statement_json.get("rows", []) + + # Index rows by row_index for quick lookup + row_map = {r["row_index"]: r for r in rows} + + corrections: dict[str, dict] = {} + + for key, submitted_value in payload.items(): + if key == "action": + continue + m = _ROW_FIELD_RE.match(key) + if not m: + continue + row_idx = int(m.group(1)) + field = m.group(2) + row = row_map.get(row_idx) + if row is None: + continue + + original = _get_original_value(row, field) + if str(submitted_value) != str(original): + row_key = f"row_{row_idx}" + if row_key not in corrections: + corrections[row_key] = {} + corrections[row_key][field] = submitted_value + + return action, corrections + + +def _get_original_value(row: dict, field: str) -> str: + """Get the original value from a row for comparison.""" + if field == "label": + return row.get("label_raw", "") + elif field == "section": + return row.get("section", "other") + elif field == "row_type": + return row.get("row_type", "line_item") + elif field.startswith("val_"): + col_idx = int(field.split("_")[1]) + values = row.get("values", []) + if col_idx < len(values): + return values[col_idx].get("raw", "") or "" + return "" + return "" + + +# ─── Session State Management ──────────────────────────────────────────────── + +STATEMENT_ORDER = ["balance_sheet", "income_statement", "cash_flow"] + + +def init_session_state(confidence: dict, job_id: str = "", + available_statements: list[str] | None = None) -> str: + """Create the initial session state from confidence data. + + Determines which statements actually exist and builds an ordered list. + A statement is included only if it appears in *available_statements* + (when provided) or its confidence level is not 'not_found'. + Returns a JSON string that the topic stores opaquely. + """ + if available_statements is not None: + statements = [ + stype for stype in STATEMENT_ORDER + if stype in available_statements + ] + else: + statements = [ + stype for stype in STATEMENT_ORDER + if confidence.get(stype, {}).get("level") != "not_found" + and confidence.get(stype) is not None + ] + state = { + "jobId": job_id, + "phase": "navigator", + "step": 0, + "statements": statements, + "corrections": {}, + "editable": False, + "editAll": False, + "editAllPage": 0, + } + return json.dumps(state, ensure_ascii=False) + + +def advance_session_state( + session_state_str: str, + raw_payload: dict, + statement_json: dict | None = None, +) -> tuple[str, str]: + """Advance the session state machine based on a card submit payload. + + Args: + session_state_str: Current session state JSON string. + raw_payload: The flat key-value payload from the Adaptive Card submit. + statement_json: The v1.2 statement dict for the current step + (needed to diff corrections). None for navigator actions. + + Returns: + (topic_action, updated_session_state_str) where topic_action is one of: + - "continue": show another card (loop back) + - "done": apply corrections and export + - "skip": export without corrections + """ + state = json.loads(session_state_str) + if not isinstance(raw_payload, dict): + raw_payload = {} + raw_action = raw_payload.get("action", "") + phase = state.get("phase", "navigator") + step = state.get("step", 0) + statements = state.get("statements", []) + total_steps = len(statements) + + # ── Navigator phase ── + if phase == "navigator": + if raw_action == "skip_review": + return "skip", json.dumps(state, ensure_ascii=False) + # start_review → advance to step 1 + state["phase"] = "review" + state["step"] = 1 + state["editable"] = False + state["editAll"] = False + state["editAllPage"] = 0 + return "continue", json.dumps(state, ensure_ascii=False) + + # ── Review phase ── + # First, extract corrections from the card payload for the current statement + current_stype = statements[step - 1] if 0 < step <= total_steps else None + + if statement_json is not None and current_stype: + _action, new_corrections = parse_card_submission(raw_payload, statement_json) + # Merge new corrections into accumulated corrections for this statement + if new_corrections: + if current_stype not in state["corrections"]: + state["corrections"][current_stype] = {} + for row_key, fields in new_corrections.items(): + if row_key not in state["corrections"][current_stype]: + state["corrections"][current_stype][row_key] = {} + state["corrections"][current_stype][row_key].update(fields) + + # Now handle the action + if raw_action == "next": + if step >= total_steps: + # Past the last statement → done + state["phase"] = "export" + return "done", json.dumps(state, ensure_ascii=False) + state["step"] = step + 1 + state["editable"] = False + state["editAll"] = False + state["editAllPage"] = 0 + return "continue", json.dumps(state, ensure_ascii=False) + + if raw_action == "previous": + state["step"] = max(1, step - 1) + state["editable"] = False + state["editAll"] = False + state["editAllPage"] = 0 + return "continue", json.dumps(state, ensure_ascii=False) + + if raw_action == "submit": + state["phase"] = "export" + return "done", json.dumps(state, ensure_ascii=False) + + if raw_action in ("edit", "edit_anyway"): + state["editable"] = True + state["editAll"] = False + state["editAllPage"] = 0 + return "continue", json.dumps(state, ensure_ascii=False) + + if raw_action == "edit_all": + state["editable"] = True + state["editAll"] = True + state["editAllPage"] = 0 + return "continue", json.dumps(state, ensure_ascii=False) + + if raw_action == "edit_all_page_next": + state["editAllPage"] = state.get("editAllPage", 0) + 1 + return "continue", json.dumps(state, ensure_ascii=False) + + if raw_action == "edit_all_page_prev": + state["editAllPage"] = max(0, state.get("editAllPage", 0) - 1) + return "continue", json.dumps(state, ensure_ascii=False) + + # Unknown action — treat as continue (safe fallback) + return "continue", json.dumps(state, ensure_ascii=False) diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/confidence_scorer.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/confidence_scorer.py new file mode 100644 index 000000000..4ac2dcf93 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/confidence_scorer.py @@ -0,0 +1,316 @@ +""" +extractor/confidence_scorer.py +-------------------------------- +Confidence scoring module for extracted financial statements. + +Computes a composite confidence score (0.0-1.0) from 6 signals: + 1. subtotal_validation — ratio of subtotals that pass cross-validation + 2. section_coverage — ratio of expected section groups present + 3. row_count — 1.0 if row count is in expected range + 4. column_dates — ratio of columns whose label contains a 20xx year + 5. empty_label_ratio — 1.0 if blank labels <= 5% of rows + 6. leaked_headers — 1.0 if no leaked header rows detected + +Public API: + score_statement(stmt, statement_type) -> dict + flag_rows(stmt) -> set[int] + +Individual signal functions are importable for testing (prefixed with _score_). +""" + +import re +from typing import Dict, List, Set + +# --------------------------------------------------------------------------- +# Module-level constants +# --------------------------------------------------------------------------- + +# Signal weights (must sum to 1.0) +WEIGHTS: Dict[str, float] = { + "subtotal_validation": 0.20, + "section_coverage": 0.20, + "row_count": 0.15, + "column_dates": 0.15, + "empty_label_ratio": 0.15, + "leaked_headers": 0.15, +} + +# Confidence level thresholds +THRESHOLDS: Dict[str, float] = { + "high": 0.85, + "medium": 0.60, +} + +# Expected row count ranges per statement type +ROW_RANGES: Dict[str, tuple] = { + "balance_sheet": (15, 120), + "income_statement": (10, 80), + "cash_flow": (15, 80), +} + +# Expected section groups per statement type. +# Each entry is a list of groups; a group is a set of alternative section names +# (the group is satisfied if ANY section in the set appears in the rows). +SECTION_GROUPS: Dict[str, List[Set[str]]] = { + "balance_sheet": [ + {"assets", "current_assets", "non_current_assets"}, + {"liabilities", "current_liabilities", "non_current_liabilities"}, + {"equity"}, + ], + "income_statement": [ + {"revenue"}, + {"operating_expenses", "expenses"}, + ], + "cash_flow": [ + {"operating_activities"}, + {"investing_activities"}, + {"financing_activities"}, + ], +} + +# Regex to detect a 4-digit year in the 2000s range within a column label +_YEAR_RE = re.compile(r"\b20\d{2}\b") + +# Regex to detect year-like or unit-string values (leaked headers) +# Matches: 4-digit year (20xx), "USD millions", "in thousands", "(USD)", etc. +_LEAKED_VALUE_RE = re.compile( + r"^(\d{4}|[A-Za-z$€£¥].*(?:thousand|million|billion|USD|EUR|GBP|JPY|AUD|CAD).*)$", + re.IGNORECASE, +) + + +# --------------------------------------------------------------------------- +# Signal functions +# --------------------------------------------------------------------------- + +def _score_subtotal_validation(stmt: dict) -> float: + """ + Ratio of subtotals that pass cross-validation. + + Warnings come from stmt["validation"]["warnings"] list. + Each warning has a "row_index" key. Count subtotals (rows where + row_type == "subtotal"), count how many have a matching row_index + in warnings. Score = (total - failed) / total. + If no subtotals, return 1.0. + """ + rows = stmt.get("rows", []) + warnings = stmt.get("validation", {}).get("warnings", []) + + subtotal_indices = { + row["row_index"] + for row in rows + if row.get("row_type") == "subtotal" + } + + if not subtotal_indices: + return 1.0 + + warned_indices = {w["row_index"] for w in warnings} + failed = subtotal_indices & warned_indices + return (len(subtotal_indices) - len(failed)) / len(subtotal_indices) + + +def _score_section_coverage(stmt: dict, statement_type: str) -> float: + """ + Check if expected section groups are present. + + Score = groups_matched / total_groups. + A group matches if ANY section in the group set is present in the rows. + """ + groups = SECTION_GROUPS.get(statement_type, []) + if not groups: + return 1.0 + + rows = stmt.get("rows", []) + present_sections = {row.get("section", "") for row in rows} + + matched = sum( + 1 for group in groups if group & present_sections + ) + return matched / len(groups) + + +def _score_row_count(stmt: dict, statement_type: str) -> float: + """ + 1.0 if row count is within the expected range, 0.0 otherwise. + """ + rows = stmt.get("rows", []) + count = len(rows) + lo, hi = ROW_RANGES.get(statement_type, (0, float("inf"))) + return 1.0 if lo <= count <= hi else 0.0 + + +def _score_column_dates(stmt: dict) -> float: + """ + Ratio of columns whose label contains a 4-digit year (20xx). + Returns 0.0 if there are no columns. + """ + columns = stmt.get("columns", []) + if not columns: + return 0.0 + + with_year = sum( + 1 for col in columns if _YEAR_RE.search(col.get("label", "")) + ) + return with_year / len(columns) + + +def _score_empty_label_ratio(stmt: dict) -> float: + """ + 1.0 if blank labels (empty or whitespace-only) are <= 5% of rows, + 0.0 otherwise. + Returns 1.0 if there are no rows. + """ + rows = stmt.get("rows", []) + if not rows: + return 1.0 + + blank_count = sum( + 1 for row in rows if not row.get("label_raw", "").strip() + ) + ratio = blank_count / len(rows) + return 1.0 if ratio <= 0.05 else 0.0 + + +def _score_leaked_headers(stmt: dict) -> float: + """ + 1.0 if no rows have a blank label where all values look like + years or unit-strings (leaked header rows), 0.0 if any are found. + """ + rows = stmt.get("rows", []) + + for row in rows: + label = row.get("label_raw", "") + if label.strip(): + # Non-blank label — cannot be a leaked header + continue + + values = row.get("values", []) + if not values: + continue + + # Check if ALL values look like years or unit strings + all_look_like_headers = all( + _LEAKED_VALUE_RE.match((v.get("raw") or "").strip()) + for v in values + if (v.get("raw") or "").strip() + ) + + # Only flag if there's at least one non-empty value that looks like a header + non_empty_values = [v for v in values if (v.get("raw") or "").strip()] + if non_empty_values and all_look_like_headers: + return 0.0 + + return 1.0 + + +# --------------------------------------------------------------------------- +# Row flagging +# --------------------------------------------------------------------------- + +def flag_rows(stmt: dict) -> Set[int]: + """ + Return a set of row_index values for rows needing review. + + A row is flagged if any of the following apply: + - label_raw is blank (empty or whitespace-only) + - row_index appears in validation warnings + - section is "other" + - Any value has normalized=None but raw is non-empty (parse failure) + - Label contains "total" (case insensitive) but row_type is "line_item" + """ + rows = stmt.get("rows", []) + warnings = stmt.get("validation", {}).get("warnings", []) + warned_indices = {w["row_index"] for w in warnings} + + flagged: Set[int] = set() + + for row in rows: + idx = row["row_index"] + + # Blank label + if not row.get("label_raw", "").strip(): + flagged.add(idx) + continue + + # In validation warnings + if idx in warned_indices: + flagged.add(idx) + continue + + # Section is "other" AND it's a line_item with a value (not section headers) + # Don't flag entire OCI sections just because they're classified as "other" + if row.get("section") == "other" and row.get("row_type") == "line_item": + vals = row.get("values", []) + has_value = any(v.get("normalized") is not None for v in vals) + if not has_value: + flagged.add(idx) + continue + + # Parse failure: normalized is None but raw is non-empty + for val in row.get("values", []): + if val.get("normalized") is None and (val.get("raw") or "").strip() and not val.get("is_null", False): + flagged.add(idx) + break + + if idx in flagged: + continue + + # Label contains "total" but row_type is "line_item" + label = row.get("label_raw", "") + if "total" in label.lower() and row.get("row_type") == "line_item": + flagged.add(idx) + + return flagged + + +# --------------------------------------------------------------------------- +# Composite scorer +# --------------------------------------------------------------------------- + +def _classify_level(score: float) -> str: + """Map a 0.0-1.0 score to a confidence level string.""" + if score >= THRESHOLDS["high"]: + return "high" + if score >= THRESHOLDS["medium"]: + return "medium" + return "low" + + +def score_statement(stmt: dict, statement_type: str) -> dict: + """ + Compute a composite confidence score for an extracted statement. + + Parameters + ---------- + stmt : dict + The statement dict in v1.2 shape. + statement_type : str + One of "balance_sheet", "income_statement", "cash_flow". + + Returns + ------- + dict with keys: + score : float — composite score in [0.0, 1.0] + level : str — "high", "medium", or "low" + signals : dict — individual signal scores keyed by signal name + flagged_rows: list — sorted list of flagged row_index values + """ + signals = { + "subtotal_validation": _score_subtotal_validation(stmt), + "section_coverage": _score_section_coverage(stmt, statement_type), + "row_count": _score_row_count(stmt, statement_type), + "column_dates": _score_column_dates(stmt), + "empty_label_ratio": _score_empty_label_ratio(stmt), + "leaked_headers": _score_leaked_headers(stmt), + } + + total_weight = sum(WEIGHTS.values()) + composite = sum(signals[k] * WEIGHTS[k] for k in WEIGHTS) / total_weight + + return { + "score": composite, + "level": _classify_level(composite), + "signals": signals, + "flagged_rows": sorted(flag_rows(stmt)), + } diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/corrections.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/corrections.py new file mode 100644 index 000000000..72a74d1b0 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/corrections.py @@ -0,0 +1,72 @@ +""" +extractor/corrections.py +------------------------- +Applies analyst corrections from HITL Adaptive Cards to v1.2 statement dicts. +""" + + +def apply_corrections(statement: dict, corrections: dict) -> dict: + """ + Apply analyst corrections to a v1.2 statement dict. + + Args: + statement: v1.2 statement dict with rows, columns, validation + corrections: dict of {"row_N": {"label": "...", "val_0": "...", "section": "..."}} + + Returns the statement dict with corrections applied in-place. + """ + from .statement_detector import _parse_financial_value + from .enrichment import _SECTION_TO_GROUP + + rows = statement.get("rows", []) + row_by_index = {r["row_index"]: r for r in rows} + + for key, fields in corrections.items(): + if not key.startswith("row_"): + continue + try: + row_idx = int(key.split("_", 1)[1]) + except (ValueError, IndexError): + continue + + row = row_by_index.get(row_idx) + if row is None: + continue + + # Apply label correction + if "label" in fields: + row["label_raw"] = fields["label"] + + # Apply value corrections + for fkey, fval in fields.items(): + if fkey.startswith("val_"): + try: + col_idx = int(fkey.split("_", 1)[1]) + except (ValueError, IndexError): + continue + if col_idx < len(row.get("values", [])): + val_cell = row["values"][col_idx] + raw = fval.strip() if fval else "" + if not raw: + val_cell["raw"] = None + val_cell["normalized"] = None + val_cell["is_null"] = True + val_cell["is_zero"] = None + else: + parsed = _parse_financial_value(raw) + val_cell["raw"] = raw + val_cell["normalized"] = parsed + val_cell["is_null"] = False + val_cell["is_zero"] = parsed == 0.0 if parsed is not None else None + + # Apply section correction + if "section" in fields: + row["section"] = fields["section"] + row["canonical_group"] = _SECTION_TO_GROUP.get(fields["section"], "other") + + # Apply row_type correction + if "row_type" in fields: + row["row_type"] = fields["row_type"] + row["is_derived_total"] = fields["row_type"] in ("subtotal", "total") + + return statement diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/cu_client.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/cu_client.py new file mode 100644 index 000000000..a6181ce6d --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/cu_client.py @@ -0,0 +1,201 @@ +""" +extractor/cu_client.py +---------------------- +Azure Content Understanding client for custom analyzers. + +Adapted from the Azure Samples reference implementation: + https://github.com/Azure-Samples/azure-ai-content-understanding-python + +Supports: + - Creating/updating custom analyzers (PUT) + - Submitting documents for analysis (POST binary) + - Polling for results (GET with retry) + - Deleting analyzers (DELETE) + - Listing existing analyzers (GET) + +Configuration (via .env): + AZURE_CU_ENDPOINT — Base endpoint (without path), e.g. + https://azure-content--resource.cognitiveservices.azure.com + +Authentication: Managed Identity (DefaultAzureCredential). + Requires Cognitive Services User role on the resource. +""" + +import json +import os +import time +from pathlib import Path +from typing import Optional + +import requests +from dotenv import load_dotenv + +load_dotenv(override=False) + +_API_VERSION = "2025-05-01-preview" +_POLL_INTERVAL = 3 +_POLL_TIMEOUT = 300 # 5 minutes max + + +def _get_auth_headers() -> dict: + """Return auth headers using managed identity (corp policy: no API keys).""" + from azure.identity import DefaultAzureCredential + token = DefaultAzureCredential().get_token("https://cognitiveservices.azure.com/.default") + return {"Authorization": f"Bearer {token.token}"} + + +def _get_base_url() -> str: + """Return the CU endpoint base URL (no trailing slash).""" + endpoint = os.environ.get("AZURE_CU_ENDPOINT", "") + # The old .env has a full analyze URL; extract just the base. + if "/contentunderstanding/" in endpoint: + endpoint = endpoint.split("/contentunderstanding/")[0] + return endpoint.rstrip("/") + + +def _get_headers(content_type: str = "application/json") -> dict: + return {**_get_auth_headers(), "Content-Type": content_type} + + +# --------------------------------------------------------------------------- +# Analyzer management +# --------------------------------------------------------------------------- + +def create_analyzer(analyzer_id: str, template: dict) -> dict: + """ + Create or update a custom analyzer. + + Args: + analyzer_id: Unique identifier (e.g. "financial-statement-locator") + template: Analyzer definition JSON (baseAnalyzerId, fieldSchema, etc.) + + Returns the full response dict. Raises on failure. + """ + url = ( + f"{_get_base_url()}/contentunderstanding/analyzers/{analyzer_id}" + f"?api-version={_API_VERSION}" + ) + resp = requests.put(url, headers=_get_headers(), json=template, timeout=60) + resp.raise_for_status() + + # If 201 Created with operation-location, poll until ready. + operation_location = resp.headers.get("operation-location") + if operation_location: + return _poll(operation_location) + + return resp.json() if resp.text.strip() else {"status": "created"} + + +def get_analyzer(analyzer_id: str) -> Optional[dict]: + """Get analyzer details. Returns None if not found.""" + url = ( + f"{_get_base_url()}/contentunderstanding/analyzers/{analyzer_id}" + f"?api-version={_API_VERSION}" + ) + resp = requests.get(url, headers=_get_headers(), timeout=30) + if resp.status_code == 404: + return None + resp.raise_for_status() + return resp.json() + + +def list_analyzers() -> list[dict]: + """List all analyzers on this resource.""" + url = f"{_get_base_url()}/contentunderstanding/analyzers?api-version={_API_VERSION}" + resp = requests.get(url, headers=_get_headers(), timeout=30) + resp.raise_for_status() + return resp.json().get("value", []) + + +def delete_analyzer(analyzer_id: str) -> None: + """Delete an analyzer.""" + url = ( + f"{_get_base_url()}/contentunderstanding/analyzers/{analyzer_id}" + f"?api-version={_API_VERSION}" + ) + resp = requests.delete(url, headers=_get_headers(), timeout=30) + if resp.status_code != 404: + resp.raise_for_status() + + +# --------------------------------------------------------------------------- +# Document analysis +# --------------------------------------------------------------------------- + +def analyze_document( + analyzer_id: str, + file_path: str, + page_range: str | None = None, +) -> dict: + """ + Submit a document for analysis and return the full result. + + Args: + analyzer_id: The custom analyzer to use. + file_path: Path to the PDF file. + page_range: Optional page range filter (e.g. "4-5", "6", "7-9"). + Uses 1-based page numbers. + + Returns the complete analysis result dict with extracted fields. + """ + range_param = f"&range={page_range}" if page_range else "" + url = ( + f"{_get_base_url()}/contentunderstanding/analyzers/{analyzer_id}:analyze" + f"?api-version={_API_VERSION}{range_param}" + ) + + with open(file_path, "rb") as f: + data = f.read() + + # Determine content type from extension. + ext = Path(file_path).suffix.lower() + content_types = { + ".pdf": "application/pdf", + ".png": "image/png", + ".jpg": "image/jpeg", + ".jpeg": "image/jpeg", + ".tiff": "image/tiff", + } + ct = content_types.get(ext, "application/octet-stream") + + headers = _get_headers(content_type=ct) + resp = requests.post(url, headers=headers, data=data, timeout=120) + resp.raise_for_status() + + operation_location = resp.headers.get("operation-location") + if operation_location: + return _poll(operation_location) + + return resp.json() + + +# --------------------------------------------------------------------------- +# Polling +# --------------------------------------------------------------------------- + +def _poll(operation_location: str) -> dict: + """Poll an async operation until it completes.""" + headers = _get_auth_headers() + start = time.time() + attempt = 0 + + while (time.time() - start) < _POLL_TIMEOUT: + attempt += 1 + resp = requests.get(operation_location, headers=headers, timeout=30) + resp.raise_for_status() + result = resp.json() + + status = result.get("status", "unknown").lower() + print(f" [{attempt}] status: {status}") + + if status == "succeeded": + return result + if status == "failed": + error = result.get("error", {}) + raise RuntimeError( + f"Operation failed: {error.get('code')} - {error.get('message')}" + ) + + time.sleep(_POLL_INTERVAL) + + raise TimeoutError(f"Operation did not complete within {_POLL_TIMEOUT}s") diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/dataverse_client.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/dataverse_client.py new file mode 100644 index 000000000..dfa0661f7 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/dataverse_client.py @@ -0,0 +1,137 @@ +""" +Dataverse Web API client. MSAL service principal auth + single row + batch writes. +""" +import json +import logging +import os +import uuid + +import httpx +import msal + +logger = logging.getLogger(__name__) + +BATCH_SIZE = 100 + + +def _get_token() -> str: + authority = f"https://login.microsoftonline.com/{os.environ['DATAVERSE_TENANT_ID']}" + app = msal.ConfidentialClientApplication( + client_id=os.environ["DATAVERSE_CLIENT_ID"], + client_credential=os.environ["DATAVERSE_CLIENT_SECRET"], + authority=authority, + ) + result = app.acquire_token_for_client(scopes=[f"{os.environ['DATAVERSE_URL']}/.default"]) + if "access_token" not in result: + raise RuntimeError(f"MSAL token error: {result.get('error_description', result)}") + return result["access_token"] + + +def _headers(token: str) -> dict: + return { + "Authorization": f"Bearer {token}", + "Content-Type": "application/json", + "OData-MaxVersion": "4.0", + "OData-Version": "4.0", + "Accept": "application/json", + "Prefer": "return=representation", + } + + +def build_create_request(table: str, data: dict, content_id: str, lookups: dict | None = None) -> dict: + body = {**data} + if lookups: + body.update(lookups) + return {"method": "POST", "url": table, "body": body, "content_id": content_id} + + +def build_batch_payload(requests: list[dict]) -> tuple[str, str]: + batch_id = f"batch_{uuid.uuid4().hex[:12]}" + changeset_id = f"changeset_{uuid.uuid4().hex[:12]}" + parts = [f"--{batch_id}", f"Content-Type: multipart/mixed; boundary={changeset_id}", ""] + for req in requests: + parts.extend([ + f"--{changeset_id}", + "Content-Type: application/http", + "Content-Transfer-Encoding: binary", + f"Content-ID: {req['content_id']}", + "", + f"{req['method']} {req['url']} HTTP/1.1", + "Content-Type: application/json", + "", + json.dumps(req["body"]), + ]) + parts.append(f"--{changeset_id}--") + parts.append(f"--{batch_id}--") + return batch_id, "\r\n".join(parts) + + +def write_to_dataverse(job_id: str, job_row: dict, statement_rows: list[tuple[str, dict]], line_items: list[tuple[str, dict]]): + """ + Write extraction results to Dataverse. Synchronous (called from background thread). + + Args: + job_id: extraction job ID + job_row: ExtractionJob dict + statement_rows: list of (statement_type, row_dict) tuples + line_items: list of (statement_type, row_dict) tuples + """ + base_url = os.environ["DATAVERSE_URL"].rstrip("/") + token = _get_token() + hdrs = _headers(token) + + with httpx.Client(timeout=60.0) as client: + # 1. Create ExtractionJob (need ID back) + resp = client.post(f"{base_url}/api/data/v9.2/cree1_extractionjobs", json=job_row, headers=hdrs) + if resp.status_code not in (200, 201): + raise RuntimeError(f"ExtractionJob create failed: {resp.status_code} {resp.text[:1000]}") + job_record_id = resp.json().get("cree1_extractionjobid", "") + logger.info("Created ExtractionJob %s -> %s", job_id, job_record_id) + + # 2. Create ExtractedStatement rows (need IDs for line item lookups) + stmt_ids = {} + for stmt_type, stmt_data in statement_rows: + stmt_data["cree1_ExtractionJobID@odata.bind"] = f"/cree1_extractionjobs({job_record_id})" + resp = client.post(f"{base_url}/api/data/v9.2/cree1_extractedstatement1s", json=stmt_data, headers=hdrs) + if resp.status_code not in (200, 201): + raise RuntimeError(f"ExtractedStatement ({stmt_type}) create failed: {resp.status_code} {resp.text[:1000]}") + stmt_ids[stmt_type] = resp.json().get("cree1_extractedstatement1id", "") + logger.info("Created ExtractedStatement %s -> %s", stmt_type, stmt_ids[stmt_type]) + + # 3. Batch-write ExtractedLineItems + batch_hdrs = { + "Authorization": f"Bearer {token}", + "OData-MaxVersion": "4.0", + "OData-Version": "4.0", + "Accept": "application/json", + } + content_id = 1 + batch_reqs = [] + total_written = 0 + + for stmt_type, item_data in line_items: + lookups = {"cree1_ExtractionJob@odata.bind": f"/cree1_extractionjobs({job_record_id})"} + if stmt_type in stmt_ids: + lookups["cree1_ExtractedStatement@odata.bind"] = f"/cree1_extractedstatement1s({stmt_ids[stmt_type]})" + batch_reqs.append(build_create_request("cree1_extractedlineitems", item_data, str(content_id), lookups)) + content_id += 1 + + if len(batch_reqs) >= BATCH_SIZE: + boundary, body = build_batch_payload(batch_reqs) + batch_hdrs["Content-Type"] = f"multipart/mixed; boundary={boundary}" + resp = client.post(f"{base_url}/api/data/v9.2/$batch", content=body, headers=batch_hdrs) + if resp.status_code not in (200, 204): + raise RuntimeError(f"Batch write failed: {resp.status_code} {resp.text[:1000]}") + total_written += len(batch_reqs) + logger.info("Written %d/%d line items", total_written, len(line_items)) + batch_reqs = [] + + if batch_reqs: + boundary, body = build_batch_payload(batch_reqs) + batch_hdrs["Content-Type"] = f"multipart/mixed; boundary={boundary}" + resp = client.post(f"{base_url}/api/data/v9.2/$batch", content=body, headers=batch_hdrs) + if resp.status_code not in (200, 204): + raise RuntimeError(f"Batch write failed: {resp.status_code} {resp.text[:1000]}") + total_written += len(batch_reqs) + + logger.info("Dataverse write complete: %d statements, %d line items", len(statement_rows), total_written) diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/dataverse_parser.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/dataverse_parser.py new file mode 100644 index 000000000..6ab327ba3 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/dataverse_parser.py @@ -0,0 +1,281 @@ +""" +Parses extraction pipeline results (v1.2 schema) into Dataverse-ready dicts. +Each function returns dicts with cree1_ prefixed keys matching Dataverse column logical names. +All choice fields use integer values (PicklistType). +""" +import json +from typing import Any + +from extractor.confidence_scorer import score_statement, flag_rows + +# Choice values +STATUS_PENDING_REVIEW = 833060002 +REVIEW_STATUS_PENDING = 833060000 +REVIEW_STATUS_FLAGGED = 833060003 + +STATEMENT_TYPE_MAP = { + "balance_sheet": 833060001, + "income_statement": 833060000, + "cash_flow": 833060002, +} + +ROW_TYPE_MAP = { + "section_header": 833060000, + "line_item": 833060001, + "subtotal": 833060002, + "total": 833060003, +} + +# Picklist mappings for ExtractionJob choice fields +LANGUAGE_MAP = {"en": 833060000, "zh": 833060001, "jp": 833060002} +CURRENCY_MAP = { + "USD": 833060000, "CNY": 833060001, "JPY": 833060002, + "EUR": 833060003, "GBP": 833060004, "AUD": 833060005, + "CAD": 833060006, "HKD": 833060007, "SGD": 833060008, +} +CURRENCY_UNIT_MAP = {"ones": 833060000, "thousands": 833060001, "millions": 833060002, "billions": 833060003} +REPORT_TYPE_MAP = {"annual_report": 833060000, "quarterly_report": 833060001, "10-K": 833060000, "10-Q": 833060001} +ACCOUNTING_STANDARD_MAP = {"IFRS": 833060000, "Chinese_ASBE": 833060001, "US_GAAP": 833060002, "US-GAAP": 833060002, "GAAP": 833060002} + + +# Dataverse decimal field limit: -100,000,000,000 to 100,000,000,000 +DATAVERSE_DECIMAL_LIMIT = 99_999_999_999 + +# Unit scaling: from source unit → target unit, with divisor +UNIT_SCALING = { + "ones": {"target": "millions", "divisor": 1_000_000, "target_code": 833060002}, + "thousands": {"target": "millions", "divisor": 1_000, "target_code": 833060002}, + "millions": {"target": "billions", "divisor": 1_000, "target_code": 833060003}, +} + + +def normalize_values_for_dataverse( + job_row: dict[str, Any], + line_items: list[tuple[str, dict[str, Any]]], + source_unit: str, +) -> None: + """Scale valuenormalized across ALL line items if any value exceeds the Dataverse decimal limit. + + Modifies job_row and line_items in place: + - Divides all cree1_valuenormalized by the appropriate scaling factor + - Updates cree1_currencyunit on the job row to reflect the new unit + + This ensures the entire report uses a consistent unit that fits within + Dataverse's decimal column range (-100B to +100B). + """ + if not line_items: + return + + # Check if any value exceeds the limit + max_abs_value = 0.0 + for _, item in line_items: + val = item.get("cree1_valuenormalized") + if val is not None: + try: + max_abs_value = max(max_abs_value, abs(float(val))) + except (TypeError, ValueError): + pass + + if max_abs_value <= DATAVERSE_DECIMAL_LIMIT: + return # All values fit, no scaling needed + + # Determine scaling factor based on source unit + scaling = UNIT_SCALING.get(source_unit) + if not scaling: + # Already in billions or unknown unit — can't scale further + import logging + logging.warning( + f"Value {max_abs_value} exceeds Dataverse limit but unit '{source_unit}' " + f"cannot be scaled further. Values may be truncated." + ) + return + + divisor = scaling["divisor"] + target_unit_code = scaling["target_code"] + target_unit_name = scaling["target"] + + import logging + logging.info( + f"Scaling values: {source_unit} -> {target_unit_name} (÷{divisor:,}) " + f"because max value {max_abs_value:,.2f} exceeds Dataverse limit {DATAVERSE_DECIMAL_LIMIT:,}" + ) + + # Apply scaling to ALL line items + for _, item in line_items: + val = item.get("cree1_valuenormalized") + if val is not None: + try: + item["cree1_valuenormalized"] = round(float(val) / divisor, 10) + except (TypeError, ValueError): + pass + + # Update the unit on the job row + job_row["cree1_currencyunit"] = target_unit_code + + +def parse_job_row(job_id: str, result: dict, file_name: str = "") -> dict[str, Any]: + """Parse pipeline result into an ExtractionJob row.""" + doc_meta = {} + stmt_meta = {} + total_rows = 0 + periods = set() + + for stype in ["balance_sheet", "income_statement", "cash_flow"]: + stmt = result.get(stype) + if not stmt or not isinstance(stmt, dict): + continue + if not doc_meta: + doc_meta = stmt.get("document_metadata", {}) + if not stmt_meta: + stmt_meta = stmt.get("statement_metadata", {}) + total_rows += len(stmt.get("rows", [])) + for col in stmt.get("columns", []): + periods.add(col.get("label", "")) + + statements_found = sum(1 for s in ["balance_sheet", "income_statement", "cash_flow"] + if result.get(s) and isinstance(result[s], dict)) + + # Use statement-level confidence scores from confidence_scorer + avg_confidence = None + conf = result.get("confidence", {}) + if isinstance(conf, dict): + scores = [v["score"] for v in conf.values() if isinstance(v, dict) and "score" in v] + if scores: + avg_confidence = sum(scores) / len(scores) + + row = { + "cree1_jobid": job_id, + "cree1_companyname": doc_meta.get("company_name", ""), + "cree1_filename": file_name, + "cree1_statementsfound": statements_found, + "cree1_totallineitems": total_rows, + "cree1_avgconfidence": avg_confidence, + "cree1_periods": ",".join(sorted(periods)), + "cree1_summaryjsonfull": json.dumps(result.get("summary", []), ensure_ascii=False), + "cree1_status": STATUS_PENDING_REVIEW, + } + + # Only include choice fields if we have a valid mapping (avoid 400 errors) + lang = doc_meta.get("report_language", "") + if lang in LANGUAGE_MAP: + row["cree1_reportlanguage"] = LANGUAGE_MAP[lang] + + currency = stmt_meta.get("currency", "") + if currency in CURRENCY_MAP: + row["cree1_currency"] = CURRENCY_MAP[currency] + + unit = stmt_meta.get("unit", "") + if unit in CURRENCY_UNIT_MAP: + row["cree1_currencyunit"] = CURRENCY_UNIT_MAP[unit] + + report_type = doc_meta.get("report_type", "") + if report_type in REPORT_TYPE_MAP: + row["cree1_reporttype"] = REPORT_TYPE_MAP[report_type] + + acct_std = stmt_meta.get("accounting_standard", "") + if acct_std in ACCOUNTING_STANDARD_MAP: + row["cree1_accountingstandard"] = ACCOUNTING_STANDARD_MAP[acct_std] + + return row + + +def parse_statement_row(statement_type: str, stmt: dict) -> dict[str, Any]: + """Parse a single statement dict into an ExtractedStatement row.""" + meta = stmt.get("statement_metadata", {}) + page_range = meta.get("page_range", {}) + + return { + "cree1_jobid": "", # Set by _write_to_dataverse before sending + "cree1_statementtitle": meta.get("statement_title", ""), + "cree1_statementname": statement_type, + "cree1_statementtype": STATEMENT_TYPE_MAP.get(statement_type, 833060000), + "cree1_pagerangestart": page_range.get("start"), + "cree1_pagerangeend": page_range.get("end"), + "cree1_isconsolidated": meta.get("is_consolidated"), + "cree1_isaudited": meta.get("is_audited"), + "cree1_rawstatementjsonfull": json.dumps(stmt, ensure_ascii=False), + "cree1_reviewcomplete": False, + } + + +def parse_line_item_rows(statement_type: str, stmt: dict, confidence_result: dict | None = None) -> list[dict[str, Any]]: + """Parse rows x columns into ExtractedLineItem rows (one per value per period). + + Parameters + ---------- + confidence_result : dict | None + Output of confidence_scorer.score_statement(). Used to: + - Apply statement-level confidence as fallback when per-cell confidence is null + - Flag rows identified by flag_rows() (validation warnings, parse failures, etc.) + """ + columns = stmt.get("columns", []) + col_lookup = {c["column_index"]: c for c in columns} + + # Identify label column indices to skip (e.g., 项目/Item — not a value column) + _LABEL_COLUMN_NAMES = {"项目", "item", "rubriques", ""} + label_col_indices = set() + for c in columns: + label_lower = c.get("label", "").strip().lower() + if label_lower in _LABEL_COLUMN_NAMES and c.get("fiscal_year") is None: + label_col_indices.add(c["column_index"]) + + # Build value columns (non-label) for correct period mapping + value_columns = [c for c in columns if c["column_index"] not in label_col_indices] + + # Statement-level confidence fallback + stmt_confidence = confidence_result["score"] if confidence_result else None + flagged_row_indices = set(confidence_result.get("flagged_rows", [])) if confidence_result else set() + + items = [] + for row in stmt.get("rows", []): + row_idx = row.get("row_index") + is_flagged = row_idx in flagged_row_indices + + values = row.get("values", []) + + # The values array may have entries for ALL columns including the label column. + # When a label column exists, it shifts values by 1 position. We detect this + # by comparing values count vs columns count, and map by position to + # value (period) columns. + if label_col_indices and len(values) > len(value_columns): + # More values than period columns — label column included in values. + # Take first N values where N = number of period columns. + data_values = values[:len(value_columns)] + else: + # No label column, or values already aligned with period columns. + data_values = values + + for vi, val in enumerate(data_values): + if vi >= len(value_columns): + break + col = value_columns[vi] + + # Use per-cell confidence if available, fall back to statement-level + cell_confidence = val.get("confidence") + if cell_confidence is None: + cell_confidence = stmt_confidence + + items.append({ + "cree1_jobid": "", # Set by _write_to_dataverse before sending + "cree1_lineitemname": row.get("label_normalized") or row.get("label_raw", ""), + "cree1_rowindex": row_idx, + "cree1_rowtype": ROW_TYPE_MAP.get(row.get("row_type", ""), 833060001), + "cree1_indentlevel": row.get("indent_level", 0), + "cree1_sectionname": row.get("section", ""), + "cree1_canonicalkey": row.get("canonical_key", ""), + "cree1_canonicalgroup": row.get("canonical_group", ""), + "cree1_labelraw": row.get("label_raw", ""), + "cree1_period": col.get("label", ""), + "cree1_periodtype": col.get("period_type", ""), + "cree1_periodenddate": col.get("end_date"), + "cree1_columnindex": col.get("column_index", vi), + "cree1_valueraw": val.get("raw"), + "cree1_valuenormalized": val.get("normalized"), + "cree1_valuekind": val.get("value_kind", ""), + "cree1_aiconfidence": cell_confidence, + "cree1_sourcepage": row.get("source_page"), + "cree1_signhint": row.get("sign_hint"), + "cree1_reviewstatus": REVIEW_STATUS_FLAGGED if is_flagged else REVIEW_STATUS_PENDING, + }) + + return items diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/dataverse_reader.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/dataverse_reader.py new file mode 100644 index 000000000..350153cf1 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/dataverse_reader.py @@ -0,0 +1,67 @@ +""" +Dataverse Web API reader. Fetches approved extraction data for Excel generation. +Reuses MSAL auth from dataverse_client. +""" +import logging +import os + +import httpx + +from extractor.dataverse_client import _get_token, _headers + +logger = logging.getLogger(__name__) + + +def _base_url() -> str: + return os.environ["DATAVERSE_URL"].rstrip("/") + + +def read_job(job_id: str) -> dict | None: + """Fetch ExtractionJob by cree1_jobid (the extraction UUID, not the Dataverse record ID).""" + token = _get_token() + url = f"{_base_url()}/api/data/v9.2/cree1_extractionjobs?$filter=cree1_jobid eq '{job_id}'" + with httpx.Client(timeout=30.0) as client: + resp = client.get(url, headers=_headers(token)) + resp.raise_for_status() + rows = resp.json().get("value", []) + return rows[0] if rows else None + + +def read_statements(job_id: str) -> list[dict]: + """Fetch ExtractedStatement rows for a job, ordered by statement type.""" + token = _get_token() + url = ( + f"{_base_url()}/api/data/v9.2/cree1_extractedstatement1s" + f"?$filter=cree1_jobid eq '{job_id}'" + f"&$orderby=cree1_statementtype asc" + ) + with httpx.Client(timeout=30.0) as client: + resp = client.get(url, headers=_headers(token)) + resp.raise_for_status() + return resp.json().get("value", []) + + +def read_line_items(job_id: str) -> list[dict]: + """Fetch all ExtractedLineItem rows for a job, ordered by row then column. + + Handles OData pagination (5000 row limit per page). + """ + token = _get_token() + hdrs = _headers(token) + hdrs["Prefer"] = "odata.maxpagesize=5000" + url = ( + f"{_base_url()}/api/data/v9.2/cree1_extractedlineitems" + f"?$filter=cree1_jobid eq '{job_id}'" + f"&$orderby=cree1_rowindex asc,cree1_columnindex asc" + ) + all_items = [] + with httpx.Client(timeout=60.0) as client: + while url: + resp = client.get(url, headers=hdrs) + resp.raise_for_status() + data = resp.json() + all_items.extend(data.get("value", [])) + url = data.get("@odata.nextLink") + + logger.info("Read %d line items for job %s", len(all_items), job_id) + return all_items diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/di_adapter.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/di_adapter.py new file mode 100644 index 000000000..e77f5bdb5 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/di_adapter.py @@ -0,0 +1,155 @@ +""" +extractor/di_adapter.py +------------------------ +Adapter for Azure Document Intelligence results → pipeline AnalyzeResult. + +Unlike Textract/pdfplumber adapters, DI with output_content_format="markdown" +already returns markdown with embedded HTML tables. This adapter only needs to: + 1. Build a page_map from the DI page spans + 2. Classify statements via LLM (reuses the same prompt as textract_adapter) + +Public API: + build_page_map(di_result, markdown) -> list[tuple[int, int, int]] + classify_statements_with_llm(markdown) -> list[dict] +""" + +from __future__ import annotations + +import json +import logging +import re + +logger = logging.getLogger(__name__) + + +# --------------------------------------------------------------------------- +# 1. Build page map +# --------------------------------------------------------------------------- + +def build_page_map( + di_result: dict, markdown: str +) -> list[tuple[int, int, int]]: + """ + Build a page_map from DI page spans. + + DI pages include span offsets into the markdown content string, + so we can directly map (offset, offset+length, page_number). + + Falls back to parsing or markers + if page spans are not available. + """ + pages = di_result.get("pages", []) + entries: list[tuple[int, int, int]] = [] + + for page in pages: + page_num = page.get("pageNumber", 1) + spans = page.get("spans", []) + if spans: + # Use the first span's offset and the last span's end + start = spans[0]["offset"] + last = spans[-1] + end = last["offset"] + last["length"] + entries.append((start, end, page_num)) + + if entries: + entries.sort(key=lambda t: t[0]) + return entries + + # Fallback: look for page markers in markdown (DI sometimes inserts these) + marker_re = re.compile(r"|") + matches = list(marker_re.finditer(markdown)) + + if not matches: + # Single page document — entire content is page 1 + return [(0, len(markdown), 1)] + + page_num = 1 + for i, m in enumerate(matches): + if m.group(1): + page_num = int(m.group(1)) + start = m.start() + if i + 1 < len(matches): + end = matches[i + 1].start() + else: + end = len(markdown) + entries.append((start, end, page_num)) + page_num += 1 + + entries.sort(key=lambda t: t[0]) + return entries + + +# --------------------------------------------------------------------------- +# 2. Classify statements with LLM (same as textract_adapter) +# --------------------------------------------------------------------------- + +def classify_statements_with_llm(markdown: str) -> list[dict]: + """ + Use Azure OpenAI to classify financial statements in the document. + + Sends first 8000 chars of markdown to the LLM and asks it to identify + statement types (balance_sheet, income_statement, cash_flow). + + Returns empty list if LLM is not available (graceful fallback). + """ + try: + from extractor.llm_reconciler import _get_client, _DEPLOYMENT + except Exception: + logger.warning( + "[di_adapter] Could not import LLM client, skipping classification" + ) + return [] + + snippet = markdown[:8000] + + prompt = f"""You are a financial document analyst. Given the following extracted text from a financial report, identify each financial statement present. + +For each statement found, return: +- statement_type: one of "balance_sheet", "income_statement", "cash_flow" +- title_raw: the exact title as it appears in the document +- currency: ISO 4217 currency code (e.g. "USD", "EUR", "GBP") +- unit: the unit of values (e.g. "millions", "thousands", "units") +- accounting_standard: e.g. "IFRS", "US GAAP", "GAAP", or null +- is_consolidated: true if consolidated/group statement, false if standalone +- report_language: ISO 639-1 language code (e.g. "en", "fr", "zh") +- company_name: name of the reporting entity + +If you cannot determine a field, use null. + +Document text: +{snippet} + +Respond with a JSON object: {{"statements": [...]}}""" + + try: + client = _get_client() + response = client.chat.completions.create( + model=_DEPLOYMENT(), + response_format={"type": "json_object"}, + messages=[ + { + "role": "system", + "content": "You are a financial document analyst. Return only valid JSON.", + }, + {"role": "user", "content": prompt}, + ], + temperature=0.0, + max_tokens=2000, + ) + + content = response.choices[0].message.content + if not content: + logger.warning("[di_adapter] LLM returned empty content") + return [] + raw = content.strip() + parsed = json.loads(raw) + + if isinstance(parsed, dict): + return parsed.get("statements", []) + elif isinstance(parsed, list): + return parsed + return [] + + except Exception as e: + logger.warning(f"[di_adapter] LLM classification failed: {e}") + return [] diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/di_client.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/di_client.py new file mode 100644 index 000000000..20c48a921 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/di_client.py @@ -0,0 +1,121 @@ +""" +extractor/di_client.py +----------------------- +Azure Document Intelligence client using the prebuilt-layout model. + +Returns markdown with embedded HTML tables — the same format that +Content Understanding produces — so all downstream pipeline stages +(locator, transformer, validator, enricher) work unchanged. + +Configuration (via .env): + AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT — Resource endpoint, e.g. + https://your-docai-resource.cognitiveservices.azure.com/ + +Authentication: Managed Identity (DefaultAzureCredential). + Requires Cognitive Services User role on the resource. + API keys are disabled on this subscription. +""" + +import logging +import os + +from dotenv import load_dotenv + +load_dotenv(override=False) + +logger = logging.getLogger(__name__) + + +def _get_client(): + """Create a DocumentIntelligenceClient with Managed Identity auth.""" + from azure.ai.documentintelligence import DocumentIntelligenceClient + from azure.identity import DefaultAzureCredential + + endpoint = os.environ["AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT"].rstrip("/") + credential = DefaultAzureCredential() + return DocumentIntelligenceClient(endpoint=endpoint, credential=credential) + + +def analyze_document(file_path: str) -> dict: + """ + Submit a PDF to Document Intelligence prebuilt-layout and return the result. + + Uses output_content_format="markdown" so the response includes markdown + text with embedded HTML blocks — matching the CU output format. + + Args: + file_path: Path to the PDF file on disk. + + Returns: + The AnalyzeResult as a dict with keys: + - content: str (markdown with HTML tables) + - pages: list[dict] (page metadata) + - tables: list[dict] (structured table data, for reference) + """ + from azure.ai.documentintelligence.models import AnalyzeDocumentRequest + + client = _get_client() + + with open(file_path, "rb") as f: + pdf_bytes = f.read() + + logger.info( + "DI: Submitting %d bytes to prebuilt-layout (markdown mode)", len(pdf_bytes) + ) + + poller = client.begin_analyze_document( + model_id="prebuilt-layout", + body=AnalyzeDocumentRequest(bytes_source=pdf_bytes), + output_content_format="markdown", + ) + + result = poller.result() + + # Convert SDK object to dict for consistent handling + result_dict = { + "content": result.content, # Markdown with HTML tables + "pages": [], + "tables": [], + } + + if result.pages: + for page in result.pages: + result_dict["pages"].append({ + "pageNumber": page.page_number, + "spans": [ + {"offset": s.offset, "length": s.length} + for s in (page.spans or []) + ], + }) + + if result.tables: + for table in result.tables: + result_dict["tables"].append({ + "rowCount": table.row_count, + "columnCount": table.column_count, + "cells": [ + { + "rowIndex": c.row_index, + "columnIndex": c.column_index, + "content": c.content, + "kind": c.kind or "content", + "columnSpan": c.column_span or 1, + "rowSpan": c.row_span or 1, + } + for c in (table.cells or []) + ], + "boundingRegions": [ + {"pageNumber": r.page_number} + for r in (table.bounding_regions or []) + ], + }) + + page_count = len(result_dict["pages"]) + table_count = len(result_dict["tables"]) + content_len = len(result_dict["content"]) + logger.info( + "DI: Got %d pages, %d tables, %d chars markdown", + page_count, table_count, content_len, + ) + + return result_dict diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/enrichment.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/enrichment.py new file mode 100644 index 000000000..145d31a9a --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/enrichment.py @@ -0,0 +1,434 @@ +""" +extractor/enrichment.py +----------------------- +Joins the CU extractor's enrichment data (canonical keys, English labels, +sections) with the Python parser's precise row extraction. + +Three enrichment sources, tried in order: + 1. CU extractor lookup (best quality — canonical keys + sections from Azure) + 2. LLM batch translation (for non-English labels not covered by CU extractor) + 3. Fallback heuristics (snake_case from label text, character-based language) +""" + +import json +import os +import re +import unicodedata +from typing import Optional + + +# --------------------------------------------------------------------------- +# Language detection (fix #4) +# --------------------------------------------------------------------------- + +def detect_label_language(text: str) -> str: + """ + Detect ISO 639-1 language code from the label's characters. + + Uses Unicode script detection — no external libraries needed. + """ + for char in text: + if char.isalpha(): + name = unicodedata.name(char, "").upper() + if "CJK" in name: + return "zh" + if "HIRAGANA" in name or "KATAKANA" in name: + return "ja" + if "HANGUL" in name: + return "ko" + if "ARABIC" in name: + return "ar" + if "DEVANAGARI" in name: + return "hi" + if "THAI" in name: + return "th" + # Latin script — default to English + if "LATIN" in name: + return "en" + return "en" + + +# --------------------------------------------------------------------------- +# Normalization helpers +# --------------------------------------------------------------------------- + +def _normalize_for_matching(text: str) -> str: + """Normalize a label for fuzzy matching.""" + s = text.lower().strip() + s = re.sub(r"[,.:;'\"\-/()():、]", " ", s) + s = re.sub(r"\s+", " ", s).strip() + return s + + +def _fallback_canonical_key(label: str) -> str: + """Generate a snake_case key from a label when no enrichment is available.""" + s = label.lower().strip() + s = s.replace("&", "and") + s = re.sub(r"[,.:;'\"\-/()():、]", " ", s) + s = re.sub(r"\s+", "_", s.strip()).strip("_") + s = re.sub(r"[^a-z0-9_]", "", s) + return s or "unknown" + + +# --------------------------------------------------------------------------- +# Section inference (fix #2) +# --------------------------------------------------------------------------- + +_SECTION_KEYWORDS: dict[str, list[str]] = { + # English — IFRS + US GAAP + UK GAAP / Companies Act + "current_assets": [ + "current assets", "cash and cash", "receivable", "prepaid", + "marketable securities", "inventori", + # UK GAAP + "debtors", "trade debtors", "amounts owed by group", + "prepayments and accrued income", "stock", + "cash at bank", "restricted cash", + ], + "non_current_assets": [ + "property", "equipment", "goodwill", "intangible", + "right-of-use", "long-term invest", "non-current asset", + # UK GAAP + "tangible fixed asset", "fixed asset", "investment", + "amounts falling due after", + ], + "current_liabilities": [ + "current liabilit", "accounts payable", "accrued", "short-term", + # UK GAAP + "creditors", "trade creditors", "amounts owed to group", + "falling due within one year", "other creditors", + "accruals and deferred income", "taxation and social security", + "corporation tax", + ], + "non_current_liabilities": [ + "long-term debt", "long-term income", "non-current liabilit", + "operating lease liabilit", + # UK GAAP + "falling due after more than one year", "provisions for liabilities", + "deferred tax", "pension", + ], + "equity": [ + "equity", "stockholder", "shareholder", "retained earning", + "common stock", "paid-in capital", + # UK GAAP + "called up share capital", "share premium", "profit and loss account", + "capital and reserves", "capital redemption reserve", + "share-based payment reserve", + ], + "revenue": ["revenue", "sales", "turnover", "income from operation"], + "operating_expenses": [ + "cost of revenue", "research and develop", "marketing", + "general and admin", "selling", "operating expense", + # UK GAAP + "cost of sales", "administrative expense", "distribution cost", + "wages and salaries", "staff cost", "social security", + "depreciation", "amortisation", "amortization", + "directors' emoluments", "directors emoluments", + "audit fee", "impairment", + ], + "tax": [ + "income tax", "provision for", "tax rate", + # UK GAAP + "corporation tax", "deferred tax", "current tax", + "tax charge", "tax on profit", "tax on ordinary", + ], + "eps": ["per share", "earnings per", "eps"], + "shares": ["shares used", "weighted-average", "shares outstanding"], + "operating_activities": [ + "operating activit", "net income", "depreciation", + "amortization", "share-based comp", "deferred income", + "working capital", "changes in assets", + ], + "investing_activities": [ + "investing activit", "purchases of property", + "purchases of investment", "purchases of marketable", + "acquisitions", "proceeds from sale", + ], + "financing_activities": [ + "financing activit", "repurchase", "dividend", "borrowing", + "issuance of", "finance lease", + ], + "cash_reconciliation": [ + "cash and cash equivalents at", "net increase", "net decrease", + "free cash flow", + ], + "supplemental_disclosures": ["supplemental", "cash paid for"], + # Chinese + # NOTE: keys that appear again extend the lists above via _merge below +} + +# Chinese keywords (merged separately to avoid overwriting English lists) +_SECTION_KEYWORDS_ZH: dict[str, list[str]] = { + "current_assets": ["流动资产", "货币资金", "交易性金融", "应收", "预付", "存货"], + "non_current_assets": ["非流动资产", "固定资产", "无形资产", "商誉", "长期股权", "在建工程", "使用权资产", "投资性房地产"], + "current_liabilities": ["流动负债", "应付", "短期借款", "预收"], + "non_current_liabilities": ["非流动负债", "长期借款", "长期应付"], + "equity": ["所有者权益", "股本", "资本公积", "盈余公积", "未分配利润", "其他综合收益"], + "revenue": ["营业收入", "营业总收入"], + "operating_expenses": ["营业成本", "营业总成本", "销售费用", "管理费用", "研发费用", "财务费用", "税金及附加"], + "tax": ["所得税", "利润总额"], + "operating_activities": ["经营活动", "销售商品", "购买商品", "支付给职工", "支付的各项税费"], + "investing_activities": ["投资活动", "购建固定资产", "取得投资收益", "处置固定资产"], + "financing_activities": ["筹资活动", "取得借款", "偿还债务", "分配股利"], + "cash_reconciliation": ["现金及现金等价物", "期末现金", "期初现金"], +} + +# Merge Chinese keywords into the main dict +for _sec, _kws in _SECTION_KEYWORDS_ZH.items(): + _SECTION_KEYWORDS.setdefault(_sec, []).extend(_kws) + + +def infer_section(label: str, statement_type: str) -> str: + """ + Infer section from label text using keyword matching. + + Works for both English and Chinese labels. Uses longest-match scoring + to prefer more specific sections (e.g. "current_liabilities" over + "current_assets" for "creditors: amounts falling due within one year"). + """ + label_lower = label.lower() + if not label_lower.strip(): + return "other" + + # Score each section by sum of matched keyword lengths (longer = more specific) + best_section = "other" + best_score = 0 + for section, keywords in _SECTION_KEYWORDS.items(): + score = sum(len(kw) for kw in keywords if kw in label_lower) + if score > best_score: + best_score = score + best_section = section + + # If no match, use statement_type as a broad default — but only for + # labels that are clearly financial. Empty or unknown labels get "other". + if best_section == "other" and best_score == 0: + defaults = { + "balance_sheet": "assets", + "income_statement": "other", + "cash_flow": "operating_activities", + } + best_section = defaults.get(statement_type, "other") + + return best_section + + +# --------------------------------------------------------------------------- +# CU extractor lookup +# --------------------------------------------------------------------------- + +def build_enrichment_lookup(cu_extractor_result: dict) -> dict[str, dict]: + """Build a label -> enrichment lookup from the CU extractor response.""" + lookup: dict[str, dict] = {} + + contents = cu_extractor_result.get("result", {}).get("contents", []) + if not contents: + return lookup + + fields = contents[0].get("fields", {}) + rows = fields.get("rows", {}).get("valueArray", []) + + for r in rows: + props = r.get("valueObject", {}) + label_raw = props.get("label_raw", {}).get("valueString", "") + if not label_raw: + continue + + key = _normalize_for_matching(label_raw) + if key in lookup: + continue + + lookup[key] = { + "canonical_key": props.get("canonical_key", {}).get("valueString", ""), + "label_normalized": props.get("label_normalized", {}).get("valueString", ""), + "label_language": props.get("label_language", {}).get("valueString", ""), + "section": props.get("section", {}).get("valueString", "other"), + "row_type_hint": props.get("row_type", {}).get("valueString", ""), + } + + return lookup + + +# --------------------------------------------------------------------------- +# LLM batch translation (fix #1) +# --------------------------------------------------------------------------- + +def translate_labels_batch( + labels: list[str], + statement_type: str, +) -> dict[str, dict]: + """ + Send unenriched labels to the LLM for English translation, + canonical key generation, and section classification. + + One LLM call per batch. Returns a dict keyed by normalized label. + """ + if not labels: + return {} + + try: + from .llm_reconciler import _get_client, _DEPLOYMENT + except Exception: + return {} # LLM not available + + # Deduplicate + unique_labels = list(dict.fromkeys(labels)) + + prompt = ( + f"You are translating financial statement row labels to English.\n" + f"Statement type: {statement_type.replace('_', ' ')}\n\n" + f"For each label below, return:\n" + f"- label_normalized: the standard English IFRS/GAAP equivalent\n" + f"- canonical_key: English snake_case identifier (e.g. cash_and_cash_equivalents)\n" + f"- section: one of: current_assets, non_current_assets, assets, " + f"current_liabilities, non_current_liabilities, liabilities, equity, " + f"revenue, operating_expenses, non_operating, tax, eps, shares, " + f"operating_activities, investing_activities, financing_activities, " + f"cash_reconciliation, supplemental_disclosures, other\n\n" + f"Labels to translate:\n" + f"{json.dumps(unique_labels, ensure_ascii=False, indent=1)}\n\n" + f"Return ONLY a JSON object:\n" + f' {{"translations": [{{"label": "original", "label_normalized": "English", ' + f'"canonical_key": "snake_case", "section": "section_name"}}]}}' + ) + + try: + client = _get_client() + response = client.chat.completions.create( + model=_DEPLOYMENT(), + response_format={"type": "json_object"}, + messages=[ + {"role": "system", "content": "You are a financial data expert. Return only valid JSON."}, + {"role": "user", "content": prompt}, + ], + temperature=0, + ) + result = json.loads(response.choices[0].message.content) + translations = result.get("translations", []) + except Exception as e: + print(f" [LLM] translation failed: {e}") + return {} + + lookup: dict[str, dict] = {} + for t in translations: + original = t.get("label", "") + key = _normalize_for_matching(original) + if not key: + continue + lookup[key] = { + "canonical_key": t.get("canonical_key", ""), + "label_normalized": t.get("label_normalized", ""), + "label_language": detect_label_language(original), + "section": t.get("section", "other"), + } + + return lookup + + +# --------------------------------------------------------------------------- +# Section-to-group mapping +# --------------------------------------------------------------------------- + +_SECTION_TO_GROUP = { + "current_assets": "assets", + "non_current_assets": "assets", + "assets": "assets", + "current_liabilities": "liabilities", + "non_current_liabilities": "liabilities", + "liabilities": "liabilities", + "equity": "equity", + "revenue": "revenue", + "operating_expenses": "expenses", + "non_operating": "profitability", + "tax": "tax", + "eps": "eps", + "shares": "shares", + "operating_activities": "operating_cash_flow", + "investing_activities": "investing_cash_flow", + "financing_activities": "financing_cash_flow", + "cash_reconciliation": "cash_reconciliation", + "supplemental_disclosures": "supplemental", +} + + +# --------------------------------------------------------------------------- +# Main enrichment function +# --------------------------------------------------------------------------- + +def enrich_row( + label_raw: str, + row_type: str, + lookup: dict[str, dict], + statement_type: str = "", +) -> dict: + """ + Enrich a single row with the best available data. + + Tries: CU extractor lookup -> LLM translation lookup -> fallback. + """ + key = _normalize_for_matching(label_raw) + match = lookup.get(key) + + if match and match.get("canonical_key"): + canonical_key = match["canonical_key"] + label_normalized = match.get("label_normalized") or None + label_language = match.get("label_language") or detect_label_language(label_raw) + section = match.get("section") or infer_section(label_raw, statement_type) + else: + # Fallback + canonical_key = _fallback_canonical_key(label_raw) + label_normalized = None + label_language = detect_label_language(label_raw) + section = infer_section(label_raw, statement_type) + + canonical_group = _SECTION_TO_GROUP.get(section, "other") + + return { + "canonical_key": canonical_key, + "label_normalized": label_normalized, + "label_language": label_language, + "section": section, + "canonical_group": canonical_group, + } + + +def enrich_all_rows( + labels: list[str], + row_types: list[str], + cu_lookup: dict[str, dict], + statement_type: str, +) -> list[dict]: + """ + Enrich all rows, using LLM translation for any labels not in the CU lookup. + + This is the main entry point. It: + 1. Checks each label against the CU lookup + 2. Collects unmatched non-English labels + 3. Sends them to the LLM for batch translation + 4. Merges LLM results into the lookup + 5. Returns enrichment for every row + """ + # First pass: identify unmatched labels + unmatched: list[str] = [] + for label in labels: + key = _normalize_for_matching(label) + match = cu_lookup.get(key) + if not match or not match.get("canonical_key"): + lang = detect_label_language(label) + if lang != "en" and label.strip(): + unmatched.append(label) + + # LLM batch translation for non-English unmatched labels + if unmatched: + print(f" [LLM] Translating {len(unmatched)} non-English labels...") + llm_translations = translate_labels_batch(unmatched, statement_type) + # Merge into the lookup + cu_lookup.update(llm_translations) + matched = sum(1 for l in unmatched if _normalize_for_matching(l) in llm_translations) + print(f" [LLM] Translated {matched}/{len(unmatched)} labels") + + # Second pass: enrich all rows + results = [] + for label, rtype in zip(labels, row_types): + enrichment = enrich_row(label, rtype, cu_lookup, statement_type) + results.append(enrichment) + + return results diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/excel_builder.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/excel_builder.py new file mode 100644 index 000000000..79fe27ee5 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/excel_builder.py @@ -0,0 +1,398 @@ +""" +Builds a formatted Excel workbook from approved Dataverse extraction data. +Uses openpyxl. Returns workbook bytes. +""" +import io +import logging +import re +from collections import defaultdict + +from openpyxl import Workbook +from openpyxl.styles import Font, Alignment, Border, Side, PatternFill, numbers +from openpyxl.utils import get_column_letter + +logger = logging.getLogger(__name__) + +# --- Style constants --- +HEADER_FONT = Font(name="Calibri", size=11, bold=True, color="FFFFFF") +HEADER_FILL = PatternFill(start_color="1a472a", end_color="1a472a", fill_type="solid") +SECTION_FONT = Font(name="Calibri", size=10, bold=True, color="1a472a") +SECTION_FILL = PatternFill(start_color="f1f0f7", end_color="f1f0f7", fill_type="solid") +TOTAL_FONT = Font(name="Calibri", size=10, bold=True) +TOTAL_BORDER = Border(top=Side(style="thin"), bottom=Side(style="double")) +NORMAL_FONT = Font(name="Calibri", size=10) +CORRECTED_FONT = Font(name="Calibri", size=10, color="6B4EE6") +ALT_ROW_FILL = PatternFill(start_color="fafaff", end_color="fafaff", fill_type="solid") +NUMBER_FORMAT = '#,##0.00' + +STATEMENT_TYPE_MAP = { + 833060000: "Income Statement", + 833060001: "Balance Sheet", + 833060002: "Cash Flow", +} + +ROW_TYPE_MAP = { + 833060000: "SectionHeader", + 833060001: "LineItem", + 833060002: "Subtotal", + 833060003: "Total", +} + + +def _get_display_value(item: dict) -> str | None: + """Get the value to display: analyst correction takes precedence.""" + corrected = item.get("cree1_analystcorrectedvalue") + if corrected: + return corrected + return item.get("cree1_valueraw") + + +def _try_numeric(val: str | None) -> float | str | None: + """Try to parse a display value as a number for Excel formatting.""" + if val is None: + return None + cleaned = val.replace(",", "").replace("$", "").replace("(", "-").replace(")", "").strip() + if not cleaned or cleaned == "\u2014": + return None + try: + return float(cleaned) + except ValueError: + return val + + +def build_excel(job: dict, statements: list[dict], line_items: list[dict]) -> bytes: + """Build an Excel workbook from Dataverse data and return as bytes.""" + wb = Workbook() + + company = job.get("cree1_companyname", "Financial Statements") + + # Group line items by statement record ID + items_by_stmt: dict[str, list[dict]] = defaultdict(list) + for item in line_items: + stmt_id = item.get("_cree1_extractedstatement_value", "") + if stmt_id: + items_by_stmt[stmt_id].append(item) + + sheet_created = False + + for stmt in statements: + stmt_id = stmt.get("cree1_extractedstatement1id", "") + stmt_type_code = stmt.get("cree1_statementtype", 833060000) + stmt_name = STATEMENT_TYPE_MAP.get(stmt_type_code, stmt.get("cree1_statementname", "Sheet")) + stmt_items = items_by_stmt.get(stmt_id, []) + + if not stmt_items: + continue + + if not sheet_created: + ws = wb.active + ws.title = stmt_name + sheet_created = True + else: + ws = wb.create_sheet(title=stmt_name) + + _build_statement_sheet(ws, company, stmt_name, stmt_items, job) + + # Add summary sheet at the beginning + if sheet_created: + _build_summary_sheet(wb, job, statements, line_items) + + if not sheet_created: + ws = wb.active + ws.title = "No Data" + ws["A1"] = "No extraction data available." + + buf = io.BytesIO() + wb.save(buf) + return buf.getvalue() + + +def _build_statement_sheet(ws, company: str, stmt_name: str, items: list[dict], job: dict | None = None): + """Build a single statement worksheet.""" + # Determine unique periods (columns) and period types + period_map: dict[int, str] = {} + period_type_map: dict[int, str] = {} + for item in items: + col_idx = item.get("cree1_columnindex", 0) + if col_idx not in period_map: + period_map[col_idx] = item.get("cree1_period", "") + period_type_map[col_idx] = item.get("cree1_periodtype", "") + sorted_periods = sorted(period_map.items(), key=lambda x: x[0]) + + # No delta columns in Excel (YoY/QoQ shown in grid only) + delta_pairs: list[tuple[int, int, str]] = [] + + num_period_cols = len(sorted_periods) + num_delta_cols = 0 + + # Title row + total_cols = 1 + num_period_cols + num_delta_cols + 1 # label + periods + deltas + audit + ws.merge_cells(start_row=1, start_column=1, end_row=1, end_column=total_cols) + title_cell = ws.cell(row=1, column=1, value=f"Consolidated {stmt_name}") + title_cell.font = Font(name="Calibri", size=14, bold=True, color="1a472a") + title_cell.alignment = Alignment(horizontal="left") + ws.row_dimensions[1].height = 28 + + # Currency/unit subtitle row + currency = "" + if job: + cur = job.get("cree1_currencyname") or "" + unit = job.get("cree1_currencyunitname") or "" + if cur or unit: + currency = f"{cur} in {unit.title()}" if cur and unit else cur or unit + if currency: + subtitle_cell = ws.cell(row=2, column=1, value=currency) + subtitle_cell.font = Font(name="Calibri", size=10, italic=True, color="64748b") + + # Header row + header_row = 4 + ws.cell(row=header_row, column=1, value="Line Item").font = HEADER_FONT + ws.cell(row=header_row, column=1).fill = HEADER_FILL + ws.cell(row=header_row, column=1).alignment = Alignment(horizontal="left") + + # Build column layout: periods interleaved with delta columns + col_layout: list[dict] = [] # list of {"type": "period"|"delta", ...} + delta_idx = 0 + for p_idx, (col_idx, period_label) in enumerate(sorted_periods): + col_layout.append({"type": "period", "col_idx": col_idx, "label": period_label}) + # Check if this period starts a delta pair + if delta_idx < len(delta_pairs) and delta_pairs[delta_idx][0] == col_idx: + col_layout.append({"type": "delta", "current": col_idx, "prior": delta_pairs[delta_idx][1], "label": delta_pairs[delta_idx][2]}) + delta_idx += 1 + + for i, col_def in enumerate(col_layout): + cell = ws.cell(row=header_row, column=2 + i, value=col_def["label"]) + cell.font = HEADER_FONT + cell.fill = HEADER_FILL + cell.alignment = Alignment(horizontal="right") + + # Audit column header + audit_col = 2 + len(col_layout) + audit_cell = ws.cell(row=header_row, column=audit_col, value="Corrections") + audit_cell.font = HEADER_FONT + audit_cell.fill = HEADER_FILL + audit_cell.alignment = Alignment(horizontal="left") + + # Group items by row index + rows_by_idx: dict[int, dict[int, dict]] = defaultdict(dict) + row_meta: dict[int, dict] = {} + for item in items: + ridx = item.get("cree1_rowindex", 0) + cidx = item.get("cree1_columnindex", 0) + rows_by_idx[ridx][cidx] = item + if ridx not in row_meta: + row_meta[ridx] = item + + sorted_row_indices = sorted(rows_by_idx.keys()) + + current_row = header_row + 1 + for data_row_num, ridx in enumerate(sorted_row_indices): + meta = row_meta[ridx] + row_type = ROW_TYPE_MAP.get(meta.get("cree1_rowtype", 833060001), "LineItem") + label = meta.get("cree1_lineitemname", "") or meta.get("cree1_labelraw", "") + label = re.sub(r'\s*[((](?:Loss|Losses?)\s+(?:shown|indicated)\s+as\s+["\'-]+[))]', "", label, flags=re.IGNORECASE) + indent = meta.get("cree1_indentlevel", 0) + + # Label cell + label_cell = ws.cell(row=current_row, column=1, value=label) + label_cell.alignment = Alignment(indent=indent) + + total_data_cols = len(col_layout) + if row_type == "SectionHeader": + label_cell.font = SECTION_FONT + for c in range(1, 2 + total_data_cols + 1): + ws.cell(row=current_row, column=c).fill = SECTION_FILL + elif row_type in ("Total", "Subtotal"): + label_cell.font = TOTAL_FONT + for c in range(1, 2 + total_data_cols): + ws.cell(row=current_row, column=c).border = TOTAL_BORDER + else: + label_cell.font = NORMAL_FONT + if data_row_num % 2 == 1: + for c in range(1, 2 + total_data_cols + 1): + ws.cell(row=current_row, column=c).fill = ALT_ROW_FILL + + # Value cells + delta cells + audit trail + corrections = [] + for i, col_def in enumerate(col_layout): + value_cell = ws.cell(row=current_row, column=2 + i) + + if row_type == "SectionHeader": + continue + + if col_def["type"] == "period": + col_idx = col_def["col_idx"] + item = rows_by_idx[ridx].get(col_idx) + if item: + display = _get_display_value(item) + numeric = _try_numeric(display) + + if isinstance(numeric, float): + value_cell.value = numeric + value_cell.number_format = NUMBER_FORMAT + elif numeric is not None: + value_cell.value = numeric + else: + value_cell.value = "\u2014" + + value_cell.alignment = Alignment(horizontal="right") + + corrected = item.get("cree1_analystcorrectedvalue") + original = item.get("cree1_valueraw") + is_corrected = corrected and corrected != original + if is_corrected: + value_cell.font = CORRECTED_FONT + period_label = period_map.get(col_idx, "") + corrections.append(f"{period_label}: {original} -> {corrected}") + elif row_type in ("Total", "Subtotal"): + value_cell.font = TOTAL_FONT + else: + value_cell.font = NORMAL_FONT + + elif col_def["type"] == "delta": + curr_item = rows_by_idx[ridx].get(col_def["current"]) + prior_item = rows_by_idx[ridx].get(col_def["prior"]) + curr_val = curr_item.get("cree1_valuenormalized") if curr_item else None + prior_val = prior_item.get("cree1_valuenormalized") if prior_item else None + + if curr_val is not None and prior_val is not None: + try: + delta = float(curr_val) - float(prior_val) + value_cell.value = delta + value_cell.number_format = NUMBER_FORMAT + value_cell.alignment = Alignment(horizontal="right") + if delta < 0: + value_cell.font = Font(name="Calibri", size=10, color="dc2626") + else: + value_cell.font = Font(name="Calibri", size=10, color="16a34a") + except (TypeError, ValueError): + pass + + # Write audit column + if corrections: + audit_cell = ws.cell(row=current_row, column=audit_col, value="; ".join(corrections)) + audit_cell.font = Font(name="Calibri", size=9, color="6B4EE6", italic=True) + + current_row += 1 + + # Column widths + ws.column_dimensions["A"].width = 40 + for i in range(len(col_layout)): + ws.column_dimensions[get_column_letter(2 + i)].width = 18 + ws.column_dimensions[get_column_letter(audit_col)].width = 35 + + +def _build_summary_sheet(wb: Workbook, job: dict, statements: list[dict], line_items: list[dict]): + """Add a Summary sheet at the beginning of the workbook.""" + ws = wb.create_sheet("Summary", 0) + + company = job.get("cree1_companyname", "") + ws.cell(row=1, column=1, value=f"{company} — Extraction Summary") + ws.cell(row=1, column=1).font = Font(name="Calibri", size=14, bold=True, color="1a472a") + + # --- Extraction metadata --- + meta_rows = [ + ("Job ID", job.get("cree1_jobid", "")), + ("File", job.get("cree1_filename", "")), + ("Statements Found", job.get("cree1_statementsfound", "")), + ("Total Line Items", job.get("cree1_totallineitems", "")), + ("Avg Confidence", f"{(job.get('cree1_avgconfidence') or 0) * 100:.1f}%"), + ("Status", "Approved"), + ] + + for i, (label, value) in enumerate(meta_rows): + ws.cell(row=3 + i, column=1, value=label).font = Font(name="Calibri", size=10, bold=True) + ws.cell(row=3 + i, column=2, value=str(value)).font = Font(name="Calibri", size=10) + + # --- Extraction status table (which statements, pages, status) --- + table_start = 3 + len(meta_rows) + 1 + ws.cell(row=table_start, column=1, value="Extraction Status").font = Font(name="Calibri", size=12, bold=True, color="1a472a") + + header_row = table_start + 1 + for col_idx, header in enumerate(["Financial Statement", "Source Pages", "Extraction Status", "Coverage", "Notes"], 1): + cell = ws.cell(row=header_row, column=col_idx, value=header) + cell.font = HEADER_FONT + cell.fill = HEADER_FILL + + # Build statement status rows — keyed on integer type code so lookups below work correctly + stmt_by_type: dict[int, dict] = {} + for s in statements: + code = s.get("cree1_statementtype") + if code is not None: + stmt_by_type[code] = s + + # Count line items per statement + items_per_stmt: dict[str, int] = {} + for item in line_items: + sid = item.get("_cree1_extractedstatement_value", "") + items_per_stmt[sid] = items_per_stmt.get(sid, 0) + 1 + + stmt_display = [ + (833060001, "Balance Sheet"), + (833060000, "Income Statement"), + (833060002, "Cash Flow"), + ] + + current_row = header_row + 1 + for type_code, display_name in stmt_display: + stmt = stmt_by_type.get(type_code) + ws.cell(row=current_row, column=1, value=display_name).font = NORMAL_FONT + + if stmt: + # Pages + ps = stmt.get("cree1_pagerangestart") + pe = stmt.get("cree1_pagerangeend") + if ps and pe: + page_text = f"Pages {ps}-{pe}" if ps != pe else f"Page {ps}" + elif ps: + page_text = f"Page {ps}" + else: + page_text = "—" + ws.cell(row=current_row, column=2, value=page_text).font = NORMAL_FONT + + # Status + ws.cell(row=current_row, column=3, value="Extracted").font = Font(name="Calibri", size=10, color="16a34a") + + # Coverage + stmt_id = stmt.get("cree1_extractedstatement1id", "") + item_count = items_per_stmt.get(stmt_id, 0) + ws.cell(row=current_row, column=4, value=f"{item_count} line items").font = NORMAL_FONT + + # Notes + title = stmt.get("cree1_statementtitle", "") + if title: + ws.cell(row=current_row, column=5, value=title).font = Font(name="Calibri", size=10, italic=True, color="64748b") + else: + ws.cell(row=current_row, column=2, value="—").font = NORMAL_FONT + ws.cell(row=current_row, column=3, value="Not found").font = Font(name="Calibri", size=10, color="dc2626") + ws.cell(row=current_row, column=4, value="—").font = NORMAL_FONT + + current_row += 1 + + # --- Analyst corrections --- + current_row += 1 + corrected = [item for item in line_items if item.get("cree1_analystcorrectedvalue")] + ws.cell(row=current_row, column=1, value="Analyst Corrections").font = Font(name="Calibri", size=12, bold=True, color="6B4EE6") + current_row += 1 + + if corrected: + for col_idx, header in enumerate(["Line Item", "Period", "Original", "Corrected"], 1): + cell = ws.cell(row=current_row, column=col_idx, value=header) + cell.font = HEADER_FONT + cell.fill = HEADER_FILL + current_row += 1 + + for item in corrected: + ws.cell(row=current_row, column=1, value=item.get("cree1_lineitemname", "")).font = NORMAL_FONT + ws.cell(row=current_row, column=2, value=item.get("cree1_period", "")).font = NORMAL_FONT + ws.cell(row=current_row, column=3, value=item.get("cree1_valueraw", "")).font = NORMAL_FONT + ws.cell(row=current_row, column=4, value=item.get("cree1_analystcorrectedvalue", "")).font = CORRECTED_FONT + current_row += 1 + else: + ws.cell(row=current_row, column=1, value="No corrections made.").font = Font(name="Calibri", size=10, italic=True, color="64748b") + + ws.column_dimensions["A"].width = 30 + ws.column_dimensions["B"].width = 18 + ws.column_dimensions["C"].width = 18 + ws.column_dimensions["D"].width = 18 + ws.column_dimensions["E"].width = 50 diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/excel_endpoint.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/excel_endpoint.py new file mode 100644 index 000000000..27d7cf7b8 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/excel_endpoint.py @@ -0,0 +1,219 @@ +""" +extractor/excel_endpoint.py +---------------------------- +Business logic for the /generate-excel endpoint. + +Responsibilities: + - Load extraction result from blob storage + - Normalize key conventions (camelCase <-> snake_case) + - Build professional Excel workbook via excel_formatter + - Upload Excel to blob storage with SAS download URL + - Build Adaptive Card with download link + +Separated from function_app.py for single-responsibility. + +Public API: + handle_generate_excel(job_id, fx_params) -> (status_code, response_dict) +""" + +import json +import logging +import os +import re +import tempfile +from datetime import datetime, timedelta, timezone +from urllib.parse import quote as url_quote + +from extractor.job_store import load_job, get_container, STATEMENT_TYPES + +logger = logging.getLogger(__name__) + + +def handle_generate_excel( + job_id: str, + fx_target_currency: str | None = None, + fx_spot_rate: float | None = None, + fx_avg_rate: float | None = None, + fx_rate_date: str = "", +) -> tuple[int, dict]: + """Generate Excel from a completed extraction job. + + Args: + job_id: The extraction job ID (blob key). + fx_target_currency: Optional target currency for FX conversion. + fx_spot_rate: Spot rate for balance sheet items. + fx_avg_rate: Average rate for IS/CF items. + fx_rate_date: Date of the FX rates. + + Returns: + (status_code, response_dict) tuple ready for HTTP response. + """ + from extractor.excel_formatter import build_professional_excel + + # --- Load job from blob --- + blob_data = load_job(job_id) + if not blob_data or blob_data.get("status") != "completed": + return 404, {"error": f"Job {job_id} not found or not completed"} + + result = blob_data.get("result", {}) + company = result.get("companyName", "Financial_Statements") + + # --- Parse JSON string fields --- + for key in ["summary", "balanceSheet", "incomeStatement", "cashFlow", + "confidence", "balance_sheet", "income_statement", "cash_flow"]: + val = result.get(key) + if isinstance(val, str): + try: + result[key] = json.loads(val) + except (json.JSONDecodeError, TypeError): + pass + + # --- Normalize camelCase -> snake_case --- + if "balanceSheet" in result and "balance_sheet" not in result: + result["balance_sheet"] = result["balanceSheet"] + if "incomeStatement" in result and "income_statement" not in result: + result["income_statement"] = result["incomeStatement"] + if "cashFlow" in result and "cash_flow" not in result: + result["cash_flow"] = result["cashFlow"] + + title = f"{company} — Financial Statement Extraction" + + # --- Build Excel --- + with tempfile.NamedTemporaryFile(suffix=".xlsx", delete=False) as tmp: + tmp_path = tmp.name + + build_professional_excel( + result, tmp_path, title=title, + fx_target_currency=fx_target_currency, + fx_spot_rate=fx_spot_rate, + fx_avg_rate=fx_avg_rate, + fx_rate_date=fx_rate_date, + ) + + with open(tmp_path, "rb") as f: + excel_bytes = f.read() + os.unlink(tmp_path) + + # --- Sanitize filename (ASCII for Content-Disposition, UTF-8 fallback) --- + safe_company = re.sub(r'[^\x20-\x7E]', '', company).strip() or "Financial_Statements" + filename = f"{safe_company.replace(' ', '_')}_Review.xlsx" + + logger.info(f"generate-excel: built workbook ({len(excel_bytes)} bytes)") + + # --- Upload to blob storage --- + from azure.storage.blob import ContentSettings + container = get_container() + blob_name = f"excel/{job_id}/{filename}" + blob_client = container.get_blob_client(blob_name) + + ascii_disp = f'attachment; filename="{filename}"' + utf8_filename = url_quote(f"{company.replace(' ', '_')}_Review.xlsx") + content_disp = f"{ascii_disp}; filename*=UTF-8''{utf8_filename}" + + blob_client.upload_blob( + excel_bytes, + overwrite=True, + content_settings=ContentSettings( + content_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", + content_disposition=content_disp, + ), + ) + + # --- Generate SAS download URL (24-hour expiry) --- + download_url = _generate_sas_url(container, blob_name, blob_client) + + # --- Build Adaptive Card for CPS display --- + card = _build_excel_card(company, filename, download_url) + + return 200, { + "fileUrl": download_url, + "fileName": filename, + "cardJson": json.dumps(card, ensure_ascii=False), + } + + +def _generate_sas_url(container, blob_name: str, blob_client) -> str: + """Generate a SAS download URL for the uploaded Excel blob. + + Tries connection-string account key first, falls back to + user delegation key for Managed Identity auth. + """ + from azure.storage.blob import generate_blob_sas, BlobSasPermissions + + expiry = datetime.now(timezone.utc) + timedelta(hours=24) + start = datetime.now(timezone.utc) - timedelta(minutes=5) + + # Try connection-string account key + conn_str = os.environ.get("AzureWebJobsStorage", "") + account_key = None + for part in conn_str.split(";"): + if part.startswith("AccountKey="): + account_key = part[len("AccountKey="):] + break + + if account_key: + sas = generate_blob_sas( + account_name=container.account_name, + container_name=container.container_name, + blob_name=blob_name, + account_key=account_key, + permission=BlobSasPermissions(read=True), + expiry=expiry, + ) + else: + # Managed Identity — user delegation key + from azure.identity import ManagedIdentityCredential + from azure.storage.blob import BlobServiceClient + + blob_uri = os.environ["AzureWebJobsStorage__blobServiceUri"] + client_id = os.environ.get("AzureWebJobsStorage__clientId") + credential = ManagedIdentityCredential(client_id=client_id) + service = BlobServiceClient(blob_uri, credential=credential) + udk = service.get_user_delegation_key(start, expiry) + sas = generate_blob_sas( + account_name=container.account_name, + container_name=container.container_name, + blob_name=blob_name, + user_delegation_key=udk, + permission=BlobSasPermissions(read=True), + expiry=expiry, + start=start, + ) + + return f"{blob_client.url}?{sas}" + + +def _build_excel_card(company: str, filename: str, download_url: str) -> dict: + """Build an Adaptive Card with Excel download link for CPS display.""" + return { + "type": "AdaptiveCard", + "$schema": "http://adaptivecards.io/schemas/adaptive-card.json", + "version": "1.5", + "body": [ + { + "type": "ColumnSet", + "columns": [ + { + "type": "Column", "width": "auto", + "items": [{"type": "Image", "url": "https://cdn-icons-png.flaticon.com/512/732/732220.png", "size": "Small"}], + }, + { + "type": "Column", "width": "stretch", + "items": [{"type": "TextBlock", "text": "Excel Report Ready", "weight": "Bolder", "size": "Large", "wrap": True}], + "verticalContentAlignment": "Center", + }, + ], + }, + { + "type": "FactSet", + "facts": [ + {"title": "Company", "value": company}, + {"title": "File", "value": filename}, + ], + }, + {"type": "TextBlock", "text": "Link expires in 24 hours.", "size": "Small", "isSubtle": True, "wrap": True, "spacing": "Medium"}, + ], + "actions": [ + {"type": "Action.OpenUrl", "title": "Download Excel", "url": download_url, "style": "positive"}, + ], + } diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/excel_formatter.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/excel_formatter.py new file mode 100644 index 000000000..478b4c6df --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/excel_formatter.py @@ -0,0 +1,724 @@ +""" +Professional Excel formatter for financial statement extraction output. + +Generates beautifully formatted Excel workbooks matching institutional +financial reporting standards: + - Dark navy header bars with white text + - Teal subtitle rows with period/currency info + - Bold italic section headers with bilingual labels + - Red negative values + - Color-coded sheet tabs + - Freeze panes on header rows + - Conditional number formatting +""" +import re +import openpyxl.workbook.properties +from openpyxl import Workbook +from openpyxl.styles import ( + Font, PatternFill, Alignment, Border, Side, numbers, +) +from openpyxl.utils import get_column_letter + +from .stages.enrich import translate_column_header + + +# --------------------------------------------------------------------------- +# Color palette +# --------------------------------------------------------------------------- + +NAVY = "1B2A4A" +TEAL = "2E6B77" +DARK_HEADER = "333333" +WHITE = "FFFFFF" +LIGHT_GRAY = "F5F5F5" +RED = "CC0000" +LIGHT_BLUE = "DCE6F1" + +# Sheet tab colors +TAB_COLORS = { + "balance_sheet": "339966", # Green + "income_statement": "CC3333", # Red + "cash_flow": "E67300", # Orange +} + +# --------------------------------------------------------------------------- +# Reusable styles +# --------------------------------------------------------------------------- + +FONT_TITLE = Font(name="Calibri", size=14, bold=True, color=WHITE) +FONT_SUBTITLE = Font(name="Calibri", size=11, italic=True, color=WHITE) +FONT_COL_HEADER = Font(name="Calibri", size=11, bold=True, color=WHITE) +FONT_SECTION = Font(name="Calibri", size=11, bold=True, italic=True) +FONT_NORMAL = Font(name="Calibri", size=11) +FONT_BOLD = Font(name="Calibri", size=11, bold=True) +FONT_RED = Font(name="Calibri", size=11, color=RED) +FONT_RED_BOLD = Font(name="Calibri", size=11, bold=True, color=RED) +FONT_SUMMARY_HEADER = Font(name="Calibri", size=11, bold=True, color=WHITE) +FONT_NOTES = Font(name="Calibri", size=10, italic=True) + +FILL_NAVY = PatternFill(start_color=NAVY, end_color=NAVY, fill_type="solid") +FILL_TEAL = PatternFill(start_color=TEAL, end_color=TEAL, fill_type="solid") +FILL_DARK = PatternFill(start_color=DARK_HEADER, end_color=DARK_HEADER, fill_type="solid") +FILL_LIGHT_BLUE = PatternFill(start_color=LIGHT_BLUE, end_color=LIGHT_BLUE, fill_type="solid") +FILL_LIGHT_GRAY = PatternFill(start_color=LIGHT_GRAY, end_color=LIGHT_GRAY, fill_type="solid") + +ALIGN_CENTER = Alignment(horizontal="center", vertical="center") +ALIGN_LEFT = Alignment(horizontal="left", vertical="center", wrap_text=True) +ALIGN_RIGHT = Alignment(horizontal="right", vertical="center") + +BORDER_BOTTOM = Border(bottom=Side(style="thin")) +BORDER_TOP_BOTTOM = Border(top=Side(style="thin"), bottom=Side(style="double")) + +NUMBER_FORMAT = '#,##0.00' + + +# --------------------------------------------------------------------------- +# Summary sheet +# --------------------------------------------------------------------------- + +def _build_summary_sheet(ws, result: dict, title: str, fx_config: dict | None = None): + """Build a professional summary sheet.""" + summary = result.get("summary", []) + + # Title bar (merged across 4 columns) + ws.merge_cells("A1:D1") + cell = ws["A1"] + cell.value = title + cell.font = FONT_TITLE + cell.fill = FILL_NAVY + cell.alignment = ALIGN_CENTER + for col in range(1, 5): + ws.cell(row=1, column=col).fill = FILL_NAVY + + # Subtitle + ws.merge_cells("A2:D2") + # Build subtitle from first statement's metadata + subtitle_parts = [] + for stype in ["balance_sheet", "income_statement", "cash_flow"]: + doc = result.get(stype) + if doc and isinstance(doc, dict): + sm = doc.get("statement_metadata", {}) + if sm.get("currency"): + subtitle_parts.append(f"Currency: {sm['currency']}") + break + cell = ws["A2"] + cell.value = " | ".join(subtitle_parts) if subtitle_parts else "" + cell.font = FONT_SUBTITLE + cell.fill = FILL_TEAL + cell.alignment = ALIGN_CENTER + for col in range(1, 5): + ws.cell(row=2, column=col).fill = FILL_TEAL + + # Column headers + headers = ["Statement", "Description", "Pages in Report", "Sheet Reference"] + for ci, h in enumerate(headers, start=1): + cell = ws.cell(row=3, column=ci, value=h) + cell.font = FONT_SUMMARY_HEADER + cell.fill = FILL_DARK + cell.alignment = ALIGN_CENTER + + # Statement rows + STYPE_NAMES = { + "balance_sheet": ("Consolidated Balance Sheet", "Assets, liabilities and equity"), + "income_statement": ("Consolidated Income Statement", "Revenue and profit"), + "cash_flow": ("Consolidated Cash Flow", "Cash flows"), + } + + row = 4 + for s in summary: + stype = s.get("statement_type", "") + names = STYPE_NAMES.get(stype, (stype, "")) + pr = s.get("page_range", {}) + pages = f"{pr.get('start', '?')}-{pr.get('end', '?')}" if pr.get("start") else "-" + quality = s.get("quality_score") + + ws.cell(row=row, column=1, value=names[0]).font = FONT_NORMAL + ws.cell(row=row, column=2, value=names[1]).font = FONT_NORMAL + c_pages = ws.cell(row=row, column=3, value=pages) + c_pages.font = FONT_NORMAL + c_pages.alignment = ALIGN_CENTER + + sheet_ref = names[0].split()[-2] + " " + names[0].split()[-1] if len(names[0].split()) > 2 else names[0] + ws.cell(row=row, column=4, value=sheet_ref).font = FONT_NORMAL + ws.cell(row=row, column=4).alignment = ALIGN_CENTER + row += 1 + + # Notes section + row += 1 + ws.cell(row=row, column=1, value="Notes:").font = Font(name="Calibri", size=10, bold=True) + ws.cell(row=row, column=1).fill = FILL_LIGHT_BLUE + for col in range(2, 5): + ws.cell(row=row, column=col).fill = FILL_LIGHT_BLUE + row += 1 + + # Get currency info from first available statement + currency = "" + currency_symbol = "" + unit = "" + for stype in ["balance_sheet", "income_statement", "cash_flow"]: + doc = result.get(stype) + if doc and isinstance(doc, dict): + sm = doc.get("statement_metadata", {}) + currency = sm.get("currency", "") + currency_symbol = sm.get("currency_symbol", "") + unit = sm.get("unit", "") + break + + # Currency and unit + currency_display = f"{currency} ({currency_symbol})" if currency_symbol else currency + if currency: + ws.cell(row=row, column=1, value=f"Reporting Currency: {currency_display}").font = FONT_NOTES + row += 1 + if unit: + ws.cell(row=row, column=1, value=f"Unit: {unit.capitalize()}").font = FONT_NOTES + row += 1 + + # FX methodology statement + ws.cell(row=row, column=1, + value=f"All figures are presented in {currency_display or 'the original reporting currency'} " + f"as reported in the source document. No foreign exchange conversion has been applied." + ).font = FONT_NOTES + row += 1 + + # FX methodology statement + if fx_config: + sc = fx_config["source_currency"] + tc = fx_config["target_currency"] + spot = fx_config["spot_rate"] + avg = fx_config["avg_rate"] + rd = fx_config.get("rate_date", "") + src = fx_config.get("rate_source", "") + ws.cell(row=row, column=1, + value=f"FX Conversion: Balance Sheet items converted at closing spot rate " + f"{sc}/{tc} {spot:.4f} as at {rd}. Income Statement and Cash Flow items " + f"converted at period average rate {sc}/{tc} {avg:.4f}. " + f"Source: {src}. Rates are editable in each sheet — modify the rate cell to recalculate." + ).font = FONT_NOTES + row += 1 + else: + ws.cell(row=row, column=1, + value=f"All figures are presented in {currency_display or 'the original reporting currency'} " + f"as reported in the source document. No foreign exchange conversion has been applied." + ).font = FONT_NOTES + row += 1 + + # Analyst corrections note + corrections_count = 0 + for stype in ["balance_sheet", "income_statement", "cash_flow"]: + doc = result.get(stype) + if doc and isinstance(doc, dict): + for r in doc.get("rows", []): + for v in r.get("values", []): + if v.get("corrected"): + corrections_count += 1 + if corrections_count > 0: + ws.cell(row=row, column=1, + value=f"Analyst Corrections: {corrections_count} value(s) were manually adjusted during HITL review." + ).font = Font(name="Calibri", size=10, italic=True, bold=True, color="CC3333") + row += 1 + + ws.cell(row=row, column=1, value="All figures as reported; no rounding applied.").font = FONT_NOTES + row += 1 + ws.cell(row=row, column=1, value="Prepared by Microsoft Copilot from source PDF.").font = FONT_NOTES + + # Column widths + ws.column_dimensions["A"].width = 35 + ws.column_dimensions["B"].width = 45 + ws.column_dimensions["C"].width = 18 + ws.column_dimensions["D"].width = 20 + + +# --------------------------------------------------------------------------- +# Statement sheet +# --------------------------------------------------------------------------- + +def _build_statement_sheet(ws, doc: dict, stype: str, fx_config: dict | None = None): + """Build a professionally formatted statement sheet.""" + sm = doc.get("statement_metadata", {}) + cols = doc.get("columns", []) + rows = doc.get("rows", []) + + # Extract currency info early (needed for FX column headers) + currency = sm.get("currency", "") + currency_symbol = sm.get("currency_symbol", "") + unit = sm.get("unit", "") + + # Determine value columns (skip label column) + value_col_headers = [] + for col in cols: + label = col.get("label", "") + translated = translate_column_header(label) + if translated.lower() in ("item", "") or label == "項目": + continue + value_col_headers.append(translated) + + # If FX conversion, double the value columns (original + converted) + if fx_config: + fx_col_headers = [] + for h in value_col_headers: + fx_col_headers.append(f"{h} ({currency})") + fx_col_headers.append(f"{h} ({fx_config['target_currency']})") + display_col_headers = fx_col_headers + else: + display_col_headers = value_col_headers + + total_cols = 2 + len(display_col_headers) # Chinese + English + value cols + last_col_letter = get_column_letter(total_cols) + + # Row 1: Title bar + ws.merge_cells(f"A1:{last_col_letter}1") + title_raw = sm.get("statement_title_raw", "") + title_en = sm.get("statement_title", "") + title_text = f"{title_en} ({title_raw})" if title_raw and title_raw != title_en else title_en + cell = ws["A1"] + cell.value = title_text + cell.font = FONT_TITLE + cell.fill = FILL_NAVY + cell.alignment = ALIGN_CENTER + for ci in range(1, total_cols + 1): + ws.cell(row=1, column=ci).fill = FILL_NAVY + + # Row 2: Subtitle + ws.merge_cells(f"A2:{last_col_letter}2") + currency_display = f"{currency} ({currency_symbol})" if currency_symbol else currency + + if fx_config: + tc = fx_config["target_currency"] + rate = fx_config["applied_rate"] + rate_type = fx_config["rate_type"] + subtitle = f"Original: {currency_display} | Converted to: {tc} | Rate: {rate:.4f} ({rate_type}) | Unit: {unit.capitalize()}" + else: + subtitle = f"Reporting Currency: {currency_display} | Unit: {unit.capitalize()}" if currency else "" + cell = ws["A2"] + cell.value = subtitle + cell.font = FONT_SUBTITLE + cell.fill = FILL_TEAL + cell.alignment = ALIGN_CENTER + for ci in range(1, total_cols + 1): + ws.cell(row=2, column=ci).fill = FILL_TEAL + + # Row 3: Column headers + col_headers = ["Item (Chinese)", "Item (English)"] + display_col_headers + for ci, h in enumerate(col_headers, start=1): + cell = ws.cell(row=3, column=ci, value=h) + cell.font = FONT_COL_HEADER + cell.fill = FILL_DARK + cell.alignment = ALIGN_CENTER + + # FX rate cell (editable by analyst) — placed to the right of data columns + rate_cell_col = total_cols + 2 # 2 columns gap after data + rate_cell_row = 2 + if fx_config: + rate_label_cell = ws.cell(row=1, column=rate_cell_col, value="FX Rate →") + rate_label_cell.font = Font(name="Calibri", size=10, bold=True, color="CC3333") + rate_cell = ws.cell(row=rate_cell_row, column=rate_cell_col, value=fx_config["applied_rate"]) + rate_cell.font = Font(name="Calibri", size=12, bold=True, color="CC3333") + rate_cell.number_format = "0.000000" + rate_cell_ref = f"${get_column_letter(rate_cell_col)}${rate_cell_row}" + + # Freeze panes below header + ws.freeze_panes = "A4" + + # Data rows + for ri, row in enumerate(rows, start=4): + label_raw = row.get("label_raw", "") + label_norm = row.get("label_normalized") or label_raw + row_type = row.get("row_type", "line_item") + indent = row.get("indent_level", 0) + + # Clean label noise + label_norm = re.sub( + r'\s*[(\uff08](?:Loss|Losses?)\s+(?:shown|indicated)\s+as.*?[)\uff09]', + '', label_norm, flags=re.IGNORECASE, + ) + + # Section header: bold italic, bilingual, merged look + if row_type == "section_header": + display = f"{label_norm.upper()} ({label_raw})" if label_raw and label_raw != label_norm else label_norm.upper() + cell_a = ws.cell(row=ri, column=1, value=display) + cell_a.font = FONT_SECTION + ws.cell(row=ri, column=2).value = "" + continue + + # Chinese label + cell_a = ws.cell(row=ri, column=1, value=label_raw) + cell_a.font = FONT_NORMAL + if indent > 0: + cell_a.alignment = Alignment(indent=indent * 2) + + # English label + is_total = row_type in ("subtotal", "total", "grand_total") + cell_b = ws.cell(row=ri, column=2, value=label_norm.upper() if is_total else label_norm) + cell_b.font = FONT_BOLD if is_total else FONT_NORMAL + if indent > 0 and not is_total: + cell_b.alignment = Alignment(indent=indent * 2) + + # Add border for totals + if is_total: + for ci in range(1, total_cols + 1): + if row_type == "grand_total": + ws.cell(row=ri, column=ci).border = BORDER_TOP_BOTTOM + else: + ws.cell(row=ri, column=ci).border = BORDER_BOTTOM + + # Values + values = row.get("values", []) + if fx_config: + # Dual columns: original (C, E, ...) + converted formula (D, F, ...) + col_offset = 3 + for vi in range(min(len(values), len(value_col_headers))): + val = values[vi] + normalized = val.get("normalized") + raw_val = val.get("raw") + + # Original value column + orig_col = col_offset + cell = ws.cell(row=ri, column=orig_col) + cell.alignment = ALIGN_RIGHT + if normalized is not None: + cell.value = normalized + cell.number_format = NUMBER_FORMAT + if normalized < 0: + cell.font = FONT_RED_BOLD if is_total else FONT_RED + else: + cell.font = FONT_BOLD if is_total else FONT_NORMAL + elif raw_val: + cell.value = raw_val + cell.font = FONT_BOLD if is_total else FONT_NORMAL + + # Converted value column (Excel formula referencing rate cell) + fx_col = col_offset + 1 + fx_cell = ws.cell(row=ri, column=fx_col) + fx_cell.alignment = ALIGN_RIGHT + fx_cell.number_format = NUMBER_FORMAT + if normalized is not None: + orig_ref = f"{get_column_letter(orig_col)}{ri}" + fx_cell.value = f"={orig_ref}*{rate_cell_ref}" + fx_cell.font = FONT_BOLD if is_total else FONT_NORMAL + # Apply light blue background to converted columns + fx_cell.fill = FILL_LIGHT_BLUE if not is_total else PatternFill() + + col_offset += 2 # Skip 2 (original + converted) + else: + # No FX: single value columns + for vi in range(min(len(values), len(value_col_headers))): + val = values[vi] + normalized = val.get("normalized") + raw_val = val.get("raw") + cell = ws.cell(row=ri, column=3 + vi) + cell.alignment = ALIGN_RIGHT + + if normalized is not None: + cell.value = normalized + cell.number_format = NUMBER_FORMAT + if normalized < 0: + cell.font = FONT_RED_BOLD if is_total else FONT_RED + else: + cell.font = FONT_BOLD if is_total else FONT_NORMAL + elif raw_val: + cell.value = raw_val + cell.font = FONT_BOLD if is_total else FONT_NORMAL + + # Column widths + ws.column_dimensions["A"].width = 30 + ws.column_dimensions["B"].width = 50 + for ci in range(len(display_col_headers)): + ws.column_dimensions[get_column_letter(3 + ci)].width = 22 + + # Tab color + if stype in TAB_COLORS: + ws.sheet_properties.tabColor = TAB_COLORS[stype] + + +# --------------------------------------------------------------------------- +# Margins & Ratios sheet +# --------------------------------------------------------------------------- + +# Canonical key aliases used to locate values across BS / IS / CF rows +_KEY_ALIASES = { + "revenue": ["total_operating_revenue", "revenue", "net_revenue"], + "cogs": ["cost_of_goods_sold", "operating_costs", "cost_of_revenue"], + "operating_profit": ["operating_profit", "operating_income"], + "net_profit": ["net_income", "net_profit", "profit_for_the_period"], + "total_assets": ["total_assets"], + "total_equity": ["total_equity", "total_owners_equity"], + "total_liabilities": ["total_liabilities"], + "current_assets": ["total_current_assets"], + "current_liabilities": ["total_current_liabilities"], + "inventories": ["inventories", "inventory"], +} + +# Ratio definitions organised by section +_RATIO_SECTIONS = [ + ("PROFITABILITY", [ + ("Gross Margin (%)", "pct", lambda v: _safe_div(v["revenue"] - v["cogs"], v["revenue"]) * 100, + ["revenue", "cogs"]), + ("EBIT Margin (%)", "pct", lambda v: _safe_div(v["operating_profit"], v["revenue"]) * 100, + ["operating_profit", "revenue"]), + ("Net Profit Margin (%)", "pct", lambda v: _safe_div(v["net_profit"], v["revenue"]) * 100, + ["net_profit", "revenue"]), + ("Return on Equity (ROE) (%)", "pct", lambda v: _safe_div(v["net_profit"], v["total_equity"]) * 100, + ["net_profit", "total_equity"]), + ("Return on Assets (ROA) (%)", "pct", lambda v: _safe_div(v["net_profit"], v["total_assets"]) * 100, + ["net_profit", "total_assets"]), + ]), + ("LIQUIDITY", [ + ("Current Ratio", "ratio", lambda v: _safe_div(v["current_assets"], v["current_liabilities"]), + ["current_assets", "current_liabilities"]), + ("Quick Ratio", "ratio", + lambda v: _safe_div(v["current_assets"] - v.get("inventories", 0), v["current_liabilities"]), + ["current_assets", "current_liabilities"]), + ]), + ("LEVERAGE", [ + ("Debt to Equity", "ratio", lambda v: _safe_div(v["total_liabilities"], v["total_equity"]), + ["total_liabilities", "total_equity"]), + ("Total Debt Ratio", "ratio", lambda v: _safe_div(v["total_liabilities"], v["total_assets"]), + ["total_liabilities", "total_assets"]), + ]), +] + + +def _safe_div(numerator, denominator): + """Return numerator / denominator, or None when division is impossible.""" + if numerator is None or denominator is None or denominator == 0: + return None + return numerator / denominator + + +def _collect_values(result: dict): + """Scan BS, IS, CF rows and return {period_index: {key: value}} maps. + + Each period_index corresponds to a column in the source data (0, 1, …). + Returns (period_values_dict, period_labels_list). + """ + # Gather all rows from all statement types + all_rows = [] + for stype in ["balance_sheet", "income_statement", "cash_flow"]: + doc = result.get(stype) + if doc and isinstance(doc, dict): + all_rows.extend(doc.get("rows", [])) + + # Determine period labels from first available statement's columns + period_labels: list[str] = [] + for stype in ["income_statement", "balance_sheet", "cash_flow"]: + doc = result.get(stype) + if doc and isinstance(doc, dict): + cols = doc.get("columns", []) + for col in cols: + label = col.get("label", "") + translated = translate_column_header(label) + if translated.lower() in ("item", "") or label == "項目": + continue + period_labels.append(translated) + if period_labels: + break + + # Build per-period value maps {period_idx: {metric_key: float}} + period_values: dict[int, dict[str, float]] = {} + for row in all_rows: + canonical = row.get("canonical_key", "") + if not canonical: + continue + for alias_key, aliases in _KEY_ALIASES.items(): + if canonical in aliases: + values = row.get("values", []) + for vi, val in enumerate(values): + normalized = val.get("normalized") + if normalized is not None: + period_values.setdefault(vi, {})[alias_key] = normalized + break # matched alias group + + return period_values, period_labels + + +def _format_ratio(value, fmt: str) -> str: + """Format a computed ratio value for display.""" + if value is None: + return "N/A" + if fmt == "pct": + return f"{value:.1f}%" + return f"{value:.2f}x" + + +def _build_margins_sheet(ws, result: dict): + """Build a Margins & Ratios computed sheet from extracted data.""" + period_values, period_labels = _collect_values(result) + + # If we have no period labels, create generic ones + if not period_labels: + if period_values: + period_labels = [f"Period {i+1}" for i in sorted(period_values.keys())] + else: + period_labels = ["Period 1"] + + num_periods = len(period_labels) + total_cols = 1 + num_periods # Metric + period columns + last_col_letter = get_column_letter(total_cols) + + # Row 1: Title bar + ws.merge_cells(f"A1:{last_col_letter}1") + cell = ws["A1"] + cell.value = "Margins & Ratios" + cell.font = FONT_TITLE + cell.fill = FILL_NAVY + cell.alignment = ALIGN_CENTER + for ci in range(1, total_cols + 1): + ws.cell(row=1, column=ci).fill = FILL_NAVY + + # Row 2: Subtitle + ws.merge_cells(f"A2:{last_col_letter}2") + cell = ws["A2"] + cell.value = "Computed from extracted financial statements" + cell.font = FONT_SUBTITLE + cell.fill = FILL_TEAL + cell.alignment = ALIGN_CENTER + for ci in range(1, total_cols + 1): + ws.cell(row=2, column=ci).fill = FILL_TEAL + + # Row 3: Column headers + col_headers = ["Metric"] + period_labels + for ci, h in enumerate(col_headers, start=1): + cell = ws.cell(row=3, column=ci, value=h) + cell.font = FONT_COL_HEADER + cell.fill = FILL_DARK + cell.alignment = ALIGN_CENTER + + ws.freeze_panes = "A4" + + # Data rows + current_row = 4 + for section_name, ratios in _RATIO_SECTIONS: + # Section header + cell = ws.cell(row=current_row, column=1, value=section_name) + cell.font = FONT_SECTION + cell.fill = FILL_LIGHT_GRAY + for ci in range(2, total_cols + 1): + ws.cell(row=current_row, column=ci).fill = FILL_LIGHT_GRAY + current_row += 1 + + for label, fmt, compute_fn, required_keys in ratios: + ws.cell(row=current_row, column=1, value=label).font = FONT_NORMAL + + for pi in range(num_periods): + pv = period_values.get(pi, {}) + # Check all required keys are present + has_all = all(k in pv for k in required_keys) + if has_all: + try: + value = compute_fn(pv) + except Exception: + value = None + else: + value = None + + display = _format_ratio(value, fmt) + cell = ws.cell(row=current_row, column=2 + pi, value=display) + cell.alignment = ALIGN_RIGHT + if display == "N/A": + cell.font = Font(name="Calibri", size=11, italic=True, color="999999") + elif fmt == "pct" and value is not None and value < 0: + cell.font = FONT_RED + else: + cell.font = FONT_NORMAL + + current_row += 1 + + # Blank row between sections + current_row += 1 + + # Column widths + ws.column_dimensions["A"].width = 35 + for ci in range(num_periods): + ws.column_dimensions[get_column_letter(2 + ci)].width = 22 + + # Purple tab color + ws.sheet_properties.tabColor = "6B4EE6" + + +# --------------------------------------------------------------------------- +# Public API +# --------------------------------------------------------------------------- + +def build_professional_excel( + result: dict, + output_path: str, + title: str = "Financial Statement Extraction", + fx_target_currency: str | None = None, + fx_spot_rate: float | None = None, + fx_avg_rate: float | None = None, + fx_rate_date: str | None = None, + fx_rate_source: str = "exchangerate.host", +): + """Generate a professionally formatted Excel workbook from extraction result. + + Args: + result: Pipeline output dict with summary + statement docs. + output_path: Path to save the .xlsx file. + title: Title for the summary sheet header. + fx_target_currency: Target currency for FX conversion (e.g., "AUD"). None = no conversion. + fx_spot_rate: Closing spot rate for BS items. + fx_avg_rate: Period average rate for IS/CF items. + fx_rate_date: Date of the FX rate. + fx_rate_source: Source of the rate (e.g., "exchangerate.host", "manual"). + """ + fx_config = None + if fx_target_currency and fx_spot_rate: + # Get source currency from first statement + source_currency = "" + for stype in ["balance_sheet", "income_statement", "cash_flow"]: + doc = result.get(stype) + if doc and isinstance(doc, dict): + source_currency = doc.get("statement_metadata", {}).get("currency", "") + if source_currency: + break + fx_config = { + "source_currency": source_currency, + "target_currency": fx_target_currency, + "spot_rate": fx_spot_rate, + "avg_rate": fx_avg_rate or fx_spot_rate, + "rate_date": fx_rate_date or "", + "rate_source": fx_rate_source, + } + + wb = Workbook() + + # Summary sheet + ws_summary = wb.active + ws_summary.title = "Summary" + _build_summary_sheet(ws_summary, result, title, fx_config=fx_config) + + # Statement sheets + SHEET_NAMES = { + "balance_sheet": "Balance Sheet", + "income_statement": "Income Statement", + "cash_flow": "Cash Flow", + } + + # Rate per statement type: BS uses spot, IS/CF use average + RATE_BY_STYPE = { + "balance_sheet": "spot_rate", + "income_statement": "avg_rate", + "cash_flow": "avg_rate", + } + + for stype, sheet_name in SHEET_NAMES.items(): + doc = result.get(stype) + if not doc or not isinstance(doc, dict): + continue + ws = wb.create_sheet(title=sheet_name) + rate_key = RATE_BY_STYPE[stype] + stype_fx = fx_config.copy() if fx_config else None + if stype_fx: + stype_fx["applied_rate"] = stype_fx[rate_key] + stype_fx["rate_type"] = "closing spot" if rate_key == "spot_rate" else "period average" + _build_statement_sheet(ws, doc, stype, fx_config=stype_fx) + + # Margins & Ratios computed sheet + ws_margins = wb.create_sheet(title="Margins & Ratios") + _build_margins_sheet(ws_margins, result) + + # Force Excel to recalculate all formulas on open + wb.calculation = openpyxl.workbook.properties.CalcProperties(fullCalcOnLoad=True) + + wb.save(output_path) + return output_path diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/extract_endpoints.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/extract_endpoints.py new file mode 100644 index 000000000..251e3b38a --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/extract_endpoints.py @@ -0,0 +1,343 @@ +""" +extractor/extract_endpoints.py +------------------------------- +Business logic for extraction endpoints: + - /extract — accept PDF upload, start async extraction job + - /extract-by-url — accept PDF URL, download and start extraction + - /extract/status — poll for job results + - /fx-rate — fetch FX conversion rates + +The extraction runs in a background thread. Results are stored in blob +storage via job_store. The CPS agent polls /extract/status until complete. + +Public API: + handle_extract(body, files, params, headers) -> (status_code, dict, headers) + handle_extract_by_url(body, params) -> (status_code, dict, headers) + handle_extract_status(job_id) -> (status_code, dict) + handle_fx_rate(params) -> (status_code, dict) + run_job(job_id, tmp_path, ...) — background thread target +""" + +import base64 +import json +import logging +import os +import tempfile +import threading +import uuid + +from extractor.job_store import save_job, load_job, delete_job, build_response_payload, STATEMENT_TYPES +from extractor.pipeline import run as run_pipeline +from extractor.stages.contracts import PipelineOptions + +logger = logging.getLogger(__name__) + + +# --------------------------------------------------------------------------- +# Background job runner +# --------------------------------------------------------------------------- + +def run_job( + job_id: str, + tmp_path: str, + use_enrichment: bool, + requested_types: list[str], + file_name: str, +): + """Run the 5-stage extraction pipeline in a background thread. + + Saves the completed result (or failure) to blob storage. Cleans up + the temp PDF file when done. + """ + try: + backend = os.environ.get("EXTRACTION_BACKEND", "cu") + options = PipelineOptions( + use_enrichment=use_enrichment, + requested_types=requested_types, + source_file_name=file_name, + backend=backend, + ) + result = run_pipeline(tmp_path, options) + payload = build_response_payload(result) + + save_job(job_id, {"status": "completed", "result": payload}) + logger.info(f"Job {job_id} extraction completed, result saved to blob storage") + + except Exception as e: + logger.exception(f"Job {job_id} failed") + save_job(job_id, {"status": "failed", "error": str(e)}) + finally: + try: + os.unlink(tmp_path) + except Exception: + pass + + +# --------------------------------------------------------------------------- +# /extract — PDF upload (multipart, JSON base64, or raw bytes) +# --------------------------------------------------------------------------- + +def handle_extract( + body: bytes, + files: dict, + form: dict, + params: dict, + headers: dict, +) -> tuple[int, dict, dict]: + """Handle PDF extraction request. Supports multipart, JSON, or raw bytes. + + Returns (status_code, response_body, response_headers). + """ + # Parse query params + use_enrichment = params.get("enrichment", "true").lower() != "false" + requested_types = params.get("statements", "") + if requested_types: + requested_types = [s.strip() for s in requested_types.split(",")] + else: + requested_types = list(STATEMENT_TYPES) + + content_type = headers.get("content-type", "").lower() + + # --- Get PDF bytes from request --- + if "multipart/form-data" in content_type: + uploaded_file = files.get("file") + if not uploaded_file: + return 400, {"error": "Missing 'file' in multipart form data"}, {} + pdf_bytes = uploaded_file.read() + file_name = form.get("fileName", uploaded_file.filename or "document.pdf") + logger.info(f"Received multipart file: {file_name} ({len(pdf_bytes)} bytes)") + + elif "application/json" in content_type or body[:1] == b"{": + payload = json.loads(body) + file_url = payload.get("fileUrl") or payload.get("contentUrl") + file_content = payload.get("fileContent") + file_name = payload.get("fileName", "document.pdf") + + if file_content: + if "," in file_content and file_content.index(",") < 100: + file_content = file_content.split(",", 1)[1] + pdf_bytes = base64.b64decode(file_content) + elif file_url: + import requests as http_requests + download_resp = http_requests.get(file_url, timeout=120) + download_resp.raise_for_status() + pdf_bytes = download_resp.content + else: + return 400, {"error": "Missing fileUrl or fileContent"}, {} + + elif body and len(body) > 100: + pdf_bytes = body + file_name = "uploaded.pdf" + + else: + return 400, {"error": "Send multipart file, JSON, or raw PDF bytes"}, {} + + # --- Start background job --- + with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp: + tmp.write(pdf_bytes) + tmp_path = tmp.name + + job_id = str(uuid.uuid4()) + save_job(job_id, {"status": "processing"}) + + thread = threading.Thread( + target=run_job, + args=(job_id, tmp_path, use_enrichment, requested_types, file_name), + daemon=True, + ) + thread.start() + logger.info(f"Job {job_id} started for {file_name}") + + return 202, {"jobId": job_id, "status": "processing"}, { + "Location": f"/api/extract/status/{job_id}", + "Retry-After": "5", + } + + +# --------------------------------------------------------------------------- +# /extract-by-url — URL-based extraction +# --------------------------------------------------------------------------- + +def handle_extract_by_url(body: dict, params: dict) -> tuple[int, dict, dict]: + """Handle URL-based extraction. Downloads the PDF then starts async job. + + Returns (status_code, response_body, response_headers). + """ + file_url = body.get("fileUrl") or body.get("contentUrl") + file_name = body.get("fileName", "document.pdf") + auth_token = body.get("authToken", "") + + if not file_url: + return 400, {"error": "Missing fileUrl"}, {} + + use_enrichment = params.get("enrichment", "true").lower() != "false" + requested_types = params.get("statements", "") + if requested_types: + requested_types = [s.strip() for s in requested_types.split(",")] + else: + requested_types = list(STATEMENT_TYPES) + + # Download PDF + import requests as http_requests + req_headers = {} + if auth_token: + req_headers["Authorization"] = f"Bearer {auth_token}" + download_resp = http_requests.get(file_url, headers=req_headers, timeout=120) + download_resp.raise_for_status() + pdf_bytes = download_resp.content + logger.info(f"Downloaded {file_name} from URL ({len(pdf_bytes)} bytes)") + + # Start background job + with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp: + tmp.write(pdf_bytes) + tmp_path = tmp.name + + job_id = str(uuid.uuid4()) + save_job(job_id, {"status": "processing"}) + + thread = threading.Thread( + target=run_job, + args=(job_id, tmp_path, use_enrichment, requested_types, file_name), + daemon=True, + ) + thread.start() + logger.info(f"Job {job_id} started for {file_name} (via URL)") + + return 202, {"jobId": job_id, "status": "processing"}, { + "Location": f"/api/extract/status/{job_id}", + "Retry-After": "5", + } + + +# --------------------------------------------------------------------------- +# /extract/status/{jobId} — poll for results +# --------------------------------------------------------------------------- + +def handle_extract_status(job_id: str) -> tuple[int, dict]: + """Poll for extraction job status. Returns result when complete. + + The status endpoint uses anonymous auth — the job ID itself acts as + the access token (unguessable UUID). + """ + job = load_job(job_id) + + if not job: + return 404, {"error": f"Job {job_id} not found"} + + if job["status"] == "processing": + # Return empty fields so the connector schema stays consistent + return 200, { + "status": "processing", + "companyName": "", "summary": "", + "balanceSheet": "", "incomeStatement": "", + "cashFlow": "", "confidence": "", + } + + if job["status"] == "completed": + result = job["result"] + result["status"] = "completed" + result["jobId"] = job_id + return 200, result + + # Failed + error = job.get("error", "Unknown error") + delete_job(job_id) + return 200, {"status": "failed", "error": error} + + +# --------------------------------------------------------------------------- +# /fx-rate — FX conversion rate lookup +# --------------------------------------------------------------------------- + +# In-memory cache: "CNY-AUD-2025-09-30" -> rate dict +_fx_rate_cache: dict[str, dict] = {} + + +def handle_fx_rate(params: dict) -> tuple[int, dict]: + """Fetch FX rates for currency conversion. + + Tries exchangerate.host first, falls back to open.er-api.com. + Caches results in memory for the function app lifetime. + """ + from_currency = params.get("from", "").upper() + to_currency = params.get("to", "").upper() + rate_date = params.get("date", "") + period_start = params.get("period_start", "") + + if not from_currency or not to_currency: + return 400, {"error": "Missing 'from' and 'to' query parameters"} + + if from_currency == to_currency: + return 200, { + "from": from_currency, "to": to_currency, + "spot_rate": 1.0, "average_rate": 1.0, + "date": rate_date, "source": "identity", + } + + # Check cache + cache_key = f"{from_currency}-{to_currency}-{rate_date}" + if cache_key in _fx_rate_cache: + logger.info(f"FX rate cache hit: {cache_key}") + return 200, _fx_rate_cache[cache_key] + + import httpx + + # Fetch spot rate + spot_url = ( + f"https://api.exchangerate.host/convert?from={from_currency}&to={to_currency}&date={rate_date}" + if rate_date else + f"https://api.exchangerate.host/convert?from={from_currency}&to={to_currency}" + ) + logger.info(f"Fetching FX spot rate: {spot_url}") + + with httpx.Client(timeout=10) as client: + spot_resp = client.get(spot_url) + spot_data = spot_resp.json() + + spot_rate = spot_data.get("result") or spot_data.get("info", {}).get("rate") + + if spot_rate is None: + # Fallback: exchangerate-api.com (free tier) + fallback_url = f"https://open.er-api.com/v6/latest/{from_currency}" + with httpx.Client(timeout=10) as client: + fb_resp = client.get(fallback_url) + fb_data = fb_resp.json() + spot_rate = fb_data.get("rates", {}).get(to_currency) + + if spot_rate is None: + return 502, {"error": f"Could not fetch rate for {from_currency}/{to_currency}"} + + # Average rate: compute from timeseries if period_start provided + average_rate = spot_rate + if period_start and rate_date: + try: + ts_url = f"https://api.exchangerate.host/timeseries?start_date={period_start}&end_date={rate_date}&source={from_currency}¤cies={to_currency}" + with httpx.Client(timeout=15) as client: + ts_resp = client.get(ts_url) + ts_data = ts_resp.json() + + ts_rates = ts_data.get("quotes", {}) or ts_data.get("rates", {}) + if ts_rates: + values = [] + for day_data in ts_rates.values(): + if isinstance(day_data, dict): + val = day_data.get(f"{from_currency}{to_currency}") or day_data.get(to_currency) + if val: + values.append(float(val)) + elif isinstance(day_data, (int, float)): + values.append(float(day_data)) + if values: + average_rate = round(sum(values) / len(values), 6) + logger.info(f"FX average rate computed from {len(values)} daily rates") + except Exception as e: + logger.warning(f"Could not compute average rate, using spot: {e}") + + result = { + "from": from_currency, "to": to_currency, + "spot_rate": round(float(spot_rate), 6), + "average_rate": round(float(average_rate), 6), + "date": rate_date, "source": "exchangerate.host", + } + + _fx_rate_cache[cache_key] = result + return 200, result diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/html_table_parser.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/html_table_parser.py new file mode 100644 index 000000000..01d25f453 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/html_table_parser.py @@ -0,0 +1,395 @@ +""" +extractor/html_table_parser.py +------------------------------- +Parses HTML tables from the prebuilt-documentAnalyzer markdown output +into structured rows, columns, and cells. + +The prebuilt-documentAnalyzer returns financial tables as clean HTML: +
+ ... + ... + ... +
Three Months Ended
20252024
Revenue$ 59,893
+ +This is dramatically cleaner than the \n\n-separated plain text from +prebuilt-read — no currency splitting, no column alignment issues, +no continuation fragments. +""" + +import re +from typing import Optional + + +def _strip_tags(html: str) -> str: + """Remove HTML tags and decode entities.""" + text = re.sub(r"<[^>]+>", "", html) + text = text.replace("&", "&").replace("<", "<").replace(">", ">") + text = text.replace(" ", " ").replace(" ", " ") + return text.strip() + + +def _parse_financial_value(raw: str) -> Optional[float]: + """Parse a financial value string to float.""" + s = raw.strip() + if not s: + return None + s = re.sub(r"^[\$\u00a5\u20ac\u00a3]\s*", "", s) + if not s: + return None + if re.match(r"^[\u2014\-\uff0d]+$", s): # dashes = zero + return 0.0 + neg = False + if s.startswith("(") and s.endswith(")"): + neg = True + s = s[1:-1] + s = s.replace(",", "") + try: + val = float(s) + return -val if neg else val + except ValueError: + return None + + +# --------------------------------------------------------------------------- +# Currency and unit extraction from column headers +# --------------------------------------------------------------------------- + +# Patterns for extracting currency + unit from column header text. +# Examples: "£000", "$'000", "€M", "¥百万", "RMB'000", "US$ millions" +_CURRENCY_UNIT_PATTERNS = [ + # Symbol + multiplier: "£000", "$'000", "€000", "¥千元" + (re.compile(r"^[\$£€¥]\s*[''']?\s*000\s*$"), {"$": "USD", "£": "GBP", "€": "EUR", "¥": "JPY"}, "thousands"), + (re.compile(r"^[\$£€¥]\s*[''']?\s*M\s*$", re.IGNORECASE), {"$": "USD", "£": "GBP", "€": "EUR", "¥": "JPY"}, "millions"), + (re.compile(r"^[\$£€¥]\s*[''']?\s*000[''']?\s*000\s*$"), {"$": "USD", "£": "GBP", "€": "EUR", "¥": "JPY"}, "millions"), + # ISO code + multiplier: "RMB'000", "USD'000", "GBP millions" + (re.compile(r"^(USD|GBP|EUR|JPY|CNY|RMB|HKD|SGD|AUD|CHF)\s*[''']?\s*000\s*$", re.IGNORECASE), None, "thousands"), + (re.compile(r"^(USD|GBP|EUR|JPY|CNY|RMB|HKD|SGD|AUD|CHF)\s*millions?\s*$", re.IGNORECASE), None, "millions"), + (re.compile(r"^(USD|GBP|EUR|JPY|CNY|RMB|HKD|SGD|AUD|CHF)\s*thousands?\s*$", re.IGNORECASE), None, "thousands"), + (re.compile(r"^(USD|GBP|EUR|JPY|CNY|RMB|HKD|SGD|AUD|CHF)\s*billions?\s*$", re.IGNORECASE), None, "billions"), +] + +# Currency symbol to ISO code mapping +_SYMBOL_TO_ISO = {"$": "USD", "£": "GBP", "€": "EUR", "¥": "JPY"} + +# Standalone currency patterns (just the symbol or code, no unit) +_CURRENCY_ONLY_PATTERNS = [ + re.compile(r"^[\$£€¥]$"), + re.compile(r"^(USD|GBP|EUR|JPY|CNY|RMB|HKD|SGD|AUD|CHF)$", re.IGNORECASE), +] + +# Unit patterns from parenthetical notes: "(In millions)", "(In thousands)" +_PAREN_UNIT_RE = re.compile( + r"\(\s*(?:in\s+)?(millions?|thousands?|billions?|百万|千|万)\s*\)", + re.IGNORECASE, +) + +# ISO code normalization +_ISO_NORMALIZE = { + "RMB": "CNY", +} + + +def extract_currency_and_unit( + header_rows: list[list[str]], + all_text: str = "", +) -> dict: + """ + Extract currency code and unit multiplier from table column headers + and surrounding text. + + Returns {"currency": str|None, "unit": str|None, "currency_symbol": str|None}. + + Examines: + 1. Column header cells (e.g. "£000", "RMB'000", "$M") + 2. Parenthetical notes in header rows (e.g. "(In millions)") + 3. Currency symbols appearing in data cells + """ + currency = None + unit = None + currency_symbol = None + + # Flatten all header cell text for scanning + all_header_text = " ".join( + cell for row in header_rows for cell in row if cell.strip() + ) + + # Strategy 1: Match column header cells against known patterns + for row in header_rows: + for cell in row: + cell_stripped = cell.strip() + if not cell_stripped: + continue + for pattern, symbol_map, unit_value in _CURRENCY_UNIT_PATTERNS: + m = pattern.match(cell_stripped) + if m: + unit = unit_value + if symbol_map: + # Pattern uses symbol — extract from first char + sym = cell_stripped[0] + currency = symbol_map.get(sym) + currency_symbol = sym + else: + # Pattern captured ISO code + iso = m.group(1).upper() + currency = _ISO_NORMALIZE.get(iso, iso) + currency_symbol = _currency_code_to_symbol(currency) + break + if currency and unit: + break + if currency and unit: + break + + # Strategy 2: Parenthetical unit in headers "(In millions)" + if not unit: + m = _PAREN_UNIT_RE.search(all_header_text) + if not m: + m = _PAREN_UNIT_RE.search(all_text[:2000]) + if m: + raw_unit = m.group(1).lower() + if "million" in raw_unit or "百万" in raw_unit: + unit = "millions" + elif "thousand" in raw_unit or "千" in raw_unit: + unit = "thousands" + elif "billion" in raw_unit: + unit = "billions" + elif "万" in raw_unit: + unit = "ten_thousands" + + # Strategy 3: Currency symbol in header text + if not currency: + for sym, iso in _SYMBOL_TO_ISO.items(): + if sym in all_header_text: + currency = iso + currency_symbol = sym + break + + # Strategy 4: ISO code in header text + if not currency: + iso_m = re.search( + r"\b(USD|GBP|EUR|JPY|CNY|RMB|HKD|SGD|AUD|CHF)\b", + all_header_text, re.IGNORECASE, + ) + if iso_m: + iso = iso_m.group(1).upper() + currency = _ISO_NORMALIZE.get(iso, iso) + currency_symbol = _currency_code_to_symbol(currency) + + return { + "currency": currency, + "unit": unit, + "currency_symbol": currency_symbol, + } + + +def _currency_code_to_symbol(code: str) -> str | None: + """Map ISO 4217 code to display symbol.""" + return { + "USD": "$", "GBP": "£", "EUR": "€", + "JPY": "¥", "CNY": "¥", + }.get(code) + + +# --------------------------------------------------------------------------- +# Table parsing +# --------------------------------------------------------------------------- + +def _parse_single_table( + table_html: str, +) -> tuple[list[list[str]], list[list[str]]]: + """ + Parse a single block into header_rows and data_rows. + + Returns (header_rows, data_rows) where each is a list of cell-value lists. + """ + tr_pattern = re.compile(r"]*>(.*?)", re.DOTALL | re.IGNORECASE) + td_pattern = re.compile(r"<(td|th)[^>]*>(.*?)", re.DOTALL | re.IGNORECASE) + colspan_pattern = re.compile(r'colspan="?(\d+)"?', re.IGNORECASE) + tag_start_pattern = re.compile(r"<(td|th)[^>]*>", re.IGNORECASE) + + header_rows: list[list[str]] = [] + data_rows: list[list[str]] = [] + + for tr_content in tr_pattern.findall(table_html): + cells_in_row = td_pattern.findall(tr_content) + is_header = any(tag.lower() == "th" for tag, _ in cells_in_row) + + row_values = [] + # We need to iterate through tag matches to get colspan from the + # opening tag, not from the overall tr_content. + tag_starts = list(tag_start_pattern.finditer(tr_content)) + cell_matches = list(td_pattern.finditer(tr_content)) + + for ci, cell_m in enumerate(cell_matches): + text = _strip_tags(cell_m.group(2)) + # Get colspan from the corresponding opening tag + opening_tag = tag_starts[ci].group(0) if ci < len(tag_starts) else "" + colspan_m = colspan_pattern.search(opening_tag) + span = int(colspan_m.group(1)) if colspan_m else 1 + row_values.append(text) + for _ in range(span - 1): + row_values.append("") + + if is_header: + header_rows.append(row_values) + else: + data_rows.append(row_values) + + return header_rows, data_rows + + +def parse_html_table( + markdown: str, + start_offset: int, + end_offset: int, +) -> tuple[list[str], list[str], list[dict]]: + """ + Parse HTML
(s) in the given markdown range. + + When multiple tables are present (page-break continuations), the FIRST + table's headers are used for column identification. Subsequent tables' + header rows are discarded (they're repeated page headers), and their + data rows are appended. + + Returns (rows, columns, cells) where: + - rows: list of label strings (one per data row) + - columns: list of column header strings + - cells: list of cell dicts compatible with the existing pipeline + + Also populates cells[*]["_table_currency_unit"] on the first cell with + extracted currency/unit metadata from column headers. + """ + section = markdown[start_offset:end_offset] + + # Find individual
...
blocks + table_matches = re.findall( + r"(.*?)
", section, re.DOTALL | re.IGNORECASE + ) + if not table_matches: + return [], [], [] + + # Parse each table separately — use first table's headers, merge data rows + primary_header_rows: list[list[str]] = [] + all_data_rows: list[list[str]] = [] + + for idx, table_html in enumerate(table_matches): + header_rows, data_rows = _parse_single_table(table_html) + if idx == 0: + # First table: keep its headers as the canonical column definitions + primary_header_rows = header_rows + # Append all data rows from every table + all_data_rows.extend(data_rows) + + # Build column headers from the primary (first) table's headers. + # When multiple header rows exist, prefer the one with year labels + # (e.g. "2025", "2024") over a unit-only row (e.g. "£000", "£000"). + # Financial tables often have two header rows: + # 20252024 ← year row (best for column labels) + # £000£000 ← unit row (used for currency/unit) + _YEAR_RE = re.compile(r"\b20\d{2}\b") + _UNIT_ONLY_RE = re.compile( + r"^[\$£€¥]\s*[''']?\s*\d*\s*$" # "£000", "$M", "€" + r"|^(USD|GBP|EUR|JPY|CNY|RMB).*$", # "RMB'000" + re.IGNORECASE, + ) + + columns: list[str] = [] + if primary_header_rows: + # Find the best header row: prefer one with year labels + best_header = primary_header_rows[-1] # default: last row + for hrow in primary_header_rows: + non_empty = [h for h in hrow if h.strip()] + if any(_YEAR_RE.search(h) for h in non_empty): + best_header = hrow + break + + if best_header and not best_header[0].strip(): + columns = [h for h in best_header[1:] if h.strip()] + else: + columns = [h for h in best_header if h.strip()] + + # Extract currency and unit from the column headers + currency_unit = extract_currency_and_unit( + primary_header_rows, section[:2000] + ) + + # Build rows and cells from data rows + rows: list[str] = [] + cells: list[dict] = [] + num_cols = len(columns) if columns else 0 + + for data_row in all_data_rows: + if not data_row: + continue + + label = data_row[0] if data_row else "" + label = label or "" # guard against None + values = data_row[1:] if len(data_row) > 1 else [] + values = [v or "" for v in values] # guard against None values + + # Skip rows that look like repeated table headers leaked into data + # (e.g. a row where all values are year strings "2025", "2024" or + # unit strings "£000", "$M"). + if label.strip() == "" and values: + all_header_like = all( + re.match(r"^\d{4}$", v.strip()) + or re.match(r"^[\$£€¥]\s*[''']?\s*\d*$", v.strip()) + or v.strip() == "" + for v in values + ) + if all_header_like and any(v.strip() for v in values): + continue + + # Determine row_type + label_lower = label.lower().strip() + has_values = any(v.strip() for v in values) + + if not has_values and label.strip(): + row_type = "section_header" + elif label_lower.startswith("total "): + row_type = "total" if any( + kw in label_lower for kw in + ["total assets", "total liabilities and", "total revenue"] + ) else "subtotal" + elif label_lower.startswith("net cash") or label_lower.startswith("net increase") or label_lower.startswith("net decrease"): + row_type = "subtotal" + else: + row_type = "line_item" + + row_index = len(rows) + rows.append(label) + + # Label cell + cells.append({ + "row": row_index, "col": 0, "content": label, + "kind": "content", "row_type": row_type, "indent_level": 0, + }) + + # Value cells — always emit num_cols cells + for col_idx in range(num_cols): + raw = values[col_idx].strip() if col_idx < len(values) else "" + cells.append({ + "row": row_index, "col": col_idx + 1, + "content": raw if raw else "", + "kind": "content", "row_type": row_type, "indent_level": 0, + }) + + # Post-pass: set indent_level for items between section_header and subtotal + in_section = False + for c in cells: + if c["col"] != 0: + continue + if c["row_type"] == "section_header": + in_section = True + elif c["row_type"] in ("subtotal", "total"): + in_section = False + elif c["row_type"] == "line_item" and in_section: + c["indent_level"] = 1 + for vc in cells: + if vc["row"] == c["row"] and vc["col"] > 0: + vc["indent_level"] = 1 + + # Attach currency/unit metadata so callers can use it + if cells: + cells[0]["_currency_unit"] = currency_unit + + return rows, columns, cells diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/job_store.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/job_store.py new file mode 100644 index 000000000..cd579f050 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/job_store.py @@ -0,0 +1,177 @@ +""" +extractor/job_store.py +----------------------- +Blob-based job store for extraction results. + +Responsibilities: + - CRUD operations on extraction jobs in Azure Blob Storage + - Building the response payload from pipeline results + - Translating non-English company names to English via LLM + +The Function App's managed identity (or connection string) is used for +blob access. Jobs are stored as JSON blobs in the 'extraction-jobs' container. + +Public API: + save_job(job_id, data) — upsert a job blob + load_job(job_id) -> dict | None — read a job blob + delete_job(job_id) — delete a job blob + get_container() — get the blob container client + build_response_payload(result) — pipeline result -> API response dict +""" + +import json +import logging +import os + +logger = logging.getLogger(__name__) + +# --------------------------------------------------------------------------- +# Container client (lazy singleton) +# --------------------------------------------------------------------------- + +_JOBS_CONTAINER = "extraction-jobs" +_container_client = None + + +def get_container(): + """Get or create the blob container client for job storage. + + Supports two auth modes: + - Connection string (regular Consumption plan): AzureWebJobsStorage env var + - Managed Identity (Flex Consumption plan): AzureWebJobsStorage__blobServiceUri + """ + global _container_client + if _container_client is None: + from azure.storage.blob import BlobServiceClient + + conn_str = os.environ.get("AzureWebJobsStorage") + if conn_str and conn_str.startswith("Default"): + service = BlobServiceClient.from_connection_string(conn_str) + else: + from azure.identity import ManagedIdentityCredential + blob_uri = os.environ["AzureWebJobsStorage__blobServiceUri"] + client_id = os.environ.get("AzureWebJobsStorage__clientId") + credential = ManagedIdentityCredential(client_id=client_id) + service = BlobServiceClient(blob_uri, credential=credential) + + _container_client = service.get_container_client(_JOBS_CONTAINER) + if not _container_client.exists(): + _container_client.create_container() + return _container_client + + +# --------------------------------------------------------------------------- +# Job CRUD +# --------------------------------------------------------------------------- + +def save_job(job_id: str, data: dict): + """Save a job result to blob storage (overwrites if exists).""" + get_container().upload_blob( + f"{job_id}.json", + json.dumps(data, ensure_ascii=False), + overwrite=True, + ) + + +def load_job(job_id: str) -> dict | None: + """Load a job result from blob storage. Returns None if not found.""" + try: + blob = get_container().download_blob(f"{job_id}.json") + return json.loads(blob.readall()) + except Exception: + return None + + +def delete_job(job_id: str): + """Delete a job blob. Silently ignores if not found.""" + try: + get_container().delete_blob(f"{job_id}.json") + except Exception: + pass + + +# --------------------------------------------------------------------------- +# Company name translation +# --------------------------------------------------------------------------- + +def translate_company_name(name: str) -> str: + """Translate non-ASCII company name to English via GPT-4.1. + + Returns the original name if it's already ASCII or if translation fails. + """ + if not name or name.isascii(): + return name + try: + from extractor.llm_reconciler import _get_client, _DEPLOYMENT + client = _get_client() + resp = client.chat.completions.create( + model=_DEPLOYMENT(), + messages=[ + {"role": "system", "content": "Translate the company name to English. Return ONLY the English name, nothing else."}, + {"role": "user", "content": name}, + ], + temperature=0.0, + max_tokens=100, + ) + translated = resp.choices[0].message.content.strip().strip('"') + if translated: + return translated + except Exception as e: + logger.warning(f"Company name translation failed: {e}") + return name + + +# --------------------------------------------------------------------------- +# Response payload builder +# --------------------------------------------------------------------------- + +# Statement types in canonical order +STATEMENT_TYPES = ["balance_sheet", "income_statement", "cash_flow"] + +# Key mapping: snake_case <-> camelCase +SNAKE_TO_CAMEL = { + "balance_sheet": "balanceSheet", + "income_statement": "incomeStatement", + "cash_flow": "cashFlow", +} +CAMEL_TO_SNAKE = {v: k for k, v in SNAKE_TO_CAMEL.items()} + + +def build_response_payload(result: dict) -> dict: + """Build the API response payload from a pipeline result dict. + + Extracts company name (translates if non-English), stringifies each + statement as JSON, and returns a flat dict with camelCase keys. + """ + company_name = None + for stype in STATEMENT_TYPES: + stmt = result.get(stype) + if stmt and isinstance(stmt, dict): + company_name = stmt.get("document_metadata", {}).get("company_name") + if company_name: + break + + company_name = translate_company_name(company_name or "Unknown_Company") + + return { + "companyName": company_name, + "summary": json.dumps(result.get("summary", []), ensure_ascii=False), + "balanceSheet": json.dumps(result.get("balance_sheet"), ensure_ascii=False), + "incomeStatement": json.dumps(result.get("income_statement"), ensure_ascii=False), + "cashFlow": json.dumps(result.get("cash_flow"), ensure_ascii=False), + "confidence": json.dumps(result.get("confidence", {}), ensure_ascii=False), + } + + +def parse_stmt_from_result(result: dict, stype_snake: str): + """Load a statement dict from a result blob (handles both key conventions). + + The blob stores statements as JSON strings with camelCase keys. This + function handles both snake_case and camelCase lookups, and parses + the JSON string if needed. + """ + camel_key = SNAKE_TO_CAMEL.get(stype_snake, stype_snake) + stmt_str = result.get(stype_snake) or result.get(camel_key) + if not stmt_str or stmt_str == "null": + return None + return json.loads(stmt_str) if isinstance(stmt_str, str) else stmt_str diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/llm_reconciler.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/llm_reconciler.py new file mode 100644 index 000000000..da790f888 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/llm_reconciler.py @@ -0,0 +1,928 @@ +""" +extractor/llm_reconciler.py +--------------------------- +Post-processing pass that uses Azure OpenAI to fix three classes of label +errors that occur when Azure Content Understanding processes page-spanning +financial tables: + + 1. suppress_noise_rows() — heuristic, no LLM + Removes syntactically impossible tokens: currency unit headers like + "(Yen)", broken sentence fragments like "of the", or section-numbering + stubs. Applied using an explicit allowlist of impossible-label patterns + so that real ghost rows (proper label, missing values) are never removed. + + 2. reconcile_suspect_ghost() — LLM (one call per statement) + Fixes rows where Azure CU dropped the true row label entirely: + SUSPECT row = correct values attached to a lowercase fragment label + GHOST row = correct uppercase label with no values + GPT-4o-mini receives both lists and returns a suspect->ghost mapping. + The fragment label is replaced with the true ghost label; the ghost row + is then removed. + + 3. complete_truncated_labels() — LLM (one call per statement) + Fixes labels that end mid-phrase at a preposition or conjunction + ("and", "for", "of", ...) because Azure CU wrapped the cell text and + the continuation ended up after the values on the next token. + All truncated labels are sent in a single batch for completion. + +All LLM calls use response_format=json_object so no free-text parsing is +needed — the response is always a directly usable JSON structure. + +Configuration (via .env): + AZURE_OPENAI_ENDPOINT — Azure OpenAI resource URL (no trailing slash) + Authentication: Managed Identity (DefaultAzureCredential) + Requires Cognitive Services OpenAI User role on the resource. + AZURE_OPENAI_DEPLOYMENT — Model deployment name (e.g. "gpt-4o-mini") + AZURE_OPENAI_API_VERSION — API version (default: "2024-02-01") + +Usage: + Called automatically by statement_detector.build_statement_json when + use_llm=True. Can also be called directly: + + from extractor.llm_reconciler import reconcile + rows, columns, cells = reconcile(statement_type, rows, columns, cells) +""" + +import json +import os +import re +from typing import Optional + +from azure.identity import DefaultAzureCredential, get_bearer_token_provider +from dotenv import load_dotenv +from openai import AzureOpenAI + +load_dotenv(override=False) + +# --------------------------------------------------------------------------- +# Azure OpenAI client (lazy-initialised so import never fails if creds absent) +# --------------------------------------------------------------------------- + +_client: Optional[AzureOpenAI] = None + + +def _get_client() -> AzureOpenAI: + global _client + if _client is None: + endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT", "").rstrip("/") + api_ver = os.environ.get("AZURE_OPENAI_API_VERSION", "2024-02-01") + if not endpoint: + raise EnvironmentError( + "AZURE_OPENAI_ENDPOINT must be set in .env " + "to use the LLM reconciler." + ) + # Use API key if available, otherwise managed identity + api_key = os.environ.get("AZURE_OPENAI_KEY") + token_provider = None + if not api_key: + token_provider = get_bearer_token_provider( + DefaultAzureCredential(), + "https://cognitiveservices.azure.com/.default", + ) + if api_key: + _client = AzureOpenAI( + azure_endpoint=endpoint, + api_key=api_key, + api_version=api_ver, + timeout=60.0, + ) + else: + _client = AzureOpenAI( + azure_endpoint=endpoint, + azure_ad_token_provider=token_provider, + api_version=api_ver, + timeout=60.0, + ) + return _client + + +_DEPLOYMENT = lambda: os.environ.get("AZURE_OPENAI_DEPLOYMENT", "gpt-4o-mini") + +# --------------------------------------------------------------------------- +# Helpers to navigate the cells list +# --------------------------------------------------------------------------- + +def _build_grid(cells: list[dict]) -> dict[int, dict[int, str]]: + """row_index -> {col_index -> content}""" + grid: dict[int, dict[int, str]] = {} + for c in cells: + grid.setdefault(c["row"], {})[c["col"]] = c["content"] + return grid + + +def _get_label(grid: dict, row: int) -> str: + return grid.get(row, {}).get(0, "") + + +def _get_values(grid: dict, row: int) -> list[str]: + d = grid.get(row, {}) + return [d[k] for k in sorted(k for k in d if k > 0)] + + +def _set_label(cells: list[dict], row: int, new_label: str) -> None: + for c in cells: + if c["row"] == row and c["col"] == 0: + c["content"] = new_label + return + + +def _remove_rows(cells: list[dict], row_indices: set[int]) -> list[dict]: + return [c for c in cells if c["row"] not in row_indices] + + +# --------------------------------------------------------------------------- +# Step 1 — suppress_noise_rows (no LLM) +# --------------------------------------------------------------------------- + +# Patterns that are NEVER valid financial line-item labels. +# Only match when the row also has no values (empty row). +_NOISE_LABEL_RE = re.compile( + r"^\(Yen\)$" # currency unit + r"|^\(Millions" # currency unit + r"|\bof the$" # broken sentence tail + r"|^of\b" # broken sentence tail + r"|^Company:?$" # ref to reporting entity + r"|^attributable to" # incomplete phrase + r"|\bper share attributable\b" # EPS sub-header fragment + r"|^\d+\\?\)$" # section numbering stub e.g. "3\)" + r"|^-+$", # dash separators + re.IGNORECASE, +) + + +def suppress_noise_rows( + rows: list[str], + columns: list[str], + cells: list[dict], +) -> tuple[list[str], list[str], list[dict]]: + """ + Remove rows that are syntactically impossible as financial line items + and have no values. Only applies an explicit allowlist of patterns + so that real rows with missing values (ghost rows) are never removed. + """ + grid = _build_grid(cells) + drop: set[int] = set() + + for row_idx in sorted(grid): + label = _get_label(grid, row_idx) + vals = _get_values(grid, row_idx) + if not vals and _NOISE_LABEL_RE.search(label): + drop.add(row_idx) + + if not drop: + return rows, columns, cells + + new_cells = _remove_rows(cells, drop) + new_rows = [r for i, r in enumerate(rows) if i not in drop] + return new_rows, columns, new_cells + + +# --------------------------------------------------------------------------- +# Step 2 — reconcile_suspect_ghost (LLM) +# --------------------------------------------------------------------------- + +# A suspect label starts with lowercase — it is a continuation fragment, not +# a real label. It has values because Azure CU attached them to the fragment +# instead of to the (dropped) true label. +_SUSPECT_RE = re.compile(r"^[a-z]") + +# A trailing-preposition label was truncated at a page wrap. +_TRUNCATED_RE = re.compile( + r"\b(and|for|of|in|the|by|to|from|with|under|on|at)\s*$", + re.IGNORECASE, +) + + +def _heuristic_match_ghosts( + suspects: list[dict], + ghosts: list[dict], +) -> tuple[list[dict], list[dict], list[dict]]: + """ + Deterministic matching for unambiguous suspect-ghost pairs. + + Matches are resolved without an LLM call when: + - There is exactly one suspect and one ghost (trivially paired). + - A ghost label ends with the suspect's fragment text (e.g. ghost + "Net decrease (increase) in receivables under" contains suspect + fragment "securities borrowing transactions" is proximate). + - A suspect fragment is a suffix/substring of exactly one ghost label. + + Returns (heuristic_matches, remaining_suspects, remaining_ghosts). + Each match is {"suspect_row": int, "ghost_row": int}. + """ + # Trivial case: exactly one of each — pair them directly. + if len(suspects) == 1 and len(ghosts) == 1: + return ( + [{"suspect_row": suspects[0]["row"], "ghost_row": ghosts[0]["row"]}], + [], + [], + ) + + matches: list[dict] = [] + matched_suspect_rows: set[int] = set() + matched_ghost_rows: set[int] = set() + + for s in suspects: + frag = s["label"].lower().strip() + # Find ghosts whose label contains the fragment as a substring. + candidates = [ + g for g in ghosts + if g["row"] not in matched_ghost_rows + and frag in g["label"].lower() + ] + if len(candidates) == 1: + matches.append({ + "suspect_row": s["row"], + "ghost_row": candidates[0]["row"], + }) + matched_suspect_rows.add(s["row"]) + matched_ghost_rows.add(candidates[0]["row"]) + + remaining_suspects = [s for s in suspects if s["row"] not in matched_suspect_rows] + remaining_ghosts = [g for g in ghosts if g["row"] not in matched_ghost_rows] + return matches, remaining_suspects, remaining_ghosts + + +def _apply_ghost_matches( + matches: list[dict], + valid_suspect_rows: set[int], + valid_ghost_rows: set[int], + grid: dict[int, dict[int, str]], + rows: list[str], + cells: list[dict], + source: str, +) -> set[int]: + """ + Apply validated suspect->ghost label swaps and return the set of ghost + row indices to drop. + + Validates every match returned by the LLM or heuristic: + - suspect_row must be in valid_suspect_rows + - ghost_row must be in valid_ghost_rows + - ghost_row must not already be claimed by a prior match (no duplicates) + Invalid mappings are logged and skipped — never silently applied. + """ + drop_after_swap: set[int] = set() + used_ghosts: set[int] = set() + + for m in matches: + suspect_row = m.get("suspect_row") + ghost_row = m.get("ghost_row") + + # --- Guard: reject None / non-integer values --- + if suspect_row is None or ghost_row is None: + continue + + # --- Guard: reject row indices not in the sets we sent --- + if suspect_row not in valid_suspect_rows: + print(f" [{source}] REJECTED: suspect_row {suspect_row} " + f"not in valid set {sorted(valid_suspect_rows)}") + continue + if ghost_row not in valid_ghost_rows: + print(f" [{source}] REJECTED: ghost_row {ghost_row} " + f"not in valid set {sorted(valid_ghost_rows)}") + continue + + # --- Guard: reject duplicate ghost targets --- + if ghost_row in used_ghosts: + print(f" [{source}] REJECTED: ghost_row {ghost_row} " + f"already matched to another suspect") + continue + + ghost_label = _get_label(grid, ghost_row) + if not ghost_label: + continue + + # Apply the swap: update both cells and rows to stay in sync. + old_label = _get_label(grid, suspect_row) + _set_label(cells, suspect_row, ghost_label) + if suspect_row < len(rows): + rows[suspect_row] = ghost_label + + drop_after_swap.add(ghost_row) + used_ghosts.add(ghost_row) + print(f" [{source}] row {suspect_row}: '{old_label}' -> '{ghost_label}'") + + return drop_after_swap + + +def reconcile_suspect_ghost( + statement_type: str, + rows: list[str], + columns: list[str], + cells: list[dict], +) -> tuple[list[str], list[str], list[dict]]: + """ + Match suspect rows (fragment label + correct values) to ghost rows + (real label + no values), then swap the labels. + + Two-phase approach to minimise LLM cost: + 1. Heuristic pass — resolve unambiguous pairs deterministically. + 2. LLM pass — send only remaining ambiguous candidates to GPT-4o-mini. + + All matches (heuristic or LLM) are validated against the known set of + suspect/ghost row indices before application. Duplicate ghost targets + and out-of-range indices are rejected with a logged warning. + """ + grid = _build_grid(cells) + + suspects: list[dict] = [] # {"row": int, "label": str, "values": list} + ghosts: list[dict] = [] # {"row": int, "label": str} + + for row_idx in sorted(grid): + label = _get_label(grid, row_idx) + vals = _get_values(grid, row_idx) + if vals and _SUSPECT_RE.match(label): + suspects.append({"row": row_idx, "label": label, "values": vals}) + elif not vals and label and not _SUSPECT_RE.match(label): + ghosts.append({"row": row_idx, "label": label}) + + if not suspects or not ghosts: + return rows, columns, cells + + # Build validity sets from the candidates we identified — any row index + # not in these sets is out-of-scope and must be rejected. + valid_suspect_rows: set[int] = {s["row"] for s in suspects} + valid_ghost_rows: set[int] = {g["row"] for g in ghosts} + + print(f" [reconcile] {len(suspects)} suspects, {len(ghosts)} ghosts") + + # --- Phase 1: heuristic matching (free, deterministic) --- + heuristic_matches, remaining_suspects, remaining_ghosts = ( + _heuristic_match_ghosts(suspects, ghosts) + ) + + all_drops: set[int] = set() + + if heuristic_matches: + print(f" [heuristic] resolved {len(heuristic_matches)} match(es) without LLM") + drops = _apply_ghost_matches( + heuristic_matches, valid_suspect_rows, valid_ghost_rows, + grid, rows, cells, source="heuristic", + ) + all_drops |= drops + + # --- Phase 2: LLM matching for remaining ambiguous candidates --- + if remaining_suspects and remaining_ghosts: + print(f" [LLM] reconcile_suspect_ghost: " + f"{len(remaining_suspects)} suspects, {len(remaining_ghosts)} ghosts") + + prompt = ( + f"You are reconciling a {statement_type.replace('_', ' ')} " + f"where an OCR tool dropped some row labels at page boundaries.\n\n" + f"SUSPECT ROWS — values are correct but the label is a continuation " + f"fragment (NOT the true label):\n" + f"{json.dumps(remaining_suspects, indent=2)}\n\n" + f"GHOST ROWS — label is correct but values are missing (they were " + f"attached to a suspect row instead):\n" + f"{json.dumps(remaining_ghosts, indent=2)}\n\n" + f"Match each suspect row to the ghost row whose label is the true " + f"label for those values.\n" + f"Not every suspect needs a match.\nNot every ghost needs a match.\n\n" + f"Return ONLY a JSON object with key \"matches\" containing an array " + f"of objects:\n {{\"suspect_row\": , \"ghost_row\": }}" + ) + + client = _get_client() + response = client.chat.completions.create( + model=_DEPLOYMENT(), + response_format={"type": "json_object"}, + messages=[ + {"role": "system", "content": "You are a financial data quality expert. Return only valid JSON."}, + {"role": "user", "content": prompt}, + ], + temperature=0, + ) + + raw = response.choices[0].message.content + try: + result = json.loads(raw) + llm_matches = result.get("matches", []) + except (json.JSONDecodeError, AttributeError): + print(" [LLM] reconcile_suspect_ghost: bad JSON response, skipping") + llm_matches = [] + + # Validated application — rejects out-of-set indices and duplicates. + drops = _apply_ghost_matches( + llm_matches, valid_suspect_rows, valid_ghost_rows, + grid, rows, cells, source="LLM", + ) + all_drops |= drops + + new_cells = _remove_rows(cells, all_drops) + new_rows = [r for i, r in enumerate(rows) if i not in all_drops] + return new_rows, columns, new_cells + + +# --------------------------------------------------------------------------- +# Step 3 — complete_truncated_labels (LLM) +# --------------------------------------------------------------------------- + +# --------------------------------------------------------------------------- +# Common IFRS label completions — used by the heuristic pass to avoid an LLM +# call for well-known truncated phrases. Keys are the truncated suffix +# (lowercased, stripped); values are the full completion suffix to append. +# --------------------------------------------------------------------------- +_KNOWN_COMPLETIONS: dict[str, str] = { + "proceeds from sales and": "redemption of investment securities", + "proceeds from sales and redemption of investment": "securities", + "purchases of investment securities for": "banking business", + "net decrease (increase) in receivables under": "securities borrowing transactions", + "increase and decrease in derivative assets and": "liabilities", + "proceeds from sales of investments in associates and": "joint ventures", + "acquisitions of investments in associates and": "joint ventures", + "net increase (decrease) in": "short-term borrowings", + "per share attributable to owners of": "the Company", + "equity attributable to owners of": "the Company", + "profit (loss) attributable to owners of": "the Company", + "cash equivalents at end of the": "period", + "restricted cash equivalents at end of the": "period", + "cash, cash equivalents, and restricted cash equivalents at end of the": "period", + "cash and cash equivalents at end of the": "period", + "cash and cash equivalents at beginning of the": "period", +} + + +def _heuristic_complete_labels( + truncated: list[dict], +) -> tuple[list[dict], list[dict]]: + """ + Complete truncated labels using a lookup table of known IFRS phrases. + + Returns (completions, remaining) where completions is a list of + {"row": int, "completed_label": str} and remaining is the list of + truncated entries that could not be resolved heuristically. + """ + completions: list[dict] = [] + remaining: list[dict] = [] + + for entry in truncated: + label_lower = entry["label"].lower().strip() + matched = False + for prefix, suffix in _KNOWN_COMPLETIONS.items(): + if label_lower == prefix or label_lower.endswith(prefix): + completions.append({ + "row": entry["row"], + "completed_label": entry["label"].rstrip() + " " + suffix, + }) + matched = True + break + if not matched: + remaining.append(entry) + + return completions, remaining + + +def _apply_label_completions( + completions: list[dict], + valid_rows: set[int], + grid: dict[int, dict[int, str]], + rows: list[str], + cells: list[dict], + source: str, +) -> None: + """ + Apply validated label completions to both cells and rows. + + Rejects any row index not in valid_rows — prevents the LLM from + silently overwriting labels on rows that were not sent for completion. + """ + for comp in completions: + row_idx = comp.get("row") + new_label = comp.get("completed_label", "").strip() + if row_idx is None or not new_label: + continue + + # --- Guard: reject row indices not in the set we sent --- + if row_idx not in valid_rows: + print(f" [{source}] REJECTED: row {row_idx} " + f"not in valid set {sorted(valid_rows)}") + continue + + old_label = _get_label(grid, row_idx) + _set_label(cells, row_idx, new_label) + # Keep rows list in sync with cells. + if row_idx < len(rows): + rows[row_idx] = new_label + print(f" [{source}] row {row_idx}: '{old_label}' -> '{new_label}'") + + +def complete_truncated_labels( + statement_type: str, + rows: list[str], + columns: list[str], + cells: list[dict], +) -> tuple[list[str], list[str], list[dict]]: + """ + Complete labels that end mid-phrase at a preposition or conjunction. + + Two-phase approach to minimise LLM cost: + 1. Heuristic pass — resolve known IFRS phrases from a lookup table. + 2. LLM pass — send only remaining unknown truncations to GPT-4o-mini. + + All completions are validated: only row indices that were identified as + truncated are accepted. Out-of-range indices from the LLM are rejected. + """ + grid = _build_grid(cells) + + truncated: list[dict] = [] # {"row": int, "label": str} + for row_idx in sorted(grid): + label = _get_label(grid, row_idx) + vals = _get_values(grid, row_idx) + if vals and _TRUNCATED_RE.search(label): + truncated.append({"row": row_idx, "label": label}) + + if not truncated: + return rows, columns, cells + + # Build validity set — only these row indices may be modified. + valid_rows: set[int] = {t["row"] for t in truncated} + + print(f" [truncated] {len(truncated)} truncated labels detected") + + # --- Phase 1: heuristic completion (free, deterministic) --- + heuristic_completions, remaining = _heuristic_complete_labels(truncated) + + if heuristic_completions: + print(f" [heuristic] completed {len(heuristic_completions)} label(s) from lookup table") + _apply_label_completions( + heuristic_completions, valid_rows, grid, rows, cells, + source="heuristic", + ) + + # --- Phase 2: LLM completion for remaining unknown truncations --- + if remaining: + print(f" [LLM] complete_truncated_labels: {len(remaining)} remaining") + + prompt = ( + f"You are correcting truncated row labels in a " + f"{statement_type.replace('_', ' ')} financial statement.\n" + f"Each label below was cut off mid-phrase by an OCR tool at a page boundary.\n" + f"Complete each label to its full, standard IFRS financial statement wording.\n" + f"Keep the completion minimal — only add the words needed to complete the phrase.\n\n" + f"Labels to complete:\n{json.dumps(remaining, indent=2)}\n\n" + f"Return ONLY a JSON object with key \"completions\" containing an array of objects:\n" + f" {{\"row\": , \"completed_label\": \"\"}}\n" + f"Only include rows where you are confident in the completion." + ) + + client = _get_client() + response = client.chat.completions.create( + model=_DEPLOYMENT(), + response_format={"type": "json_object"}, + messages=[ + {"role": "system", "content": "You are a financial data quality expert. Return only valid JSON."}, + {"role": "user", "content": prompt}, + ], + temperature=0, + ) + + raw = response.choices[0].message.content + try: + result = json.loads(raw) + llm_completions = result.get("completions", []) + except (json.JSONDecodeError, AttributeError): + print(" [LLM] complete_truncated_labels: bad JSON response, skipping") + llm_completions = [] + + _apply_label_completions( + llm_completions, valid_rows, grid, rows, cells, source="LLM", + ) + + return rows, columns, cells + + +# --------------------------------------------------------------------------- +# Step 4 — align_underpopulated_columns (subtotal-constrained solver) +# --------------------------------------------------------------------------- +# When Azure CU drops nil-dash markers ("—"), a row ends up with fewer values +# than expected. The parser fills values left-to-right, so they land in the +# wrong columns. +# +# This step uses SECTION SUBTOTALS as ground truth to determine the correct +# column placement. For each section (operating, investing, financing): +# 1. Parse the subtotal row's values (these are always fully populated). +# 2. Sum all fully-populated line-item rows in the section. +# 3. The remaining deficit per column = the sum of under-populated rows' +# correct values for that column. +# 4. Try all possible column placements for each under-populated row and +# pick the combination where the column sums match the deficit. +# +# This is deterministic, free (no LLM call), and constrained by arithmetic. +# Falls back to LLM only if subtotal matching fails. + +from itertools import combinations as _combinations + + +def _parse_value_float(raw: str) -> float: + """Parse a financial value string to float. Returns 0.0 for empty/unparseable.""" + s = raw.strip() + if not s: + return 0.0 + # Strip currency symbols + s = re.sub(r"^[\$\u00a5\u20ac\u00a3]\s*", "", s) + if not s: + return 0.0 + neg = False + if s.startswith("(") and s.endswith(")"): + neg = True + s = s[1:-1] + s = s.replace(",", "") + try: + val = float(s) + return -val if neg else val + except ValueError: + return 0.0 + + +def _find_sections( + grid: dict[int, dict[int, str]], + expected_cols: int, +) -> list[dict]: + """ + Identify sections in a cash-flow-style statement. + + A section starts at a section_header and ends at a subtotal row (inclusive). + Returns a list of dicts: + {"subtotal_row": int, "subtotal_vals": [float, ...], + "line_item_rows": [int, ...]} + """ + # Walk rows in order to find subtotal rows and their preceding line items. + sorted_rows = sorted(grid.keys()) + sections: list[dict] = [] + current_items: list[int] = [] + + for row_idx in sorted_rows: + label = grid[row_idx].get(0, "") + vals = [grid[row_idx].get(c, "") for c in range(1, expected_cols + 1)] + non_empty = [v for v in vals if v.strip()] + + label_lower = label.lower().strip() + is_subtotal = ( + (label_lower.startswith("net cash") or + label_lower.startswith("total ")) and + len(non_empty) == expected_cols + ) + + if is_subtotal: + sections.append({ + "subtotal_row": row_idx, + "subtotal_vals": [_parse_value_float(v) for v in vals], + "line_item_rows": list(current_items), + }) + current_items = [] + elif non_empty: + current_items.append(row_idx) + + return sections + + +def align_underpopulated_columns( + statement_type: str, + rows: list[str], + columns: list[str], + cells: list[dict], +) -> tuple[list[str], list[str], list[dict]]: + """ + Fix column alignment for rows that have fewer values than expected. + + Uses section subtotals as arithmetic constraints to determine the correct + column placement. For each section, computes the deficit between the + subtotal and the sum of fully-populated rows, then searches for a column + assignment of under-populated rows that closes the deficit. + + Falls back to LLM if subtotal-based solving is not possible (e.g. no + subtotals available, or the combinatorial search finds no exact match). + """ + expected_cols = len(columns) + if expected_cols < 2: + return rows, columns, cells + + grid = _build_grid(cells) + + # Classify rows. + underpopulated_map: dict[int, list[str]] = {} # row_idx -> non-empty values + for row_idx in sorted(grid): + vals = [grid[row_idx].get(c, "") for c in range(1, expected_cols + 1)] + non_empty = [v for v in vals if v.strip()] + empty_count = expected_cols - len(non_empty) + if non_empty and empty_count > 0: + underpopulated_map[row_idx] = non_empty + + if not underpopulated_map: + return rows, columns, cells + + print(f" [align] {len(underpopulated_map)} under-populated row(s) " + f"(expected {expected_cols} cols)") + + # Find sections with subtotals. + sections = _find_sections(grid, expected_cols) + + # For each section, solve the column alignment. + solved: dict[int, list[int]] = {} # row_idx -> column_mapping + + for sec in sections: + subtotal = sec["subtotal_vals"] + sec_underpop = [r for r in sec["line_item_rows"] if r in underpopulated_map] + if not sec_underpop: + continue + + # Sum fully-populated rows in this section. + col_sums = [0.0] * expected_cols + for r in sec["line_item_rows"]: + if r not in underpopulated_map: + for c in range(expected_cols): + col_sums[c] += _parse_value_float( + grid[r].get(c + 1, "") + ) + + # Deficit per column = subtotal - sum of fully-populated rows. + deficit = [subtotal[c] - col_sums[c] for c in range(expected_cols)] + + # For each under-populated row, generate all valid column placements. + # Then search for the combination that closes the deficit. + def _solve_rows( + remaining: list[int], + current_deficit: list[float], + ) -> dict[int, list[int]] | None: + """Recursive backtracking solver.""" + if not remaining: + # Check if deficit is approximately zero. + if all(abs(d) < 2.0 for d in current_deficit): + return {} + return None + + row_idx = remaining[0] + rest = remaining[1:] + vals = underpopulated_map[row_idx] + n_vals = len(vals) + parsed_vals = [_parse_value_float(v) for v in vals] + + # Generate all possible column placements (combinations of n_vals + # columns from expected_cols). + for col_combo in _combinations(range(expected_cols), n_vals): + # Check this placement. + new_deficit = list(current_deficit) + for vi, ci in enumerate(col_combo): + new_deficit[ci] -= parsed_vals[vi] + + result = _solve_rows(rest, new_deficit) + if result is not None: + # col_combo is 0-indexed, convert to 1-indexed. + result[row_idx] = [c + 1 for c in col_combo] + return result + + return None + + solution = _solve_rows(sec_underpop, deficit) + if solution: + solved.update(solution) + for r, mapping in solution.items(): + print(f" [align-solve] row {r} '{_get_label(grid, r)}': " + f"values {underpopulated_map[r]} -> columns {mapping}") + else: + print(f" [align] section ending at row {sec['subtotal_row']}: " + f"no exact solution found for {len(sec_underpop)} row(s)") + + # Apply solved alignments. + for row_idx, col_mapping in solved.items(): + current_vals = underpopulated_map[row_idx] + + label_cell = None + for c in cells: + if c["row"] == row_idx and c["col"] == 0: + label_cell = c + break + if not label_cell: + continue + + rt = label_cell.get("row_type", "line_item") + il = label_cell.get("indent_level", 0) + + # Remove existing value cells for this row. + cells = [c for c in cells if not (c["row"] == row_idx and c["col"] > 0)] + + # Re-emit value cells at the correct column positions. + for col_idx in range(1, expected_cols + 1): + if col_idx in col_mapping: + val_pos = col_mapping.index(col_idx) + val = current_vals[val_pos] + else: + val = "" + cells.append({ + "row": row_idx, "col": col_idx, "content": val, + "kind": "content", "currency": None, + "row_type": rt, "indent_level": il, + }) + + # Remove column_alignment_warning from label cell if present. + if "column_alignment_warning" in label_cell: + del label_cell["column_alignment_warning"] + + # LLM fallback for unsolved rows. + unsolved = [r for r in underpopulated_map if r not in solved] + if unsolved: + print(f" [align-LLM] {len(unsolved)} row(s) unsolved, using LLM fallback") + + unsolved_data = [ + {"row": r, "label": _get_label(grid, r), + "values": underpopulated_map[r]} + for r in unsolved + ] + + # Gather all rows (fully populated) as context. + context = [] + for row_idx in sorted(grid): + vals = _get_values(grid, row_idx) + non_empty = [v for v in vals if v.strip()] + if len(non_empty) == expected_cols: + context.append({ + "row": row_idx, + "label": _get_label(grid, row_idx), + "values": non_empty, + }) + + prompt = ( + f"You are fixing column alignment in a {statement_type.replace('_', ' ')}.\n" + f"Columns: {json.dumps(columns)}\n\n" + f"Column 1 = Q current year, Column 2 = Q prior year, " + f"Column 3 = FY current year, Column 4 = FY prior year.\n" + f"FY values are ALWAYS >= Q values in absolute magnitude.\n" + f"Prior-year-only items go in columns 2 or 4.\n\n" + f"Reference rows:\n{json.dumps(context[:10], indent=2)}\n\n" + f"Fix these rows:\n{json.dumps(unsolved_data, indent=2)}\n\n" + f"Return JSON: {{\"alignments\": [{{\"row\": int, " + f"\"column_mapping\": [col_idx, ...]}}]}}" + ) + + try: + client = _get_client() + response = client.chat.completions.create( + model=_DEPLOYMENT(), + response_format={"type": "json_object"}, + messages=[ + {"role": "system", "content": "You are a financial data quality expert. Return only valid JSON."}, + {"role": "user", "content": prompt}, + ], + temperature=0, + ) + result = json.loads(response.choices[0].message.content) + for alignment in result.get("alignments", []): + row_idx = alignment.get("row") + col_mapping = alignment.get("column_mapping", []) + if row_idx not in underpopulated_map: + continue + vals = underpopulated_map[row_idx] + if len(col_mapping) != len(vals): + continue + if not all(isinstance(c, int) and 1 <= c <= expected_cols for c in col_mapping): + continue + # Apply the alignment. + label_cell = next((c for c in cells if c["row"] == row_idx and c["col"] == 0), None) + if not label_cell: + continue + rt = label_cell.get("row_type", "line_item") + il = label_cell.get("indent_level", 0) + cells = [c for c in cells if not (c["row"] == row_idx and c["col"] > 0)] + for col_idx in range(1, expected_cols + 1): + val = vals[col_mapping.index(col_idx)] if col_idx in col_mapping else "" + cells.append({ + "row": row_idx, "col": col_idx, "content": val, + "kind": "content", "currency": None, + "row_type": rt, "indent_level": il, + }) + if "column_alignment_warning" in label_cell: + del label_cell["column_alignment_warning"] + print(f" [align-LLM] row {row_idx} '{_get_label(grid, row_idx)}': " + f"values {vals} -> columns {col_mapping}") + except Exception as e: + print(f" [align-LLM] fallback failed: {e}") + + return rows, columns, cells + + +# --------------------------------------------------------------------------- +# Public entry point +# --------------------------------------------------------------------------- + +def reconcile( + statement_type: str, + rows: list[str], + columns: list[str], + cells: list[dict], +) -> tuple[list[str], list[str], list[dict]]: + """ + Run all four reconciliation steps in order: + 1. suppress_noise_rows (heuristic, always runs) + 2. reconcile_suspect_ghost (LLM, only when suspects+ghosts detected) + 3. complete_truncated_labels (LLM, only when truncated labels detected) + 4. align_underpopulated_columns (LLM, only when under-populated rows exist) + + Returns updated (rows, columns, cells). + Raises EnvironmentError if Azure OpenAI credentials are missing. + """ + rows, columns, cells = suppress_noise_rows(rows, columns, cells) + rows, columns, cells = reconcile_suspect_ghost(statement_type, rows, columns, cells) + rows, columns, cells = complete_truncated_labels(statement_type, rows, columns, cells) + rows, columns, cells = align_underpopulated_columns(statement_type, rows, columns, cells) + return rows, columns, cells diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/llm_table_classifier.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/llm_table_classifier.py new file mode 100644 index 000000000..a9209c02e --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/llm_table_classifier.py @@ -0,0 +1,273 @@ +""" +LLM-based table classifier for financial statement extraction. + +Replaces hardcoded heading patterns and content label matching with a single +LLM call that classifies all tables in the document markdown. + +Public API: + classify_tables(markdown) -> list[TableClassification] +""" +import json +import logging +import os +import re +from dataclasses import dataclass +from typing import Optional + +from openai import AzureOpenAI +from dotenv import load_dotenv + +load_dotenv(override=False) + +logger = logging.getLogger(__name__) + +# --------------------------------------------------------------------------- +# Azure OpenAI client (reuses same pattern as llm_reconciler) +# --------------------------------------------------------------------------- + +_client: Optional[AzureOpenAI] = None + + +def _get_client() -> AzureOpenAI: + global _client + if _client is None: + from azure.identity import DefaultAzureCredential, get_bearer_token_provider + endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT", "").rstrip("/") + api_ver = os.environ.get("AZURE_OPENAI_API_VERSION", "2024-02-01") + if not endpoint: + raise EnvironmentError("AZURE_OPENAI_ENDPOINT must be set.") + api_key = os.environ.get("AZURE_OPENAI_KEY") + if api_key: + _client = AzureOpenAI( + azure_endpoint=endpoint, + api_key=api_key, + api_version=api_ver, + ) + else: + token_provider = get_bearer_token_provider( + DefaultAzureCredential(), + "https://cognitiveservices.azure.com/.default", + ) + _client = AzureOpenAI( + azure_endpoint=endpoint, + azure_ad_token_provider=token_provider, + api_version=api_ver, + ) + return _client + + +_DEPLOYMENT = lambda: os.environ.get("AZURE_OPENAI_DEPLOYMENT", "gpt-4.1") + + +# --------------------------------------------------------------------------- +# Data structures +# --------------------------------------------------------------------------- + +@dataclass +class TableClassification: + """Classification result for a single table in the document.""" + table_index: int + md_offset: int # Absolute offset of in markdown + md_end_offset: int # Absolute offset of
end + statement_type: str # "balance_sheet", "income_statement", "cash_flow", "other" + confidence: float # 0.0 - 1.0 + is_consolidated: bool + reasoning: str # Brief explanation from LLM + + +# --------------------------------------------------------------------------- +# Table extraction helpers +# --------------------------------------------------------------------------- + +def _extract_table_summaries(markdown: str, max_rows: int = 8) -> list[dict]: + """Extract a summary of each table in the markdown for LLM classification. + + Returns list of dicts with: index, offset, end_offset, header_context, first_rows. + """ + tables = [] + search_start = 0 + + while True: + table_start = markdown.find("", search_start) + if table_start < 0: + break + table_end = markdown.find("
", table_start) + if table_end < 0: + break + table_end += len("") + + table_html = markdown[table_start:min(table_end, table_start + 8000)] + + # Extract header context (text before the table, e.g., headings/captions) + context_start = max(0, table_start - 300) + header_context = markdown[context_start:table_start] + # Clean to just text + header_context = re.sub(r"<[^>]+>", " ", header_context) + header_context = re.sub(r"\s+", " ", header_context).strip()[-200:] + + # Extract first N rows of cell content + rows_html = re.findall(r"(.*?)", table_html, re.DOTALL | re.IGNORECASE) + row_texts = [] + for row_html in rows_html[:max_rows]: + cells = re.findall(r"]*>(.*?)", row_html, re.DOTALL | re.IGNORECASE) + cell_texts = [re.sub(r"<[^>]+>", "", c).strip() for c in cells] + row_texts.append(cell_texts) + + # Count total rows (for size context) + total_rows = len(rows_html) + + tables.append({ + "index": len(tables), + "offset": table_start, + "end_offset": table_end, + "header_context": header_context, + "first_rows": row_texts, + "total_rows": total_rows, + }) + + search_start = table_end + + return tables + + +def _build_classification_prompt(table_summaries: list[dict]) -> str: + """Build the LLM prompt for table classification.""" + table_descriptions = [] + + for t in table_summaries: + rows_display = [] + for row in t["first_rows"]: + rows_display.append(" | ".join(row)) + + desc = ( + f"--- Table {t['index']} ({t['total_rows']} rows) ---\n" + f"Context before table: {t['header_context'][:200]}\n" + f"First rows:\n" + "\n".join(rows_display) + ) + table_descriptions.append(desc) + + tables_text = "\n\n".join(table_descriptions) + + return f"""You are a financial statement classifier. Given the following tables extracted from a financial report PDF, classify each table. + +For each table, determine: +1. **statement_type**: One of "balance_sheet", "income_statement", "cash_flow", or "other" +2. **confidence**: 0.0 to 1.0 how confident you are +3. **is_consolidated**: true if this is a consolidated (group) statement, false if parent/standalone +4. **reasoning**: One sentence explaining why + +Classification rules: +- **balance_sheet**: Contains assets, liabilities, equity. Has "total assets" or equivalent. +- **income_statement**: Contains revenue/sales, expenses, profit/loss. Shows profitability over a period. Includes: profit & loss, statement of comprehensive income (if it contains revenue). +- **cash_flow**: Contains operating/investing/financing cash flows. Shows cash movements. +- **other**: Notes, summaries, segment data, OCI-only statements without revenue, parent company statements, or any non-primary financial statement. + +Important: +- Prefer CONSOLIDATED statements over parent company statements. +- If a table is a parent company statement (母公司, parent company, individuel), classify as "other". +- A "Statement of Comprehensive Income" that contains revenue IS an income_statement. +- A "Statement of Comprehensive Income" with ONLY hedging/translation items is "other". +- Tables with very few rows (< 5) that look like summary/index tables are "other". + +Tables: +{tables_text} + +Respond with a JSON object: {{"tables": [...]}} where each element has: table_index (int), statement_type (string), confidence (float), is_consolidated (bool), reasoning (string).""" + + +# --------------------------------------------------------------------------- +# Public API +# --------------------------------------------------------------------------- + +def classify_tables(markdown: str) -> list[TableClassification]: + """Classify all tables in the document markdown using LLM. + + Returns a list of TableClassification objects, one per table found. + """ + table_summaries = _extract_table_summaries(markdown) + + if not table_summaries: + logging.info("[LLM Classifier] No tables found in markdown") + return [] + + logging.info(f"[LLM Classifier] Classifying {len(table_summaries)} tables...") + + prompt = _build_classification_prompt(table_summaries) + + try: + client = _get_client() + response = client.chat.completions.create( + model=_DEPLOYMENT(), + messages=[{"role": "user", "content": prompt}], + temperature=0.0, + max_tokens=4000, + response_format={"type": "json_object"}, + ) + + raw = response.choices[0].message.content.strip() + parsed = json.loads(raw) + + # Handle both {"tables": [...]} and bare [...] formats + if isinstance(parsed, list): + classifications_raw = parsed + elif isinstance(parsed, dict): + classifications_raw = parsed.get("tables", parsed.get("classifications", [])) + else: + classifications_raw = [] + + except Exception as e: + logging.warning(f"[LLM Classifier] LLM call failed: {e}") + return [] + + # Build offset lookup + offset_map = {t["index"]: t for t in table_summaries} + + results = [] + for item in classifications_raw: + idx = item.get("table_index", -1) + t = offset_map.get(idx) + if not t: + continue + + results.append(TableClassification( + table_index=idx, + md_offset=t["offset"], + md_end_offset=t["end_offset"], + statement_type=item.get("statement_type", "other"), + confidence=float(item.get("confidence", 0.0)), + is_consolidated=bool(item.get("is_consolidated", False)), + reasoning=item.get("reasoning", ""), + )) + + # Log results + for r in results: + if r.statement_type != "other": + logging.info( + f"[LLM Classifier] Table {r.table_index}: {r.statement_type} " + f"(confidence={r.confidence:.2f}, consolidated={r.is_consolidated}) " + f"— {r.reasoning}" + ) + + return results + + +def get_best_table( + classifications: list[TableClassification], + statement_type: str, + min_confidence: float = 0.5, +) -> Optional[TableClassification]: + """Get the best classified table for a given statement type. + + Prefers consolidated over non-consolidated, then highest confidence. + """ + matches = [ + c for c in classifications + if c.statement_type == statement_type and c.confidence >= min_confidence + ] + + if not matches: + return None + + # Sort: consolidated first, then by confidence descending + matches.sort(key=lambda c: (c.is_consolidated, c.confidence), reverse=True) + return matches[0] diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/pdfplumber_adapter.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/pdfplumber_adapter.py new file mode 100644 index 000000000..fad775a34 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/pdfplumber_adapter.py @@ -0,0 +1,153 @@ +""" +extractor/pdfplumber_adapter.py +------------------------------- +Converts pdfplumber extraction output into the AnalyzeResult-compatible format +that the existing pipeline (Stages 2-5) expects. + +Key responsibilities: + - Convert pdfplumber tables to HTML strings + - Reconstruct markdown with page text + embedded HTML tables + - Build page_map from page markers + - Classify statements via LLM (same as Textract adapter) +""" +import json +import logging +import re +from typing import Optional + +logger = logging.getLogger(__name__) + + +def _table_to_html(table: list[list[str | None]]) -> str: + """Convert a pdfplumber table (list of rows, each a list of cell values) to HTML. + + First row is treated as header ({''.join(cells_html)}") + + return f"
) if it looks like a header row. + All other rows use . + Newlines within cell values are replaced with spaces. + """ + if not table or not table[0]: + return "" + + rows_html = [] + for row_idx, row in enumerate(table): + cells_html = [] + for cell in row: + text = (cell or "").replace("\n", " ").strip() + # First row gets tags + tag = "th" if row_idx == 0 else "td" + cells_html.append(f"<{tag}>{text}") + rows_html.append(f"
{''.join(rows_html)}
" + + +def reconstruct_markdown(pdfplumber_result: dict) -> str: + """Build markdown with embedded HTML tables from pdfplumber output. + + For each page: + 1. Insert marker + 2. Add page text lines (excluding text that appears in tables) + 3. Add each table as HTML + + The extract stage's heading search works on this text, + and html_table_parser.py parses the
blocks. + """ + parts = [] + + for page_data in pdfplumber_result["pages"]: + page_num = page_data["page_number"] + text = page_data["text"] + tables = page_data["tables"] + + # Page marker for page_map + parts.append(f"") + + # Add page text (preserves headings for the heading search in extract.py) + if text: + parts.append(f"\n{text}\n") + + # Add each table as HTML + for table in tables: + html = _table_to_html(table) + if html: + parts.append(f"\n\n{html}\n\n") + + return "".join(parts) + + +def build_page_map( + pdfplumber_result: dict, + markdown: str, +) -> list[tuple[int, int, int]]: + """Build pipeline-compatible page_map from markers.""" + page_map = [] + markers = list(re.finditer(r"", markdown)) + + for i, match in enumerate(markers): + page_num = int(match.group(1)) + start = match.start() + end = markers[i + 1].start() if i + 1 < len(markers) else len(markdown) + page_map.append((start, end, page_num)) + + return page_map + + +def classify_statements_with_llm(markdown: str) -> list[dict]: + """Use Azure OpenAI to classify financial statements in the document. + + Same approach as textract_adapter.classify_statements_with_llm. + Returns empty list if LLM is not available (graceful fallback). + """ + # Lazy imports to avoid circular imports and env var issues at module level + try: + from extractor.llm_reconciler import _get_client, _DEPLOYMENT + except Exception: + logger.warning("[pdfplumber_adapter] Could not import LLM client, skipping classification") + return [] + + snippet = markdown[:8000] + + prompt = f"""You are a financial document analyst. Given the following extracted text from a financial report, identify each financial statement present. + +For each statement found, return: +- statement_type: one of "balance_sheet", "income_statement", "cash_flow" +- title_raw: the exact title as it appears in the document +- currency: ISO 4217 currency code (e.g. "USD", "EUR", "GBP", "CNY") +- unit: the unit of values (e.g. "millions", "thousands", "ones") +- accounting_standard: e.g. "IFRS", "US_GAAP", "Chinese_ASBE", "Japanese_GAAP", or null +- is_consolidated: true if consolidated/group statement, false if standalone +- report_language: ISO 639-1 language code (e.g. "en", "fr", "zh") +- company_name: name of the reporting entity (in English) + +If you cannot determine a field, use null. + +Document text: +{snippet} + +Respond with a JSON object: {{"statements": [...]}}""" + + try: + client = _get_client() + response = client.chat.completions.create( + model=_DEPLOYMENT(), + response_format={"type": "json_object"}, + messages=[ + {"role": "system", "content": "You are a financial document analyst. Return only valid JSON."}, + {"role": "user", "content": prompt}, + ], + temperature=0.0, + max_tokens=2000, + ) + + raw = response.choices[0].message.content.strip() + parsed = json.loads(raw) + + if isinstance(parsed, dict): + return parsed.get("statements", []) + elif isinstance(parsed, list): + return parsed + return [] + + except Exception as e: + logger.warning(f"[pdfplumber_adapter] LLM classification failed: {e}") + return [] diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/pdfplumber_client.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/pdfplumber_client.py new file mode 100644 index 000000000..d8dd0c476 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/pdfplumber_client.py @@ -0,0 +1,47 @@ +""" +extractor/pdfplumber_client.py +------------------------------ +Local PDF extraction using pdfplumber. No cloud service needed. + +Returns structured data: pages with tables and text lines. +""" +import logging +import pdfplumber + +logger = logging.getLogger(__name__) + + +def extract_document(file_path: str) -> dict: + """Extract tables and text from a PDF using pdfplumber. + + Returns: + { + "pages": [ + { + "page_number": 1, # 1-based + "text": "full page text...", + "tables": [ + [["header1", "header2"], ["row1col1", "row1col2"], ...] + ] + }, + ... + ] + } + """ + pdf = pdfplumber.open(file_path) + pages = [] + + for i, page in enumerate(pdf.pages): + page_text = page.extract_text() or "" + page_tables = page.extract_tables() or [] + + pages.append({ + "page_number": i + 1, + "text": page_text, + "tables": page_tables, + }) + + pdf.close() + + logger.info(f"pdfplumber: {len(pages)} pages, {sum(len(p['tables']) for p in pages)} tables") + return {"pages": pages} diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/pipeline.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/pipeline.py new file mode 100644 index 000000000..89be0c50d --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/pipeline.py @@ -0,0 +1,196 @@ +""" +extractor/pipeline.py — Extraction pipeline orchestrator. + +Runs 5 stages in sequence: + 1. Analyze (CU Locator + markdown + enrichment) + 2. Select (score-based candidate selection) + 3. Extract (page-constrained table location + HTML parse) + 4. Enrich (translate + verify + clean) + 5. Validate (structural quality gates) + +Public API: + run(pdf_path, options) -> dict (same shape as old _run_pipeline output) +""" +import logging +from typing import Optional + +from .stages.contracts import ( + PipelineOptions, + PipelineResult, + QualityStatus, +) + +logger = logging.getLogger(__name__) + +STATEMENT_TYPES = ["balance_sheet", "income_statement", "cash_flow"] + + +def run( + pdf_path: str, + options: Optional[PipelineOptions] = None, +) -> dict: + """Run the full extraction pipeline and return the result dict. + + This is the single entry point replacing function_app._run_pipeline(). + The return dict has the same shape as the old output: + { + "summary": [...], + "balance_sheet": { v1.2 schema } | None, + "income_statement": { v1.2 schema } | None, + "cash_flow": { v1.2 schema } | None, + "confidence": { ... }, + } + """ + from .stages.analyze import run_analyze + from .stages.select import run_select + from .stages.extract import run_extract + from .stages.enrich import run_enrich + from .stages.validate import run_validate + from .confidence_scorer import score_statement + + if options is None: + options = PipelineOptions() + + # -- Stage 1: Analyze -- + logger.info("Pipeline Stage 1/5: Analyze") + analyze_result = run_analyze(pdf_path, options) + + # -- Stage 2: Select -- + logger.info("Pipeline Stage 2/5: Select") + select_result = run_select(analyze_result, options.requested_types) + + # Diagnostic: log select results for each statement type + for stype in STATEMENT_TYPES: + if stype in select_result.selected: + c = select_result.selected[stype] + scores = select_result.scores.get(stype, []) + best_score = scores[0].score if scores else "?" + logging.info( + f"[PIPELINE] Select {stype}: SELECTED title_raw='{c.title_raw}' " + f"pages={c.page_start}-{c.page_end} score={best_score}" + ) + else: + scores = select_result.scores.get(stype, []) + if scores: + logging.info( + f"[PIPELINE] Select {stype}: REJECTED " + f"{len(scores)} candidates, best_score={scores[0].score:.0f}, " + f"reason={scores[0].rejection_reason}, " + f"title='{scores[0].candidate.title_raw[:60]}'" + ) + else: + logging.info(f"[PIPELINE] Select {stype}: NO CANDIDATES from CU Locator") + + # -- Stage 3: Extract -- + logger.info("Pipeline Stage 3/5: Extract") + extract_result = run_extract( + select_result, + analyze_result.markdown, + analyze_result.page_map, + analyze_result.pages, + requested_types=options.requested_types, + ) + + # Diagnostic: log extract results + for stype in STATEMENT_TYPES: + if stype in extract_result.statements: + es = extract_result.statements[stype] + logging.info( + f"[PIPELINE] Extract {stype}: OK rows={len(es.rows)} " + f"pages={es.start_page}-{es.end_page}" + ) + elif stype in extract_result.failures: + logging.info( + f"[PIPELINE] Extract {stype}: FAILED reason={extract_result.failures[stype]}" + ) + elif stype in select_result.selected: + logging.info(f"[PIPELINE] Extract {stype}: MISSING (selected but not extracted)") + else: + logging.info(f"[PIPELINE] Extract {stype}: SKIPPED (not selected)") + + # -- Stage 4: Enrich -- + logger.info("Pipeline Stage 4/5: Enrich") + enrich_result = run_enrich( + extract_result, + {stype: c for stype, c in select_result.selected.items()}, + analyze_result.enrichment_lookup, + analyze_result.markdown, + options.source_file_name, + ) + + # -- Stage 5: Validate -- + logger.info("Pipeline Stage 5/5: Validate") + validate_result = run_validate(enrich_result) + + # -- Build output dict (backward compatible) -- + output: dict = {"summary": []} + + for stype in STATEMENT_TYPES: + if stype not in options.requested_types: + output["summary"].append({ + "statement_type": stype, + "status": "not_requested", + "page_range": {"start": None, "end": None}, + }) + output[stype] = None + continue + + # Check if statement was extracted + validated = validate_result.statements.get(stype) + extracted = extract_result.statements.get(stype) + + if validated: + v12_doc = validated.v12_doc + output[stype] = v12_doc + output["summary"].append({ + "statement_type": stype, + "status": "extracted", + "page_range": { + "start": extracted.start_page if extracted else None, + "end": extracted.end_page if extracted else None, + }, + "row_count": len(v12_doc.get("rows", [])), + "column_count": len(v12_doc.get("columns", [])), + "validation_status": v12_doc.get("validation", {}).get("status"), + "quality_score": validated.quality_score, + "quality_status": validated.status.value, + }) + logger.info( + f"{stype}: {len(v12_doc.get('rows', []))} rows, " + f"quality={validated.quality_score:.2f} ({validated.status.value})" + ) + elif stype in extract_result.failures: + reason = extract_result.failures[stype] + status = "found_but_not_extracted" if reason == "could_not_locate_in_markdown" else "found_but_empty" + candidate = select_result.selected.get(stype) + output["summary"].append({ + "statement_type": stype, + "status": status, + "page_range": { + "start": candidate.page_start if candidate else None, + "end": candidate.page_end if candidate else None, + }, + }) + output[stype] = None + else: + output["summary"].append({ + "statement_type": stype, + "status": "not_found", + "page_range": {"start": None, "end": None}, + }) + output[stype] = None + + # -- Confidence scoring (existing module, reused as-is) -- + confidence = {} + for stype in STATEMENT_TYPES: + stmt = output.get(stype) + if stmt and isinstance(stmt, dict) and stmt.get("rows"): + confidence[stype] = score_statement(stmt, stype) + else: + confidence[stype] = { + "score": 0.0, "level": "low", + "signals": {}, "flagged_rows": [], + } + output["confidence"] = confidence + + return output diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/review_endpoints.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/review_endpoints.py new file mode 100644 index 000000000..1e5c98c14 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/review_endpoints.py @@ -0,0 +1,249 @@ +""" +extractor/review_endpoints.py +------------------------------ +Business logic for HITL review endpoints: + - /build-review-card — generate Adaptive Card JSON for statement review + - /parse-card-submission — advance session state from card payload + - /apply-corrections — merge analyst corrections into extraction result + +These endpoints implement a session-state-driven review flow where the +backend owns all review logic. The CPS topic just shuttles sessionState +and card JSON back and forth. + +Public API: + handle_build_review_card(job_id, session_state_str) -> (status_code, dict) + handle_parse_card_submission(session_state_str, payload_raw) -> (status_code, dict) + handle_apply_corrections(job_id, session_state_str) -> (status_code, dict) +""" + +import json +import logging + +from extractor.job_store import ( + load_job, delete_job, parse_stmt_from_result, + STATEMENT_TYPES, SNAKE_TO_CAMEL, +) + +logger = logging.getLogger(__name__) + +# Unit display names for the navigator card +_UNIT_DISPLAY = { + "元": "Yuan (ones)", "千元": "Thousands (CNY)", "万元": "Ten-thousands (CNY)", + "百万元": "Millions (CNY)", "亿元": "Hundred-millions (CNY)", + "ones": "Ones", "thousands": "Thousands", "millions": "Millions", + "billions": "Billions", +} + + +# --------------------------------------------------------------------------- +# /build-review-card +# --------------------------------------------------------------------------- + +def handle_build_review_card( + job_id: str, session_state_str: str +) -> tuple[int, dict]: + """Generate an Adaptive Card JSON payload for HITL review. + + On first call (empty sessionState): initialises session state from + confidence data and returns the navigator card with extraction summary. + + On subsequent calls: uses session state to determine which statement + card to build (paginated review flow). + """ + from extractor.card_builder import ( + build_navigator_card, + build_statement_review_card, + init_session_state, + ) + + job = load_job(job_id) + if not job or job.get("status") != "completed": + return 404, {"error": f"Job {job_id} not found"} + + result = job.get("result", {}) + + # Parse confidence data + confidence_str = result.get("confidence") or "{}" + confidence = json.loads(confidence_str) if isinstance(confidence_str, str) else confidence_str + + # --- First call: return navigator card --- + if not session_state_str or session_state_str.strip() in ('{}', '"{}"'): + available = [ + stype for stype in STATEMENT_TYPES + if parse_stmt_from_result(result, stype) is not None + ] + session_state_str = init_session_state( + confidence, job_id=job_id, available_statements=available, + ) + + # Extract metadata from first available statement + company_name = result.get("companyName") or result.get("company_name", "Unknown") + currency = "USD" + unit = "ones" + for stype in STATEMENT_TYPES: + stmt = parse_stmt_from_result(result, stype) + if stmt and isinstance(stmt, dict): + meta = stmt.get("statement_metadata", {}) + currency = meta.get("currency", currency) + raw_unit = meta.get("unit", unit) + unit = _UNIT_DISPLAY.get(raw_unit, raw_unit) + break + + summary_str = result.get("summary", "[]") + summary = json.loads(summary_str) if isinstance(summary_str, str) else summary_str + + card = build_navigator_card(company_name, currency, unit, confidence, summary, job_id=job_id) + return 200, { + "cardJson": json.dumps(card, ensure_ascii=False), + "sessionState": session_state_str, + } + + # --- Subsequent calls: build statement review card --- + try: + state = json.loads(session_state_str) + except (json.JSONDecodeError, TypeError): + return 400, {"error": "Malformed sessionState"} + + step = state.get("step", 1) + statements = state.get("statements", []) + total_steps = len(statements) + + if step < 1 or step > total_steps: + return 400, {"error": f"Invalid step {step} for {total_steps} statements"} + + stype = statements[step - 1] + stmt = parse_stmt_from_result(result, stype) + if not stmt: + return 404, {"error": f"Statement {stype} not found in job"} + + confidence_entry = confidence.get(stype, {"score": 0, "level": "low", "flagged_rows": []}) + corrections_for_stmt = state.get("corrections", {}).get(stype, {}) + + card = build_statement_review_card( + statement_type=stype, + statement_json=stmt, + confidence_entry=confidence_entry, + corrections=corrections_for_stmt, + step_num=step, + total_steps=total_steps, + editable=state.get("editable", False), + edit_all=state.get("editAll", False), + edit_all_page=state.get("editAllPage", 0), + ) + + return 200, { + "cardJson": json.dumps(card, ensure_ascii=False), + "sessionState": session_state_str, + } + + +# --------------------------------------------------------------------------- +# /parse-card-submission +# --------------------------------------------------------------------------- + +def handle_parse_card_submission( + session_state_str: str, payload_raw: str | dict +) -> tuple[int, dict]: + """Parse an Adaptive Card submission and advance the session state machine. + + Resolves the current statement from sessionState, diffs corrections, + and returns a topic-facing action (continue/done/skip) plus updated state. + """ + from extractor.card_builder import advance_session_state + + if not session_state_str: + return 400, {"error": "Missing sessionState"} + + # Parse payload (topic sends as JSON string) + if isinstance(payload_raw, str): + try: + payload = json.loads(payload_raw) + except json.JSONDecodeError: + payload = {} + else: + payload = payload_raw + if not isinstance(payload, dict): + payload = {} + + logger.info(f"parse-card-submission: sessionState={session_state_str[:200]}, payload={json.dumps(payload)[:200]}") + + # Resolve statement JSON from blob (needed for correction diffing) + try: + state = json.loads(session_state_str) + except (json.JSONDecodeError, TypeError): + return 400, {"error": "Malformed sessionState"} + + statement_json = None + if state.get("phase") == "review": + step = state.get("step", 0) + statements = state.get("statements", []) + if 0 < step <= len(statements): + stype = statements[step - 1] + job_id = state.get("jobId") or payload.get("jobId") + if job_id: + job = load_job(job_id) + if job and job.get("status") == "completed": + statement_json = parse_stmt_from_result(job.get("result", {}), stype) + if statement_json is None: + logger.warning( + f"parse-card-submission: could not load statement for step={step}" + ) + + topic_action, updated_state = advance_session_state(session_state_str, payload, statement_json) + + return 200, { + "action": topic_action, + "sessionState": updated_state, + } + + +# --------------------------------------------------------------------------- +# /apply-corrections +# --------------------------------------------------------------------------- + +def handle_apply_corrections( + job_id: str, session_state_str: str +) -> tuple[int, dict]: + """Apply accumulated analyst corrections to a cached extraction result. + + Corrections are stored in sessionState across all reviewed statements. + Returns the corrected statement JSONs ready for Excel generation. + """ + # Parse session state + try: + state = json.loads(session_state_str) if session_state_str else {} + except (json.JSONDecodeError, TypeError): + return 400, {"error": "Malformed sessionState"} + corrections = state.get("corrections", {}) + + # Load cached result + job = load_job(job_id) + if not job or job.get("status") != "completed": + return 404, {"error": f"Job {job_id} not found or not completed"} + + result = job.get("result", {}) + + # Apply corrections per statement + from extractor.corrections import apply_corrections as _apply + for stype in STATEMENT_TYPES: + stmt = parse_stmt_from_result(result, stype) + stmt_corrections = corrections.get(stype, {}) + if stmt and stmt_corrections: + stmt = _apply(stmt, stmt_corrections) + camel_key = SNAKE_TO_CAMEL.get(stype, stype) + result[camel_key] = json.dumps(stmt, ensure_ascii=False) if stmt else "null" + + # Build final payload + payload = { + "status": "corrected", + "companyName": result.get("companyName") or result.get("company_name", "Unknown_Company"), + "summary": result.get("summary", "[]"), + "balanceSheet": result.get("balanceSheet") or result.get("balance_sheet", "null"), + "incomeStatement": result.get("incomeStatement") or result.get("income_statement", "null"), + "cashFlow": result.get("cashFlow") or result.get("cash_flow", "null"), + } + + # Clean up cached job + delete_job(job_id) + + return 200, payload diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/schema_mapper.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/schema_mapper.py new file mode 100644 index 000000000..8128ffcf5 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/schema_mapper.py @@ -0,0 +1,462 @@ +""" +extractor/schema_mapper.py +-------------------------- +Maps the hybrid pipeline output (CU locator metadata + Python parser rows + +CU enrichment) into the v1.2 unified financial statement schema. + +Produces one JSON file per statement conforming to: + docs/schema/financial-statement-unified.schema.json +""" + +import json +import re +from datetime import datetime +from pathlib import Path +from typing import Optional + +from .statement_detector import _parse_financial_value + + +# --------------------------------------------------------------------------- +# Value normalization +# --------------------------------------------------------------------------- + +def _normalize_value(raw: str | None) -> dict: + """ + Convert a raw display value into the v1.2 value cell format. + + Returns: + {"column_index": ..., "raw": ..., "normalized": ..., "is_null": ..., + "is_zero": ..., "value_kind": ..., "confidence": None} + """ + if raw is None or not raw.strip(): + return { + "raw": None, + "normalized": None, + "is_null": True, + "is_zero": None, + "value_kind": None, + "confidence": None, + } + + parsed = _parse_financial_value(raw) + is_zero = parsed == 0.0 if parsed is not None else None + + return { + "raw": raw, + "normalized": parsed, + "is_null": False, + "is_zero": is_zero, + "value_kind": None, # set downstream based on section + "confidence": None, + } + + +# --------------------------------------------------------------------------- +# Column metadata builder +# --------------------------------------------------------------------------- + +def _build_column_metadata( + columns: list[str], + statement_type: str, +) -> list[dict]: + """Build v1.2 column metadata from the Python parser's column strings.""" + result = [] + for i, col_label in enumerate(columns): + col_lower = col_label.lower() + + # Detect period type — expanded for Chinese/Japanese + if any(kw in col_lower for kw in ["three months", "quarter", "q1", "q2", "q3", "q4"]): + period_type = "quarter" + elif any(kw in col_lower for kw in ["twelve months", "annual", "fiscal year", "fy "]): + period_type = "annual" + elif any(kw in col_lower for kw in ["nine months", "前三季度", "1-9月", "q1-q3"]): + period_type = "nine_months" + elif any(kw in col_lower for kw in ["six months", "half", "h1", "半年"]): + period_type = "half_year" + elif any(kw in col_lower for kw in ["year to date", "ytd"]): + period_type = "year_to_date" + elif statement_type == "balance_sheet": + period_type = "instant" + else: + period_type = "other" + + # Extract year + year_match = re.search(r"20\d{2}", col_label) + fiscal_year = int(year_match.group()) if year_match else None + + # Detect if comparative (second/later column of same period type) + is_comparative = i > 0 and fiscal_year is not None + + result.append({ + "column_index": i, + "label": col_label, + "label_raw": col_label, + "period_type": period_type, + "fiscal_year": fiscal_year, + "fiscal_quarter": None, # TODO: detect from label + "start_date": None, + "end_date": None, + "is_comparative": is_comparative, + }) + return result + + +# --------------------------------------------------------------------------- +# Row builder +# --------------------------------------------------------------------------- + +def _classify_value_kind(section: str, label_lower: str) -> str | None: + """Determine value_kind from section and label context.""" + if section == "eps" or "per share" in label_lower: + return "ratio" + if section == "shares" or "shares" in label_lower: + return "shares" + if "margin" in label_lower or "rate" in label_lower or "%" in label_lower: + return "percentage" + return "currency" + + +def _is_required_anchor(canonical_key: str) -> bool: + """Check if this row is a critical anchor that must always be present.""" + anchors = { + "revenue", "net_income", "total_assets", "total_liabilities", + "total_equity", "total_liabilities_and_equity", + "net_cash_from_operating_activities", + "net_cash_provided_by_operating_activities", + "net_cash_used_in_investing_activities", + "net_cash_used_in_financing_activities", + "total_costs_and_expenses", "income_from_operations", + "basic_eps", "diluted_eps", + } + return canonical_key in anchors + + +def build_v12_row( + row_index: int, + label: str, + row_type: str, + indent_level: int, + values_raw: list[str | None], + enrichment: dict, + num_columns: int, +) -> dict: + """Build a single v1.2 schema row.""" + canonical_key = enrichment["canonical_key"] + section = enrichment["section"] + label_lower = label.lower() + + # Build value cells — column_index is 0-based, matching the columns metadata + value_cells = [] + for i in range(num_columns): + raw = values_raw[i] if i < len(values_raw) else None + cell = _normalize_value(raw) + cell["column_index"] = i + cell["value_kind"] = _classify_value_kind(section, label_lower) + value_cells.append(cell) + + return { + "row_index": row_index, + "label_raw": label, + "label_normalized": enrichment.get("label_normalized"), + "label_language": enrichment.get("label_language", "en"), + "canonical_key": canonical_key, + "canonical_group": enrichment.get("canonical_group"), + "row_type": row_type, + "indent_level": indent_level, + "section": section, + "parent_canonical_key": None, # TODO: infer from hierarchy + "sign_hint": None, + "is_derived_total": row_type in ("subtotal", "total"), + "is_required_anchor": _is_required_anchor(canonical_key), + "source_page": None, + "source_bbox": None, + "values": value_cells, + } + + +# --------------------------------------------------------------------------- +# Validation +# --------------------------------------------------------------------------- + +def _run_validation(rows: list[dict], num_columns: int) -> dict: + """ + Run subtotal validation on the v1.2 rows. + + Returns the validation block with status, warnings, errors. + """ + warnings = [] + errors = [] + + # Check required anchors + present_keys = {r["canonical_key"] for r in rows} + for r in rows: + if r["is_required_anchor"] and r["canonical_key"] not in present_keys: + errors.append({ + "code": "MISSING_ANCHOR", + "message": f"Required anchor row '{r['canonical_key']}' not found", + "severity": "error", + "row_index": None, + "canonical_key": r["canonical_key"], + "column_index": None, + "expected": None, + "actual": None, + "difference": None, + }) + + # Subtotal validation: sum indented children, compare to subtotal + for i, row in enumerate(rows): + if row["row_type"] != "subtotal": + continue + + # Find preceding children (indent > 0, until previous subtotal/total) + children = [] + for j in range(i - 1, -1, -1): + prev = rows[j] + if prev["row_type"] in ("subtotal", "total"): + break + if prev["row_type"] == "section_header": + continue + if prev["row_type"] == "line_item": + children.append(prev) + + if not children: + continue + + for col_idx in range(num_columns): + total_cell = row["values"][col_idx] if col_idx < len(row["values"]) else {} + total_val = total_cell.get("normalized") + if total_val is None: + continue + + child_sum = 0.0 + all_parseable = True + for child in children: + child_cell = child["values"][col_idx] if col_idx < len(child["values"]) else {} + child_val = child_cell.get("normalized") + if child_val is None and not child_cell.get("is_null", True): + all_parseable = False + break + child_sum += child_val or 0.0 + + if not all_parseable: + continue + + diff = abs(total_val - child_sum) + if diff > 2.0: + warnings.append({ + "code": "SUBTOTAL_MISMATCH", + "message": ( + f"Row '{row['canonical_key']}' col {col_idx}: " + f"subtotal={total_val}, sum of children={child_sum}, " + f"diff={diff}" + ), + "severity": "warning", + "row_index": row["row_index"], + "canonical_key": row["canonical_key"], + "column_index": col_idx, + "expected": child_sum, + "actual": total_val, + "difference": diff, + }) + + if errors: + status = "failed" + elif warnings: + status = "passed_with_warnings" + else: + status = "passed" + + return { + "status": status, + "validator_version": "2.0.0", + "rule_set_version": "1.0.0", + "validation_run_id": None, + "validated_at": datetime.utcnow().isoformat() + "Z", + "warnings": warnings, + "errors": errors, + } + + +# --------------------------------------------------------------------------- +# Top-level assembly +# --------------------------------------------------------------------------- + +def assemble_v12_statement( + statement_type: str, + locator_metadata: dict, + parser_rows: list[str], + parser_columns: list[str], + parser_cells: list[dict], + enrichment_lookup: dict, + source_file_name: str, +) -> dict: + """ + Assemble a complete v1.2 schema document from all pipeline stages. + + Args: + statement_type: "balance_sheet", "income_statement", or "cash_flow" + locator_metadata: Dict from CU locator (company, currency, etc.) + parser_rows: Row labels from Python parser + parser_columns: Column headers from Python parser + parser_cells: Cell dicts from Python parser + enrichment_lookup: Dict from build_enrichment_lookup() + source_file_name: Original PDF filename + """ + from .enrichment import enrich_all_rows + + # Extract currency/unit metadata embedded by html_table_parser in the + # first cell (from column headers like "£000"). Use as fallback when + # the CU locator didn't provide currency or unit. + table_currency_unit = {} + if parser_cells: + table_currency_unit = parser_cells[0].get("_currency_unit", {}) or {} + + # parser_columns already contains only data column headers (label column + # is not included), so this is the true number of data columns. + num_columns = len(parser_columns) + + # Build grid from parser cells + grid: dict[int, dict[int, dict]] = {} + for c in parser_cells: + grid.setdefault(c["row"], {})[c["col"]] = c + + # Collect all labels and row_types for batch enrichment + sorted_indices = sorted(grid.keys()) + all_labels = [] + all_row_types = [] + all_indent_levels = [] + all_values_raw = [] + + for row_idx in sorted_indices: + label_cell = grid[row_idx].get(0, {}) + label = label_cell.get("content", "") + row_type = label_cell.get("row_type", "line_item") + indent_level = label_cell.get("indent_level", 0) + + values_raw = [] + for col_idx in range(1, num_columns + 1): + val_cell = grid[row_idx].get(col_idx, {}) + content = val_cell.get("content", "") + values_raw.append(content if content.strip() else None) + + all_labels.append(label) + all_row_types.append(row_type) + all_indent_levels.append(indent_level) + all_values_raw.append(values_raw) + + # Batch enrichment (CU lookup + LLM translation for non-English) + enrichments = enrich_all_rows( + all_labels, all_row_types, enrichment_lookup, statement_type + ) + + # Build v1.2 rows + v12_rows = [] + for i, row_idx in enumerate(sorted_indices): + label = all_labels[i] + row_type = all_row_types[i] + indent_level = all_indent_levels[i] + values_raw = all_values_raw[i] + enrichment = enrichments[i] + + v12_row = build_v12_row( + row_index=len(v12_rows), + label=label, + row_type=row_type, + indent_level=indent_level, + values_raw=values_raw, + enrichment=enrichment, + num_columns=num_columns, + ) + v12_rows.append(v12_row) + + # Build column metadata + v12_columns = _build_column_metadata(parser_columns, statement_type) + + # Run validation + validation = _run_validation(v12_rows, num_columns) + + # Resolve currency and unit: CU locator > table column headers > defaults. + # The CU locator provides currency/unit from AI analysis; the table parser + # extracts them from column headers (e.g. "£000" → GBP + thousands). + resolved_currency = ( + locator_metadata.get("currency") + or table_currency_unit.get("currency") + or "USD" + ) + resolved_unit = ( + locator_metadata.get("unit") + or table_currency_unit.get("unit") + or "ones" + ) + resolved_symbol = ( + _currency_to_symbol(locator_metadata.get("currency", "")) + or table_currency_unit.get("currency_symbol") + or _currency_to_symbol(resolved_currency) + ) + + # Assemble the document + return { + "schema_version": "1.2.0", + "document_metadata": { + "source_file_name": source_file_name, + "source_file_hash": None, + "company_name": locator_metadata.get("company_name"), + "company_name_raw": locator_metadata.get("company_name_raw"), + "report_type": _infer_report_type(source_file_name), + "report_language": locator_metadata.get("report_language", "en"), + "source_country": None, + "source_exchange": None, + "ticker": None, + "identifier": None, + }, + "statement_metadata": { + "statement_type": statement_type, + "statement_title": locator_metadata.get("title_english"), + "statement_title_raw": locator_metadata.get("title_raw"), + "accounting_standard": locator_metadata.get("accounting_standard"), + "currency": resolved_currency, + "currency_symbol": resolved_symbol, + "unit": resolved_unit, + "unit_raw": None, + "is_consolidated": locator_metadata.get("is_consolidated", True), + "is_audited": None, + "page_range": { + "start": locator_metadata.get("page_start", 1), + "end": locator_metadata.get("page_end", 1), + }, + "bbox_coordinate_system": "normalized_0_1", + }, + "columns": v12_columns, + "rows": v12_rows, + "validation": validation, + } + + +def _infer_report_type(filename: str) -> str: + """Infer report type from filename.""" + fl = filename.lower() + if "10-q" in fl: + return "10-Q" + if "10-k" in fl: + return "10-K" + if "exhibit" in fl or "earnings" in fl: + return "earnings_release" + if "annual" in fl: + return "annual_report" + if "quarter" in fl or "qr" in fl or "q1" in fl or "q2" in fl or "q3" in fl or "q4" in fl: + return "quarterly_report" + if "interim" in fl: + return "interim_report" + return "other" + + +def _currency_to_symbol(currency: str) -> str | None: + """Map ISO 4217 code to display symbol.""" + return { + "USD": "$", + "CNY": "\u00a5", + "JPY": "\u00a5", + "EUR": "\u20ac", + "GBP": "\u00a3", + }.get(currency) diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/analyze.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/analyze.py new file mode 100644 index 000000000..2970d8360 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/analyze.py @@ -0,0 +1,294 @@ +""" +Stage 1: Analyze — CU Locator + markdown + page map. + OR + Document Intelligence + markdown + LLM classification. + OR + Textract + adapter + LLM classification. + OR + pdfplumber (local) + adapter + LLM classification. + +Backend is selected by PipelineOptions.backend: + "cu", "document_intelligence", "textract", or "pdfplumber". + +Wraps existing cu_client.analyze_document() calls and parses the +CU Locator response into typed CandidateStatement objects. +""" +import logging +from typing import Optional + +from .contracts import AnalyzeResult, CandidateStatement, PipelineOptions + +logger = logging.getLogger(__name__) + + +def parse_locator_statements(locator_result: dict) -> list[CandidateStatement]: + """Extract CandidateStatement objects from the CU locator response. + + This is a typed version of the old _parse_locator_statements() in function_app.py. + """ + contents = locator_result.get("result", {}).get("contents", []) + if not contents: + return [] + + fields = contents[0].get("fields", {}) + stmts = fields.get("statements", {}).get("valueArray", []) + + results = [] + for s in stmts: + props = s.get("valueObject", {}) + + def _str(key): + return props.get(key, {}).get("valueString") + + def _int(key): + return props.get(key, {}).get("valueInteger") + + def _bool(key): + return props.get(key, {}).get("valueBoolean") + + results.append(CandidateStatement( + statement_type=_str("statement_type") or "", + title_raw=_str("title_raw"), + title_english=_str("title_english"), + page_start=_int("page_start"), + page_end=_int("page_end"), + company_name=_str("company_name"), + company_name_raw=_str("company_name_raw"), + report_language=_str("report_language"), + currency=_str("currency"), + unit=_str("unit"), + is_consolidated=_bool("is_consolidated"), + accounting_standard=_str("accounting_standard"), + )) + + return results + + +def _run_analyze_cu( + pdf_path: str, + options: PipelineOptions, +) -> AnalyzeResult: + """Original CU backend — unchanged. + + Args: + pdf_path: Path to the PDF file on disk. + options: Pipeline options (enrichment flag, etc.). + + Returns: + AnalyzeResult with candidates, markdown, pages, page_map, enrichment_lookup. + """ + from extractor.cu_client import analyze_document + from extractor.statement_detector import _build_page_map + from extractor.enrichment import build_enrichment_lookup + + # Step 1: CU Locator + logger.info("Stage 1 (Analyze): Running CU Locator") + locator_result = analyze_document("financial-statement-locator", pdf_path) + + candidates = parse_locator_statements(locator_result) + + # Step 2: Extract markdown and pages + contents = locator_result.get("result", {}).get("contents", []) + markdown = contents[0].get("markdown", "") if contents else "" + pages = contents[0].get("pages", []) if contents else [] + page_map = _build_page_map(pages) + + # Step 3: CU Extractor for enrichment (optional) + enrichment_lookup = {} + if options.use_enrichment: + logger.info("Stage 1 (Analyze): Running CU Extractor for enrichment") + try: + extractor_result = analyze_document( + "financial-statement-extractor", pdf_path + ) + enrichment_lookup = build_enrichment_lookup(extractor_result) + except Exception as e: + logger.warning(f"Enrichment failed, continuing without: {e}") + + logger.info( + f"Stage 1 (Analyze): {len(candidates)} candidates, " + f"{len(markdown)} chars markdown, {len(pages)} pages" + ) + + return AnalyzeResult( + candidates=candidates, + markdown=markdown, + pages=pages, + page_map=page_map, + enrichment_lookup=enrichment_lookup, + ) + + +def _run_analyze_textract( + pdf_path: str, + options: PipelineOptions, +) -> AnalyzeResult: + """Textract backend — adapter converts blocks to AnalyzeResult.""" + from extractor.textract_client import analyze_document + from extractor.textract_adapter import ( + reconstruct_markdown, + build_page_map, + classify_statements_with_llm, + ) + + logger.info("Stage 1 (Analyze): Running AWS Textract") + textract_result = analyze_document(pdf_path) + blocks = textract_result.get("Blocks", []) + + # Reconstruct markdown with embedded HTML tables + markdown = reconstruct_markdown(blocks) + page_map = build_page_map(blocks, markdown) + + # Classify statements via LLM + logger.info("Stage 1 (Analyze): Classifying statements via LLM") + stmt_classifications = classify_statements_with_llm(markdown) + + candidates = [] + for s in stmt_classifications: + candidates.append(CandidateStatement( + statement_type=s.get("statement_type", ""), + title_raw=s.get("title_raw"), + title_english=s.get("title_raw"), # LLM already returns English + page_start=s.get("page_start"), + page_end=s.get("page_end"), + company_name=s.get("company_name"), + company_name_raw=s.get("company_name"), + report_language=s.get("report_language"), + currency=s.get("currency"), + unit=s.get("unit"), + is_consolidated=s.get("is_consolidated"), + accounting_standard=s.get("accounting_standard"), + )) + + logger.info( + f"Stage 1 (Analyze): {len(candidates)} candidates, " + f"{len(markdown)} chars markdown, page_map entries: {len(page_map)}" + ) + + return AnalyzeResult( + candidates=candidates, + markdown=markdown, + pages=[], # Textract doesn't use CU's page span format + page_map=page_map, + enrichment_lookup={}, # LLM fallback handles enrichment in Stage 4 + ) + + +def _run_analyze_pdfplumber( + pdf_path: str, + options: PipelineOptions, +) -> AnalyzeResult: + """pdfplumber backend — local extraction, no cloud service.""" + from extractor.pdfplumber_client import extract_document + from extractor.pdfplumber_adapter import ( + reconstruct_markdown, + build_page_map, + classify_statements_with_llm, + ) + + logger.info("Stage 1 (Analyze): Running pdfplumber (local)") + pdfplumber_result = extract_document(pdf_path) + + markdown = reconstruct_markdown(pdfplumber_result) + page_map = build_page_map(pdfplumber_result, markdown) + + logger.info("Stage 1 (Analyze): Classifying statements via LLM") + stmt_classifications = classify_statements_with_llm(markdown) + + candidates = [] + for s in stmt_classifications: + candidates.append(CandidateStatement( + statement_type=s.get("statement_type", ""), + title_raw=s.get("title_raw"), + title_english=s.get("title_raw"), + page_start=s.get("page_start"), + page_end=s.get("page_end"), + company_name=s.get("company_name"), + company_name_raw=s.get("company_name"), + report_language=s.get("report_language"), + currency=s.get("currency"), + unit=s.get("unit"), + is_consolidated=s.get("is_consolidated"), + accounting_standard=s.get("accounting_standard"), + )) + + logger.info( + f"Stage 1 (Analyze): {len(candidates)} candidates, " + f"{len(markdown)} chars markdown, page_map entries: {len(page_map)}" + ) + + return AnalyzeResult( + candidates=candidates, + markdown=markdown, + pages=[], + page_map=page_map, + enrichment_lookup={}, + ) + + +def _run_analyze_document_intelligence( + pdf_path: str, + options: PipelineOptions, +) -> AnalyzeResult: + """Azure Document Intelligence backend — prebuilt-layout with markdown output. + + DI returns markdown with embedded HTML tables natively, so no complex + adapter is needed. Auth is via Managed Identity (DefaultAzureCredential). + """ + from extractor.di_client import analyze_document + from extractor.di_adapter import build_page_map, classify_statements_with_llm + + logger.info("Stage 1 (Analyze): Running Azure Document Intelligence") + di_result = analyze_document(pdf_path) + + # DI markdown output already contains HTML tables + markdown = di_result.get("content", "") + page_map = build_page_map(di_result, markdown) + + # Classify statements via LLM (same approach as Textract/pdfplumber) + logger.info("Stage 1 (Analyze): Classifying statements via LLM") + stmt_classifications = classify_statements_with_llm(markdown) + + candidates = [] + for s in stmt_classifications: + candidates.append(CandidateStatement( + statement_type=s.get("statement_type", ""), + title_raw=s.get("title_raw"), + title_english=s.get("title_raw"), + page_start=s.get("page_start"), + page_end=s.get("page_end"), + company_name=s.get("company_name"), + company_name_raw=s.get("company_name"), + report_language=s.get("report_language"), + currency=s.get("currency"), + unit=s.get("unit"), + is_consolidated=s.get("is_consolidated"), + accounting_standard=s.get("accounting_standard"), + )) + + logger.info( + f"Stage 1 (Analyze): {len(candidates)} candidates, " + f"{len(markdown)} chars markdown, page_map entries: {len(page_map)}" + ) + + return AnalyzeResult( + candidates=candidates, + markdown=markdown, + pages=[], # DI doesn't use CU's page span format + page_map=page_map, + enrichment_lookup={}, # LLM fallback handles enrichment in Stage 4 + ) + + +def run_analyze( + pdf_path: str, + options: PipelineOptions, +) -> AnalyzeResult: + """Run Stage 1 — delegates to backend based on options.backend.""" + if options.backend == "document_intelligence": + return _run_analyze_document_intelligence(pdf_path, options) + elif options.backend == "textract": + return _run_analyze_textract(pdf_path, options) + elif options.backend == "pdfplumber": + return _run_analyze_pdfplumber(pdf_path, options) + return _run_analyze_cu(pdf_path, options) diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/contracts.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/contracts.py new file mode 100644 index 000000000..9cadd0ec5 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/contracts.py @@ -0,0 +1,199 @@ +""" +Typed data contracts for the extraction pipeline stages. + +Every stage receives and returns a dataclass. This enables: + - Type checking at stage boundaries + - Clear documentation of what each stage produces + - Easy serialization for debugging/logging +""" +from __future__ import annotations + +from dataclasses import dataclass, field +from enum import Enum +from typing import Optional + + +# --------------------------------------------------------------------------- +# Shared enums +# --------------------------------------------------------------------------- + +class StatementType(str, Enum): + BALANCE_SHEET = "balance_sheet" + INCOME_STATEMENT = "income_statement" + CASH_FLOW = "cash_flow" + + +class QualityStatus(str, Enum): + ACCEPTED = "accepted" # >= 0.90 + REVIEW = "review" # >= 0.70 + REJECTED = "rejected" # < 0.70 + + +# --------------------------------------------------------------------------- +# Pipeline options (input to the pipeline) +# --------------------------------------------------------------------------- + +@dataclass +class PipelineOptions: + """Options controlling pipeline behavior.""" + use_enrichment: bool = True + requested_types: list[str] = field( + default_factory=lambda: ["balance_sheet", "income_statement", "cash_flow"] + ) + source_file_name: str = "document.pdf" + backend: str = "cu" # "cu" | "textract" | "pdfplumber" + + +# --------------------------------------------------------------------------- +# Stage 1: Analyze +# --------------------------------------------------------------------------- + +@dataclass +class CandidateStatement: + """A single statement candidate from the CU Locator.""" + statement_type: str + title_raw: Optional[str] = None + title_english: Optional[str] = None + page_start: Optional[int] = None + page_end: Optional[int] = None + company_name: Optional[str] = None + company_name_raw: Optional[str] = None + report_language: Optional[str] = None + currency: Optional[str] = None + unit: Optional[str] = None + is_consolidated: Optional[bool] = None + accounting_standard: Optional[str] = None + + def to_dict(self) -> dict: + """Convert to dict for backward compatibility with existing code.""" + return { + "statement_type": self.statement_type, + "title_raw": self.title_raw, + "title_english": self.title_english, + "page_start": self.page_start, + "page_end": self.page_end, + "company_name": self.company_name, + "company_name_raw": self.company_name_raw, + "report_language": self.report_language, + "currency": self.currency, + "unit": self.unit, + "is_consolidated": self.is_consolidated, + "accounting_standard": self.accounting_standard, + } + + +@dataclass +class AnalyzeResult: + """Output of Stage 1 (Analyze).""" + candidates: list[CandidateStatement] + markdown: str + pages: list[dict] + page_map: list[tuple[int, int, int]] + enrichment_lookup: dict = field(default_factory=dict) + + +# --------------------------------------------------------------------------- +# Stage 2: Select +# --------------------------------------------------------------------------- + +@dataclass +class ScoredCandidate: + """A candidate with its selection score.""" + candidate: CandidateStatement + score: float + rejection_reason: Optional[str] = None + + +@dataclass +class SelectResult: + """Output of Stage 2 (Select).""" + selected: dict[str, CandidateStatement] # stype -> best candidate + rejected: dict[str, list[ScoredCandidate]] # stype -> rejected candidates + scores: dict[str, list[ScoredCandidate]] # stype -> all scored candidates + + +# --------------------------------------------------------------------------- +# Stage 3: Extract +# --------------------------------------------------------------------------- + +@dataclass +class ExtractedStatement: + """Parsed data for a single statement.""" + statement_type: str + rows: list[str] # row labels from parser + columns: list[str] # column headers from parser + cells: list[dict] # cell dicts from parser + md_offset: int = 0 + md_end_offset: int = 0 + start_page: int = 0 + end_page: int = 0 + tables_merged: int = 1 + + +@dataclass +class ExtractResult: + """Output of Stage 3 (Extract).""" + statements: dict[str, ExtractedStatement] # stype -> extracted data + failures: dict[str, str] # stype -> failure reason + + +# --------------------------------------------------------------------------- +# Stage 4: Enrich +# --------------------------------------------------------------------------- + +@dataclass +class EnrichedStatement: + """Statement with enrichment data applied.""" + statement_type: str + v12_doc: dict # Complete v1.2 schema document + company_name_verified: Optional[str] = None + columns_translated: bool = False + + +@dataclass +class EnrichResult: + """Output of Stage 4 (Enrich).""" + statements: dict[str, EnrichedStatement] # stype -> enriched doc + + +# --------------------------------------------------------------------------- +# Stage 5: Validate +# --------------------------------------------------------------------------- + +@dataclass +class ValidationCheck: + """Result of a single validation check.""" + name: str + passed: bool + score: float # 0.0 to 1.0 + weight: float # check weight + details: Optional[str] = None + + +@dataclass +class ValidatedStatement: + """Statement with validation results.""" + statement_type: str + v12_doc: dict + quality_score: float + status: QualityStatus + checks: list[ValidationCheck] + + +@dataclass +class ValidateResult: + """Output of Stage 5 (Validate).""" + statements: dict[str, ValidatedStatement] + + +# --------------------------------------------------------------------------- +# Pipeline result (final output) +# --------------------------------------------------------------------------- + +@dataclass +class PipelineResult: + """Final output of the full pipeline.""" + output: dict # The result dict (summary + statements + confidence) + validate_result: Optional[ValidateResult] = None + analyze_result: Optional[AnalyzeResult] = None + select_result: Optional[SelectResult] = None diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/enrich.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/enrich.py new file mode 100644 index 000000000..de0f29c37 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/enrich.py @@ -0,0 +1,243 @@ +""" +Stage 4: Enrich — translate rows + columns, verify company name, clean labels. + +Expanded scope over the current enrichment.py: + 1. Column header translation (e.g. "2025年前三季度 (1-9月)" -> "Q1-Q3 2025 (Jan-Sep)") + 2. Company name verification (cross-check CU Locator vs 编制单位 line in markdown) + 3. Label cleanup (strip parenthetical noise like "(Loss shown as '-')") + 4. Row label translation (existing batch LLM translation, reused via schema_mapper) +""" +import logging +import re +from typing import Optional + +from .contracts import ( + CandidateStatement, + EnrichedStatement, + EnrichResult, + ExtractedStatement, + ExtractResult, +) + +logger = logging.getLogger(__name__) + + +# --------------------------------------------------------------------------- +# Column header translation +# --------------------------------------------------------------------------- + +# Month name lookup +_MONTH_NAMES = [ + "", "Jan", "Feb", "Mar", "Apr", "May", "Jun", + "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", +] + + +def translate_column_header(raw: str) -> str: + """Translate a non-English column header to English. + + Returns the translated header, or the original if no pattern matches. + """ + result = raw.strip() + + # Label column header + if result == "项目": + return "Item" + + # Chinese Q1-Q3 period: 2025年前三季度 (1-9月) → Jan-Sep 2025 + result = re.sub( + r"(\d{4})\s*年\s*前三季度\s*[((]1-9月[))]?", + r"Jan-Sep \1", result, + ) + # Chinese H1 period: 2025年半年度 (1-6月) → Jan-Jun 2025 + result = re.sub( + r"(\d{4})\s*年\s*半年度\s*[((]1-6月[))]?", + r"Jan-Jun \1", result, + ) + + # Full date: 2025年9月30日 → Sep 30, 2025 (must be before year-only pattern) + def _date_repl(m: re.Match) -> str: + year, month, day = m.group(1), int(m.group(2)), m.group(3) + mn = _MONTH_NAMES[month] if 1 <= month <= 12 else str(month) + return f"{mn} {day}, {year}" + result = re.sub(r"(\d{4})\s*年\s*(\d{1,2})\s*月\s*(\d{1,2})\s*日", _date_repl, result) + + # Year only: 2025年 / 2025年度 → FY 2025 + result = re.sub(r"(\d{4})\s*年\s*(?:度|年度)?", r"FY \1 ", result) + + # Japanese: 2025年3月期 → Mar 2025 + def _jp_repl(m: re.Match) -> str: + year, month = m.group(1), int(m.group(2)) + mn = _MONTH_NAMES[month] if 1 <= month <= 12 else str(month) + return f"{mn} {year}" + result = re.sub(r"(\d{4})年(\d{1,2})月期", _jp_repl, result) + + # Cleanup pass: fix any remaining partial translations + # e.g., "FY 20259月30日" → from a mangled earlier run + def _cleanup_date(m: re.Match) -> str: + prefix = m.group(1).strip() + year = m.group(2) + month = int(m.group(3)) + day = m.group(4) + mn = _MONTH_NAMES[month] if 1 <= month <= 12 else str(month) + return f"{mn} {day}, {year}" + result = re.sub(r"(FY\s+)?(\d{4})\s*(\d{1,2})\s*月\s*(\d{1,2})\s*日", _cleanup_date, result) + + # Remove stale "FY" prefix if a full date was produced + result = re.sub(r"^FY\s+((?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s)", r"\1", result) + + # Normalize verbose period format: "Q1-Q3 2025 (Jan-Sep)" → "Jan-Sep 2025" + result = re.sub(r"Q1-Q3\s+(\d{4})\s*\(Jan-Sep\)", r"Jan-Sep \1", result) + result = re.sub(r"H1\s+(\d{4})\s*\(Jan-Jun\)", r"Jan-Jun \1", result) + + return result.strip() + + +# --------------------------------------------------------------------------- +# Company name verification +# --------------------------------------------------------------------------- + +def verify_company_name( + locator_name: Optional[str], + markdown: str, +) -> Optional[str]: + """Cross-check the CU Locator's company name against the markdown. + + Looks for 编制单位 (preparation unit) line in Chinese reports. + If found and different from locator_name, returns the markdown version. + Otherwise returns locator_name. + """ + if not markdown: + return locator_name + + # Look for 编制单位: pattern + m = re.search(r"编制单位[::]\s*(.+?)[\s\n]", markdown[:5000]) + if m: + doc_name = m.group(1).strip() + if doc_name and locator_name and doc_name != locator_name: + logger.info( + f" Company name mismatch: locator='{locator_name}', " + f"document='{doc_name}'. Using document version." + ) + return doc_name + + return locator_name + + +# --------------------------------------------------------------------------- +# Label cleanup +# --------------------------------------------------------------------------- + +_NOISE_PATTERNS = [ + r"""\s*[((](?:Loss|Losses?)\s+(?:shown|indicated)\s+as\s+["'\u2018\u2019\u201c\u201d-]+[))]""", + r"\s*[((](?:亏损|损失).*?[))]", + r"\s*[((](?:Note|注)\s*\d+[))]", +] + + +def clean_label(raw: str) -> str: + """Strip parenthetical noise from a label. Preserves raw in label_raw.""" + result = raw + for pattern in _NOISE_PATTERNS: + result = re.sub(pattern, "", result, flags=re.IGNORECASE) + return result.strip() + + +# --------------------------------------------------------------------------- +# Stage entry point +# --------------------------------------------------------------------------- + +def run_enrich( + extract_result: ExtractResult, + selected: dict[str, CandidateStatement], + enrichment_lookup: dict, + markdown: str, + source_file_name: str, +) -> EnrichResult: + """Run Stage 4: enrich each extracted statement. + + Calls schema_mapper.assemble_v12_statement() which internally handles + row enrichment (CU lookup + LLM translation). This stage adds: + - Column header translation + - Company name verification + - Label cleanup (applied to the v12 doc after assembly) + + Args: + extract_result: Output from Stage 3. + selected: Selected candidates from Stage 2. + enrichment_lookup: CU extractor enrichment from Stage 1. + markdown: Full document markdown (for company name verification). + source_file_name: Original PDF filename. + + Returns: + EnrichResult with enriched v1.2 documents. + """ + from extractor.schema_mapper import assemble_v12_statement + + statements: dict[str, EnrichedStatement] = {} + + for stype, extracted in extract_result.statements.items(): + candidate = selected.get(stype) + + if candidate: + locator_metadata = candidate.to_dict() + else: + # No selected candidate (e.g. Textract backend where LLM + # classification was sparse). Build minimal metadata so + # enrich can still process the extracted statement. + logger.info(f" {stype}: no selected candidate, using extraction defaults") + locator_metadata = { + "statement_type": stype, + "company_name": None, + "report_language": None, + "currency": None, + "unit": None, + "accounting_standard": None, + } + + # Verify company name against document text + verified_name = verify_company_name( + candidate.company_name if candidate else None, markdown + ) + if verified_name: + locator_metadata["company_name"] = verified_name + + # Translate column headers + translated_columns = [ + translate_column_header(col) for col in extracted.columns + ] + columns_translated = translated_columns != extracted.columns + + # Assemble v1.2 document (includes row enrichment + validation) + v12_doc = assemble_v12_statement( + statement_type=stype, + locator_metadata=locator_metadata, + parser_rows=extracted.rows, + parser_columns=translated_columns, + parser_cells=extracted.cells, + enrichment_lookup=enrichment_lookup.copy(), + source_file_name=source_file_name, + ) + + # Post-process: clean labels in the assembled document + for row in v12_doc.get("rows", []): + raw_label = row.get("label_raw", "") + cleaned = clean_label(raw_label) + if cleaned != raw_label: + # Keep raw, update normalized display + if not row.get("label_normalized"): + row["label_normalized"] = cleaned + + # Store translated column info in the v12 doc columns + for i, col_meta in enumerate(v12_doc.get("columns", [])): + if i < len(extracted.columns): + col_meta["label_raw"] = extracted.columns[i] + + statements[stype] = EnrichedStatement( + statement_type=stype, + v12_doc=v12_doc, + company_name_verified=verified_name, + columns_translated=columns_translated, + ) + + return EnrichResult(statements=statements) diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/extract.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/extract.py new file mode 100644 index 000000000..f71bd375a --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/extract.py @@ -0,0 +1,552 @@ +""" +Stage 3: Extract — full-markdown heading search + anti-contamination merging. + +Hybrid approach: + - FINDING tables: search the FULL markdown (like old pipeline) — locator + page ranges are unreliable (can point to summary tables, not statements) + - MERGING tables: use anti-contamination checks to stop at the right boundary + (other statement headings, parent company markers, note headings, unrelated labels) + +Search priority: + 1. Known heading patterns for statement type (full markdown) + 2. Locator's title_raw as fallback (full markdown) + +Anti-contamination during table merging: + - Stop if gap contains another statement heading or parent company markers + - Stop if next table's rows contain labels from a different statement type + - Stop at note headings (Notes to Financial Statements, etc.) + - Max 3 continuation tables +""" +import logging +import re +from typing import Optional + +from .contracts import ( + CandidateStatement, + ExtractedStatement, + ExtractResult, + SelectResult, +) + +logger = logging.getLogger(__name__) + + +# --------------------------------------------------------------------------- +# Heading patterns per statement type +# --------------------------------------------------------------------------- + +_HEADING_PATTERNS: dict[str, list[str]] = { + "balance_sheet": [ + "合并资产负债表", "連結貸借対照表", + "STATEMENT OF FINANCIAL POSITION", + "CONDENSED CONSOLIDATED BALANCE SHEET", + "CONSOLIDATED BALANCE SHEET", "BALANCE SHEET", + "BILAN", "资产负债表", + ], + "income_statement": [ + "合并利润表", "連結損益計算書", + "STATEMENT OF PROFIT OR LOSS", + "CONSOLIDATED STATEMENT OF PROFIT OR LOSS", + "CONDENSED CONSOLIDATED STATEMENTS OF INCOME", + "CONSOLIDATED STATEMENTS OF INCOME", + "CONSOLIDATED STATEMENT OF INCOME", "INCOME STATEMENT", + "COMPTE DE RESULTAT", "利润表", + # OCI/Comprehensive Income — only match if content validation confirms it has revenue + # (UK small companies use SCI as the income statement) + "STATEMENT OF COMPREHENSIVE INCOME", + ], + "cash_flow": [ + "合并现金流量表", "連結キャッシュ・フロー計算書", + "STATEMENT OF CASH FLOWS", + "CONDENSED CONSOLIDATED STATEMENTS OF CASH", + "CONSOLIDATED STATEMENTS OF CASH", + "CONSOLIDATED STATEMENT OF CASH", "CASH FLOW", + "TABLEAU DES FLUX", "现金流量表", + ], +} + +# Note headings / section dividers that signal we've left the statement +_NOTE_HEADING_RE = re.compile( + r"\b\d{1,2}\.\s+[A-Z]" + r"|\bNote\s+\d" + r"|\bNotes to\b" + r"|Statement of Changes" + r"|Comprehensive Income" + r"|Directors[''']?\s*Report" + r"|Accounting Policies" + r"|Significant Accounting" + r"|母公司" + r"|个别财务报表" + r"|Segment\s+(?:Information|Results|Revenue)" + r"|Supplemental\s+(?:Financial|Revenue)" + r"|Revenue\s+by\s+(?:Segment|Geography)" + r"|Reconciliation\s+of" + # French PCG: stop BS continuation at Income Statement heading + r"|COMPTE\s+DE\s+R[EÉ]SULTAT", + re.IGNORECASE, +) + +# Labels that don't belong in a continuation table for this statement type +_NON_CONTINUATION_LABELS: dict[str, list[str]] = { + "balance_sheet": [ + "administrative expenses", "wages and salaries", "revenue", + "cost of sales", "operating profit", "profit for", + "depreciation", "amortisation", "amortization", + "directors' emoluments", "directors emoluments", + "pension cost", "social security", "audit fee", + "tax charge", "corporation tax", "deferred tax", + "dividend", "interest payable", "interest receivable", + "underlying ebitda", "ebitda", "cash from operating", + ], + "income_statement": [ + "total assets", "total liabilities", "total equity", + "cash and cash equivalents", "trade debtors", "trade creditors", + "fixed assets", "current assets", "creditors", + ], + "cash_flow": [ + "total assets", "total liabilities", "total equity", + "trade debtors", "trade creditors", "fixed assets", + "share capital", "retained earnings", + "underlying ebitda", "ebitda reconciliation", + # Segment data that bleeds after CF ends + "segment", "family of apps", "reality labs", + "segment revenue", "segment income", "segment operating", + "revenue by geography", "headcount", + ], +} + +MAX_CONTINUATION_TABLES = 3 + + +# --------------------------------------------------------------------------- +# Content validation — does the table actually match the statement type? +# --------------------------------------------------------------------------- + +# Labels that indicate a table is an Income Statement (not a Cash Flow) +_IS_CONTENT_LABELS = [ + "revenue", "total revenue", "operating revenue", + "cost of sales", "cost of goods", "gross profit", + "operating expenses", "operating profit", "ebitda", + "profit before tax", "income tax", "net income", + "earnings per share", "diluted earnings", + # French PCG + "chiffre d'affaires", "produits d'exploitation", + "charges d'exploitation", "resultat d'exploitation", + # Non-GAAP indicators (should NOT be in CF) + "non-gaap", "adjusted ebitda", "adjusted operating", +] + +# Labels that indicate a table is a Cash Flow statement +_CF_CONTENT_LABELS = [ + "cash flows from operating", "cash provided by operating", + "cash used in investing", "cash flows from investing", + "cash flows from financing", "cash provided by financing", + "net increase in cash", "net decrease in cash", + "cash at beginning", "cash at end", + "cash and cash equivalents at", +] + + +def _validate_table_content(statement_type: str, markdown: str, table_start: int, table_end: int) -> bool: + """Check if a table's content matches the expected statement type. + + Returns True if content looks correct, False if it's the wrong statement. + """ + table_html = markdown[table_start:min(table_end, table_start + 3000)] + first_cells = re.findall(r"]*>(.*?)", table_html, re.DOTALL | re.IGNORECASE) + cell_text = " ".join(re.sub(r"<[^>]+>", "", c).strip().lower() for c in first_cells[:15]) + + if statement_type == "cash_flow": + # If first rows contain IS-like content but no CF-like content, it's wrong + has_is_content = any(label in cell_text for label in _IS_CONTENT_LABELS) + has_cf_content = any(label in cell_text for label in _CF_CONTENT_LABELS) + if has_is_content and not has_cf_content: + logger.info(f" {statement_type}: table content looks like Income Statement, skipping") + return False + + if statement_type == "income_statement": + # If first rows only contain OCI items (hedging, translation) with no revenue/profit, skip + has_revenue = any(label in cell_text for label in [ + "revenue", "total revenue", "operating revenue", "sales", + "turnover", "gross profit", "operating profit", "profit before", + "chiffre d'affaires", "produits d'exploitation", + ]) + has_oci_only = any(label in cell_text for label in ["hedging", "translation differences"]) + has_no_profit = not any(label in cell_text for label in ["profit", "loss", "income", "revenue", "turnover", "sales"]) + if has_oci_only and has_no_profit: + logger.info(f" {statement_type}: table content looks like pure OCI statement, skipping") + return False + + return True + + +# --------------------------------------------------------------------------- +# Full-markdown heading search (like old pipeline — reliable) +# --------------------------------------------------------------------------- + +def _find_heading_offset( + statement_type: str, + title_raw: str, + markdown: str, +) -> Optional[int]: + """Find the heading offset by searching the FULL markdown. + + Strategy 1: Known heading patterns (most reliable — language-aware) + Strategy 2: Locator's title_raw as fallback + + Both strategies validate table content to avoid picking the wrong table + (e.g., P&L summary on the CF page, OCI instead of IS). + + Returns absolute offset into the markdown, or None. + """ + # --- Strategy 1: Known heading patterns (full markdown) --- + for pattern in _HEADING_PATTERNS.get(statement_type, []): + for m in re.finditer(re.escape(pattern), markdown, re.IGNORECASE): + lookahead = markdown[m.start():m.start() + 500] + lookbehind = markdown[max(0, m.start() - 100):m.start()] + + # Check: heading before a
, OR heading inside a ", next_table_html[:500], re.IGNORECASE) + if caption_match: + caption_text = caption_match.group(1).upper() + if any(h.upper() in caption_text for h in other_headings): + logger.info(f" {statement_type}: stopping merge — next table caption contains other statement heading") + break + + # Stop: next table content doesn't belong to this statement type + first_cells = re.findall( + r"]*>(.*?)", next_table_html[:2000], + re.DOTALL | re.IGNORECASE, + ) + first_labels = " ".join( + re.sub(r"<[^>]+>", "", c).strip().lower() + for c in first_cells[:6] + ) + bad_labels = _NON_CONTINUATION_LABELS.get(statement_type, []) + if any(bl in first_labels for bl in bad_labels): + break + + end_offset = end_offset + next_close + len("
within a + has_table_ahead = "
" in lookahead.lower() + has_table_behind = "
" in lookbehind.lower() and "", markdown[table_start:table_end], re.IGNORECASE)) + min_rows = {"balance_sheet": 10, "income_statement": 8, "cash_flow": 8}.get(statement_type, 5) + if row_count < min_rows: + logger.info( + f" {statement_type}: skipping small table at offset {m.start()} " + f"({row_count} rows < {min_rows} min)" + ) + continue + + logger.info( + f" {statement_type}: matched heading pattern " + f"'{pattern}' at offset {m.start()}" + f"{' (caption)' if has_table_behind else ''}" + ) + return table_start if has_table_behind else m.start() + + # --- Strategy 2: Locator's raw title (full markdown) --- + if title_raw and len(title_raw) > 5: + for prefix_len in [len(title_raw), 40, 20]: + prefix = title_raw[:prefix_len].strip() + if not prefix: + continue + idx = markdown.lower().find(prefix.lower()) + if idx >= 0: + lookahead = markdown[idx:idx + 500] + if "
" in lookbehind.lower() + + if has_table_ahead or has_table_behind: + # Find the actual table start + if has_table_behind: + table_start = markdown.rfind("", max(0, m.start() - 100), m.start()) + else: + table_start = markdown.find("
", m.start()) + table_end = markdown.find("
", table_start) if table_start >= 0 else -1 + + # Validate: does the table content match this statement type? + if table_start >= 0 and table_end >= 0: + if not _validate_table_content(statement_type, markdown, table_start, table_end): + continue # Skip this match, try next occurrence + + # Skip tables that are too small to be a primary statement + # (likely a summary or note table with the same heading) + row_count = len(re.findall(r"
" in lookahead.lower(): + logger.info( + f" {statement_type}: matched title_raw " + f"'{prefix[:40]}' at offset {idx}" + ) + return idx + + return None + + + +# --------------------------------------------------------------------------- +# Table merging with anti-contamination +# --------------------------------------------------------------------------- + +def _merge_continuation_tables( + statement_type: str, + markdown: str, + table_start: int, + initial_end: int, +) -> tuple[int, int]: + """Merge consecutive tables that belong to the same statement. + + Uses content-based checks only (like old pipeline). No page range constraints + because locator page ranges can be wrong. + + Returns (end_offset, tables_merged). + """ + # Build set of OTHER statement type headings + other_headings: list[str] = [] + for other_type, patterns in _HEADING_PATTERNS.items(): + if other_type != statement_type: + other_headings.extend(patterns) + + end_offset = initial_end + tables_merged = 1 + + while tables_merged < MAX_CONTINUATION_TABLES: + remaining = markdown[end_offset:] + next_table = remaining.find("
") + if next_table < 0: + break + + gap_text = remaining[:next_table] + + # Stop: another statement type heading in the gap + if any(h.upper() in gap_text.upper() for h in other_headings): + break + + # Stop: note headings or parent company markers in the gap + # Exception: "Comprehensive Income" is a valid IS continuation (IFRS) + note_match = _NOTE_HEADING_RE.search(gap_text) + if note_match: + matched_text = note_match.group(0) + if statement_type == "income_statement" and "comprehensive income" in matched_text.lower(): + pass # Allow IS to continue past SCI heading + else: + break + + # Stop: large gap (> 500 chars non-whitespace = section break) + gap_stripped = re.sub(r"\s+", "", gap_text) + if len(gap_stripped) > 500: + break + + # Stop: next table has a
with another statement's heading + next_close = remaining.find("
", next_table) + if next_close < 0: + break + next_table_html = remaining[next_table:next_close] + caption_match = re.search(r"
(.*?)
") + tables_merged += 1 + + return end_offset, tables_merged + + +# --------------------------------------------------------------------------- +# Locate a single statement's table +# --------------------------------------------------------------------------- + +def locate_table( + statement_type: str, + candidate: CandidateStatement, + markdown: str, + page_map: list[tuple[int, int, int]], +) -> Optional[dict]: + """Find and merge table(s) for a statement in the markdown. + + Uses full-markdown heading search (reliable) + content-based merging + with anti-contamination (prevents bleeding). + + Returns dict with md_offset, md_end_offset, start_page, end_page, tables_merged + or None if not found. + """ + from extractor.statement_detector import _offset_to_page + + title_raw = (candidate.title_raw or "").strip() + + # Step 1: Find heading in FULL markdown + direct_offset = _find_heading_offset(statement_type, title_raw, markdown) + + if direct_offset is None: + logger.warning(f" {statement_type}: could not locate via heading search") + return None + + # Step 2: Find the table block + table_start = markdown.find("", direct_offset) + if table_start < 0: + return None + table_end = markdown.find("
", table_start) + if table_end < 0: + return None + initial_end = table_end + len("") + + # Step 3: Merge continuation tables (content-based anti-contamination) + end_offset, tables_merged = _merge_continuation_tables( + statement_type, markdown, table_start, initial_end, + ) + + start_page = _offset_to_page(direct_offset, page_map) if page_map else 1 + end_page = _offset_to_page(end_offset, page_map) if page_map else start_page + + logger.info( + f" {statement_type}: located table at pages {start_page}-{end_page} " + f"({tables_merged} table(s), {end_offset - direct_offset} chars)" + ) + + return { + "md_offset": direct_offset, + "md_end_offset": end_offset, + "start_page": start_page, + "end_page": end_page, + "tables_merged": tables_merged, + } + + +# --------------------------------------------------------------------------- +# Stage entry point +# --------------------------------------------------------------------------- + +def _locate_by_llm_classification( + statement_type: str, + classification, # TableClassification from llm_table_classifier + markdown: str, + page_map: list[tuple[int, int, int]], +) -> Optional[dict]: + """Locate and merge tables starting from an LLM-classified table offset.""" + from extractor.statement_detector import _offset_to_page + + table_start = classification.md_offset + table_end_initial = classification.md_end_offset + + # Merge continuation tables (same anti-contamination logic) + end_offset, tables_merged = _merge_continuation_tables( + statement_type, markdown, table_start, table_end_initial, + ) + + start_page = _offset_to_page(table_start, page_map) if page_map else 1 + end_page = _offset_to_page(end_offset, page_map) if page_map else start_page + + logger.info( + f" {statement_type}: LLM classified table at pages {start_page}-{end_page} " + f"({tables_merged} table(s), confidence={classification.confidence:.2f})" + ) + + return { + "md_offset": table_start, + "md_end_offset": end_offset, + "start_page": start_page, + "end_page": end_page, + "tables_merged": tables_merged, + } + + +def run_extract( + select_result: SelectResult, + markdown: str, + page_map: list[tuple[int, int, int]], + pages: list[dict], + requested_types: list[str] | None = None, +) -> ExtractResult: + """Run Stage 3: locate and parse tables for each selected statement. + + Strategy (layered, most reliable first): + 1. Heading pattern search (fast, works for known formats) + 2. LLM table classifier (handles any language/format, one call for all) + 3. CU Locator title_raw fallback (last resort) + + Args: + select_result: Output from Stage 2. + markdown: Full document markdown from CU. + page_map: Page offset map from Stage 1. + pages: Raw page objects from CU. + requested_types: Statement types to extract (for fallback search). + + Returns: + ExtractResult with parsed data per statement. + """ + from extractor.html_table_parser import parse_html_table + + if requested_types is None: + requested_types = list(select_result.selected.keys()) + + statements: dict[str, ExtractedStatement] = {} + failures: dict[str, str] = {} + + # --- Layer 1: Try heading pattern search for each type --- + types_needing_llm = [] + + for stype in requested_types: + candidate = select_result.selected.get(stype) + + if candidate: + location = locate_table(stype, candidate, markdown, page_map) + else: + dummy = CandidateStatement(statement_type=stype) + location = locate_table(stype, dummy, markdown, page_map) + + if location: + rows, columns, cells = parse_html_table( + markdown, location["md_offset"], location["md_end_offset"] + ) + if cells: + statements[stype] = ExtractedStatement( + statement_type=stype, + rows=rows, columns=columns, cells=cells, + md_offset=location["md_offset"], + md_end_offset=location["md_end_offset"], + start_page=location["start_page"], + end_page=location["end_page"], + tables_merged=location.get("tables_merged", 1), + ) + continue + + # Heading search failed — need LLM fallback + types_needing_llm.append(stype) + + # --- Layer 2: LLM table classifier for any types not found --- + if types_needing_llm: + logging.info( + f"[Extract] Heading search missed: {types_needing_llm}. " + f"Running LLM table classifier..." + ) + try: + from extractor.llm_table_classifier import classify_tables, get_best_table + + classifications = classify_tables(markdown) + + for stype in list(types_needing_llm): + best = get_best_table(classifications, stype) + if not best: + continue + + location = _locate_by_llm_classification( + stype, best, markdown, page_map, + ) + if not location: + continue + + rows, columns, cells = parse_html_table( + markdown, location["md_offset"], location["md_end_offset"] + ) + if cells: + statements[stype] = ExtractedStatement( + statement_type=stype, + rows=rows, columns=columns, cells=cells, + md_offset=location["md_offset"], + md_end_offset=location["md_end_offset"], + start_page=location["start_page"], + end_page=location["end_page"], + tables_merged=location.get("tables_merged", 1), + ) + types_needing_llm.remove(stype) + + except Exception as e: + logging.warning(f"[Extract] LLM classifier failed: {e}") + + # --- Mark remaining as failures --- + for stype in types_needing_llm: + failures[stype] = "could_not_locate_in_markdown" + + return ExtractResult(statements=statements, failures=failures) diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/select.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/select.py new file mode 100644 index 000000000..f075e686a --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/select.py @@ -0,0 +1,225 @@ +""" +Stage 2: Select — scoring-based statement selection. + +Adapted from select_v2.py (experiments repo). Each candidate gets a score +based on how likely it is to be the primary statutory statement. Highest +score wins per statement type. Below -50 threshold: rejected. + +Scoring signals: + +50 Primary heading pattern (consolidated + temporal markers) + +30 Temporal markers ("for the year ended", "as at") + +20 is_consolidated flag + +15 "Consolidated" keyword (any language) + +10 Statement type keyword in raw title + +5 Page >= 80 (financial statements section, not MD&A) + -20 Single-page early page without consolidated markers + -30 Incomplete page range (start but no end) + -100 Non-primary patterns (notes, bridges, segments, parent company) + -500 Empty title + -1000 Ghost match (no page range) +""" +import logging +import re +from typing import Optional + +from .contracts import ( + AnalyzeResult, + CandidateStatement, + ScoredCandidate, + SelectResult, +) + +logger = logging.getLogger(__name__) + +# --------------------------------------------------------------------------- +# Patterns +# --------------------------------------------------------------------------- + +_PRIMARY_HEADING_PATTERNS = [ + r"(?:consolidated\s+)?(?:statement|balance sheet|income statement).*(?:for the (?:year|period)|as at|as of)", + r"consolidated\s+statement\s+of\s+(?:financial position|profit|income|cash|comprehensive)", + r"consolidated\s+balance\s+sheet", + r"consolidated\s+(?:income|p&l)\s+statement", + r"consolidated\s+(?:statement\s+of\s+)?cash\s+flow", + r"合并资产负债表", + r"合并利润表", + r"合并现金流量表", + r"連結貸借対照表", + r"連結損益計算書", + r"連結キャッシュ・フロー計算書", + r"bilan\s+consoli", + r"compte\s+de\s+r[ée]sultat\s+consoli", + r"tableau\s+des\s+flux\s+de\s+tr[ée]sorerie", +] + +_NON_PRIMARY_PATTERNS = [ + r"\bnote\s+\d", + r"\bnotes?\s+to\b", + r"\bsegment\b", + r"\breconciliation\b", + r"\bbridge\b", + r"\bebitda\b", + r"\bnon.?gaap\b", + r"\badjusted\b", + r"\bsummary\b.*\b(?:aplng|subsidiary|joint venture)\b", + r"\bparent\s+(?:company|entity)\b", + r"母公司", + r"\bunderlying\b", + r"\b(?:c|d|e|f|g)\.\d+\b", + r"\bcontinued\b", + r"\bproperty,?\s+plant", + r"\bintangible\s+assets\b", + r"\bincome\s+tax\s+expense\b", + r"\binterest.bearing\s+liabilities\b", + r"\bderivatives?\b", + r"\bdisclosure\s+statement\b", + r"\bfinancial\s+summary\b", +] + +_TYPE_KEYWORDS: dict[str, list[str]] = { + "balance_sheet": [r"financial position", r"balance sheet", r"资产负债", r"貸借対照", r"bilan"], + "income_statement": [r"profit or loss", r"income statement", r"利润表", r"損益", r"compte de r"], + "cash_flow": [r"cash flow", r"现金流量", r"キャッシュ", r"flux de tr"], +} + +# Minimum score for a candidate to be selected (below this = rejected) +_MIN_SCORE_THRESHOLD = -50 + + +# --------------------------------------------------------------------------- +# Scoring +# --------------------------------------------------------------------------- + +def score_statement(stmt: CandidateStatement, statement_type: str) -> float: + """Score a candidate statement. Higher = more likely primary.""" + score = 0.0 + + title_raw = (stmt.title_raw or "").strip() + title_en = (stmt.title_english or "").strip() + combined_title = f"{title_raw} {title_en}".lower() + page_start = stmt.page_start or 0 + page_end = stmt.page_end or 0 + + # --- Disqualifiers --- + + # Ghost match (no page range) — penalize but don't kill if title is good + # CU Locator sometimes returns consolidated statements without page ranges. + # The extract stage searches full markdown, so missing pages is recoverable. + if page_start == 0 and page_end == 0: + if stmt.is_consolidated and title_raw: + score -= 40 # Penalty but still selectable if title scores high + else: + return -1000 + + # Incomplete page range + if page_start > 0 and page_end == 0: + score -= 30 + + # Empty title + if not title_raw and not title_en: + return -500 + + # Non-primary patterns (notes, segments, bridges, parent company) + for pattern in _NON_PRIMARY_PATTERNS: + if re.search(pattern, combined_title, re.IGNORECASE): + score -= 100 + break + + # --- Positive signals --- + + # is_consolidated flag + if stmt.is_consolidated is True: + score += 20 + + # Primary heading pattern match + for pattern in _PRIMARY_HEADING_PATTERNS: + if re.search(pattern, combined_title, re.IGNORECASE): + score += 50 + break + + # Temporal markers + if re.search(r"for the (?:year|period|half.year|quarter)", combined_title, re.IGNORECASE): + score += 30 + if re.search(r"as (?:at|of)\s+\d", combined_title, re.IGNORECASE): + score += 30 + + # "Consolidated" keyword (any language) + if re.search(r"consoli|合并|連結|consolidé", combined_title, re.IGNORECASE): + score += 15 + + # Statement type keyword in title + for kw in _TYPE_KEYWORDS.get(statement_type, []): + if re.search(kw, combined_title, re.IGNORECASE): + score += 10 + break + + # Penalty for single-page early match without consolidated markers + if page_start == page_end and page_start < 60: + if not re.search(r"consoli|statutory|合并|連結", combined_title, re.IGNORECASE): + score -= 20 + + # Prefer statements in the financial statements section + if page_start >= 80: + score += 5 + + return score + + +def run_select( + analyze_result: AnalyzeResult, + requested_types: list[str], +) -> SelectResult: + """Run Stage 2: score all candidates and select best per type.""" + # Group candidates by type + by_type: dict[str, list[CandidateStatement]] = {} + for c in analyze_result.candidates: + if c.statement_type in requested_types: + by_type.setdefault(c.statement_type, []).append(c) + + selected: dict[str, CandidateStatement] = {} + rejected: dict[str, list[ScoredCandidate]] = {} + all_scores: dict[str, list[ScoredCandidate]] = {} + + for stype in requested_types: + candidates = by_type.get(stype, []) + if not candidates: + all_scores[stype] = [] + rejected[stype] = [] + continue + + scored = [] + for c in candidates: + s = score_statement(c, stype) + reason = None + if s <= _MIN_SCORE_THRESHOLD: + if s <= -1000: + reason = "ghost_match_no_pages" + elif s <= -500: + reason = "empty_title" + else: + reason = "below_score_threshold" + scored.append(ScoredCandidate(candidate=c, score=s, rejection_reason=reason)) + + scored.sort(key=lambda x: x.score, reverse=True) + all_scores[stype] = scored + + best = scored[0] + if best.score > _MIN_SCORE_THRESHOLD: + selected[stype] = best.candidate + rejected[stype] = [sc for sc in scored[1:]] + else: + rejected[stype] = scored + + # Logging + logger.info( + f" Stage 2 (Select) {stype}: {len(candidates)} candidates, " + f"best score={best.score:.0f}" + ) + if len(scored) > 1: + runner_up = scored[1] + logger.info( + f" 2nd: score={runner_up.score:.0f} " + f"page {runner_up.candidate.page_start}-{runner_up.candidate.page_end}" + ) + + return SelectResult(selected=selected, rejected=rejected, scores=all_scores) diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/validate.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/validate.py new file mode 100644 index 000000000..8fc5cdee2 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/stages/validate.py @@ -0,0 +1,264 @@ +""" +Stage 5: Validate — structural validation gates. + +7 checks with weighted scoring: + 1. Required anchors (25%) — critical rows present + 2. Subtotal arithmetic (20%) — children sum ≈ subtotal + 3. Balance equation (15%) — Assets ≈ Liabilities + Equity + 4. Min row count (15%) — BS>=15, IS>=8, CF>=12 + 5. Cross-statement (10%) — NI on IS ≈ NI on CF + 6. Value density (10%) — >=60% of line items have non-null values + 7. Period/currency (5%) — consistent across statements + +Quality thresholds: + >= 0.90 → accepted + >= 0.70 → review + < 0.70 → rejected +""" +import logging +from typing import Optional + +from .contracts import ( + EnrichedStatement, + EnrichResult, + QualityStatus, + ValidatedStatement, + ValidateResult, + ValidationCheck, +) + +logger = logging.getLogger(__name__) + +_MIN_ROW_COUNTS: dict[str, int] = { + "balance_sheet": 15, + "income_statement": 8, + "cash_flow": 12, +} + +_REQUIRED_ANCHORS: dict[str, list[list[str]]] = { + "balance_sheet": [ + ["total_assets", "total_general"], + ["total_liabilities", "total_liabilities_and_equity"], + ], + "income_statement": [ + ["revenue", "net_revenue", "total_revenue", "turnover", "operating_income", + "net_sales", "chiffre_d_affaires"], + ], + "cash_flow": [ + ["net_cash_from_operating_activities", "net_cash_provided_by_operating_activities"], + ["net_cash_used_in_investing_activities", "net_cash_from_investing_activities"], + ["net_cash_used_in_financing_activities", "net_cash_from_financing_activities"], + ], +} + +# Fallback: also check by label text when canonical keys don't match +# (for French/non-English reports where canonical key mapping may be incomplete) +_REQUIRED_ANCHOR_LABELS: dict[str, list[list[str]]] = { + "balance_sheet": [ + ["total assets", "total general", "total actif"], + ["total liabilities", "total passif", "total general (i"], + ], + "income_statement": [ + ["revenue", "net sales", "turnover", "chiffre d'affaires", "net turnover", + "operating revenue", "total operating revenue"], + ], + "cash_flow": [ + ["cash from operating", "cash flows from operating", "net cash from operating"], + ["cash from investing", "cash flows from investing", "net cash from investing"], + ["cash from financing", "cash flows from financing", "net cash from financing"], + ], +} + +_QUALITY_THRESHOLDS = { + "accepted": 0.90, + "review": 0.70, +} + + +def _check_required_anchors(v12_doc: dict, statement_type: str) -> ValidationCheck: + """Check that required anchor rows are present (by canonical key OR label text).""" + anchor_groups = _REQUIRED_ANCHORS.get(statement_type, []) + label_groups = _REQUIRED_ANCHOR_LABELS.get(statement_type, []) + if not anchor_groups: + return ValidationCheck("required_anchors", True, 1.0, 0.25) + + rows = v12_doc.get("rows", []) + present_keys = {r.get("canonical_key", "") for r in rows} + present_labels = { + (r.get("label_normalized") or r.get("label_raw", "")).lower() + for r in rows + } + + groups_found = 0 + missing_groups = [] + for i, group in enumerate(anchor_groups): + # Check by canonical key first + if any(key in present_keys for key in group): + groups_found += 1 + continue + # Fallback: check by label text + label_group = label_groups[i] if i < len(label_groups) else [] + if any(any(lbl in pl for pl in present_labels) for lbl in label_group): + groups_found += 1 + continue + missing_groups.append(group) + + score = groups_found / len(anchor_groups) if anchor_groups else 1.0 + details = f"Missing: {missing_groups}" if missing_groups else None + return ValidationCheck(name="required_anchors", passed=score == 1.0, score=score, weight=0.25, details=details) + + +def _check_subtotal_arithmetic(v12_doc: dict) -> ValidationCheck: + rows = v12_doc.get("rows", []) + warnings = v12_doc.get("validation", {}).get("warnings", []) + subtotal_indices = {r["row_index"] for r in rows if r.get("row_type") == "subtotal"} + if not subtotal_indices: + return ValidationCheck("subtotal_arithmetic", True, 1.0, 0.20) + warned_indices = {w["row_index"] for w in warnings if w.get("code") == "SUBTOTAL_MISMATCH"} + failed = subtotal_indices & warned_indices + score = (len(subtotal_indices) - len(failed)) / len(subtotal_indices) + return ValidationCheck(name="subtotal_arithmetic", passed=len(failed) == 0, score=score, weight=0.20, details=f"{len(failed)}/{len(subtotal_indices)} subtotals mismatched" if failed else None) + + +def _check_balance_equation(v12_doc: dict, statement_type: str) -> ValidationCheck: + if statement_type != "balance_sheet": + return ValidationCheck("balance_equation", True, 1.0, 0.15) + rows = v12_doc.get("rows", []) + key_values: dict[str, Optional[float]] = {} + + # Look up by canonical key + for r in rows: + key = r.get("canonical_key", "") + if key in ("total_assets", "total_liabilities", "total_equity", + "total_liabilities_and_equity", "total_general"): + vals = r.get("values", []) + if vals and vals[0].get("normalized") is not None: + key_values[key] = vals[0]["normalized"] + + # Fallback: look up by label text (for French/non-English reports) + if not key_values: + _LABEL_MAP = { + "total_assets": ["total assets", "total actif", "total general (i"], + "total_liabilities_and_equity": ["total liabilities and equity", "total passif", + "total general (i à v", "total general (i to v"], + } + for r in rows: + label = (r.get("label_normalized") or r.get("label_raw", "")).lower() + for target_key, patterns in _LABEL_MAP.items(): + if any(p in label for p in patterns) and target_key not in key_values: + vals = r.get("values", []) + if vals and vals[0].get("normalized") is not None: + key_values[target_key] = vals[0]["normalized"] + + total_assets = key_values.get("total_assets") or key_values.get("total_general") + total_le = key_values.get("total_liabilities_and_equity") + total_l = key_values.get("total_liabilities") + total_e = key_values.get("total_equity") + if total_assets is not None and total_le is not None: + diff = abs(total_assets - total_le) + tolerance = abs(total_assets) * 0.01 if total_assets else 1.0 + passed = diff <= tolerance + score = 1.0 if passed else max(0.0, 1.0 - diff / max(abs(total_assets), 1)) + return ValidationCheck("balance_equation", passed, score, 0.15, f"Assets={total_assets}, L+E={total_le}, diff={diff}" if not passed else None) + if total_assets is not None and total_l is not None and total_e is not None: + computed_le = total_l + total_e + diff = abs(total_assets - computed_le) + tolerance = abs(total_assets) * 0.01 if total_assets else 1.0 + passed = diff <= tolerance + score = 1.0 if passed else max(0.0, 1.0 - diff / max(abs(total_assets), 1)) + return ValidationCheck("balance_equation", passed, score, 0.15, f"Assets={total_assets}, L+E={computed_le}, diff={diff}" if not passed else None) + return ValidationCheck("balance_equation", False, 0.5, 0.15, "Missing anchor values") + + +def _check_min_row_count(v12_doc: dict, statement_type: str) -> ValidationCheck: + rows = v12_doc.get("rows", []) + count = len(rows) + min_count = _MIN_ROW_COUNTS.get(statement_type, 8) + passed = count >= min_count + score = min(1.0, count / min_count) if min_count > 0 else 1.0 + return ValidationCheck("min_row_count", passed, score, 0.15, f"{count} rows (min {min_count})" if not passed else None) + + +def _check_cross_statement(v12_doc: dict, statement_type: str, all_docs: dict[str, dict]) -> ValidationCheck: + if statement_type not in ("income_statement", "cash_flow"): + return ValidationCheck("cross_statement", True, 1.0, 0.10) + is_doc = all_docs.get("income_statement", {}) + cf_doc = all_docs.get("cash_flow", {}) + if not is_doc or not cf_doc: + return ValidationCheck("cross_statement", True, 0.5, 0.10, "Missing paired statement") + is_ni = None + for r in is_doc.get("rows", []): + if r.get("canonical_key") in ("net_income", "profit_for_the_period", "net_profit"): + vals = r.get("values", []) + if vals and vals[0].get("normalized") is not None: + is_ni = vals[0]["normalized"] + break + cf_ni = None + for r in cf_doc.get("rows", []): + if r.get("canonical_key") in ("net_income", "profit_for_the_period", "net_profit"): + vals = r.get("values", []) + if vals and vals[0].get("normalized") is not None: + cf_ni = vals[0]["normalized"] + break + if is_ni is None or cf_ni is None: + return ValidationCheck("cross_statement", True, 0.5, 0.10, "Net income not found in both") + diff = abs(is_ni - cf_ni) + tolerance = max(abs(is_ni) * 0.01, 1.0) + passed = diff <= tolerance + score = 1.0 if passed else max(0.0, 1.0 - diff / max(abs(is_ni), 1)) + return ValidationCheck("cross_statement", passed, score, 0.10, f"IS NI={is_ni}, CF NI={cf_ni}, diff={diff}" if not passed else None) + + +def _check_value_density(v12_doc: dict) -> ValidationCheck: + rows = v12_doc.get("rows", []) + line_items = [r for r in rows if r.get("row_type") == "line_item"] + if not line_items: + return ValidationCheck("value_density", True, 1.0, 0.10) + with_values = 0 + for r in line_items: + vals = r.get("values", []) + if any(not v.get("is_null", True) for v in vals): + with_values += 1 + density = with_values / len(line_items) + passed = density >= 0.60 + score = min(1.0, density / 0.60) if density < 0.60 else 1.0 + return ValidationCheck("value_density", passed, score, 0.10, f"{density:.0%} density ({with_values}/{len(line_items)})" if not passed else None) + + +def _check_period_currency(v12_doc: dict, all_docs: dict[str, dict]) -> ValidationCheck: + currencies = set() + for stype, doc in all_docs.items(): + if doc: + cur = doc.get("statement_metadata", {}).get("currency") + if cur: + currencies.add(cur) + passed = len(currencies) <= 1 + score = 1.0 if passed else 0.5 + return ValidationCheck("period_currency", passed, score, 0.05, f"Multiple currencies: {currencies}" if not passed else None) + + +def run_validate(enrich_result: EnrichResult) -> ValidateResult: + all_docs: dict[str, dict] = {stype: es.v12_doc for stype, es in enrich_result.statements.items()} + statements: dict[str, ValidatedStatement] = {} + for stype, enriched in enrich_result.statements.items(): + v12_doc = enriched.v12_doc + checks = [ + _check_required_anchors(v12_doc, stype), + _check_subtotal_arithmetic(v12_doc), + _check_balance_equation(v12_doc, stype), + _check_min_row_count(v12_doc, stype), + _check_cross_statement(v12_doc, stype, all_docs), + _check_value_density(v12_doc), + _check_period_currency(v12_doc, all_docs), + ] + total_weight = sum(c.weight for c in checks) + quality_score = sum(c.score * c.weight for c in checks) / total_weight if total_weight > 0 else 0.0 + if quality_score >= _QUALITY_THRESHOLDS["accepted"]: + status = QualityStatus.ACCEPTED + elif quality_score >= _QUALITY_THRESHOLDS["review"]: + status = QualityStatus.REVIEW + else: + status = QualityStatus.REJECTED + logger.info(f" Stage 5 (Validate) {stype}: quality={quality_score:.2f} status={status.value} checks={[c.name for c in checks if not c.passed]}") + statements[stype] = ValidatedStatement(statement_type=stype, v12_doc=v12_doc, quality_score=quality_score, status=status, checks=checks) + return ValidateResult(statements=statements) diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/statement_detector.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/statement_detector.py new file mode 100644 index 000000000..750765449 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/statement_detector.py @@ -0,0 +1,1704 @@ +""" +extractor/statement_detector.py +------------------------------- +Locates and parses the three core financial statements from the markdown +string produced by Azure Content Understanding prebuilt-read. + +How Azure CU structures the output + The full document is returned as a single markdown string at: + result.contents[0].markdown + Page boundaries are tracked via: + result.contents[0].pages[*].spans[*].offset / .length + Azure CU does NOT return structured table objects for financial tables; + instead each table row is a sequence of \n\n-separated plain-text tokens. + +Pipeline (in order) + 1. _all_heading_offsets — find all pages where each statement appears + 2. _pick_cluster_offsets — select the main consolidated statements cluster + (ignores summary tables scattered earlier) + 3. locate_statements — determine char-offset boundaries per statement + 4. _parse_plain_text_table— tokenise and reconstruct rows/columns/cells + 5. build_statement_json — assemble final JSON; optionally run LLM pass +""" + +from __future__ import annotations + +import re +from typing import Optional + +# --------------------------------------------------------------------------- +# Statement heading detection — two-tier approach +# --------------------------------------------------------------------------- +# +# Tier 1 (fast, no LLM): Exact heading match from a curated list of known +# IFRS and US GAAP heading variants. This is tried first because it is +# deterministic and free. +# +# Tier 2 (fast, no LLM): Keyword-based semantic scoring. Each candidate +# text region is scored against characteristic keyword clusters for each +# statement type. This handles non-standard headings, private company +# filings, and creative formatting without requiring an LLM call. +# +# Both tiers validate matches by checking for numeric financial data within +# _DATA_LOOKAHEAD chars — this rules out TOC entries and narrative mentions. +# --------------------------------------------------------------------------- + +# Tier 1: Exact heading strings (tried in priority order; first match wins). +# Covers IFRS, US GAAP, "Condensed" variants, and CJK (Chinese/Japanese). +HEADINGS: dict[str, list[str]] = { + "balance_sheet": [ + # Chinese (Simplified) + "合并资产负债表", # Consolidated Balance Sheet + "资产负债表", # Balance Sheet + # English — IFRS + US GAAP + "Condensed Consolidated Balance Sheets", + "Condensed Consolidated Balance Sheet", + "Consolidated Statement of Financial Position", + "Consolidated Balance Sheets", + "Consolidated Balance Sheet", + "Statement of Financial Position", + "Balance Sheets", + "Balance Sheet", + # Japanese + "連結財政状態計算書", # Consolidated Statement of Financial Position + "連結貸借対照表", # Consolidated Balance Sheet + "貸借対照表", # Balance Sheet + ], + "cash_flow": [ + # Chinese (Simplified) + "合并现金流量表", # Consolidated Cash Flow Statement + "现金流量表", # Cash Flow Statement + # English — IFRS + US GAAP + "Condensed Consolidated Statements of Cash Flows", + "Condensed Consolidated Statement of Cash Flows", + "Consolidated Statements of Cash Flows", + "Consolidated Statement of Cash Flows", + "Consolidated Statement of Cash Flow", + "Statements of Cash Flows", + "Statement of Cash Flows", + # Japanese + "連結キャッシュ・フロー計算書", # Consolidated Cash Flow Statement + ], + "income_statement": [ + # Chinese (Simplified) + "合并利润表", # Consolidated Income Statement + "利润表", # Income Statement + # English — IFRS + US GAAP + "Condensed Consolidated Statements of Income", + "Condensed Consolidated Statement of Income", + "Consolidated Statements of Income", + "Consolidated Statements of Operations", + "Consolidated Statement of Income", + "Consolidated Statement of Operations", + "Consolidated Statement of Profit", + "Statements of Income", + "Statements of Operations", + "Income Statement", + # Japanese + "連結損益計算書", # Consolidated Income Statement + "連結損益及び包括利益計算書", # Consolidated P&L and Comprehensive Income + "損益計算書", # Income Statement + ], +} + +# Tier 2: Keyword clusters for semantic scoring. +# Each keyword carries a weight. A candidate heading/region is scored by +# summing the weights of all keywords it contains. Higher weight = more +# discriminative (i.e. "financial position" is uniquely balance-sheet, +# while "consolidated" is shared across all types). +# +# A minimum score threshold prevents false positives from narrative text +# that incidentally mentions a few financial terms. +_KEYWORD_SCORES: dict[str, list[tuple[str, float]]] = { + "balance_sheet": [ + ("balance sheet", 3.0), + ("financial position", 3.0), + ("assets", 2.0), + ("liabilities", 2.0), + ("equity", 1.5), + ("stockholders", 1.5), + ("shareholders", 1.5), + ("current assets", 1.0), + ("current liabilities", 1.0), + ("non-current", 1.0), + # Chinese + ("资产负债表", 3.0), + ("资产", 2.0), + ("负债", 2.0), + ("所有者权益", 1.5), + ("流动资产", 1.0), + ("流动负债", 1.0), + # Japanese + ("貸借対照表", 3.0), + ("財政状態", 3.0), + ], + "cash_flow": [ + ("cash flow", 3.0), + ("cash flows", 3.0), + ("operating activities", 2.5), + ("investing activities", 2.5), + ("financing activities", 2.5), + ("net cash", 2.0), + ("cash provided", 1.5), + ("cash used", 1.5), + ("cash equivalents", 1.0), + # Chinese + ("现金流量表", 3.0), + ("经营活动", 2.5), + ("投资活动", 2.5), + ("筹资活动", 2.5), + # Japanese + ("キャッシュ・フロー", 3.0), + ("営業活動", 2.5), + ("投資活動", 2.5), + ("財務活動", 2.5), + ], + "income_statement": [ + ("income statement", 3.0), + ("statement of income", 3.0), + ("statement of operations", 3.0), + ("statement of profit", 3.0), + ("net income", 2.0), + ("revenue", 2.0), + ("earnings per share", 2.0), + ("cost of revenue", 1.5), + ("operating expenses", 1.5), + ("income from operations", 1.5), + ("gross profit", 1.5), + ("operating income", 1.0), + # Chinese + ("利润表", 3.0), + ("营业收入", 2.0), + ("净利润", 2.0), + ("营业成本", 1.5), + ("利润总额", 1.5), + # Japanese + ("損益計算書", 3.0), + ("売上高", 2.0), + ("当期純利益", 2.0), + ], +} + +# Minimum keyword score to consider a candidate as a valid statement heading. +# Prevents false positives from narrative paragraphs mentioning financial terms. +_MIN_KEYWORD_SCORE = 3.0 + +# A heading match is valid only if numeric data appears within this many +# characters after it — rules out TOC references. +_DATA_LOOKAHEAD = 1000 +_NUMBER_RE = re.compile(r"\d[\d,]+") + +# Density-based end-of-section detection +# Scan forward in windows; when financial number count drops below threshold +# for two consecutive windows, that marks the end of the statement section. +_DENSITY_WINDOW = 3000 # chars per window +_DENSITY_THRESHOLD = 5 # min numbers per window to still be "in statement" +_DENSITY_MIN_CONSECUTIVE = 2 # consecutive low-density windows required + +# Note refs are small integers like "5", "6, 15", "10, 15" — max 2 digits each part +_NOTE_REF_RE = re.compile(r"^\d{1,2}(?:[,\s]+\d{1,2})*$") +# Financial values: optional leading currency symbol ($, ¥, €, £) and/or +# minus, digits+commas+optional decimal, or (negatives in parentheses). +# Decimal support handles EPS and per-share figures. +_CURRENCY_SYM = r"[\$¥€£]?\s*" +_VALUE_RE = re.compile( + rf"^{_CURRENCY_SYM}-?[\d,]+(?:\.\d+)?$" + rf"|^{_CURRENCY_SYM}\([\d,]+(?:\.\d+)?\)$" +) +# Page markers to strip from sections before parsing. +# Also strips section-numbering stubs (e.g. "2\)" / "3"") that Azure CU +# sometimes emits as isolated tokens at section boundaries. +_PAGE_MARKER_RE = re.compile(r"^-\d+-$|^$|^\d+\\?\)$") + +# Hard-stop patterns that signal the end of the financial statements section. +# +# IFRS terminators: +# - "[Notes to ..." — narrative notes section +# - "Consolidated Statement of Comprehensive Income" +# - "Consolidated Statement of Changes" (in equity) +# - "Statement of Changes in Stockholders/Shareholders/Owners/Parent" +# +# US GAAP terminators (common in 10-K/10-Q filings and earnings releases): +# - "Segment Results" / "Segment Information" — segment disclosure tables +# - "Reconciliation of GAAP" — GAAP-to-non-GAAP reconciliation tables +# +# NOTE: "Supplemental cash flow data" is NOT a terminator — it is part of +# the cash flow statement itself (contains "Cash paid for income taxes" etc). +_NOTES_TERMINATOR_RE = re.compile( + r"\[Notes to\b" + r"|Consolidated Statement of Comprehensive Income" + r"|Consolidated Statement of Changes" + r"|Statement of Changes in (Stockholders|Shareholders|Owners|Parent)" + r"|Segment Results\b" + r"|Segment Information\b" + r"|Reconciliation of GAAP\b", + re.IGNORECASE, +) + + +def _find_section_end_by_density(markdown: str, start_offset: int) -> int: + """ + Generic end-of-section detector. + Scans forward in fixed windows from start_offset. + Returns the offset where financial number density drops to near-zero, + indicating the transition from tabular statements to narrative notes. + Works for any company, any language. + """ + # Hard-stop: if a "[Notes to ..." heading is found within 100K chars, + # use it as the section boundary — the notes section has high number + # density (all the footnote tables) so the density scan alone is + # unreliable once we reach it. + note_m = _NOTES_TERMINATOR_RE.search(markdown, start_offset) + if note_m and (note_m.start() - start_offset) < 100_000: + return note_m.start() + + pos = start_offset + low_count = 0 + while pos < len(markdown): + window = markdown[pos: pos + _DENSITY_WINDOW] + nums = len(re.findall(r"\b\d[\d,]{2,}\b", window)) + if nums < _DENSITY_THRESHOLD: + low_count += 1 + if low_count >= _DENSITY_MIN_CONSECUTIVE: + # Return the start of the first low-density window + return pos - (_DENSITY_MIN_CONSECUTIVE - 1) * _DENSITY_WINDOW + else: + low_count = 0 + pos += _DENSITY_WINDOW + return len(markdown) + +def _get_contents(raw_result: dict) -> dict: + """Return contents[0] from the Azure CU result envelope.""" + result = raw_result.get("result", raw_result) + contents = result.get("contents", []) + if not contents: + raise ValueError("Azure CU result has no 'contents' array.") + return contents[0] + + +def _get_markdown(raw_result: dict) -> str: + return _get_contents(raw_result).get("markdown", "") + + +def _get_pages(raw_result: dict) -> list[dict]: + return _get_contents(raw_result).get("pages", []) + + +# --------------------------------------------------------------------------- +# Page number lookup (char offset -> page number) +# --------------------------------------------------------------------------- + +def _build_page_map(pages: list[dict]) -> list[tuple[int, int, int]]: + """ + Build a sorted list of (start_offset, end_offset, page_number) tuples + using the spans[] array on each page object. + """ + mapping: list[tuple[int, int, int]] = [] + for page in pages: + page_num: int = page.get("pageNumber", 0) + for span in page.get("spans", []): + offset: int = span.get("offset", 0) + length: int = span.get("length", 0) + mapping.append((offset, offset + length, page_num)) + mapping.sort(key=lambda t: t[0]) + return mapping + + +def _offset_to_page(offset: int, page_map: list[tuple[int, int, int]]) -> int: + """Return the page number containing the given character offset.""" + for start, end, page_num in page_map: + if start <= offset < end: + return page_num + return page_map[-1][2] if page_map else 0 + + +# --------------------------------------------------------------------------- +# Statement locator +# --------------------------------------------------------------------------- + +def _has_data_nearby(markdown: str, offset: int) -> bool: + """True if numeric financial data appears within _DATA_LOOKAHEAD chars.""" + lookahead = markdown[offset: offset + _DATA_LOOKAHEAD] + return bool(_NUMBER_RE.search(lookahead)) + + +def _tier1_heading_offsets(markdown: str, headings: list[str]) -> list[int]: + """ + Tier 1: Exact heading match. + + Return ALL char offsets where a known heading string appears AND is followed + by numeric financial data within _DATA_LOOKAHEAD chars. + + Headings are tried in priority order (most specific first). As soon as one + heading yields any valid match, those offsets are returned exclusively. + This prevents short fallback terms (e.g. "Balance Sheet") from matching + unrelated fragments when a more specific heading would have matched. + """ + for heading in headings: + results = [] + for m in re.finditer(re.escape(heading), markdown, re.IGNORECASE): + if _has_data_nearby(markdown, m.start()): + results.append(m.start()) + if results: + return sorted(results) + return [] + + +def _score_candidate(text: str, keywords: list[tuple[str, float]]) -> float: + """Score a text region against a keyword cluster. Higher = more likely.""" + text_lower = text.lower() + return sum(weight for keyword, weight in keywords if keyword in text_lower) + + +def _tier2_keyword_offsets( + markdown: str, + statement_type: str, +) -> list[int]: + """ + Tier 2: Keyword-based semantic scoring. + + Scans the markdown for heading-like regions (lines that are short, often + uppercase or title-cased, and precede numeric data) and scores each one + against the characteristic keyword cluster for the given statement type. + + Returns offsets of regions that score >= _MIN_KEYWORD_SCORE and are + followed by numeric data. This handles non-standard headings that + Tier 1's exact-match list doesn't cover. + """ + keywords = _KEYWORD_SCORES.get(statement_type, []) + if not keywords: + return [] + + results: list[int] = [] + + # Scan for heading-like lines: short-ish text (< 200 chars) that could + # be a statement title. We use \n\n boundaries since Azure CU separates + # structural elements with double newlines. + for m in re.finditer(r"(?:^|\n\n)(.{10,200})(?=\n\n)", markdown): + candidate = m.group(1).strip() + # Skip candidates that look like data values or page markers. + if _VALUE_RE.match(candidate) or _PAGE_MARKER_RE.match(candidate): + continue + score = _score_candidate(candidate, keywords) + if score >= _MIN_KEYWORD_SCORE and _has_data_nearby(markdown, m.start()): + results.append(m.start()) + + return sorted(results) + + +def _all_heading_offsets(markdown: str, headings: list[str], statement_type: str) -> list[int]: + """ + Two-tier heading detection: try exact match first, fall back to keyword + scoring for non-standard headings. + + Args: + markdown: Full document markdown from Azure CU. + headings: Tier 1 exact heading strings for this statement type. + statement_type: One of "balance_sheet", "cash_flow", "income_statement". + + Returns: + List of character offsets where the statement heading was found. + """ + # Tier 1: exact heading match (fast, deterministic) + offsets = _tier1_heading_offsets(markdown, headings) + if offsets: + return offsets + + # Tier 2: keyword-based semantic scoring (handles non-standard headings) + offsets = _tier2_keyword_offsets(markdown, statement_type) + if offsets: + return offsets + + return [] + + +def _pick_cluster_offsets( + all_offsets: dict[str, list[int]] +) -> dict[str, Optional[int]]: + """ + Generic cluster selection: annual reports always contain a section where + all three financial statements appear close together (within ~200,000 chars). + Summary tables and segment disclosures scatter individual statements + throughout the document, but the actual consolidated statements section + is a tight cluster. + + Algorithm: + 1. Collect every valid occurrence of every statement heading. + 2. For the statement type with the fewest occurrences (most specific), + use each occurrence as a candidate anchor. + 3. For each anchor, count how many other statement types have an + occurrence within MAX_CLUSTER_SPAN chars after it. + 4. Pick the anchor with the most co-located statements. Ties broken + by preferring later anchors (main statements appear after summaries). + 5. For each statement type choose the occurrence nearest to and after + the winning anchor. + """ + MAX_CLUSTER_SPAN = 200_000 # chars — tuned to fit any typical annual report + + # Flatten all offsets into (offset, type) pairs + all_pairs: list[tuple[int, str]] = [] + for stype, offsets in all_offsets.items(): + for o in offsets: + all_pairs.append((o, stype)) + all_pairs.sort() + + best_anchor: Optional[int] = None + best_score = -1 + + for anchor_offset, anchor_type in all_pairs: + # Count distinct statement types with an occurrence in [anchor, anchor+SPAN] + types_found = set() + for offset, stype in all_pairs: + if offset >= anchor_offset and offset <= anchor_offset + MAX_CLUSTER_SPAN: + types_found.add(stype) + score = len(types_found) + # Prefer higher score; on tie prefer EARLIER anchor — the first + # complete cluster is the main consolidated statements section; + # later ones tend to be subsidiary or segment statements. + if score > best_score or (score == best_score and best_anchor is not None and anchor_offset < best_anchor): + best_score = score + best_anchor = anchor_offset + + if best_anchor is None: + # Fallback: just return first occurrence of each + return {stype: (offsets[0] if offsets else None) + for stype, offsets in all_offsets.items()} + + # For each statement type pick the occurrence closest to and >= best_anchor + chosen: dict[str, Optional[int]] = {} + for stype, offsets in all_offsets.items(): + candidates = [o for o in offsets if o >= best_anchor] + if candidates: + chosen[stype] = min(candidates) + elif offsets: + chosen[stype] = offsets[0] # only occurrence is before anchor + else: + chosen[stype] = None + + return chosen + + +def _llm_detect_statements(markdown: str) -> dict[str, Optional[int]]: + """ + Use an LLM to identify the three core financial statements in the document. + + Sends a compact summary of the document (first ~4000 chars plus a sampled + section index) to the LLM and asks it to return the heading text and + approximate character offset for each statement. + + This approach is language-agnostic: works for English, Chinese, Japanese, + or any other language without hard-coded patterns. + + Returns a dict mapping statement_type -> char offset (or None if not found). + """ + import json as _json + from .llm_reconciler import _get_client, _DEPLOYMENT + + # Build a compact document outline for the LLM. + # Include the first 4000 chars (typically covers TOC and early pages), + # plus a sampled section index showing \n\n-delimited tokens with offsets. + preview = markdown[:4000] + + # Build a sparse index: for every \n\n token, record offset + first 80 chars. + # This lets the LLM see the full document structure without sending all text. + token_index: list[dict] = [] + pos = 0 + for tok in markdown.split("\n\n"): + tok_stripped = tok.strip() + if tok_stripped and len(tok_stripped) > 5: + token_index.append({ + "offset": pos, + "text": tok_stripped[:100], + }) + pos += len(tok) + 2 # +2 for the \n\n separator + + # Limit to ~200 entries to stay within token budget. + if len(token_index) > 200: + step = len(token_index) // 200 + token_index = token_index[::step] + + prompt = ( + "You are analyzing a financial report (PDF extracted to markdown). " + "Identify the THREE core financial statements:\n" + "1. balance_sheet (Statement of Financial Position / Balance Sheet / " + "资产负债表 / 貸借対照表)\n" + "2. income_statement (Income Statement / P&L / Statement of Operations / " + "利润表 / 損益計算書)\n" + "3. cash_flow (Cash Flow Statement / 现金流量表 / キャッシュ・フロー計算書)\n\n" + "IMPORTANT RULES:\n" + "- Pick the CONSOLIDATED version, not the parent/standalone version.\n" + "- Each statement must be followed by numeric financial data.\n" + "- If the same statement appears multiple times (e.g. summary table + " + "full table), pick the FULL version with detailed line items.\n" + "- Return the character OFFSET where each statement heading begins.\n\n" + f"DOCUMENT PREVIEW (first 4000 chars):\n{preview}\n\n" + f"DOCUMENT TOKEN INDEX (offset + first 100 chars of each section):\n" + f"{_json.dumps(token_index, ensure_ascii=False, indent=1)}\n\n" + "Return ONLY a JSON object:\n" + '{"statements": [\n' + ' {"type": "balance_sheet", "heading": "", "offset": },\n' + ' {"type": "income_statement", "heading": "...", "offset": },\n' + ' {"type": "cash_flow", "heading": "...", "offset": }\n' + "]}\n" + "If a statement is not found, omit it from the array." + ) + + try: + client = _get_client() + response = client.chat.completions.create( + model=_DEPLOYMENT(), + response_format={"type": "json_object"}, + messages=[ + {"role": "system", "content": "You are a financial document analysis expert. Return only valid JSON."}, + {"role": "user", "content": prompt}, + ], + temperature=0, + ) + result = _json.loads(response.choices[0].message.content) + statements = result.get("statements", []) + except Exception as e: + print(f" [LLM] statement detection failed: {e}") + return {"balance_sheet": None, "cash_flow": None, "income_statement": None} + + # Convert LLM response to offset map. + offsets: dict[str, Optional[int]] = { + "balance_sheet": None, + "cash_flow": None, + "income_statement": None, + } + + for s in statements: + stype = s.get("type", "") + llm_offset = s.get("offset") + heading = s.get("heading", "") + + if stype not in offsets or llm_offset is None: + continue + + # The LLM returns an approximate offset from the token index. + # Verify by searching for the heading text near that offset. + search_start = max(0, llm_offset - 500) + search_end = min(len(markdown), llm_offset + 500) + search_region = markdown[search_start:search_end] + + # Try to find the exact heading in the region. + idx = search_region.find(heading) + if idx >= 0: + offsets[stype] = search_start + idx + else: + # Fallback: use the LLM offset directly if heading not found. + # (Heading text may have been truncated in the token index.) + offsets[stype] = llm_offset + + print(f" [LLM] {stype}: '{heading}' at offset {offsets[stype]}") + + return offsets + + +def locate_statements( + raw_result: dict, + use_llm: bool = True, +) -> dict[str, Optional[dict]]: + """ + Scan the markdown and return a mapping of statement type -> page range. + + Detection strategy (in order): + 1. LLM-based detection (when use_llm=True and credentials available): + Sends the document structure to the LLM which identifies statement + locations in any language. Most accurate, costs ~$0.001/document. + 2. Tier 1 — exact heading match from curated list (free, fast). + 3. Tier 2 — keyword-based semantic scoring (free, handles non-standard). + + Section boundaries are always determined by the same logic: nearest + adjacent statement heading, hard-stop terminators, or density-based end. + """ + markdown = _get_markdown(raw_result) + pages = _get_pages(raw_result) + page_map = _build_page_map(pages) + + offsets: dict[str, Optional[int]] = None + + # Strategy 1: LLM-based detection (most accurate, language-agnostic) + if use_llm: + try: + offsets = _llm_detect_statements(markdown) + # Check if LLM found at least one statement + if not any(v is not None for v in offsets.values()): + print(" [LLM] no statements found, falling back to pattern matching") + offsets = None + except Exception: + offsets = None # fall through to pattern matching + + # Strategy 2+3: Pattern matching fallback (Tier 1 exact + Tier 2 keywords) + if offsets is None: + all_offsets: dict[str, list[int]] = {} + for statement_type, headings in HEADINGS.items(): + all_offsets[statement_type] = _all_heading_offsets(markdown, headings, statement_type) + offsets = _pick_cluster_offsets(all_offsets) + + # Determine section boundaries + located: dict[str, Optional[dict]] = {} + for statement_type, offset in offsets.items(): + if offset is None: + located[statement_type] = None + continue + + start_page = _offset_to_page(offset, page_map) + + # Section ends at the nearest other chosen statement heading + next_offset = None + for other_type, other_offset in offsets.items(): + if other_type == statement_type or other_offset is None: + continue + if other_offset > offset: + if next_offset is None or other_offset < next_offset: + next_offset = other_offset + + # For all statements: also look for hard-stop section terminators + # (e.g. Comprehensive Income or Changes in Equity sections that can + # appear between the three main statements). Skip 200 chars past the + # current heading to avoid matching the heading text itself. + term_m = _NOTES_TERMINATOR_RE.search(markdown, offset + 200) + if term_m: + if next_offset is None or term_m.start() < next_offset: + next_offset = term_m.start() + + # No adjacent statement or terminator found — use density-based end + if next_offset is None: + next_offset = _find_section_end_by_density(markdown, offset) + + # Compute end_page from the last line of actual content BEFORE the + # next heading. Walk backwards from next_offset to skip past company + # name headers and page breaks that belong to the next section. + content_end = next_offset - 1 + lookback = markdown[max(0, next_offset - 500): next_offset] + # Skip past "COMPANY NAME\nCONDENSED..." or "" + for pattern in [r"[A-Z][A-Z\s,.'&\-]+(?:INC|LLC|LTD|CORP)\.?\n", + r""]: + m = re.search(pattern, lookback) + if m: + candidate = max(0, next_offset - 500) + m.start() - 1 + if candidate < content_end and candidate >= offset: + content_end = candidate + + end_page = _offset_to_page(content_end, page_map) + if end_page < start_page: + end_page = start_page + + located[statement_type] = { + "start_page": start_page, + "end_page": end_page, + "md_offset": offset, + "md_end_offset": next_offset, + } + + return located + + +# --------------------------------------------------------------------------- +# Plain-text financial table parser +# --------------------------------------------------------------------------- +# Azure CU prebuilt-read returns tables as plain text with \n\n separators. +# Long labels are often split across consecutive \n\n-separated tokens when +# text wraps across PDF table cell lines, e.g.: +# +# [tok A] "Net decrease (increase) in receivables under" (first part) +# [tok B] "(21,517)" (row A values) +# [tok C] "28,614" +# [tok D] "securities borrowing transactions" (continuation of A) +# [tok E] "(42,391)" ← these belong to the NEXT row whose label Azure +# [tok F] "86,042" CU dropped entirely +# +# Strategy +# -------- +# 1. After consuming a row's values, peek at the next token. If it starts +# with a lowercase letter it is a continuation fragment of the current +# label — merge it. +# 2. After merging a fragment, check for more values. Any values found here +# are ORPHANED (Azure CU dropped the true label for that row). Emit them +# as a separate row using the fragment text as the best-available label. +# 3. "Section total" rows (label starts with "Net cash" or ends in +# "activities") are never merged — this prevents [113-120]-style garbage +# fragments from attaching to summary rows. +# 4. Rows with no values that are just continuation/garbage fragments (all +# lowercase or repeated single-word noise) are suppressed from output. +# --------------------------------------------------------------------------- + +# Japanese full-width dash used for nil/zero entries in Japanese filings +_NIL_RE = re.compile(r"^[-\--]+$") + +# Header tokens: currency notes, "Note" keyword, year labels, period ranges. +_HEADER_TOKEN_RE = re.compile( + r"\b(20\d{2}|Note|January|February|March|April|May|June|" + r"July|August|September|October|November|December|Fiscal Year|Millions)\b", + re.IGNORECASE, +) + +# Section-total labels: never apply post-value continuation merging on these. +# Each branch is anchored to avoid false positives — e.g. "activities$" alone +# would match any label ending in "activities", not just section totals. +_SECTION_TOTAL_RE = re.compile( + r"^Net cash\b" + r"|^.*\bactivities$" + r"|^Cash and cash equivalents at\b" + r"|^Net increase \(decrease\) in cash\b", + re.IGNORECASE, +) + +# Fallback column cap when header detection finds no period columns. +# Standard annual reports compare two fiscal periods; this prevents greedy +# consumption of nil dash tokens that appear as page separators. +_DEFAULT_VALUE_COLS = 2 + + +def _is_value_or_note(tok: str) -> bool: + return bool(_VALUE_RE.match(tok) or _NOTE_REF_RE.match(tok) or _NIL_RE.match(tok)) + + +def _is_continuation_fragment(tok: str) -> bool: + """True if tok is the wrapped tail of the previous label (starts lowercase).""" + return bool(tok) and tok[0].islower() + + +def _parse_plain_text_table(section: str) -> tuple[list[str], list[str], list[dict]]: + """ + Parse the plain-text financial table format into rows, columns, cells. + Handles: + - Header vs data token separation + - Continuation fragment merging (restores full labels) + - Orphaned value preservation (values that Azure CU detached from their labels) + - Suppression of valueless noise rows + """ + raw_tokens = [ + t.strip() for t in section.split("\n\n") + if t.strip() and not _PAGE_MARKER_RE.match(t.strip()) + ] + + # --- Pre-processing: merge currency symbols with following values --- + # Azure CU produces several currency-related token patterns: + # Pattern A: "$\n35,873" — leading $ with internal newline + # Pattern B: "$" then "35,873" — standalone $ as its own token + # Pattern C: "$\n20,838 $" — leading $ + value + trailing $ + # (the trailing $ belongs to the NEXT column) + # Pattern D: "20,838 $" — value with trailing $ fused in + # + # We normalise all cases to clean numeric tokens and track the currency + # symbol in _token_currencies for per-cell metadata. + _CURRENCY_ONLY_RE = re.compile(r"^[\$¥€£]$") + _TRAILING_CURRENCY_RE = re.compile(r"^(.+?)\s+[\$¥€£]$") + _CURRENCY_MAP = {"$": "USD", "¥": "JPY", "€": "EUR", "£": "GBP"} + cleaned_tokens: list[str] = [] + # Maps cleaned_token index → currency code (only for tokens that had a symbol). + _token_currencies: dict[int, str] = {} + pending_currency: str | None = None # from a standalone "$" or trailing "$" + for tok in raw_tokens: + # Case 1: internal newline — "$\n20,838" or "$\n20,838 $" + if "\n" in tok and _CURRENCY_ONLY_RE.match(tok.split("\n")[0].strip()): + sym = tok.split("\n")[0].strip() + remainder = tok.split("\n", 1)[1].strip() + # Check for trailing currency: "20,838 $" → strip trailing " $" + trail_m = _TRAILING_CURRENCY_RE.match(remainder) + if trail_m: + remainder = trail_m.group(1).strip() + # The trailing $ belongs to the next column's value. + pending_currency = _CURRENCY_MAP.get(sym, sym) + idx = len(cleaned_tokens) + cleaned_tokens.append(remainder) + _token_currencies[idx] = _CURRENCY_MAP.get(sym, sym) + # Case 2: standalone "$" — remember it for the next numeric token. + elif _CURRENCY_ONLY_RE.match(tok): + pending_currency = _CURRENCY_MAP.get(tok, tok) + continue + # Case 3: trailing currency — "20,838 $" → "20,838" + pending $ for next + elif _TRAILING_CURRENCY_RE.match(tok): + trail_m = _TRAILING_CURRENCY_RE.match(tok) + clean_val = trail_m.group(1).strip() + trailing_sym = tok[trail_m.end(1):].strip() + idx = len(cleaned_tokens) + cleaned_tokens.append(clean_val) + # Attach pending currency from a preceding token if available. + if pending_currency: + _token_currencies[idx] = pending_currency + # The trailing symbol becomes pending for the next value. + pending_currency = _CURRENCY_MAP.get(trailing_sym, trailing_sym) + else: + idx = len(cleaned_tokens) + cleaned_tokens.append(tok) + # Attach pending currency from a preceding standalone symbol. + if pending_currency and _VALUE_RE.match(tok): + _token_currencies[idx] = pending_currency + pending_currency = None + else: + pending_currency = None # discard if next token isn't a value + + columns: list[str] = [] + rows: list[str] = [] + cells: list[dict] = [] + + # ---- Phase 1: separate header tokens from data tokens ---- # + col_headers_raw: list[str] = [] + data_tokens: list[str] = [] + # Maps data_tokens index → cleaned_tokens index (for currency lookup). + _dt_to_ct: dict[int, int] = {} + header_done = False + + for ct_idx, tok in enumerate(cleaned_tokens[1:], start=1): # skip heading + if not header_done: + # Check header pattern FIRST — standalone year tokens like "2025" + # match both _HEADER_TOKEN_RE and _VALUE_RE. When we're still in + # header mode, the header interpretation takes priority. + if _HEADER_TOKEN_RE.search(tok): + col_headers_raw.append(tok) + elif _is_value_or_note(tok): + header_done = True + _dt_to_ct[len(data_tokens)] = ct_idx + data_tokens.append(tok) + else: + header_done = True + _dt_to_ct[len(data_tokens)] = ct_idx + data_tokens.append(tok) + else: + _dt_to_ct[len(data_tokens)] = ct_idx + data_tokens.append(tok) + + # --- Construct proper column headers --- + # Azure CU produces two kinds of header tokens: + # Group headers: "Three Months Ended December 31," (spans 2 columns) + # Year headers: "2025", "2024", "2025", "2024" (one per column) + # + # We separate them: standalone year tokens (just digits) are the actual + # column identifiers. Group headers describe which group of year columns + # they span. We pair them to produce proper column names like + # "Three Months Ended December 31, 2025". + _STANDALONE_YEAR_RE = re.compile(r"^\d{4}$") + group_headers: list[str] = [] + year_tokens: list[str] = [] + for h in col_headers_raw: + if _STANDALONE_YEAR_RE.match(h.strip()): + year_tokens.append(h.strip()) + else: + group_headers.append(h.strip()) + + if year_tokens and group_headers: + # Pair years with their group headers. Years are assigned to groups + # in order: first N years → first group, next N → second group, etc. + years_per_group = len(year_tokens) // max(len(group_headers), 1) + columns = [] + for gi, group in enumerate(group_headers): + start = gi * years_per_group + for yi in range(start, min(start + years_per_group, len(year_tokens))): + # Clean group header: remove trailing comma/whitespace + clean_group = group.rstrip(",").strip() + columns.append(f"{clean_group} {year_tokens[yi]}") + # If there are leftover years (more years than groups can pair), append them + paired_count = len(group_headers) * years_per_group + for yi in range(paired_count, len(year_tokens)): + columns.append(year_tokens[yi]) + elif year_tokens: + columns = year_tokens + else: + columns = [h for h in col_headers_raw if "20" in h or "Note" in h] + if not columns: + columns = col_headers_raw or ["Label", "Period 1", "Period 2"] + + # Derive the expected number of data-value columns from the columns list. + # Each column entry represents one period column. Falls back to + # _DEFAULT_VALUE_COLS when header detection is inconclusive. + max_value_cols = len(columns) if len(columns) >= 1 else _DEFAULT_VALUE_COLS + + # Detect whether the statement has a "Note" column. Note refs (small + # 1-2 digit integers like "5", "94") should only be skipped when a Note + # column actually exists. Without it, these numbers are data values. + has_note_column = any("note" in h.lower() for h in col_headers_raw) + + # ---- Phase 2: walk data tokens ---- # + n = len(data_tokens) + + # Track the data_token indices consumed for the current row's values + # so we can look up per-value currency from _token_currencies. + _current_val_dt_indices: list[int] = [] + + def _classify_row(label: str, has_values: bool) -> tuple[str, int]: + """ + Determine row_type and indent_level from the label text. + + row_type is one of: + "section_header" — label with no values, acts as a group heading + "total" — grand total row (e.g. "Total assets") + "subtotal" — section subtotal (e.g. "Total current assets") + "line_item" — regular data row + + indent_level: + 0 — top-level items, headers, and totals + 1 — items within a section (between a header and its total) + + Heuristic: if the label starts with "Total" and has values, it is a + total or subtotal. If it has no values and looks like a heading + (e.g. ends with ":" or is an all-caps section divider), it is a + section_header. + """ + label_lower = label.lower().strip() + label_stripped = label.strip() + + if not has_values: + # No values → likely a section header or an empty grouping label. + return "section_header", 0 + + if label_lower.startswith("total "): + # Grand totals are the highest-level aggregations that encompass + # the entire statement side (e.g. "Total assets" or + # "Total liabilities and stockholders' equity"). + # Everything else starting with "Total" is a subtotal. + grand_total_keywords = [ + "total assets", + "total liabilities and", # "Total liabilities and stockholders' equity" + "total revenue", + ] + if any(label_lower.startswith(kw) for kw in grand_total_keywords): + return "total", 0 + # "Total liabilities" (without "and") is a subtotal within the + # balance sheet — it's the sum of current + non-current liabilities, + # not the grand total. + return "subtotal", 0 + + # Cash flow section subtotals: only "Net cash..." rows are subtotals. + # "Other investing activities" and "Other financing activities" are + # regular line items, NOT subtotals. + # "Net increase/decrease in cash" is a summary row that equals + # Operating + Investing + Financing + FX — classified separately. + if label_lower.startswith("net cash"): + return "subtotal", 0 + if label_lower.startswith("net increase") or label_lower.startswith("net decrease"): + return "total", 0 # summary-level total, not a section subtotal + + # Default: line_item. Indent level is determined in a post-pass + # after all rows are emitted (items between a header and its + # total/subtotal get indent_level=1). + return "line_item", 0 + + def emit(label: str, vals: list[str]) -> None: + """Append one row to rows/cells with currency and structure metadata. + + Always emits exactly max_value_cols value cells. If vals has fewer + entries, remaining columns are filled with empty strings. This ensures + every data row has a consistent column count in the output JSON. + """ + nonlocal row_index + rows.append(label) + + row_type, indent_level = _classify_row(label, bool(vals)) + + cells.append({ + "row": row_index, "col": 0, "content": label, + "kind": "content", "currency": None, + "row_type": row_type, "indent_level": indent_level, + }) + # Emit exactly max_value_cols value cells, padding with empty strings. + for vi in range(max_value_cols): + val = vals[vi] if vi < len(vals) else "" + # Look up currency for this value from the token pre-processing pass. + currency = None + if vi < len(_current_val_dt_indices): + dt_idx = _current_val_dt_indices[vi] + ct_idx = _dt_to_ct.get(dt_idx) + if ct_idx is not None: + currency = _token_currencies.get(ct_idx) + cells.append({ + "row": row_index, "col": vi + 1, "content": val, + "kind": "content", "currency": currency, + "row_type": row_type, "indent_level": indent_level, + }) + _current_val_dt_indices.clear() + row_index += 1 + + row_index = 0 + i = 0 + while i < n: + tok = data_tokens[i] + + if _is_value_or_note(tok): + i += 1 # orphaned value/note — skip + continue + + # Skip mid-section page column headers that leaked through page breaks + # (e.g. "(Millions of Yen)", "Note", "Fiscal Year ended March 31, …"). + # These match _HEADER_TOKEN_RE but are NOT followed by numeric values; + # a real data label that incidentally contains a year IS followed by values. + if _HEADER_TOKEN_RE.search(tok) and not _SECTION_TOTAL_RE.search(tok): + has_vals = False + for pi in range(i + 1, min(i + 5, n)): + pt = data_tokens[pi] + if _VALUE_RE.match(pt): + has_vals = True + break + if not _NOTE_REF_RE.match(pt) and not _HEADER_TOKEN_RE.search(pt): + break # hit a real label — stop lookahead + if not has_vals: + i += 1 + continue + + label = tok + i += 1 + + # Skip optional note ref (only when the statement has a Note column). + if has_note_column and i < n and _NOTE_REF_RE.match(data_tokens[i]): + i += 1 + + # Consume numeric values for this label + vals: list[str] = [] + _current_val_dt_indices.clear() + while i < n and _VALUE_RE.match(data_tokens[i]): + _current_val_dt_indices.append(i) + vals.append(data_tokens[i]) + i += 1 + # Fill a missing second column with an adjacent nil marker if present. + # Japanese reports use "-" for nil entries; they follow immediately + # after the numeric value(s) and should not be consumed past the + # expected number of data columns to avoid polluting later rows. + while vals and i < n and _NIL_RE.match(data_tokens[i]) and len(vals) < max_value_cols: + vals.append(data_tokens[i]) + i += 1 + + # ---- Continuation fragment loop ---- + # Only applies when current label is NOT a section total. + was_merged = False # tracks whether any fragment was merged into label + if not _SECTION_TOTAL_RE.search(label): + while i < n and _is_continuation_fragment(data_tokens[i]): + fragment = data_tokens[i] + i += 1 + + # Skip optional note ref after fragment (only with Note column). + if has_note_column and i < n and _NOTE_REF_RE.match(data_tokens[i]): + i += 1 + + # Numeric values immediately after the fragment, plus any nil fill + orphan_vals: list[str] = [] + while i < n and _VALUE_RE.match(data_tokens[i]): + orphan_vals.append(data_tokens[i]); + i += 1 + while orphan_vals and i < n and _NIL_RE.match(data_tokens[i]) and len(orphan_vals) < max_value_cols: + orphan_vals.append(data_tokens[i]) + i += 1 + + if orphan_vals: + # Fragment is followed by values → Azure CU dropped the + # true label for those values. Emit current row (with + # fragment merged into label), then emit the orphaned + # values under the fragment text as best-available label. + emit(label + " " + fragment, vals) + emit(fragment, orphan_vals) + label = None # signal already emitted + vals = [] + break + else: + # Clean continuation — just extend the label + label = label + " " + fragment + was_merged = True + + # Emit if not already done inside the continuation loop + if label is not None: + # Suppress valueless noise rows: only lowercase-start tokens which + # are orphaned continuation fragments with no data of their own. + # Uppercase merged labels are kept even with no values — they may + # represent real rows whose values Azure CU failed to extract. + is_noise = not vals and label[:1].islower() + if not is_noise: + emit(label, vals) + + # ---- Phase 3: strip trailing noise rows ---- # + # Remove rows from the end that are noise: dash-only rows, company name + # headers (e.g. "META PLATFORMS, INC."), or other non-data artifacts. + _TRAILING_NOISE_RE = re.compile(r"^[—\--]") + _COMPANY_NAME_RE = re.compile( + r"^[A-Z][A-Z\s,.'&\-]+(?:INC|LLC|LTD|CORP|CO|PLC|GROUP|COMPANY|SA|AG|NV)\.?$", + re.IGNORECASE, + ) + while rows: + last_label = rows[-1].strip() + last_row_idx = len(rows) - 1 + has_real_values = any( + c["row"] == last_row_idx and c["col"] > 0 and c["content"].strip() + for c in cells + ) + is_noise = ( + (not has_real_values and _TRAILING_NOISE_RE.match(last_label)) + or (not has_real_values and _COMPANY_NAME_RE.match(last_label)) + ) + if is_noise: + rows.pop() + cells = [c for c in cells if c["row"] != last_row_idx] + else: + break + + # ---- Phase 3b: normalize row labels ---- # + # Replace internal newlines with spaces for cleaner output. + for c in cells: + if c["col"] == 0 and "\n" in c["content"]: + c["content"] = re.sub(r"\s*\n\s*", " ", c["content"]) + for i, r in enumerate(rows): + if "\n" in r: + rows[i] = re.sub(r"\s*\n\s*", " ", r) + + # ---- Phase 3c: fix EPS currency tags ---- # + # Earnings per share and share count values should NOT have currency. + # They are ratios/counts, not monetary amounts. + _EPS_SECTION = False + for c in cells: + if c["col"] == 0: + label_lower = c["content"].lower().strip() + if "per share" in label_lower or "shares used" in label_lower: + _EPS_SECTION = True + elif c["row_type"] in ("subtotal", "total", "section_header"): + if "per share" not in label_lower and "shares" not in label_lower: + _EPS_SECTION = False + elif _EPS_SECTION and c["col"] > 0: + c["currency"] = None + + # ---- Phase 3d: normalize column headers ---- # + columns = [re.sub(r"\s*\n\s*", " ", c) for c in columns] + + # ---- Phase 4: indent-level post-pass ---- # + # Line items between a section_header and its subtotal/total get indent=1. + in_section = False + for c in cells: + if c["col"] != 0: + continue # only process label cells for indent logic + if c["row_type"] == "section_header": + in_section = True + elif c["row_type"] in ("subtotal", "total"): + in_section = False + elif c["row_type"] == "line_item" and in_section: + c["indent_level"] = 1 + # Also update indent on value cells for this row. + for vc in cells: + if vc["row"] == c["row"] and vc["col"] > 0: + vc["indent_level"] = 1 + + # ---- Phase 5: flag under-populated rows ---- # + # When Azure CU drops nil-dash markers ("—"), a row may have fewer values + # than expected. These values end up in the wrong columns because the + # parser fills left-to-right. Flag these rows so downstream consumers + # (or the LLM reconciler) can review column alignment. + if max_value_cols >= 2: + row_val_counts: dict[int, int] = {} + for c in cells: + if c["col"] > 0: + row_val_counts[c["row"]] = row_val_counts.get(c["row"], 0) + 1 + for c in cells: + if c["col"] == 0 and c["row_type"] == "line_item": + actual_cols = row_val_counts.get(c["row"], 0) + if 0 < actual_cols < max_value_cols: + c["column_alignment_warning"] = ( + f"Row has {actual_cols} value(s) but {max_value_cols} " + f"columns expected. Azure CU may have dropped nil markers, " + f"causing values to be in wrong columns." + ) + + return rows, columns, cells + + +# --------------------------------------------------------------------------- +# Total cross-validation +# --------------------------------------------------------------------------- +# Identifies section-total rows (e.g. "Total assets", "Net cash from +# operating activities") and compares their value against the sum of the +# preceding line-item values. Discrepancies flag extraction errors +# without any API call — this is a free accuracy check. + +# Labels that represent section totals. Checked via substring match +# (case-insensitive) to accommodate wording variations across IFRS reports. +_TOTAL_LABEL_KEYWORDS = [ + "total assets", + "total liabilities", + "total equity", + "total liabilities and equity", + "total current assets", + "total current liabilities", + "total non-current assets", + "total non-current liabilities", + "net cash flows from operating", + "net cash flows from investing", + "net cash flows from financing", + "net cash used in operating", + "net cash used in investing", + "net cash used in financing", + "net cash provided by operating", + "net cash provided by investing", + "net cash provided by financing", +] + + +def _parse_financial_value(raw: str) -> Optional[float]: + """ + Parse a financial value string into a float. + + Handles: + - Comma-separated thousands: "1,234,567" → 1234567.0 + - Parenthesised negatives: "(42,391)" → -42391.0 + - Plain negatives: "-217,741" → -217741.0 + - Japanese nil dashes: "-" → 0.0 + + Returns None if the string is not a recognisable financial number. + """ + s = raw.strip() + if not s: + return None + # Strip leading currency symbols + s = re.sub(r"^[\$¥€£]\s*", "", s) + if not s: + return None + # Japanese full-width dash = zero + if re.match(r"^[-\--]+$", s): + return 0.0 + # Parenthesised negative + neg = False + if s.startswith("(") and s.endswith(")"): + neg = True + s = s[1:-1] + # Remove commas + s = s.replace(",", "") + try: + val = float(s) + return -val if neg else val + except ValueError: + return None + + +def validate_totals( + rows: list[str], + cells: list[dict], +) -> list[dict]: + """ + Hierarchy-aware cross-validation of section totals. + + Uses row_type and indent_level metadata to correctly sum only the + direct children of each total/subtotal: + + 1. For a SUBTOTAL (e.g. "Total current assets"): + Sum only the immediately preceding line_items that share the same + section (between the previous section_header/subtotal and this one). + Only includes rows at indent_level > 0 (children of the section). + + 2. For a TOTAL (e.g. "Total assets"): + Sum the preceding subtotals and any ungrouped line_items at indent=0. + This avoids double-counting by treating subtotals as atomic blocks. + + Returns a list of warning dicts. An empty list means all totals matched. + """ + if not cells: + return [] + + # Build grid and collect metadata per row. + grid: dict[int, dict[int, str]] = {} + row_meta: dict[int, dict] = {} # row_idx -> {row_type, indent_level} + for c in cells: + grid.setdefault(c["row"], {})[c["col"]] = c.get("content", "") + if c["col"] == 0: + row_meta[c["row"]] = { + "row_type": c.get("row_type", "line_item"), + "indent_level": c.get("indent_level", 0), + } + + sorted_row_indices = sorted(grid.keys()) + if not sorted_row_indices: + return [] + + max_col = max((c["col"] for c in cells if c["col"] > 0), default=0) + if max_col == 0: + return [] + + warnings: list[dict] = [] + + for i, row_idx in enumerate(sorted_row_indices): + meta = row_meta.get(row_idx, {}) + rt = meta.get("row_type", "") + + if rt != "subtotal": + # Only validate subtotals — they have clear parent-child + # relationships. Grand totals (e.g. "Total assets", + # "Total liabilities and stockholders' equity") depend on + # hierarchical aggregation that is prone to false positives. + # If all subtotals validate, grand totals are implied-correct. + continue + + # Collect the rows that should sum to this subtotal. + summable: list[int] = [] + + if rt == "subtotal": + # Sum indented line_items (indent >= 1) going back until hitting + # another subtotal or total. This ensures: + # - Income statement: Revenue (indent=0) excluded from expense total + # - Balance sheet: non-current items excluded from current subtotal + # - Cash flow: Net income + adjustments + working capital all included + # in operating subtotal (all are indent=1) + # Section headers are skipped (they have no values). + for j in range(i - 1, -1, -1): + prev_idx = sorted_row_indices[j] + prev_meta = row_meta.get(prev_idx, {}) + prev_rt = prev_meta.get("row_type", "") + prev_il = prev_meta.get("indent_level", 0) + if prev_rt in ("subtotal", "total"): + break # stop at the previous subtotal/total + if prev_rt == "line_item" and prev_il >= 1: + summable.append(prev_idx) + + elif rt == "total": + # Sum preceding subtotals + other totals + ungrouped line_items. + # This treats subtotals/totals as atomic blocks (already aggregated). + for j in range(i - 1, -1, -1): + prev_idx = sorted_row_indices[j] + prev_meta = row_meta.get(prev_idx, {}) + prev_rt = prev_meta.get("row_type", "") + prev_il = prev_meta.get("indent_level", 0) + if prev_rt == "total": + # Include this total as a summable block (e.g. "Total liabilities" + # contributes to "Total liabilities and stockholders' equity"). + summable.append(prev_idx) + break # stop — we've reached another grand total + if prev_rt == "subtotal": + summable.append(prev_idx) + elif prev_rt == "line_item" and prev_il == 0: + summable.append(prev_idx) + # Skip section_headers and indented line_items (already in subtotals) + + if not summable: + continue + + # Validate each column. + for col in range(1, max_col + 1): + total_raw = grid[row_idx].get(col, "") + total_val = _parse_financial_value(total_raw) + if total_val is None: + continue + + running_sum = 0.0 + all_parseable = True + for li_idx in summable: + raw = grid[li_idx].get(col, "") + parsed = _parse_financial_value(raw) + if parsed is None and raw.strip(): + all_parseable = False + break + running_sum += parsed or 0.0 + + if not all_parseable: + continue + + diff = abs(total_val - running_sum) + if diff > 2.0: + warnings.append({ + "row": row_idx, + "label": grid[row_idx].get(0, ""), + "col": col, + "expected": running_sum, + "actual": total_val, + "diff": diff, + }) + + return warnings + + +# --------------------------------------------------------------------------- +# Statement builder +# --------------------------------------------------------------------------- + +def _detect_metadata(section_text: str) -> dict: + """ + Extract statement-level metadata from the section header. + + Detects: + - company_name: e.g. "META PLATFORMS, INC." + - statement_title: e.g. "Condensed Consolidated Statements of Cash Flows" + - currency: e.g. "USD" (from $ symbols or "(In millions)" note) + - unit: e.g. "millions" (from "(In millions)" or "(In thousands)") + """ + # Look at the first few tokens (not just the first) for metadata. + # Chinese reports put metadata in separate \n\n-delimited tokens: + # tok[0]: "合并资产负债表\n2025年9月30日" + # tok[1]: "编制单位:厦门国贸集团股份有限公司" + # tok[2]: "单位:元 币种:人民币 审计类型:未经审计" + header_tokens = section_text.split("\n\n")[:5] if section_text else [] + lines = [] + for tok in header_tokens: + lines.extend(l.strip() for l in tok.split("\n") if l.strip()) + + company_name = None + statement_title = None + currency = None # detect from content, no default assumption + unit = None + + _COMPANY_RE = re.compile( + r"^[A-Z][A-Z\s,.'&\-]+(?:INC|LLC|LTD|CORP|CO|PLC|GROUP|COMPANY|SA|AG|NV)\.?$", + re.IGNORECASE, + ) + + full_heading = "\n".join(lines) + for line in lines: + if _COMPANY_RE.match(line) and not company_name: + company_name = line + # Chinese format: "编制单位:厦门国贸集团股份有限公司" + elif re.search(r"编制单位[::](.+)", line) and not company_name: + company_name = re.search(r"编制单位[::](.+)", line).group(1).strip() + elif re.search(r"(?i)(statement|balance|income|cash flow|资产负债|利润|现金流|損益|貸借)", line) and not statement_title: + statement_title = line + # Unit detection (English + Chinese) + if re.search(r"(?i)\(in millions|单位.*万元", line): + unit = "ten_thousands" if "万元" in line else "millions" + elif re.search(r"(?i)\(in thousands", line): + unit = "thousands" + elif re.search(r"(?i)\(in billions|单位.*亿元", line): + unit = "billions" + elif re.search(r"单位.*[::].*元", line) and "万" not in line and "亿" not in line: + unit = "ones" # Chinese reports in yuan (元) + + # Currency detection from the full heading block + if re.search(r"人民币|CNY|RMB", full_heading): + currency = "CNY" + elif re.search(r"¥|yen|円", full_heading, re.IGNORECASE): + currency = "JPY" + elif re.search(r"€|euro", full_heading, re.IGNORECASE): + currency = "EUR" + elif re.search(r"£|pound|sterling", full_heading, re.IGNORECASE): + currency = "GBP" + elif re.search(r"\$|USD", full_heading): + currency = "USD" + + return { + "company_name": company_name, + "statement_title": statement_title, + "currency": currency, + "unit": unit, + } + + +def _to_canonical_key(label: str) -> str: + """ + Generate a stable canonical key from a row label. + + Normalizes the label to a snake_case identifier that is consistent + across companies and formatting variations: + "Cash and cash equivalents" -> "cash_and_cash_equivalents" + "Cash & cash eq." -> "cash_and_cash_eq" + "Net income (loss)" -> "net_income_loss" + "Total costs and expenses" -> "total_costs_and_expenses" + """ + s = label.lower().strip() + # Replace & with "and" + s = s.replace("&", "and") + # Remove parentheses but keep content + s = s.replace("(", "").replace(")", "") + # Remove common punctuation + s = re.sub(r"[,.:;'\"\-/]", " ", s) + # Collapse whitespace and convert to underscores + s = re.sub(r"\s+", "_", s.strip()) + # Remove trailing underscores + s = s.strip("_") + return s + + +def _normalize_value(raw: str | None) -> dict: + """ + Convert a raw display value into a structured value object. + + Returns: + {"raw": str|None, "normalized": float|None, "is_null": bool} + + Distinguishes: + - null/None -> is_null=True, normalized=None (not reported) + - "" -> is_null=True, normalized=None (formatting artifact) + - "0" -> is_null=False, normalized=0.0 (explicitly zero) + - "(3,097)" -> is_null=False, normalized=-3097.0 + """ + if raw is None or not raw.strip(): + return {"raw": None, "normalized": None, "is_null": True} + + parsed = _parse_financial_value(raw) + return { + "raw": raw, + "normalized": parsed, + "is_null": False, + } + + +def _build_column_metadata(columns: list[str]) -> list[dict]: + """ + Enrich column headers with structured period metadata. + + Parses period_type ("quarter" or "annual") and year from the column label. + """ + result = [] + for col in columns: + col_lower = col.lower() + # Detect period type + if any(kw in col_lower for kw in ["three months", "quarter", "q1", "q2", "q3", "q4"]): + period_type = "quarter" + elif any(kw in col_lower for kw in ["twelve months", "full year", "fiscal year", "annual"]): + period_type = "annual" + else: + period_type = "unknown" + + # Extract year + year_match = re.search(r"20\d{2}", col) + year = int(year_match.group()) if year_match else None + + result.append({ + "label": col, + "period_type": period_type, + "year": year, + }) + return result + + +def _cells_to_row_first( + rows: list[str], + columns: list[str], + cells: list[dict], +) -> list[dict]: + """ + Convert the flat cells array into a row-first analytics-ready schema. + + Each row becomes a single object with: + - label: str — display label + - canonical_key: str — stable snake_case identifier for cross-company use + - row_type: str ("section_header", "line_item", "subtotal", "total") + - indent_level: int + - values: list[{raw, normalized, is_null}] — one per column + """ + expected_cols = len(columns) + + # Group cells by row. + grid: dict[int, dict[int, dict]] = {} + for c in cells: + grid.setdefault(c["row"], {})[c["col"]] = c + + output_rows: list[dict] = [] + for row_idx in sorted(grid): + label_cell = grid[row_idx].get(0, {}) + label = label_cell.get("content", "") + row_type = label_cell.get("row_type", "line_item") + indent_level = label_cell.get("indent_level", 0) + + # Build values array with normalized numeric layer. + values: list[dict] = [] + for col_idx in range(1, expected_cols + 1): + val_cell = grid[row_idx].get(col_idx, {}) + content = val_cell.get("content", "") + raw = content if content.strip() else None + values.append(_normalize_value(raw)) + + output_rows.append({ + "label": label, + "canonical_key": _to_canonical_key(label), + "row_type": row_type, + "indent_level": indent_level, + "values": values, + }) + + return output_rows + + +def build_statement_json( + statement_type: str, + raw_result: dict, + location: Optional[dict], + use_llm: bool = False, +) -> dict: + """ + Build a single statement JSON in analytics-ready row-first schema. + + Output schema: + { + "statement_type": "cash_flow", + "status": "extracted", + "page_range": {"start": 7, "end": 7}, + "columns": [ + {"label": "...", "period_type": "quarter", "year": 2025} + ], + "rows": [ + {"label": "Revenue", "canonical_key": "revenue", + "row_type": "line_item", "indent_level": 0, + "values": [ + {"raw": "59,893", "normalized": 59893.0, "is_null": false} + ]} + ], + "metadata": {"currency": "USD", "unit": "millions", ...}, + "validation_warnings": [...] + } + + Args: + use_llm: When True, run the LLM reconciliation pass after parsing. + """ + if location is None: + return { + "statement_type": statement_type, + "status": "not_found", + "page_range": {"start": None, "end": None}, + "columns": [], + "rows": [], + "metadata": {}, + "validation_warnings": [], + } + + markdown = _get_markdown(raw_result) + section = markdown[location["md_offset"]: location["md_end_offset"]] + + # Extract metadata from the section header. + metadata = _detect_metadata(section) + + # Parse the table. + rows, columns, cells = _parse_plain_text_table(section) + + if use_llm and cells: + from .llm_reconciler import reconcile + rows, columns, cells = reconcile(statement_type, rows, columns, cells) + + # Cross-validate section totals against line-item sums. + validation_warnings: list[dict] = [] + if cells: + validation_warnings = validate_totals(rows, cells) + for w in validation_warnings: + print(f" [WARN] {statement_type} row {w['row']} col {w['col']}: " + f"'{w['label']}' total={w['actual']}, sum={w['expected']}, " + f"diff={w['diff']}") + + # Convert to row-first schema. + row_objects = _cells_to_row_first(rows, columns, cells) + + return { + "statement_type": statement_type, + "status": "extracted" if cells else "found", + "page_range": { + "start": location["start_page"], + "end": location["end_page"], + }, + "columns": _build_column_metadata(columns), + "rows": row_objects, + "metadata": metadata, + "validation_warnings": validation_warnings, + } + + +# --------------------------------------------------------------------------- +# Summary builder +# --------------------------------------------------------------------------- + +def build_summary(locations: dict[str, Optional[dict]]) -> dict: + """Build the summary.json object from the located statements.""" + entries = [] + for statement_type, location in locations.items(): + if location is None: + entries.append({ + "statement_type": statement_type, + "status": "not_found", + "page_range": {"start": None, "end": None}, + }) + else: + entries.append({ + "statement_type": statement_type, + "status": "extracted", + "page_range": { + "start": location["start_page"], + "end": location["end_page"], + }, + }) + return {"summary": entries} diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/textract_adapter.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/textract_adapter.py new file mode 100644 index 000000000..ee4b4e041 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/textract_adapter.py @@ -0,0 +1,290 @@ +""" +extractor/textract_adapter.py +------------------------------ +Converts AWS Textract JSON blocks into the markdown + page_map format +expected by the existing extraction pipeline (Stages 2-5). + +The key idea: reconstruct markdown text with embedded HTML tags from +Textract blocks, so all existing heading search, table parsing, and enrichment +logic in html_table_parser.py and the extract stage works unchanged. + +Public API: + reconstruct_markdown(blocks) -> str + build_page_map(blocks, markdown) -> list[tuple[int, int, int]] + classify_statements_with_llm(markdown) -> list[dict] + +Internal helpers (exposed for testing): + _build_block_index(blocks) -> dict[str, dict] + _table_to_html(table_block, index) -> str +""" + +from __future__ import annotations + +import json +import logging +import re +from collections import defaultdict + +logger = logging.getLogger(__name__) + + +# --------------------------------------------------------------------------- +# 1. Block index +# --------------------------------------------------------------------------- + +def _build_block_index(blocks: list[dict]) -> dict[str, dict]: + """Build a lookup dict from Block Id -> Block for fast traversal.""" + return {block["Id"]: block for block in blocks} + + +# --------------------------------------------------------------------------- +# Internal helpers +# --------------------------------------------------------------------------- + +def _get_child_ids(block: dict, relationship_type: str = "CHILD") -> list[str]: + """ + Get child IDs from a block's Relationships array for the given type. + + Args: + block: A Textract block dict. + relationship_type: The relationship type to look for (e.g. "CHILD"). + + Returns: + List of block IDs, or empty list if no matching relationship. + """ + for rel in block.get("Relationships", []): + if rel.get("Type") == relationship_type: + return rel.get("Ids", []) + return [] + + +def _get_cell_text(cell_block: dict, index: dict[str, dict]) -> str: + """ + Get the text content of a CELL block by traversing CELL -> CHILD -> WORD. + + Joins Word.Text values with spaces. + """ + word_ids = _get_child_ids(cell_block, "CHILD") + words = [] + for wid in word_ids: + word_block = index.get(wid) + if word_block and word_block.get("BlockType") == "WORD": + words.append(word_block.get("Text", "")) + return " ".join(words) + + +# --------------------------------------------------------------------------- +# 2. Table to HTML +# --------------------------------------------------------------------------- + +def _table_to_html(table_block: dict, index: dict[str, dict]) -> str: + """ + Convert a Textract TABLE block into an HTML
string. + + - Gets CELL children from the TABLE block's CHILD relationship + - Groups cells by RowIndex (1-based) + - Sorts cells within each row by ColumnIndex + - Gets cell text by traversing CELL -> WORD children + - Uses
for cells with COLUMN_HEADER entity type, otherwise + - Handles colspan/rowspan attributes when > 1 + """ + cell_ids = _get_child_ids(table_block, "CHILD") + + # Group cells by row + rows: dict[int, list[dict]] = defaultdict(list) + for cell_id in cell_ids: + cell = index.get(cell_id) + if cell and cell.get("BlockType") == "CELL": + row_idx = cell.get("RowIndex", 1) + rows[row_idx].append(cell) + + # Build HTML + html_parts = [""] + + for row_num in sorted(rows.keys()): + cells = sorted(rows[row_num], key=lambda c: c.get("ColumnIndex", 1)) + html_parts.append("") + + for cell in cells: + text = _get_cell_text(cell, index) + entity_types = cell.get("EntityTypes", []) + is_header = "COLUMN_HEADER" in entity_types + + tag = "th" if is_header else "td" + + # Build attributes for colspan/rowspan + attrs = "" + col_span = cell.get("ColumnSpan", 1) + row_span = cell.get("RowSpan", 1) + if col_span > 1: + attrs += f' colspan="{col_span}"' + if row_span > 1: + attrs += f' rowspan="{row_span}"' + + html_parts.append(f"<{tag}{attrs}>{text}") + + html_parts.append("") + + html_parts.append("
") + return "".join(html_parts) + + +# --------------------------------------------------------------------------- +# 3. Reconstruct markdown +# --------------------------------------------------------------------------- + +def reconstruct_markdown(blocks: list[dict]) -> str: + """ + Build a markdown-like document from Textract blocks. + + - Collects LINE blocks (text) and TABLE blocks (convert to HTML) + - Sorts by page number, then by vertical position (BoundingBox.Top) + - Inserts markers at page boundaries + - LINE blocks become \\n{text} + - TABLE blocks become \\n\\n{html}\\n\\n + + The output mimics what Azure Content Understanding produces, so all + existing heading search, table parsing, and enrichment logic works. + """ + index = _build_block_index(blocks) + + # Collect renderable elements: LINEs and TABLEs + elements: list[tuple[int, float, str, str]] = [] + # Each element: (page, top_position, block_type, content) + + for block in blocks: + block_type = block.get("BlockType") + page = block.get("Page", 1) + top = block.get("Geometry", {}).get("BoundingBox", {}).get("Top", 0.0) + + if block_type == "LINE": + text = block.get("Text", "") + elements.append((page, top, "LINE", text)) + elif block_type == "TABLE": + html = _table_to_html(block, index) + elements.append((page, top, "TABLE", html)) + + # Sort by page, then vertical position + elements.sort(key=lambda e: (e[0], e[1])) + + # Build markdown with page markers + parts: list[str] = [] + current_page = 0 + + for page, top, block_type, content in elements: + if page != current_page: + current_page = page + parts.append(f"") + + if block_type == "LINE": + parts.append(f"\n{content}") + elif block_type == "TABLE": + parts.append(f"\n\n{content}\n\n") + + return "".join(parts) + + +# --------------------------------------------------------------------------- +# 4. Build page map +# --------------------------------------------------------------------------- + +def build_page_map( + blocks: list[dict], markdown: str +) -> list[tuple[int, int, int]]: + """ + Build a page_map compatible with the existing pipeline. + + Finds markers in the markdown and returns a list of + (start_offset, end_offset, page_number) tuples sorted by offset. + Each entry covers from one marker to the next (or end of string). + """ + marker_re = re.compile(r"") + entries: list[tuple[int, int, int]] = [] + + matches = list(marker_re.finditer(markdown)) + for i, m in enumerate(matches): + page_num = int(m.group(1)) + start = m.start() + if i + 1 < len(matches): + end = matches[i + 1].start() + else: + end = len(markdown) + entries.append((start, end, page_num)) + + entries.sort(key=lambda t: t[0]) + return entries + + +# --------------------------------------------------------------------------- +# 5. Classify statements with LLM +# --------------------------------------------------------------------------- + +def classify_statements_with_llm(markdown: str) -> list[dict]: + """ + Use Azure OpenAI to classify financial statements in the document. + + Sends first 8000 chars of markdown to the LLM and asks it to identify + statement types (balance_sheet, income_statement, cash_flow). + + For each, returns: statement_type, title_raw, currency (ISO 4217), unit, + accounting_standard, is_consolidated, report_language, company_name. + + Returns empty list if LLM is not available (graceful fallback). + """ + # Lazy imports to avoid circular imports and env var issues at module level + try: + from extractor.llm_reconciler import _get_client, _DEPLOYMENT + except Exception: + logger.warning("[textract_adapter] Could not import LLM client, skipping classification") + return [] + + snippet = markdown[:8000] + + prompt = f"""You are a financial document analyst. Given the following extracted text from a financial report, identify each financial statement present. + +For each statement found, return: +- statement_type: one of "balance_sheet", "income_statement", "cash_flow" +- title_raw: the exact title as it appears in the document +- currency: ISO 4217 currency code (e.g. "USD", "EUR", "GBP") +- unit: the unit of values (e.g. "millions", "thousands", "units") +- accounting_standard: e.g. "IFRS", "US GAAP", "GAAP", or null +- is_consolidated: true if consolidated/group statement, false if standalone +- report_language: ISO 639-1 language code (e.g. "en", "fr", "zh") +- company_name: name of the reporting entity + +If you cannot determine a field, use null. + +Document text: +{snippet} + +Respond with a JSON object: {{"statements": [...]}}""" + + try: + client = _get_client() + response = client.chat.completions.create( + model=_DEPLOYMENT(), + response_format={"type": "json_object"}, + messages=[ + {"role": "system", "content": "You are a financial document analyst. Return only valid JSON."}, + {"role": "user", "content": prompt}, + ], + temperature=0.0, + max_tokens=2000, + ) + + content = response.choices[0].message.content + if not content: + logger.warning("[textract_adapter] LLM returned empty content") + return [] + raw = content.strip() + parsed = json.loads(raw) + + if isinstance(parsed, dict): + return parsed.get("statements", []) + elif isinstance(parsed, list): + return parsed + return [] + + except Exception as e: + logger.warning(f"[textract_adapter] LLM classification failed: {e}") + return [] diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/extractor/textract_client.py b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/textract_client.py new file mode 100644 index 000000000..b3cffd089 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/extractor/textract_client.py @@ -0,0 +1,123 @@ +""" +extractor/textract_client.py +---------------------------- +AWS Textract client for document analysis. + +Handles: S3 upload -> StartDocumentAnalysis -> poll -> return blocks -> S3 cleanup. + +Configuration (via env vars or ~/.aws/credentials): + AWS_REGION -- AWS region (must match S3 bucket and Textract) + AWS_S3_BUCKET -- S3 bucket for temporary PDF upload + AWS_S3_PREFIX -- S3 key prefix (default: "textract-input") + + Authentication via either: + - ~/.aws/credentials (local dev -- boto3 reads automatically) + - AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY env vars (production) +""" + +import logging +import os +import time +import uuid + +import boto3 + +logger = logging.getLogger(__name__) + +_POLL_INTERVAL = 3 +_POLL_TIMEOUT = 300 # 5 minutes + + +def _get_s3_client(): + region = os.environ.get("AWS_REGION", "us-east-1") + return boto3.client("s3", region_name=region) + + +def _get_textract_client(): + region = os.environ.get("AWS_REGION", "us-east-1") + return boto3.client("textract", region_name=region) + + +def analyze_document(file_path: str) -> dict: + """ + Submit a PDF for Textract table analysis and return the full result. + + 1. Uploads PDF to S3 + 2. Starts async document analysis with TABLES feature + 3. Polls until complete + 4. Handles pagination for large documents + 5. Cleans up S3 object + 6. Returns the Textract response with all Blocks + + Args: + file_path: Path to the PDF file on disk. + + Returns: + Textract response dict containing Blocks array. + """ + bucket = os.environ["AWS_S3_BUCKET"] + prefix = os.environ.get("AWS_S3_PREFIX", "textract-input") + s3_key = f"{prefix}/{uuid.uuid4().hex}.pdf" + + s3 = _get_s3_client() + textract = _get_textract_client() + + # Step 1: Upload to S3 + logger.info(f"Uploading PDF to s3://{bucket}/{s3_key}") + s3.upload_file(Filename=file_path, Bucket=bucket, Key=s3_key) + + try: + # Step 2: Start analysis + logger.info("Starting Textract document analysis") + start_resp = textract.start_document_analysis( + DocumentLocation={"S3Object": {"Bucket": bucket, "Name": s3_key}}, + FeatureTypes=["TABLES"], + ) + job_id = start_resp["JobId"] + logger.info(f"Textract job started: {job_id}") + + # Step 3: Poll until complete + result = _poll_job(textract, job_id) + + # Step 4: Handle pagination (large documents) + all_blocks = list(result.get("Blocks", [])) + next_token = result.get("NextToken") + while next_token: + page_resp = textract.get_document_analysis( + JobId=job_id, NextToken=next_token + ) + all_blocks.extend(page_resp.get("Blocks", [])) + next_token = page_resp.get("NextToken") + + result["Blocks"] = all_blocks + return result + + finally: + # Step 5: Always clean up S3 + logger.info(f"Cleaning up s3://{bucket}/{s3_key}") + try: + s3.delete_object(Bucket=bucket, Key=s3_key) + except Exception as e: + logger.warning(f"S3 cleanup failed (non-fatal): {e}") + + +def _poll_job(textract, job_id: str) -> dict: + """Poll Textract job until SUCCEEDED or FAILED.""" + start = time.time() + attempt = 0 + + while (time.time() - start) < _POLL_TIMEOUT: + attempt += 1 + resp = textract.get_document_analysis(JobId=job_id) + status = resp["JobStatus"] + logger.info(f" [{attempt}] Textract status: {status}") + + if status == "SUCCEEDED": + return resp + if status == "FAILED": + msg = resp.get("StatusMessage", "Unknown error") + raise RuntimeError(f"Textract job failed: {msg}") + + time.sleep(_POLL_INTERVAL) + + raise TimeoutError(f"Textract job did not complete within {_POLL_TIMEOUT}s") diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/function_app.py b/samples/mcs-finance-statement-agent/src/azure-functions/function_app.py new file mode 100644 index 000000000..82c7f17b8 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/function_app.py @@ -0,0 +1,305 @@ +""" +function_app.py +--------------- +Azure Function HTTP router for the Financial Statement Extraction Pipeline. + +This file is a thin routing layer only — all business logic lives in the +extractor/ modules: + + extractor/extract_endpoints.py — PDF extraction (upload, URL, polling) + extractor/excel_endpoint.py — Excel generation and download + extractor/review_endpoints.py — HITL review (Adaptive Cards, corrections) + extractor/job_store.py — Blob storage CRUD for job results + +API Endpoints: + POST /api/extract — Submit PDF for extraction (async, returns jobId) + POST /api/extract-by-url — Submit PDF URL for extraction + GET /api/extract/status/{id} — Poll extraction status and results + POST /api/generate-excel — Generate Excel workbook from results + POST /api/build-review-card — Build Adaptive Card for HITL review + POST /api/parse-card-submission — Parse Adaptive Card submit payload + POST /api/apply-corrections — Apply analyst corrections + GET /api/fx-rate — Fetch FX conversion rates + GET /api/health — Health check +""" + +import json +import logging +import time +import traceback +from datetime import datetime + +import azure.functions as func + +# --------------------------------------------------------------------------- +# Azure Function App instance +# --------------------------------------------------------------------------- + +app = func.FunctionApp(http_auth_level=func.AuthLevel.FUNCTION) + + +# --------------------------------------------------------------------------- +# POST /extract — submit PDF for async extraction +# --------------------------------------------------------------------------- + +@app.route(route="extract", methods=["POST"]) +def extract(req: func.HttpRequest) -> func.HttpResponse: + """Accept a PDF upload and start async extraction. Returns 202 + jobId.""" + logging.info("Extract request received") + try: + status, body, headers = _import_extract().handle_extract( + body=req.get_body(), + files=req.files, + form=req.form, + params=dict(req.params), + headers=dict(req.headers), + ) + return func.HttpResponse( + json.dumps(body, ensure_ascii=False), + status_code=status, headers=headers, mimetype="application/json", + ) + except Exception as e: + logging.exception("Failed to start extraction job") + return _error_response(500, f"Failed to start job: {e}") + + +# --------------------------------------------------------------------------- +# POST /extract-by-url — submit PDF URL for async extraction +# --------------------------------------------------------------------------- + +@app.route(route="extract-by-url", methods=["POST"]) +def extract_by_url(req: func.HttpRequest) -> func.HttpResponse: + """Accept a PDF download URL and start async extraction.""" + logging.info("Extract-by-url request received") + try: + body = req.get_json() + except Exception: + return _error_response(400, "Invalid JSON body") + + try: + status, resp, headers = _import_extract().handle_extract_by_url( + body=body, params=dict(req.params), + ) + return func.HttpResponse( + json.dumps(resp, ensure_ascii=False), + status_code=status, headers=headers, mimetype="application/json", + ) + except Exception as e: + logging.exception("Failed to start extraction job (via URL)") + return _error_response(500, f"Failed to start job: {e}") + + +# --------------------------------------------------------------------------- +# GET /extract/status/{jobId} — poll for results +# --------------------------------------------------------------------------- + +@app.route(route="extract/status/{jobId}", methods=["GET"], + auth_level=func.AuthLevel.ANONYMOUS) +def extract_status(req: func.HttpRequest) -> func.HttpResponse: + """Poll for extraction job status. Anonymous auth — jobId is the token.""" + job_id = req.route_params.get("jobId") + status, body = _import_extract().handle_extract_status(job_id) + return func.HttpResponse( + json.dumps(body, ensure_ascii=False, indent=2), + status_code=status, mimetype="application/json", + ) + + +# --------------------------------------------------------------------------- +# POST /generate-excel — build Excel from extraction results +# --------------------------------------------------------------------------- + +@app.route(route="generate-excel", methods=["POST"]) +def generate_excel(req: func.HttpRequest) -> func.HttpResponse: + """Generate a formatted Excel workbook and return a download URL.""" + job_id = req.params.get("jobId") + fx_target_currency = None + fx_spot_rate = None + fx_avg_rate = None + fx_rate_date = "" + try: + body = req.get_json() + if not job_id: + job_id = body.get("jobId") + fx_target_currency = body.get("fxTargetCurrency") or req.params.get("fxTargetCurrency") + fx_spot_rate = body.get("fxSpotRate") or req.params.get("fxSpotRate") + fx_avg_rate = body.get("fxAvgRate") or req.params.get("fxAvgRate") + fx_rate_date = body.get("fxRateDate") or req.params.get("fxRateDate") or "" + except Exception: + pass + + if not job_id: + return _error_response(400, "jobId is required") + + if fx_spot_rate: + fx_spot_rate = float(fx_spot_rate) + if fx_avg_rate: + fx_avg_rate = float(fx_avg_rate) + + logging.info(f"generate-excel: building Excel for job {job_id}") + try: + from extractor.excel_endpoint import handle_generate_excel + status, body = handle_generate_excel( + job_id, fx_target_currency, fx_spot_rate, fx_avg_rate, fx_rate_date, + ) + return func.HttpResponse( + json.dumps(body, ensure_ascii=False), + status_code=status, mimetype="application/json", + ) + except Exception as e: + logging.exception(f"generate-excel failed for job {job_id}") + return _error_response(500, str(e)) + + +# --------------------------------------------------------------------------- +# POST /build-review-card — Adaptive Card for HITL review +# --------------------------------------------------------------------------- + +@app.route(route="build-review-card", methods=["POST"]) +def build_review_card(req: func.HttpRequest) -> func.HttpResponse: + """Generate an Adaptive Card JSON payload for HITL statement review.""" + try: + body = req.get_json() + except Exception: + return _error_response(400, "Invalid JSON body") + + from extractor.review_endpoints import handle_build_review_card + status, resp = handle_build_review_card( + job_id=body.get("jobId"), + session_state_str=body.get("sessionState", ""), + ) + return func.HttpResponse( + json.dumps(resp, ensure_ascii=False), + status_code=status, mimetype="application/json", + ) + + +# --------------------------------------------------------------------------- +# POST /parse-card-submission — advance review session state +# --------------------------------------------------------------------------- + +@app.route(route="parse-card-submission", methods=["POST"]) +def parse_card_submission_route(req: func.HttpRequest) -> func.HttpResponse: + """Parse an Adaptive Card submission and advance the session state machine.""" + try: + body = req.get_json() + except Exception: + return _error_response(400, "Invalid JSON body") + + try: + from extractor.review_endpoints import handle_parse_card_submission + status, resp = handle_parse_card_submission( + session_state_str=body.get("sessionState", ""), + payload_raw=body.get("payload", "{}"), + ) + return func.HttpResponse( + json.dumps(resp, ensure_ascii=False), + status_code=status, mimetype="application/json", + ) + except Exception as e: + logging.exception(f"parse-card-submission error: {e}") + return _error_response(500, f"Internal error: {e}") + + +# --------------------------------------------------------------------------- +# POST /apply-corrections — merge analyst corrections +# --------------------------------------------------------------------------- + +@app.route(route="apply-corrections", methods=["POST"]) +def apply_corrections(req: func.HttpRequest) -> func.HttpResponse: + """Apply HITL analyst corrections and return corrected statement data.""" + try: + body = req.get_json() + except Exception: + return _error_response(400, "Invalid JSON body") + + job_id = body.get("jobId") + if not job_id: + return _error_response(400, "Missing jobId") + + from extractor.review_endpoints import handle_apply_corrections + status, resp = handle_apply_corrections( + job_id=job_id, + session_state_str=body.get("sessionState", "{}"), + ) + return func.HttpResponse( + json.dumps(resp, ensure_ascii=False), + status_code=status, mimetype="application/json", + ) + + +# --------------------------------------------------------------------------- +# GET /fx-rate — FX conversion rates +# --------------------------------------------------------------------------- + +@app.route(route="fx-rate", methods=["GET"], auth_level=func.AuthLevel.ANONYMOUS) +def fx_rate(req: func.HttpRequest) -> func.HttpResponse: + """Fetch FX rates for currency conversion.""" + try: + status, body = _import_extract().handle_fx_rate(dict(req.params)) + return func.HttpResponse( + json.dumps(body), status_code=status, mimetype="application/json", + ) + except Exception as e: + logging.exception(f"FX rate fetch failed: {e}") + return _error_response(500, str(e)) + + +# --------------------------------------------------------------------------- +# GET /health — health check +# --------------------------------------------------------------------------- + +@app.route(route="health", methods=["GET"]) +def health(req: func.HttpRequest) -> func.HttpResponse: + """Health check endpoint.""" + return func.HttpResponse( + json.dumps({ + "status": "healthy", + "version": "3.0.0", + "schema_version": "1.2.0", + "timestamp": datetime.utcnow().isoformat() + "Z", + "active_jobs": 0, + }), + mimetype="application/json", + ) + + +# --------------------------------------------------------------------------- +# GET /timeout-test — connector timeout diagnostic +# --------------------------------------------------------------------------- + +@app.route(route="timeout-test", methods=["GET"]) +def timeout_test(req: func.HttpRequest) -> func.HttpResponse: + """Test endpoint to measure connector timeout limits.""" + sleep_seconds = int(req.params.get("sleep", "10")) + start = time.time() + logging.info(f"timeout-test: sleeping for {sleep_seconds}s ...") + time.sleep(sleep_seconds) + elapsed = round(time.time() - start, 2) + return func.HttpResponse( + json.dumps({ + "status": "ok", + "requested_sleep": sleep_seconds, + "actual_elapsed": elapsed, + "timestamp": datetime.utcnow().isoformat() + "Z", + }), + mimetype="application/json", + ) + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + +def _error_response(status_code: int, message: str) -> func.HttpResponse: + """Build a JSON error response.""" + return func.HttpResponse( + json.dumps({"error": message}), + status_code=status_code, mimetype="application/json", + ) + + +def _import_extract(): + """Lazy import of extract_endpoints to avoid circular imports.""" + from extractor import extract_endpoints + return extract_endpoints diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/host.json b/samples/mcs-finance-statement-agent/src/azure-functions/host.json new file mode 100644 index 000000000..c237ca323 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/host.json @@ -0,0 +1,23 @@ +{ + "version": "2.0", + "logging": { + "applicationInsights": { + "samplingSettings": { + "isEnabled": true, + "excludedTypes": "Request" + } + } + }, + "extensionBundle": { + "id": "Microsoft.Azure.Functions.ExtensionBundle", + "version": "[4.*, 5.0.0)" + }, + "extensions": { + "http": { + "routePrefix": "api", + "maxOutstandingRequests": 10, + "maxConcurrentRequests": 5 + } + }, + "functionTimeout": "00:10:00" +} diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/pytest.ini b/samples/mcs-finance-statement-agent/src/azure-functions/pytest.ini new file mode 100644 index 000000000..660541023 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/pytest.ini @@ -0,0 +1,4 @@ +[pytest] +testpaths = tests +python_files = test_*.py +python_functions = test_* diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/requirements.txt b/samples/mcs-finance-statement-agent/src/azure-functions/requirements.txt new file mode 100644 index 000000000..c5e43df34 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/requirements.txt @@ -0,0 +1,13 @@ +azure-functions +azure-identity +azure-storage-blob +azure-ai-documentintelligence>=1.0.0 +requests +python-dotenv +openai>=1.0 +pytest>=7.0 +msal>=1.28.0 +httpx>=0.27.0 +openpyxl>=3.1.0 +boto3>=1.34.0 +pdfplumber>=0.11.0 diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/scripts/print_tables.py b/samples/mcs-finance-statement-agent/src/azure-functions/scripts/print_tables.py new file mode 100644 index 000000000..350fe1ad4 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/scripts/print_tables.py @@ -0,0 +1,71 @@ +""" +scripts/print_tables.py +----------------------- +Developer utility: print all three financial statements as formatted ASCII +tables to the console, reading from the output/*.json files. + +Usage (run from the project root): + python scripts/print_tables.py + +Prerequisites: + output/balance_sheet.json, output/income_statement.json, + output/cash_flow.json must exist (produced by main.py or reprocess.py). +""" + +import json +import sys +from pathlib import Path + +sys.stdout.reconfigure(encoding="utf-8") + +# Locate the output directory relative to this script's location. +OUTPUT_DIR = Path(__file__).resolve().parent.parent / "output" + + +def print_table(stype: str) -> None: + """Print one statement as a formatted ASCII table.""" + stmt = json.loads((OUTPUT_DIR / f"{stype}.json").read_text(encoding="utf-8")) + rows = stmt["rows"] + cols = stmt["columns"] + + title = stype.upper().replace("_", " ") + page_range = stmt.get("page_range", {}) + pages = f"pages {page_range.get('start', '?')}-{page_range.get('end', '?')}" + nrows = len(rows) + + sep = "-" * 120 + print(f"\n{'='*120}") + print(f" {title} | {pages} | {nrows} rows") + print(f"{'='*120}") + + # Column headers + col_widths = [22] * len(cols) + header = f" {'Label':<68}" + for i, c in enumerate(cols): + short_name = c[:col_widths[i]] + header += f" {short_name:>{col_widths[i]}}" + print(header) + print(sep) + + for row in rows: + label = row["label"] + values = row.get("values", []) + indent = " " * row.get("indent_level", 0) + prefix = " " + + # Format values + val_strs = [] + for v in values: + val_strs.append(v if v else "") + + line = f"{prefix}{indent}{label:<{68 - len(indent)}}" + for i, v in enumerate(val_strs): + w = col_widths[i] if i < len(col_widths) else 22 + line += f" {v:>{w}}" + print(line) + print(sep) + + +if __name__ == "__main__": + for s in ["balance_sheet", "income_statement", "cash_flow"]: + print_table(s) diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/scripts/reprocess.py b/samples/mcs-finance-statement-agent/src/azure-functions/scripts/reprocess.py new file mode 100644 index 000000000..6cb7e375d --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/scripts/reprocess.py @@ -0,0 +1,67 @@ +""" +scripts/reprocess.py +-------------------- +Developer utility: re-run statement detection and parsing against a cached +raw_result.json without hitting the Azure Content Understanding API again. + +Useful for iterating on statement_detector.py changes without incurring API +costs or waiting for the analysis to complete. + +Usage (run from the project root): + python scripts/reprocess.py + python scripts/reprocess.py --llm # with LLM reconciliation + +Prerequisites: + output/raw_result.json must exist (produced by a previous main.py run). +""" + +import argparse +import json +import sys +from pathlib import Path + +# Allow importing the extractor package from the project root. +sys.path.insert(0, str(Path(__file__).resolve().parent.parent)) +sys.stdout.reconfigure(encoding="utf-8") + +from extractor import locate_statements, build_statement_json, build_summary + +OUTPUT_DIR = Path(__file__).resolve().parent.parent / "output" +STATEMENT_TYPES = ["balance_sheet", "cash_flow", "income_statement"] + + +def main(use_llm: bool = False) -> None: + raw_path = OUTPUT_DIR / "raw_result.json" + if not raw_path.exists(): + print(f"ERROR: {raw_path} not found. Run main.py first to generate it.") + sys.exit(1) + + raw_result = json.loads(raw_path.read_text(encoding="utf-8")) + + # Re-run location detection and parsing + locations = locate_statements(raw_result, use_llm=use_llm) + for stype in STATEMENT_TYPES: + stmt = build_statement_json(stype, raw_result, locations.get(stype), use_llm=use_llm) + (OUTPUT_DIR / f"{stype}.json").write_text( + json.dumps(stmt, indent=2, ensure_ascii=False), encoding="utf-8" + ) + (OUTPUT_DIR / "summary.json").write_text( + json.dumps(build_summary(locations), indent=2, ensure_ascii=False), encoding="utf-8" + ) + + # Print a quick summary + for stype in STATEMENT_TYPES: + stmt = json.loads((OUTPUT_DIR / f"{stype}.json").read_text(encoding="utf-8")) + pr = stmt.get("page_range", {}) + print(f"{stype}: {len(stmt['rows'])} rows, pages {pr.get('start')}-{pr.get('end')}") + + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Reprocess cached Azure CU output") + parser.add_argument( + "--llm", + action="store_true", + help="Enable LLM reconciliation pass (requires Azure OpenAI credentials in .env)", + ) + args = parser.parse_args() + main(use_llm=args.llm) diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/scripts/setup_analyzers.py b/samples/mcs-finance-statement-agent/src/azure-functions/scripts/setup_analyzers.py new file mode 100644 index 000000000..b34aa697e --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/scripts/setup_analyzers.py @@ -0,0 +1,109 @@ +""" +scripts/setup_analyzers.py +-------------------------- +Creates (or updates) the custom Content Understanding analyzers on your +Azure resource. Run this once before using the new pipeline. + +Usage: + python scripts/setup_analyzers.py # create both analyzers + python scripts/setup_analyzers.py --list # list existing analyzers + python scripts/setup_analyzers.py --delete # delete both analyzers + +Prerequisites: + AZURE_CU_ENDPOINT must be set in .env (auth via managed identity) +""" + +import argparse +import json +import sys +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parent.parent)) +sys.stdout.reconfigure(encoding="utf-8") + +from extractor.cu_client import ( + create_analyzer, + get_analyzer, + list_analyzers, + delete_analyzer, +) + +TEMPLATE_DIR = Path(__file__).resolve().parent.parent / "analyzer_templates" + +ANALYZERS = { + "financial-statement-locator": TEMPLATE_DIR / "financial_statement_locator.json", + "financial-statement-extractor": TEMPLATE_DIR / "financial_statement_extractor.json", +} + + +def cmd_create(): + """Create or update both analyzers.""" + for analyzer_id, template_path in ANALYZERS.items(): + print(f"\n{'='*60}") + print(f"Creating analyzer: {analyzer_id}") + print(f"Template: {template_path.name}") + print(f"{'='*60}") + + template = json.loads(template_path.read_text(encoding="utf-8")) + + # Check if it already exists. + existing = get_analyzer(analyzer_id) + if existing: + print(f" Analyzer already exists. Updating...") + else: + print(f" Creating new analyzer...") + + try: + result = create_analyzer(analyzer_id, template) + status = result.get("status", "unknown") + print(f" Result: {status}") + if status == "succeeded": + print(f" Analyzer '{analyzer_id}' is ready.") + elif status == "created": + print(f" Analyzer '{analyzer_id}' created (may need a moment to become active).") + else: + print(f" Full response: {json.dumps(result, indent=2, ensure_ascii=False)[:500]}") + except Exception as e: + print(f" ERROR: {e}") + continue + + print(f"\nDone. Both analyzers should now be available on your Azure resource.") + + +def cmd_list(): + """List all analyzers on the resource.""" + analyzers = list_analyzers() + if not analyzers: + print("No analyzers found.") + return + print(f"\n{len(analyzers)} analyzer(s) found:\n") + for a in analyzers: + aid = a.get("analyzerId", "?") + desc = a.get("description", "")[:60] + status = a.get("status", "?") + print(f" {aid:<40} status={status:<12} {desc}") + + +def cmd_delete(): + """Delete both financial statement analyzers.""" + for analyzer_id in ANALYZERS: + print(f"Deleting: {analyzer_id}...") + try: + delete_analyzer(analyzer_id) + print(f" Deleted.") + except Exception as e: + print(f" Error: {e}") + + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Manage CU custom analyzers") + parser.add_argument("--list", action="store_true", help="List existing analyzers") + parser.add_argument("--delete", action="store_true", help="Delete both analyzers") + args = parser.parse_args() + + if args.list: + cmd_list() + elif args.delete: + cmd_delete() + else: + cmd_create() diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/scripts/validate_output.py b/samples/mcs-finance-statement-agent/src/azure-functions/scripts/validate_output.py new file mode 100644 index 000000000..3b56d3e20 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/scripts/validate_output.py @@ -0,0 +1,166 @@ +""" +scripts/validate_output.py +-------------------------- +Strict validation agent: validates output JSONs against the analytics-ready +schema and spot-checks accuracy against known Meta report values. +""" + +import json +import sys +from pathlib import Path + +sys.stdout.reconfigure(encoding="utf-8") +OUTPUT_DIR = Path(__file__).resolve().parent.parent / "output" + +# === Schema definition === +REQUIRED_TOP_KEYS = ["statement_type", "status", "page_range", "columns", "rows", "metadata", "validation_warnings"] +REQUIRED_COL_KEYS = ["label", "period_type", "year"] +REQUIRED_ROW_KEYS = ["label", "canonical_key", "row_type", "indent_level", "values"] +REQUIRED_VAL_KEYS = ["raw", "normalized", "is_null"] +VALID_ROW_TYPES = {"section_header", "line_item", "subtotal", "total"} +VALID_PERIOD_TYPES = {"quarter", "annual", "unknown"} + +# === Known Meta report values for accuracy checks === +ACCURACY_CHECKS = { + "income_statement": { + "revenue": [59893.0, 48385.0, 200966.0, 164501.0], + "net_income": [22768.0, 20838.0, 60458.0, 62360.0], + "total_costs_and_expenses": [35148.0, 25020.0, 117690.0, 95121.0], + }, + "balance_sheet": { + "total_assets": [366021.0, 276054.0], + "total_liabilities": [148778.0, 93417.0], + "cash_and_cash_equivalents": [35873.0, 43889.0], + }, + "cash_flow": { + "impairment_charges_for_facilities_consolidation": [None, 94.0, None, 383.0], + "repurchases_of_class_a_common_stock": [None, None, -26248.0, -30125.0], + "net_cash_provided_by_operating_activities": [36214.0, 27988.0, 115800.0, 91328.0], + "payments_for_held_for_sale_assets": [-635.0, None, -2432.0, None], + "proceeds_from_venture_distribution": [2554.0, None, 2554.0, None], + "net_income": [22768.0, 20838.0, 60458.0, 62360.0], + }, +} + + +def find_row(stmt, key): + for r in stmt["rows"]: + if r.get("canonical_key") == key: + return r + return None + + +def validate(): + issues = [] + suggested_fixes = [] + schema_valid = True + json_valid = True + accuracy_valid = True + + stmts = {} + for stype in ["income_statement", "balance_sheet", "cash_flow"]: + path = OUTPUT_DIR / f"{stype}.json" + if not path.exists(): + issues.append({"issue_type": "missing_file", "field": stype, "description": f"{path} not found"}) + json_valid = False + continue + try: + stmts[stype] = json.loads(path.read_text(encoding="utf-8")) + except json.JSONDecodeError as e: + issues.append({"issue_type": "invalid_json", "field": stype, "description": str(e)}) + json_valid = False + + for stype, stmt in stmts.items(): + # --- Top-level keys --- + for k in REQUIRED_TOP_KEYS: + if k not in stmt: + issues.append({"issue_type": "missing_field", "field": f"{stype}.{k}", "description": f"Missing required key: {k}"}) + schema_valid = False + + # --- Columns --- + for ci, col in enumerate(stmt.get("columns", [])): + for k in REQUIRED_COL_KEYS: + if k not in col: + issues.append({"issue_type": "missing_field", "field": f"{stype}.columns[{ci}].{k}", "description": f"Missing: {k}"}) + schema_valid = False + if col.get("period_type") not in VALID_PERIOD_TYPES: + issues.append({"issue_type": "unsupported_value", "field": f"{stype}.columns[{ci}].period_type", "description": f"Invalid: {col.get('period_type')}"}) + schema_valid = False + if col.get("year") is not None and not isinstance(col.get("year"), int): + issues.append({"issue_type": "type_mismatch", "field": f"{stype}.columns[{ci}].year", "description": f"Expected int, got {type(col.get('year')).__name__}"}) + schema_valid = False + + n_cols = len(stmt.get("columns", [])) + + # --- Rows --- + for ri, row in enumerate(stmt.get("rows", [])): + for k in REQUIRED_ROW_KEYS: + if k not in row: + issues.append({"issue_type": "missing_field", "field": f"{stype}.rows[{ri}].{k}", "description": f"Missing: {k}"}) + schema_valid = False + if row.get("row_type") not in VALID_ROW_TYPES: + issues.append({"issue_type": "unsupported_value", "field": f"{stype}.rows[{ri}].row_type", "description": f"Invalid: {row.get('row_type')}"}) + schema_valid = False + if not isinstance(row.get("indent_level", 0), int): + issues.append({"issue_type": "type_mismatch", "field": f"{stype}.rows[{ri}].indent_level", "description": "Must be int"}) + schema_valid = False + + vals = row.get("values", []) + if len(vals) != n_cols: + issues.append({"issue_type": "length_mismatch", "field": f"{stype}.rows[{ri}].values", "description": f"Expected {n_cols}, got {len(vals)}"}) + schema_valid = False + + for vi, v in enumerate(vals): + for k in REQUIRED_VAL_KEYS: + if k not in v: + issues.append({"issue_type": "missing_field", "field": f"{stype}.rows[{ri}].values[{vi}].{k}", "description": f"Missing: {k}"}) + schema_valid = False + if not isinstance(v.get("is_null"), bool): + issues.append({"issue_type": "type_mismatch", "field": f"{stype}.rows[{ri}].values[{vi}].is_null", "description": "Must be bool"}) + schema_valid = False + if v.get("normalized") is not None and not isinstance(v.get("normalized"), (int, float)): + issues.append({"issue_type": "type_mismatch", "field": f"{stype}.rows[{ri}].values[{vi}].normalized", "description": f"Must be number, got {type(v.get('normalized')).__name__}"}) + schema_valid = False + # Consistency + if v.get("is_null") and v.get("raw") is not None: + issues.append({"issue_type": "inconsistency", "field": f"{stype}.rows[{ri}].values[{vi}]", "description": "is_null=true but raw is not null"}) + if not v.get("is_null") and v.get("raw") is None: + issues.append({"issue_type": "inconsistency", "field": f"{stype}.rows[{ri}].values[{vi}]", "description": "is_null=false but raw is null"}) + + # --- Metadata --- + if not isinstance(stmt.get("metadata", {}), dict): + issues.append({"issue_type": "type_mismatch", "field": f"{stype}.metadata", "description": "Must be object"}) + schema_valid = False + + # --- ACCURACY CHECKS --- + for stype, checks in ACCURACY_CHECKS.items(): + stmt = stmts.get(stype) + if not stmt: + continue + for key, expected in checks.items(): + row = find_row(stmt, key) + if not row: + issues.append({"issue_type": "missing_row", "field": f"{stype}.{key}", "description": f"Row with canonical_key '{key}' not found"}) + accuracy_valid = False + continue + actual = [v["normalized"] for v in row["values"]] + if actual != expected: + issues.append({ + "issue_type": "accuracy", + "field": f"{stype}.{key}", + "description": f"Expected {expected}, got {actual}", + }) + accuracy_valid = False + + result = { + "schema_valid": schema_valid, + "json_valid": json_valid, + "accuracy_valid": accuracy_valid, + "issues": issues, + "suggested_fixes": suggested_fixes, + } + print(json.dumps(result, indent=2)) + + +if __name__ == "__main__": + validate() diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/debug_xiamen_is.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/debug_xiamen_is.py new file mode 100644 index 000000000..63a70b9b3 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/debug_xiamen_is.py @@ -0,0 +1,93 @@ +"""Debug script: run Xiamen PDF through pipeline stages to find IS detection failure.""" +import os, sys, json, logging + +sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..")) + +# Load env vars from .env file (never hardcode keys) +from dotenv import load_dotenv +load_dotenv(os.path.join(os.path.dirname(__file__), "..", ".env"), override=False) +os.environ.setdefault("AZURE_CU_ENDPOINT", "https://placeholder.cognitiveservices.azure.com") +os.environ.setdefault("AZURE_OPENAI_DEPLOYMENT", "gpt-4.1") +os.environ.setdefault("AZURE_OPENAI_API_VERSION", "2024-12-01-preview") + +logging.basicConfig(level=logging.INFO, format="%(message)s") + +from extractor.stages.contracts import PipelineOptions +from extractor.stages.analyze import run_analyze +from extractor.stages.select import run_select +from extractor.stages.extract import run_extract + +PDF_PATH = os.path.join(os.path.dirname(__file__), "..", "..", "docs", "samples", + "Xiamen ITG Group Corp.,Ltd_QR_2025-09-30T00_00_00_Chinese.pdf") + +print(f"PDF: {PDF_PATH}") +print(f"Exists: {os.path.exists(PDF_PATH)}") +print() + +options = PipelineOptions() + +# Stage 1: Analyze +print("=" * 60) +print("STAGE 1: ANALYZE") +print("=" * 60) +analyze_result = run_analyze(PDF_PATH, options) +print(f"Candidates: {len(analyze_result.candidates)}") +for c in analyze_result.candidates: + print(f" type={c.statement_type}, title_raw='{c.title_raw.encode('ascii','replace').decode()}', " + f"title_en='{c.title_english}', pages={c.page_start}-{c.page_end}, " + f"consolidated={c.is_consolidated}") +print(f"Markdown length: {len(analyze_result.markdown)}") +print() + +# Stage 2: Select +print("=" * 60) +print("STAGE 2: SELECT") +print("=" * 60) +select_result = run_select(analyze_result, options.requested_types) +for stype in ["balance_sheet", "income_statement", "cash_flow"]: + if stype in select_result.selected: + c = select_result.selected[stype] + scores = select_result.scores.get(stype, []) + print(f" {stype}: SELECTED score={scores[0].score if scores else '?'} " + f"title='{c.title_raw}' pages={c.page_start}-{c.page_end}") + else: + scores = select_result.scores.get(stype, []) + if scores: + for sc in scores: + print(f" {stype}: REJECTED score={sc.score:.0f} reason={sc.rejection_reason} " + f"title='{sc.candidate.title_raw[:60]}'") + else: + print(f" {stype}: NO CANDIDATES") +print() + +# Stage 3: Extract +print("=" * 60) +print("STAGE 3: EXTRACT") +print("=" * 60) +extract_result = run_extract( + select_result, analyze_result.markdown, + analyze_result.page_map, analyze_result.pages, + requested_types=options.requested_types, +) +for stype in ["balance_sheet", "income_statement", "cash_flow"]: + if stype in extract_result.statements: + es = extract_result.statements[stype] + print(f" {stype}: OK rows={len(es.rows)} pages={es.start_page}-{es.end_page}") + elif stype in extract_result.failures: + print(f" {stype}: FAILED reason={extract_result.failures[stype]}") + else: + print(f" {stype}: SKIPPED (not selected)") + +# Check for IS heading in markdown +print() +print("=" * 60) +print("MARKDOWN SEARCH FOR IS HEADINGS") +print("=" * 60) +md = analyze_result.markdown +for pattern in ["合并利润表", "利润表", "INCOME STATEMENT", "PROFIT OR LOSS", "STATEMENT OF PROFIT"]: + idx = md.find(pattern) + if idx >= 0: + context = md[max(0, idx-50):idx+100].replace("\n", " ") + print(f" FOUND '{pattern}' at offset {idx}: ...{context}...") + else: + print(f" NOT FOUND: '{pattern}'") diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/fixtures/textract_sample_response.json b/samples/mcs-finance-statement-agent/src/azure-functions/tests/fixtures/textract_sample_response.json new file mode 100644 index 000000000..8b80a1d3b --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/fixtures/textract_sample_response.json @@ -0,0 +1,886 @@ +{ + "DocumentMetadata": { + "Pages": 1 + }, + "JobStatus": "SUCCEEDED", + "Blocks": [ + { + "Id": "block-page-001", + "BlockType": "PAGE", + "Page": 1, + "Confidence": 99.0, + "Geometry": { + "BoundingBox": { + "Width": 1.0, + "Height": 1.0, + "Left": 0.0, + "Top": 0.0 + }, + "Polygon": [ + {"X": 0.0, "Y": 0.0}, + {"X": 1.0, "Y": 0.0}, + {"X": 1.0, "Y": 1.0}, + {"X": 0.0, "Y": 1.0} + ] + }, + "Relationships": [ + { + "Type": "CHILD", + "Ids": [ + "block-line-heading", + "block-line-units", + "block-table-001" + ] + } + ] + }, + { + "Id": "block-line-heading", + "BlockType": "LINE", + "Page": 1, + "Confidence": 99.12, + "Text": "CONSOLIDATED BALANCE SHEET", + "Geometry": { + "BoundingBox": { + "Width": 0.45, + "Height": 0.022, + "Left": 0.05, + "Top": 0.04 + }, + "Polygon": [ + {"X": 0.05, "Y": 0.04}, + {"X": 0.50, "Y": 0.04}, + {"X": 0.50, "Y": 0.062}, + {"X": 0.05, "Y": 0.062} + ] + }, + "Relationships": [ + { + "Type": "CHILD", + "Ids": [ + "block-word-consolidated", + "block-word-balance", + "block-word-sheet" + ] + } + ] + }, + { + "Id": "block-word-consolidated", + "BlockType": "WORD", + "Page": 1, + "Confidence": 99.3, + "Text": "CONSOLIDATED", + "Geometry": { + "BoundingBox": { + "Width": 0.15, + "Height": 0.02, + "Left": 0.05, + "Top": 0.04 + }, + "Polygon": [ + {"X": 0.05, "Y": 0.04}, + {"X": 0.20, "Y": 0.04}, + {"X": 0.20, "Y": 0.06}, + {"X": 0.05, "Y": 0.06} + ] + } + }, + { + "Id": "block-word-balance", + "BlockType": "WORD", + "Page": 1, + "Confidence": 99.5, + "Text": "BALANCE", + "Geometry": { + "BoundingBox": { + "Width": 0.10, + "Height": 0.02, + "Left": 0.21, + "Top": 0.04 + }, + "Polygon": [ + {"X": 0.21, "Y": 0.04}, + {"X": 0.31, "Y": 0.04}, + {"X": 0.31, "Y": 0.06}, + {"X": 0.21, "Y": 0.06} + ] + } + }, + { + "Id": "block-word-sheet", + "BlockType": "WORD", + "Page": 1, + "Confidence": 99.7, + "Text": "SHEET", + "Geometry": { + "BoundingBox": { + "Width": 0.08, + "Height": 0.02, + "Left": 0.32, + "Top": 0.04 + }, + "Polygon": [ + {"X": 0.32, "Y": 0.04}, + {"X": 0.40, "Y": 0.04}, + {"X": 0.40, "Y": 0.06}, + {"X": 0.32, "Y": 0.06} + ] + } + }, + { + "Id": "block-line-units", + "BlockType": "LINE", + "Page": 1, + "Confidence": 98.45, + "Text": "(In millions)", + "Geometry": { + "BoundingBox": { + "Width": 0.15, + "Height": 0.018, + "Left": 0.05, + "Top": 0.068 + }, + "Polygon": [ + {"X": 0.05, "Y": 0.068}, + {"X": 0.20, "Y": 0.068}, + {"X": 0.20, "Y": 0.086}, + {"X": 0.05, "Y": 0.086} + ] + }, + "Relationships": [ + { + "Type": "CHILD", + "Ids": [ + "block-word-in", + "block-word-millions" + ] + } + ] + }, + { + "Id": "block-word-in", + "BlockType": "WORD", + "Page": 1, + "Confidence": 98.1, + "Text": "(In", + "Geometry": { + "BoundingBox": { + "Width": 0.04, + "Height": 0.018, + "Left": 0.05, + "Top": 0.068 + }, + "Polygon": [ + {"X": 0.05, "Y": 0.068}, + {"X": 0.09, "Y": 0.068}, + {"X": 0.09, "Y": 0.086}, + {"X": 0.05, "Y": 0.086} + ] + } + }, + { + "Id": "block-word-millions", + "BlockType": "WORD", + "Page": 1, + "Confidence": 98.9, + "Text": "millions)", + "Geometry": { + "BoundingBox": { + "Width": 0.10, + "Height": 0.018, + "Left": 0.10, + "Top": 0.068 + }, + "Polygon": [ + {"X": 0.10, "Y": 0.068}, + {"X": 0.20, "Y": 0.068}, + {"X": 0.20, "Y": 0.086}, + {"X": 0.10, "Y": 0.086} + ] + } + }, + { + "Id": "block-table-001", + "BlockType": "TABLE", + "EntityTypes": ["STRUCTURED_TABLE"], + "Page": 1, + "Confidence": 97.85, + "Geometry": { + "BoundingBox": { + "Width": 0.60, + "Height": 0.30, + "Left": 0.05, + "Top": 0.10 + }, + "Polygon": [ + {"X": 0.05, "Y": 0.10}, + {"X": 0.65, "Y": 0.10}, + {"X": 0.65, "Y": 0.40}, + {"X": 0.05, "Y": 0.40} + ] + }, + "Relationships": [ + { + "Type": "CHILD", + "Ids": [ + "block-cell-r1c1", + "block-cell-r1c2", + "block-cell-r1c3", + "block-cell-r2c1", + "block-cell-r2c2", + "block-cell-r2c3", + "block-cell-r3c1", + "block-cell-r3c2", + "block-cell-r3c3", + "block-cell-r4c1", + "block-cell-r4c2", + "block-cell-r4c3" + ] + }, + { + "Type": "TABLE_TITLE", + "Ids": [ + "block-line-heading" + ] + }, + { + "Type": "TABLE_FOOTER", + "Ids": [ + "block-line-units" + ] + } + ] + }, + { + "Id": "block-cell-r1c1", + "BlockType": "CELL", + "EntityTypes": ["COLUMN_HEADER"], + "Page": 1, + "Confidence": 99.0, + "RowIndex": 1, + "ColumnIndex": 1, + "RowSpan": 1, + "ColumnSpan": 1, + "Geometry": { + "BoundingBox": { + "Width": 0.25, + "Height": 0.05, + "Left": 0.05, + "Top": 0.10 + }, + "Polygon": [ + {"X": 0.05, "Y": 0.10}, + {"X": 0.30, "Y": 0.10}, + {"X": 0.30, "Y": 0.15}, + {"X": 0.05, "Y": 0.15} + ] + } + }, + { + "Id": "block-cell-r1c2", + "BlockType": "CELL", + "EntityTypes": ["COLUMN_HEADER"], + "Page": 1, + "Confidence": 99.0, + "RowIndex": 1, + "ColumnIndex": 2, + "RowSpan": 1, + "ColumnSpan": 1, + "Geometry": { + "BoundingBox": { + "Width": 0.17, + "Height": 0.05, + "Left": 0.30, + "Top": 0.10 + }, + "Polygon": [ + {"X": 0.30, "Y": 0.10}, + {"X": 0.47, "Y": 0.10}, + {"X": 0.47, "Y": 0.15}, + {"X": 0.30, "Y": 0.15} + ] + }, + "Relationships": [ + { + "Type": "CHILD", + "Ids": ["block-word-2024"] + } + ] + }, + { + "Id": "block-word-2024", + "BlockType": "WORD", + "Page": 1, + "Confidence": 99.8, + "Text": "2024", + "Geometry": { + "BoundingBox": { + "Width": 0.06, + "Height": 0.03, + "Left": 0.355, + "Top": 0.11 + }, + "Polygon": [ + {"X": 0.355, "Y": 0.11}, + {"X": 0.415, "Y": 0.11}, + {"X": 0.415, "Y": 0.14}, + {"X": 0.355, "Y": 0.14} + ] + } + }, + { + "Id": "block-cell-r1c3", + "BlockType": "CELL", + "EntityTypes": ["COLUMN_HEADER"], + "Page": 1, + "Confidence": 99.0, + "RowIndex": 1, + "ColumnIndex": 3, + "RowSpan": 1, + "ColumnSpan": 1, + "Geometry": { + "BoundingBox": { + "Width": 0.18, + "Height": 0.05, + "Left": 0.47, + "Top": 0.10 + }, + "Polygon": [ + {"X": 0.47, "Y": 0.10}, + {"X": 0.65, "Y": 0.10}, + {"X": 0.65, "Y": 0.15}, + {"X": 0.47, "Y": 0.15} + ] + }, + "Relationships": [ + { + "Type": "CHILD", + "Ids": ["block-word-2023"] + } + ] + }, + { + "Id": "block-word-2023", + "BlockType": "WORD", + "Page": 1, + "Confidence": 99.7, + "Text": "2023", + "Geometry": { + "BoundingBox": { + "Width": 0.06, + "Height": 0.03, + "Left": 0.525, + "Top": 0.11 + }, + "Polygon": [ + {"X": 0.525, "Y": 0.11}, + {"X": 0.585, "Y": 0.11}, + {"X": 0.585, "Y": 0.14}, + {"X": 0.525, "Y": 0.14} + ] + } + }, + { + "Id": "block-cell-r2c1", + "BlockType": "CELL", + "Page": 1, + "Confidence": 98.5, + "RowIndex": 2, + "ColumnIndex": 1, + "RowSpan": 1, + "ColumnSpan": 1, + "Geometry": { + "BoundingBox": { + "Width": 0.25, + "Height": 0.05, + "Left": 0.05, + "Top": 0.15 + }, + "Polygon": [ + {"X": 0.05, "Y": 0.15}, + {"X": 0.30, "Y": 0.15}, + {"X": 0.30, "Y": 0.20}, + {"X": 0.05, "Y": 0.20} + ] + }, + "Relationships": [ + { + "Type": "CHILD", + "Ids": ["block-word-total", "block-word-assets"] + } + ] + }, + { + "Id": "block-word-total", + "BlockType": "WORD", + "Page": 1, + "Confidence": 99.1, + "Text": "Total", + "Geometry": { + "BoundingBox": { + "Width": 0.07, + "Height": 0.03, + "Left": 0.06, + "Top": 0.16 + }, + "Polygon": [ + {"X": 0.06, "Y": 0.16}, + {"X": 0.13, "Y": 0.16}, + {"X": 0.13, "Y": 0.19}, + {"X": 0.06, "Y": 0.19} + ] + } + }, + { + "Id": "block-word-assets", + "BlockType": "WORD", + "Page": 1, + "Confidence": 99.4, + "Text": "Assets", + "Geometry": { + "BoundingBox": { + "Width": 0.08, + "Height": 0.03, + "Left": 0.14, + "Top": 0.16 + }, + "Polygon": [ + {"X": 0.14, "Y": 0.16}, + {"X": 0.22, "Y": 0.16}, + {"X": 0.22, "Y": 0.19}, + {"X": 0.14, "Y": 0.19} + ] + } + }, + { + "Id": "block-cell-r2c2", + "BlockType": "CELL", + "Page": 1, + "Confidence": 98.3, + "RowIndex": 2, + "ColumnIndex": 2, + "RowSpan": 1, + "ColumnSpan": 1, + "Geometry": { + "BoundingBox": { + "Width": 0.17, + "Height": 0.05, + "Left": 0.30, + "Top": 0.15 + }, + "Polygon": [ + {"X": 0.30, "Y": 0.15}, + {"X": 0.47, "Y": 0.15}, + {"X": 0.47, "Y": 0.20}, + {"X": 0.30, "Y": 0.20} + ] + }, + "Relationships": [ + { + "Type": "CHILD", + "Ids": ["block-word-total-assets-2024"] + } + ] + }, + { + "Id": "block-word-total-assets-2024", + "BlockType": "WORD", + "Page": 1, + "Confidence": 97.9, + "Text": "125,435", + "Geometry": { + "BoundingBox": { + "Width": 0.09, + "Height": 0.03, + "Left": 0.335, + "Top": 0.16 + }, + "Polygon": [ + {"X": 0.335, "Y": 0.16}, + {"X": 0.425, "Y": 0.16}, + {"X": 0.425, "Y": 0.19}, + {"X": 0.335, "Y": 0.19} + ] + } + }, + { + "Id": "block-cell-r2c3", + "BlockType": "CELL", + "Page": 1, + "Confidence": 98.1, + "RowIndex": 2, + "ColumnIndex": 3, + "RowSpan": 1, + "ColumnSpan": 1, + "Geometry": { + "BoundingBox": { + "Width": 0.18, + "Height": 0.05, + "Left": 0.47, + "Top": 0.15 + }, + "Polygon": [ + {"X": 0.47, "Y": 0.15}, + {"X": 0.65, "Y": 0.15}, + {"X": 0.65, "Y": 0.20}, + {"X": 0.47, "Y": 0.20} + ] + }, + "Relationships": [ + { + "Type": "CHILD", + "Ids": ["block-word-total-assets-2023"] + } + ] + }, + { + "Id": "block-word-total-assets-2023", + "BlockType": "WORD", + "Page": 1, + "Confidence": 97.6, + "Text": "118,290", + "Geometry": { + "BoundingBox": { + "Width": 0.09, + "Height": 0.03, + "Left": 0.495, + "Top": 0.16 + }, + "Polygon": [ + {"X": 0.495, "Y": 0.16}, + {"X": 0.585, "Y": 0.16}, + {"X": 0.585, "Y": 0.19}, + {"X": 0.495, "Y": 0.19} + ] + } + }, + { + "Id": "block-cell-r3c1", + "BlockType": "CELL", + "Page": 1, + "Confidence": 98.7, + "RowIndex": 3, + "ColumnIndex": 1, + "RowSpan": 1, + "ColumnSpan": 1, + "Geometry": { + "BoundingBox": { + "Width": 0.25, + "Height": 0.05, + "Left": 0.05, + "Top": 0.20 + }, + "Polygon": [ + {"X": 0.05, "Y": 0.20}, + {"X": 0.30, "Y": 0.20}, + {"X": 0.30, "Y": 0.25}, + {"X": 0.05, "Y": 0.25} + ] + }, + "Relationships": [ + { + "Type": "CHILD", + "Ids": ["block-word-revenue"] + } + ] + }, + { + "Id": "block-word-revenue", + "BlockType": "WORD", + "Page": 1, + "Confidence": 99.6, + "Text": "Revenue", + "Geometry": { + "BoundingBox": { + "Width": 0.09, + "Height": 0.03, + "Left": 0.06, + "Top": 0.21 + }, + "Polygon": [ + {"X": 0.06, "Y": 0.21}, + {"X": 0.15, "Y": 0.21}, + {"X": 0.15, "Y": 0.24}, + {"X": 0.06, "Y": 0.24} + ] + } + }, + { + "Id": "block-cell-r3c2", + "BlockType": "CELL", + "Page": 1, + "Confidence": 97.8, + "RowIndex": 3, + "ColumnIndex": 2, + "RowSpan": 1, + "ColumnSpan": 1, + "Geometry": { + "BoundingBox": { + "Width": 0.17, + "Height": 0.05, + "Left": 0.30, + "Top": 0.20 + }, + "Polygon": [ + {"X": 0.30, "Y": 0.20}, + {"X": 0.47, "Y": 0.20}, + {"X": 0.47, "Y": 0.25}, + {"X": 0.30, "Y": 0.25} + ] + }, + "Relationships": [ + { + "Type": "CHILD", + "Ids": ["block-word-revenue-2024"] + } + ] + }, + { + "Id": "block-word-revenue-2024", + "BlockType": "WORD", + "Page": 1, + "Confidence": 97.5, + "Text": "59,893", + "Geometry": { + "BoundingBox": { + "Width": 0.08, + "Height": 0.03, + "Left": 0.345, + "Top": 0.21 + }, + "Polygon": [ + {"X": 0.345, "Y": 0.21}, + {"X": 0.425, "Y": 0.21}, + {"X": 0.425, "Y": 0.24}, + {"X": 0.345, "Y": 0.24} + ] + } + }, + { + "Id": "block-cell-r3c3", + "BlockType": "CELL", + "Page": 1, + "Confidence": 97.9, + "RowIndex": 3, + "ColumnIndex": 3, + "RowSpan": 1, + "ColumnSpan": 1, + "Geometry": { + "BoundingBox": { + "Width": 0.18, + "Height": 0.05, + "Left": 0.47, + "Top": 0.20 + }, + "Polygon": [ + {"X": 0.47, "Y": 0.20}, + {"X": 0.65, "Y": 0.20}, + {"X": 0.65, "Y": 0.25}, + {"X": 0.47, "Y": 0.25} + ] + }, + "Relationships": [ + { + "Type": "CHILD", + "Ids": ["block-word-revenue-2023"] + } + ] + }, + { + "Id": "block-word-revenue-2023", + "BlockType": "WORD", + "Page": 1, + "Confidence": 97.3, + "Text": "54,210", + "Geometry": { + "BoundingBox": { + "Width": 0.08, + "Height": 0.03, + "Left": 0.505, + "Top": 0.21 + }, + "Polygon": [ + {"X": 0.505, "Y": 0.21}, + {"X": 0.585, "Y": 0.21}, + {"X": 0.585, "Y": 0.24}, + {"X": 0.505, "Y": 0.24} + ] + } + }, + { + "Id": "block-cell-r4c1", + "BlockType": "CELL", + "Page": 1, + "Confidence": 98.9, + "RowIndex": 4, + "ColumnIndex": 1, + "RowSpan": 1, + "ColumnSpan": 1, + "Geometry": { + "BoundingBox": { + "Width": 0.25, + "Height": 0.05, + "Left": 0.05, + "Top": 0.25 + }, + "Polygon": [ + {"X": 0.05, "Y": 0.25}, + {"X": 0.30, "Y": 0.25}, + {"X": 0.30, "Y": 0.30}, + {"X": 0.05, "Y": 0.30} + ] + }, + "Relationships": [ + { + "Type": "CHILD", + "Ids": ["block-word-net", "block-word-income"] + } + ] + }, + { + "Id": "block-word-net", + "BlockType": "WORD", + "Page": 1, + "Confidence": 99.2, + "Text": "Net", + "Geometry": { + "BoundingBox": { + "Width": 0.04, + "Height": 0.03, + "Left": 0.06, + "Top": 0.26 + }, + "Polygon": [ + {"X": 0.06, "Y": 0.26}, + {"X": 0.10, "Y": 0.26}, + {"X": 0.10, "Y": 0.29}, + {"X": 0.06, "Y": 0.29} + ] + } + }, + { + "Id": "block-word-income", + "BlockType": "WORD", + "Page": 1, + "Confidence": 99.5, + "Text": "Income", + "Geometry": { + "BoundingBox": { + "Width": 0.08, + "Height": 0.03, + "Left": 0.11, + "Top": 0.26 + }, + "Polygon": [ + {"X": 0.11, "Y": 0.26}, + {"X": 0.19, "Y": 0.26}, + {"X": 0.19, "Y": 0.29}, + {"X": 0.11, "Y": 0.29} + ] + } + }, + { + "Id": "block-cell-r4c2", + "BlockType": "CELL", + "Page": 1, + "Confidence": 97.6, + "RowIndex": 4, + "ColumnIndex": 2, + "RowSpan": 1, + "ColumnSpan": 1, + "Geometry": { + "BoundingBox": { + "Width": 0.17, + "Height": 0.05, + "Left": 0.30, + "Top": 0.25 + }, + "Polygon": [ + {"X": 0.30, "Y": 0.25}, + {"X": 0.47, "Y": 0.25}, + {"X": 0.47, "Y": 0.30}, + {"X": 0.30, "Y": 0.30} + ] + }, + "Relationships": [ + { + "Type": "CHILD", + "Ids": ["block-word-net-income-2024"] + } + ] + }, + { + "Id": "block-word-net-income-2024", + "BlockType": "WORD", + "Page": 1, + "Confidence": 96.8, + "Text": "15,320", + "Geometry": { + "BoundingBox": { + "Width": 0.08, + "Height": 0.03, + "Left": 0.345, + "Top": 0.26 + }, + "Polygon": [ + {"X": 0.345, "Y": 0.26}, + {"X": 0.425, "Y": 0.26}, + {"X": 0.425, "Y": 0.29}, + {"X": 0.345, "Y": 0.29} + ] + } + }, + { + "Id": "block-cell-r4c3", + "BlockType": "CELL", + "Page": 1, + "Confidence": 97.2, + "RowIndex": 4, + "ColumnIndex": 3, + "RowSpan": 1, + "ColumnSpan": 1, + "Geometry": { + "BoundingBox": { + "Width": 0.18, + "Height": 0.05, + "Left": 0.47, + "Top": 0.25 + }, + "Polygon": [ + {"X": 0.47, "Y": 0.25}, + {"X": 0.65, "Y": 0.25}, + {"X": 0.65, "Y": 0.30}, + {"X": 0.47, "Y": 0.30} + ] + }, + "Relationships": [ + { + "Type": "CHILD", + "Ids": ["block-word-net-income-2023"] + } + ] + }, + { + "Id": "block-word-net-income-2023", + "BlockType": "WORD", + "Page": 1, + "Confidence": 96.5, + "Text": "12,890", + "Geometry": { + "BoundingBox": { + "Width": 0.08, + "Height": 0.03, + "Left": 0.505, + "Top": 0.26 + }, + "Polygon": [ + {"X": 0.505, "Y": 0.26}, + {"X": 0.585, "Y": 0.26}, + {"X": 0.585, "Y": 0.29}, + {"X": 0.505, "Y": 0.29} + ] + } + } + ] +} diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/gen_sample_excel.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/gen_sample_excel.py new file mode 100644 index 000000000..6bca87e64 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/gen_sample_excel.py @@ -0,0 +1,89 @@ +"""Generate a sample Excel using the professional formatter for verification.""" +import os, sys +sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..")) +os.environ.setdefault("AZURE_CU_ENDPOINT", "dummy") +os.environ.setdefault("AZURE_CU_KEY", "dummy") + +from extractor.excel_formatter import build_professional_excel + +sample_result = { + "summary": [ + {"statement_type": "balance_sheet", "page_range": {"start": 45, "end": 46}, "quality_score": 0.95}, + {"statement_type": "income_statement", "page_range": {"start": 43, "end": 44}, "quality_score": 0.92}, + {"statement_type": "cash_flow", "page_range": {"start": 47, "end": 48}, "quality_score": 0.88}, + ], + "balance_sheet": { + "statement_metadata": { + "statement_title": "Consolidated Balance Sheet", + "statement_title_raw": "Consolidated Balance Sheet", + "currency": "AUD", "currency_symbol": "$", "unit": "millions", + }, + "columns": [{"label": "Item"}, {"label": "30 Jun 2024"}, {"label": "30 Jun 2023"}], + "rows": [ + {"label_raw": "ASSETS", "label_normalized": "Assets", "row_type": "section_header", "indent_level": 0, "values": []}, + {"label_raw": "Cash and cash equivalents", "label_normalized": "Cash and cash equivalents", "row_type": "line_item", "indent_level": 1, "canonical_key": "cash_and_equivalents", "values": [{"normalized": 12450.0, "raw": "12,450"}, {"normalized": 11230.0, "raw": "11,230"}]}, + {"label_raw": "Trading securities", "label_normalized": "Trading securities", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": 8920.0, "raw": "8,920"}, {"normalized": 7650.0, "raw": "7,650"}]}, + {"label_raw": "Loans and advances", "label_normalized": "Loans and advances", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": 215800.0, "raw": "215,800"}, {"normalized": 198500.0, "raw": "198,500"}]}, + {"label_raw": "Property and equipment", "label_normalized": "Property and equipment", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": 3240.0, "raw": "3,240"}, {"normalized": 3180.0, "raw": "3,180"}]}, + {"label_raw": "Goodwill and intangibles", "label_normalized": "Goodwill and intangibles", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": 9870.0, "raw": "9,870"}, {"normalized": 9870.0, "raw": "9,870"}]}, + {"label_raw": "Other assets", "label_normalized": "Other assets", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": 15720.0, "raw": "15,720"}, {"normalized": 14570.0, "raw": "14,570"}]}, + {"label_raw": "Total Assets", "label_normalized": "Total Assets", "row_type": "total", "indent_level": 0, "canonical_key": "total_assets", "values": [{"normalized": 266000.0, "raw": "266,000"}, {"normalized": 245000.0, "raw": "245,000"}]}, + {"label_raw": "LIABILITIES", "label_normalized": "Liabilities", "row_type": "section_header", "indent_level": 0, "values": []}, + {"label_raw": "Deposits", "label_normalized": "Deposits", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": 185000.0, "raw": "185,000"}, {"normalized": 172000.0, "raw": "172,000"}]}, + {"label_raw": "Borrowings", "label_normalized": "Borrowings", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": 42000.0, "raw": "42,000"}, {"normalized": 38500.0, "raw": "38,500"}]}, + {"label_raw": "Other liabilities", "label_normalized": "Other liabilities", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": 12500.0, "raw": "12,500"}, {"normalized": 11200.0, "raw": "11,200"}]}, + {"label_raw": "Total Liabilities", "label_normalized": "Total Liabilities", "row_type": "total", "indent_level": 0, "canonical_key": "total_liabilities", "values": [{"normalized": 239500.0, "raw": "239,500"}, {"normalized": 221700.0, "raw": "221,700"}]}, + {"label_raw": "EQUITY", "label_normalized": "Equity", "row_type": "section_header", "indent_level": 0, "values": []}, + {"label_raw": "Share capital", "label_normalized": "Share capital", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": 12800.0, "raw": "12,800"}, {"normalized": 12400.0, "raw": "12,400"}]}, + {"label_raw": "Retained earnings", "label_normalized": "Retained earnings", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": 13700.0, "raw": "13,700"}, {"normalized": 10900.0, "raw": "10,900"}]}, + {"label_raw": "Total Equity", "label_normalized": "Total Equity", "row_type": "total", "indent_level": 0, "canonical_key": "total_equity", "values": [{"normalized": 26500.0, "raw": "26,500"}, {"normalized": 23300.0, "raw": "23,300"}]}, + ], + }, + "income_statement": { + "statement_metadata": { + "statement_title": "Consolidated Income Statement", + "statement_title_raw": "Consolidated Income Statement", + "currency": "AUD", "currency_symbol": "$", "unit": "millions", + }, + "columns": [{"label": "Item"}, {"label": "FY2024"}, {"label": "FY2023"}], + "rows": [ + {"label_raw": "REVENUE", "label_normalized": "Revenue", "row_type": "section_header", "indent_level": 0, "values": []}, + {"label_raw": "Interest income", "label_normalized": "Interest income", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": 18500.0, "raw": "18,500"}, {"normalized": 15200.0, "raw": "15,200"}]}, + {"label_raw": "Fee and commission income", "label_normalized": "Fee and commission income", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": 3200.0, "raw": "3,200"}, {"normalized": 2900.0, "raw": "2,900"}]}, + {"label_raw": "Total operating revenue", "label_normalized": "Total operating revenue", "row_type": "subtotal", "indent_level": 0, "canonical_key": "total_operating_revenue", "values": [{"normalized": 21700.0, "raw": "21,700"}, {"normalized": 18100.0, "raw": "18,100"}]}, + {"label_raw": "EXPENSES", "label_normalized": "Expenses", "row_type": "section_header", "indent_level": 0, "values": []}, + {"label_raw": "Interest expense", "label_normalized": "Interest expense", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": -9800.0, "raw": "-9,800"}, {"normalized": -7500.0, "raw": "-7,500"}]}, + {"label_raw": "Operating costs", "label_normalized": "Operating costs", "row_type": "line_item", "indent_level": 1, "canonical_key": "operating_costs", "values": [{"normalized": -6200.0, "raw": "-6,200"}, {"normalized": -5800.0, "raw": "-5,800"}]}, + {"label_raw": "Impairment charges", "label_normalized": "Impairment charges", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": -1200.0, "raw": "-1,200"}, {"normalized": -950.0, "raw": "-950"}]}, + {"label_raw": "Operating profit", "label_normalized": "Operating profit", "row_type": "subtotal", "indent_level": 0, "canonical_key": "operating_profit", "values": [{"normalized": 4500.0, "raw": "4,500"}, {"normalized": 3850.0, "raw": "3,850"}]}, + {"label_raw": "Tax expense", "label_normalized": "Tax expense", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": -1350.0, "raw": "-1,350"}, {"normalized": -1155.0, "raw": "-1,155"}]}, + {"label_raw": "Net income", "label_normalized": "Net income", "row_type": "total", "indent_level": 0, "canonical_key": "net_income", "values": [{"normalized": 3150.0, "raw": "3,150"}, {"normalized": 2695.0, "raw": "2,695"}]}, + ], + }, + "cash_flow": { + "statement_metadata": { + "statement_title": "Consolidated Cash Flow Statement", + "statement_title_raw": "Consolidated Cash Flow Statement", + "currency": "AUD", "currency_symbol": "$", "unit": "millions", + }, + "columns": [{"label": "Item"}, {"label": "FY2024"}, {"label": "FY2023"}], + "rows": [ + {"label_raw": "OPERATING ACTIVITIES", "label_normalized": "Operating Activities", "row_type": "section_header", "indent_level": 0, "values": []}, + {"label_raw": "Net income", "label_normalized": "Net income", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": 3150.0, "raw": "3,150"}, {"normalized": 2695.0, "raw": "2,695"}]}, + {"label_raw": "Depreciation and amortisation", "label_normalized": "Depreciation and amortisation", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": 850.0, "raw": "850"}, {"normalized": 790.0, "raw": "790"}]}, + {"label_raw": "Net cash from operating activities", "label_normalized": "Net cash from operating activities", "row_type": "subtotal", "indent_level": 0, "values": [{"normalized": 5200.0, "raw": "5,200"}, {"normalized": 4800.0, "raw": "4,800"}]}, + {"label_raw": "INVESTING ACTIVITIES", "label_normalized": "Investing Activities", "row_type": "section_header", "indent_level": 0, "values": []}, + {"label_raw": "Capital expenditure", "label_normalized": "Capital expenditure", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": -920.0, "raw": "-920"}, {"normalized": -780.0, "raw": "-780"}]}, + {"label_raw": "Net cash from investing", "label_normalized": "Net cash from investing", "row_type": "subtotal", "indent_level": 0, "values": [{"normalized": -920.0, "raw": "-920"}, {"normalized": -780.0, "raw": "-780"}]}, + {"label_raw": "FINANCING ACTIVITIES", "label_normalized": "Financing Activities", "row_type": "section_header", "indent_level": 0, "values": []}, + {"label_raw": "Dividends paid", "label_normalized": "Dividends paid", "row_type": "line_item", "indent_level": 1, "values": [{"normalized": -2060.0, "raw": "-2,060"}, {"normalized": -1850.0, "raw": "-1,850"}]}, + {"label_raw": "Net cash from financing", "label_normalized": "Net cash from financing", "row_type": "subtotal", "indent_level": 0, "values": [{"normalized": -3060.0, "raw": "-3,060"}, {"normalized": -2850.0, "raw": "-2,850"}]}, + {"label_raw": "Net increase in cash", "label_normalized": "Net increase in cash", "row_type": "total", "indent_level": 0, "values": [{"normalized": 1220.0, "raw": "1,220"}, {"normalized": 1170.0, "raw": "1,170"}]}, + ], + }, +} + +output = os.path.join(os.path.dirname(__file__), "..", "..", "Sample_Financial_Report.xlsx") +build_professional_excel(sample_result, output, title="Sample Bank \u2014 Financial Statement Review") +print(f"Generated: {output}") +print(f"Size: {os.path.getsize(output):,} bytes") diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_all_samples.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_all_samples.py new file mode 100644 index 000000000..f320c0010 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_all_samples.py @@ -0,0 +1,49 @@ +"""Run extraction pipeline on all sample PDFs and report results.""" +import os, sys, logging + +sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..")) + +# Load env vars from .env file (never hardcode keys) +from dotenv import load_dotenv +load_dotenv(os.path.join(os.path.dirname(__file__), "..", ".env"), override=False) +os.environ.setdefault("AZURE_CU_ENDPOINT", "https://placeholder.cognitiveservices.azure.com") +os.environ.setdefault("AZURE_OPENAI_DEPLOYMENT", "gpt-4.1") +os.environ.setdefault("AZURE_OPENAI_API_VERSION", "2024-12-01-preview") + +logging.basicConfig(level=logging.WARNING, format="%(message)s") + +from extractor.stages.contracts import PipelineOptions +from extractor.pipeline import run as run_pipeline + +SAMPLES_DIR = os.path.join(os.path.dirname(__file__), "..", "..", "docs", "samples") + +pdfs = [f for f in os.listdir(SAMPLES_DIR) if f.endswith(".pdf")] + +print(f"Testing {len(pdfs)} sample PDFs\n") +print(f"{'PDF':<55} {'BS':>5} {'IS':>5} {'CF':>5} {'Status'}") +print("-" * 85) + +for pdf in sorted(pdfs): + pdf_path = os.path.join(SAMPLES_DIR, pdf) + short_name = pdf[:52] + "..." if len(pdf) > 55 else pdf + + try: + result = run_pipeline(pdf_path, PipelineOptions()) + bs = result.get("balance_sheet") + is_ = result.get("income_statement") + cf = result.get("cash_flow") + + bs_rows = len(bs.get("rows", [])) if bs else 0 + is_rows = len(is_.get("rows", [])) if is_ else 0 + cf_rows = len(cf.get("rows", [])) if cf else 0 + + missing = [] + if bs_rows == 0: missing.append("BS") + if is_rows == 0: missing.append("IS") + if cf_rows == 0: missing.append("CF") + + status = "OK" if not missing else f"MISSING: {','.join(missing)}" + print(f"{short_name:<55} {bs_rows:>5} {is_rows:>5} {cf_rows:>5} {status}") + + except Exception as e: + print(f"{short_name:<55} {'ERR':>5} {'ERR':>5} {'ERR':>5} {str(e)[:30]}") diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_analyze_textract.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_analyze_textract.py new file mode 100644 index 000000000..1aa9ee342 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_analyze_textract.py @@ -0,0 +1,209 @@ +""" +tests/test_analyze_textract.py +------------------------------- +Tests for Stage 1 (Analyze) with Textract backend. + +Verifies: + 1. Textract backend returns correct AnalyzeResult with mocked dependencies + 2. Default PipelineOptions still uses "cu" backend + +Import strategy: loads modules inside fixtures/tests using importlib, +saving and restoring sys.modules to avoid polluting other test files. +This avoids the eager extractor/__init__.py import that requires +AZURE_CU_ENDPOINT. +""" + +import importlib +import importlib.util +import json +import pathlib +import sys +import types +from unittest.mock import patch, MagicMock + +import pytest + + +# --------------------------------------------------------------------------- +# Paths +# --------------------------------------------------------------------------- + +_STAGES_DIR = pathlib.Path(__file__).parent.parent / "extractor" / "stages" +_EXTRACTOR_DIR = pathlib.Path(__file__).parent.parent / "extractor" +_FIXTURE_PATH = pathlib.Path(__file__).parent / "fixtures" / "textract_sample_response.json" + + +# --------------------------------------------------------------------------- +# Context manager to temporarily inject modules for relative imports +# --------------------------------------------------------------------------- + +class _StagesImportContext: + """ + Temporarily injects extractor and extractor.stages into sys.modules + so that analyze.py's relative import (from .contracts import ...) works. + Restores original sys.modules state on exit. + """ + KEYS = [ + "extractor", + "extractor.stages", + "extractor.stages.contracts", + "extractor.stages.analyze", + ] + + def __enter__(self): + self._saved = {k: sys.modules[k] for k in self.KEYS if k in sys.modules} + + # Minimal extractor package + extractor_pkg = types.ModuleType("extractor") + extractor_pkg.__path__ = [str(_EXTRACTOR_DIR)] + extractor_pkg.__package__ = "extractor" + sys.modules["extractor"] = extractor_pkg + + # stages sub-package + stages_spec = importlib.util.spec_from_file_location( + "extractor.stages", _STAGES_DIR / "__init__.py", + ) + stages_mod = importlib.util.module_from_spec(stages_spec) + stages_mod.__path__ = [str(_STAGES_DIR)] + stages_mod.__package__ = "extractor.stages" + sys.modules["extractor.stages"] = stages_mod + stages_spec.loader.exec_module(stages_mod) + + # contracts + contracts_spec = importlib.util.spec_from_file_location( + "extractor.stages.contracts", _STAGES_DIR / "contracts.py", + ) + self.contracts = importlib.util.module_from_spec(contracts_spec) + sys.modules["extractor.stages.contracts"] = self.contracts + contracts_spec.loader.exec_module(self.contracts) + + # analyze + analyze_spec = importlib.util.spec_from_file_location( + "extractor.stages.analyze", _STAGES_DIR / "analyze.py", + ) + self.analyze = importlib.util.module_from_spec(analyze_spec) + sys.modules["extractor.stages.analyze"] = self.analyze + analyze_spec.loader.exec_module(self.analyze) + + return self + + def __exit__(self, *exc): + # Restore original modules + for key in self.KEYS: + if key in self._saved: + sys.modules[key] = self._saved[key] + elif key in sys.modules: + del sys.modules[key] + return False + + +# --------------------------------------------------------------------------- +# Fixtures +# --------------------------------------------------------------------------- + +@pytest.fixture(scope="module") +def textract_response() -> dict: + with open(_FIXTURE_PATH, "r", encoding="utf-8") as f: + return json.load(f) + + +MOCK_LLM_CLASSIFICATIONS = [{ + "statement_type": "balance_sheet", + "title_raw": "CONSOLIDATED BALANCE SHEET", + "currency": "USD", + "unit": "millions", + "accounting_standard": "US_GAAP", + "is_consolidated": True, + "report_language": "en", + "company_name": "Test Corp", +}] + + +# --------------------------------------------------------------------------- +# Tests +# --------------------------------------------------------------------------- + +class TestTextractBackendReturnsAnalyzeResult: + """Test that Textract backend produces a valid AnalyzeResult.""" + + def test_textract_backend_returns_analyze_result(self, textract_response): + """Mock textract_client and classify_statements_with_llm, + call run_analyze with backend='textract', verify result.""" + + with _StagesImportContext() as ctx: + PipelineOptions = ctx.contracts.PipelineOptions + AnalyzeResult = ctx.contracts.AnalyzeResult + run_analyze = ctx.analyze.run_analyze + + # Load the real adapter for reconstruct_markdown and build_page_map + adapter_path = _EXTRACTOR_DIR / "textract_adapter.py" + adapter_spec = importlib.util.spec_from_file_location( + "extractor.textract_adapter", adapter_path, + ) + adapter_mod = importlib.util.module_from_spec(adapter_spec) + sys.modules["extractor.textract_adapter"] = adapter_mod + adapter_spec.loader.exec_module(adapter_mod) + + # Mock textract_client.analyze_document to return fixture + mock_textract_client = MagicMock() + mock_textract_client.analyze_document = MagicMock( + return_value=textract_response + ) + sys.modules["extractor.textract_client"] = mock_textract_client + + try: + # Mock classify_statements_with_llm to return known classifications + with patch.object( + adapter_mod, + "classify_statements_with_llm", + return_value=MOCK_LLM_CLASSIFICATIONS, + ): + options = PipelineOptions(backend="textract") + result = run_analyze("/fake/path.pdf", options) + + # Verify result type + assert isinstance(result, AnalyzeResult) + + # Verify candidates + assert len(result.candidates) == 1 + c = result.candidates[0] + assert c.statement_type == "balance_sheet" + assert c.currency == "USD" + assert c.company_name == "Test Corp" + assert c.unit == "millions" + assert c.accounting_standard == "US_GAAP" + assert c.is_consolidated is True + assert c.report_language == "en" + assert c.title_raw == "CONSOLIDATED BALANCE SHEET" + # title_english mirrors title_raw for Textract path + assert c.title_english == "CONSOLIDATED BALANCE SHEET" + + # Verify markdown contains expected content + assert "" in result.markdown + assert "Total Assets" in result.markdown + + # Verify page_map has entries + assert len(result.page_map) >= 1 + + # Verify Textract-specific defaults + assert result.pages == [] + assert result.enrichment_lookup == {} + + finally: + # Clean up adapter and client from sys.modules + sys.modules.pop("extractor.textract_adapter", None) + sys.modules.pop("extractor.textract_client", None) + + +class TestCuBackendDefault: + """Test that default PipelineOptions uses CU backend.""" + + def test_cu_backend_still_works_with_default(self): + with _StagesImportContext() as ctx: + options = ctx.contracts.PipelineOptions() + assert options.backend == "cu" + + def test_textract_backend_option(self): + with _StagesImportContext() as ctx: + options = ctx.contracts.PipelineOptions(backend="textract") + assert options.backend == "textract" diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_caching.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_caching.py new file mode 100644 index 000000000..973df597a --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_caching.py @@ -0,0 +1,80 @@ +""" +Tests for PDF hash caching in main.py — covers: + Fix 3: Skip Azure CU API call when cached result exists for the same PDF. +""" + +import json +import hashlib +from pathlib import Path +from unittest.mock import patch, MagicMock + +from main import _pdf_hash, _load_cached_result, _save_cached_result + + +class TestPdfHash: + """_pdf_hash produces a stable SHA-256 hex digest of file contents.""" + + def test_same_content_same_hash(self, tmp_path): + """Two files with identical content produce the same hash.""" + f1 = tmp_path / "a.pdf" + f2 = tmp_path / "b.pdf" + f1.write_bytes(b"identical content") + f2.write_bytes(b"identical content") + assert _pdf_hash(f1) == _pdf_hash(f2) + + def test_different_content_different_hash(self, tmp_path): + """Two files with different content produce different hashes.""" + f1 = tmp_path / "a.pdf" + f2 = tmp_path / "b.pdf" + f1.write_bytes(b"content A") + f2.write_bytes(b"content B") + assert _pdf_hash(f1) != _pdf_hash(f2) + + def test_hash_is_sha256(self, tmp_path): + """The hash matches Python's hashlib.sha256.""" + f = tmp_path / "test.pdf" + content = b"test data" + f.write_bytes(content) + expected = hashlib.sha256(content).hexdigest() + assert _pdf_hash(f) == expected + + +class TestCacheRoundTrip: + """_save_cached_result and _load_cached_result form a cache round-trip.""" + + def test_save_then_load(self, tmp_path, monkeypatch): + """A saved result can be loaded back for the same PDF.""" + # Point CACHE_DIR to tmp_path. + monkeypatch.setattr("main.CACHE_DIR", tmp_path / ".cache") + + pdf = tmp_path / "report.pdf" + pdf.write_bytes(b"PDF bytes here") + result = {"status": "succeeded", "contents": [{"markdown": "hello"}]} + + _save_cached_result(pdf, result) + loaded = _load_cached_result(pdf) + + assert loaded == result + + def test_cache_miss_returns_none(self, tmp_path, monkeypatch): + """When no cache exists, returns None.""" + monkeypatch.setattr("main.CACHE_DIR", tmp_path / ".cache") + + pdf = tmp_path / "new_report.pdf" + pdf.write_bytes(b"never cached") + + assert _load_cached_result(pdf) is None + + def test_different_pdf_misses(self, tmp_path, monkeypatch): + """Cache for PDF-A does not match PDF-B.""" + monkeypatch.setattr("main.CACHE_DIR", tmp_path / ".cache") + + pdf_a = tmp_path / "a.pdf" + pdf_b = tmp_path / "b.pdf" + pdf_a.write_bytes(b"content A") + pdf_b.write_bytes(b"content B") + + result = {"status": "succeeded"} + _save_cached_result(pdf_a, result) + + assert _load_cached_result(pdf_b) is None diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_card_builder.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_card_builder.py new file mode 100644 index 000000000..13428366c --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_card_builder.py @@ -0,0 +1,616 @@ +"""Tests for Adaptive Card builder module.""" + +import json +import pytest +from extractor.card_builder import ( + build_navigator_card, + build_statement_review_card, + parse_card_submission, + init_session_state, + advance_session_state, +) + + +# ── helpers ────────────────────────────────────────────────────────────────── + + +def _make_test_statement(num_rows=5, columns=None): + cols = columns or [ + {"column_index": 0, "label": "2025"}, + {"column_index": 1, "label": "2024"}, + ] + rows = [] + for i in range(num_rows): + rows.append( + { + "row_index": i, + "label_raw": f"Item {i}", + "row_type": "line_item", + "indent_level": 0, + "section": "current_assets", + "canonical_group": "assets", + "values": [ + { + "raw": str((i + 1) * 100), + "normalized": float((i + 1) * 100), + "is_null": False, + "column_index": j, + } + for j in range(len(cols)) + ], + } + ) + return { + "rows": rows, + "columns": cols, + "statement_metadata": {"page_range": {"start": 1, "end": 2}}, + "validation": {"warnings": [], "errors": []}, + } + + +def _make_test_confidence(level="medium", flagged=None): + return { + "score": {"high": 0.92, "medium": 0.72, "low": 0.45}[level], + "level": level, + "flagged_rows": flagged or [], + "signals": {}, + } + + +def _make_summary(): + return [ + {"statement": "balance_sheet", "rows": 42}, + {"statement": "income_statement", "rows": 20}, + {"statement": "cash_flow", "rows": 15}, + ] + + +def _make_confidence_map(): + return { + "balance_sheet": { + "score": 0.72, + "level": "medium", + "flagged_rows": [5, 12], + "signals": {}, + }, + "income_statement": { + "score": 0.92, + "level": "high", + "flagged_rows": [], + "signals": {}, + }, + "cash_flow": { + "score": 0.45, + "level": "low", + "flagged_rows": [1, 3, 7], + "signals": {}, + }, + } + + +# ── helpers for card introspection ─────────────────────────────────────────── + + +def _collect_texts(obj): + """Recursively collect all 'text' values in a nested dict/list structure.""" + texts = [] + if isinstance(obj, dict): + if "text" in obj: + texts.append(obj["text"]) + for v in obj.values(): + texts.extend(_collect_texts(v)) + elif isinstance(obj, list): + for item in obj: + texts.extend(_collect_texts(item)) + return texts + + +def _collect_elements(obj, element_type): + """Recursively collect all elements matching a given 'type'.""" + results = [] + if isinstance(obj, dict): + if obj.get("type") == element_type: + results.append(obj) + for v in obj.values(): + results.extend(_collect_elements(v, element_type)) + elif isinstance(obj, list): + for item in obj: + results.extend(_collect_elements(item, element_type)) + return results + + +def _collect_actions(obj): + """Recursively collect all action objects (type starts with 'Action.').""" + results = [] + if isinstance(obj, dict): + t = obj.get("type", "") + if isinstance(t, str) and t.startswith("Action."): + results.append(obj) + for v in obj.values(): + results.extend(_collect_actions(v)) + elif isinstance(obj, list): + for item in obj: + results.extend(_collect_actions(item)) + return results + + +# ═══════════════════════════════════════════════════════════════════════════════ +# Navigator card tests +# ═══════════════════════════════════════════════════════════════════════════════ + + +class TestNavigatorCard: + def test_navigator_card_structure(self): + card = build_navigator_card("Acme", "GBP", "thousands", _make_confidence_map(), _make_summary()) + assert card["$schema"] == "http://adaptivecards.io/schemas/adaptive-card.json" + assert card["type"] == "AdaptiveCard" + assert card["version"] == "1.5" + assert "body" in card + assert "actions" in card + + def test_navigator_shows_all_statements(self): + card = build_navigator_card("Acme", "GBP", "thousands", _make_confidence_map(), _make_summary()) + texts = _collect_texts(card) + all_text = " ".join(texts) + assert "Balance Sheet" in all_text + assert "Income Statement" in all_text + assert "Cash Flow" in all_text + + def test_navigator_high_confidence_shows_checkmark(self): + card = build_navigator_card("Acme", "GBP", "thousands", _make_confidence_map(), _make_summary()) + texts = _collect_texts(card) + found = [t for t in texts if "High" in t and "\u2713" in t] + assert len(found) >= 1, "Expected at least one '✓ High' text" + + def test_navigator_medium_shows_warning(self): + card = build_navigator_card("Acme", "GBP", "thousands", _make_confidence_map(), _make_summary()) + texts = _collect_texts(card) + found = [t for t in texts if "Medium" in t and "\u26a0" in t and "issues" in t.lower()] + assert len(found) >= 1, "Expected at least one '⚠ Medium (N issues)' text" + + def test_navigator_actions(self): + card = build_navigator_card("Acme", "GBP", "thousands", _make_confidence_map(), _make_summary()) + actions = card.get("actions", []) + titles = [a["title"] for a in actions] + assert "Start Review" in titles + assert "Skip Review & Generate Excel" in titles + + +# ═══════════════════════════════════════════════════════════════════════════════ +# Statement review card tests +# ═══════════════════════════════════════════════════════════════════════════════ + + +class TestStatementReviewCard: + def test_review_card_has_header(self): + stmt = _make_test_statement() + conf = _make_test_confidence("medium", [0]) + card = build_statement_review_card("balance_sheet", stmt, conf, {}, 1, 3) + texts = _collect_texts(card) + all_text = " ".join(texts) + assert "Balance Sheet" in all_text + assert "Medium" in all_text + + def test_flagged_rows_have_input_fields(self): + stmt = _make_test_statement() + conf = _make_test_confidence("medium", [0, 1]) + card = build_statement_review_card("balance_sheet", stmt, conf, {}, 1, 3) + inputs = _collect_elements(card, "Input.Text") + # flagged rows 0 and 1 should each have label + N value inputs + ids = [inp["id"] for inp in inputs] + assert "row_0_label" in ids + assert "row_1_label" in ids + assert "row_0_val_0" in ids + + def test_flagged_rows_have_section_dropdown(self): + stmt = _make_test_statement() + conf = _make_test_confidence("medium", [0]) + card = build_statement_review_card("balance_sheet", stmt, conf, {}, 1, 3) + choice_sets = _collect_elements(card, "Input.ChoiceSet") + ids = [cs["id"] for cs in choice_sets] + assert "row_0_section" in ids + + def test_flagged_row_ids_match_pattern(self): + stmt = _make_test_statement() + conf = _make_test_confidence("medium", [2]) + card = build_statement_review_card("balance_sheet", stmt, conf, {}, 1, 3) + inputs = _collect_elements(card, "Input.Text") + choice_sets = _collect_elements(card, "Input.ChoiceSet") + all_ids = [el["id"] for el in inputs + choice_sets] + # All IDs for row 2 should start with "row_2_" + row2_ids = [i for i in all_ids if i.startswith("row_2_")] + assert len(row2_ids) >= 3 # label + val_0 + val_1 at minimum + + def test_rows_in_original_order(self): + """Rows render in row_index order, not flagged-first.""" + stmt = _make_test_statement(num_rows=6) + conf = _make_test_confidence("medium", [1, 3]) + card = build_statement_review_card("balance_sheet", stmt, conf, {}, 1, 3) + # All rows are inside a single Table element as TableRows. + # The first TableRow is the header; data rows follow in order. + tables = _collect_elements(card, "Table") + assert len(tables) == 1, "Expected exactly one Table element" + data_rows = tables[0]["rows"][1:] # skip header row + row_order = [] + for tr in data_rows: + # Editable rows have Input.Text with id row_N_label + inputs = _collect_elements(tr, "Input.Text") + for inp in inputs: + m = __import__("re").match(r"row_(\d+)_label", inp.get("id", "")) + if m: + row_order.append(int(m.group(1))) + break + else: + # Read-only rows have TextBlock text matching "Item N" + texts = _collect_texts(tr) + for t in texts: + if t.startswith("Item "): + try: + row_order.append(int(t.split(" ")[1])) + except (ValueError, IndexError): + pass + break + assert row_order == [0, 1, 2, 3, 4, 5], f"Expected rows in order, got {row_order}" + + def test_flagged_editable_clean_readonly(self): + """Flagged rows have Input fields, clean rows only have TextBlocks.""" + stmt = _make_test_statement(num_rows=5) + conf = _make_test_confidence("medium", [1, 3]) + card = build_statement_review_card("balance_sheet", stmt, conf, {}, 1, 3) + inputs = _collect_elements(card, "Input.Text") + ids = [inp["id"] for inp in inputs] + # Flagged rows 1 and 3 should have input fields + assert "row_1_label" in ids + assert "row_3_label" in ids + # Clean rows 0, 2, 4 should NOT have input fields + assert "row_0_label" not in ids + assert "row_2_label" not in ids + assert "row_4_label" not in ids + + def test_high_confidence_readonly(self): + stmt = _make_test_statement() + conf = _make_test_confidence("high") + card = build_statement_review_card("balance_sheet", stmt, conf, {}, 1, 3, editable=False) + inputs = _collect_elements(card, "Input.Text") + # Only the hidden 'action' input should exist — no editable row inputs + visible_inputs = [i for i in inputs if i.get("isVisible") is not False] + assert len(visible_inputs) == 0, "Read-only card should have no visible Input.Text elements" + text_blocks = _collect_elements(card, "TextBlock") + assert len(text_blocks) > 0 + + def test_high_confidence_has_edit_anyway_button(self): + stmt = _make_test_statement() + conf = _make_test_confidence("high") + card = build_statement_review_card("balance_sheet", stmt, conf, {}, 1, 3, editable=False) + actions = _collect_actions(card) + titles = [a.get("title", "") for a in actions] + assert "Edit Anyway" in titles + + def test_edit_all_button_present(self): + """Standard editable mode should have an Edit All button.""" + stmt = _make_test_statement() + conf = _make_test_confidence("medium", [0]) + card = build_statement_review_card("balance_sheet", stmt, conf, {}, 1, 3) + actions = card.get("actions", []) + titles = [a["title"] for a in actions] + assert "Edit All" in titles + + def test_edit_all_all_rows_editable(self): + """edit_all=True makes every row editable.""" + stmt = _make_test_statement(num_rows=5) + conf = _make_test_confidence("medium", [0]) + card = build_statement_review_card("balance_sheet", stmt, conf, {}, 1, 3, edit_all=True) + inputs = _collect_elements(card, "Input.Text") + ids = [inp["id"] for inp in inputs] + for i in range(5): + assert f"row_{i}_label" in ids, f"row_{i}_label missing in edit_all mode" + + def test_edit_all_pagination(self): + """With 40 rows, edit_all page 1 should render only rows 20-39.""" + stmt = _make_test_statement(num_rows=40) + conf = _make_test_confidence("medium", [0]) + card = build_statement_review_card("balance_sheet", stmt, conf, {}, 1, 3, edit_all=True, edit_all_page=1) + inputs = _collect_elements(card, "Input.Text") + ids = [inp["id"] for inp in inputs] + # Page 1 should have rows 20-39 + assert "row_20_label" in ids + assert "row_39_label" in ids + # Page 0 rows should not be present + assert "row_0_label" not in ids + assert "row_19_label" not in ids + + def test_edit_all_page_navigation_actions(self): + """Edit All with multiple pages should have page navigation.""" + stmt = _make_test_statement(num_rows=40) + conf = _make_test_confidence("medium", [0]) + # Page 0 of 2 — should have "Page →" but no "← Page" + card = build_statement_review_card("balance_sheet", stmt, conf, {}, 1, 3, edit_all=True, edit_all_page=0) + actions = card.get("actions", []) + action_data = [a.get("data", {}).get("action") for a in actions] + assert "edit_all_page_next" in action_data + assert "edit_all_page_prev" not in action_data + + def test_corrections_prepopulate_values(self): + stmt = _make_test_statement() + conf = _make_test_confidence("medium", [0]) + corrections = {"row_0": {"label": "Trade debtors"}} + card = build_statement_review_card("balance_sheet", stmt, conf, corrections, 1, 3) + inputs = _collect_elements(card, "Input.Text") + label_input = [inp for inp in inputs if inp["id"] == "row_0_label"] + assert len(label_input) == 1 + assert label_input[0]["value"] == "Trade debtors" + + def test_navigation_actions_step1(self): + stmt = _make_test_statement() + conf = _make_test_confidence("medium", [0]) + card = build_statement_review_card("balance_sheet", stmt, conf, {}, 1, 3) + actions = card.get("actions", []) + titles = [a["title"] for a in actions] + assert any("Next" in t for t in titles) + assert not any("Previous" in t for t in titles) + + def test_navigation_actions_step2(self): + stmt = _make_test_statement() + conf = _make_test_confidence("medium", [0]) + card = build_statement_review_card("income_statement", stmt, conf, {}, 2, 3) + actions = card.get("actions", []) + titles = [a["title"] for a in actions] + assert any("Previous" in t for t in titles) + assert any("Next" in t for t in titles) + + def test_navigation_actions_last_step(self): + stmt = _make_test_statement() + conf = _make_test_confidence("medium", [0]) + card = build_statement_review_card("cash_flow", stmt, conf, {}, 3, 3) + actions = card.get("actions", []) + titles = [a["title"] for a in actions] + assert any("Previous" in t for t in titles) + assert any("Submit" in t for t in titles) + assert not any("Next" in t for t in titles) + + def test_empty_statement_shows_no_data(self): + stmt = _make_test_statement(num_rows=0) + conf = _make_test_confidence("medium") + card = build_statement_review_card("balance_sheet", stmt, conf, {}, 1, 3) + texts = _collect_texts(card) + all_text = " ".join(texts) + assert "No data extracted" in all_text + + +# ═══════════════════════════════════════════════════════════════════════════════ +# Parse submission tests +# ═══════════════════════════════════════════════════════════════════════════════ + + +class TestParseSubmission: + def test_parse_extracts_action(self): + stmt = _make_test_statement() + payload = {"action": "next"} + action, corrections = parse_card_submission(payload, stmt) + assert action == "next" + + def test_parse_detects_changed_label(self): + stmt = _make_test_statement() + payload = {"action": "next", "row_0_label": "Trade debtors"} + action, corrections = parse_card_submission(payload, stmt) + assert "row_0" in corrections + assert corrections["row_0"]["label"] == "Trade debtors" + + def test_parse_ignores_unchanged_values(self): + stmt = _make_test_statement() + # Original label for row 0 is "Item 0", original val_0 is "100" + payload = {"action": "next", "row_0_label": "Item 0", "row_0_val_0": "100"} + action, corrections = parse_card_submission(payload, stmt) + assert "row_0" not in corrections + + def test_parse_detects_changed_value(self): + stmt = _make_test_statement() + payload = {"action": "next", "row_0_val_0": "999"} + action, corrections = parse_card_submission(payload, stmt) + assert "row_0" in corrections + assert corrections["row_0"]["val_0"] == "999" + + def test_parse_detects_changed_section(self): + stmt = _make_test_statement() + payload = {"action": "next", "row_0_section": "equity"} + action, corrections = parse_card_submission(payload, stmt) + assert "row_0" in corrections + assert corrections["row_0"]["section"] == "equity" + + def test_parse_empty_payload_no_corrections(self): + stmt = _make_test_statement() + payload = {"action": "next"} + action, corrections = parse_card_submission(payload, stmt) + assert corrections == {} + + +# ═══════════════════════════════════════════════════════════════════════════════ +# Session state tests +# ═══════════════════════════════════════════════════════════════════════════════ + + +class TestInitSessionState: + def test_basic_init(self): + conf = _make_confidence_map() + state_str = init_session_state(conf, job_id="job-123") + state = json.loads(state_str) + assert state["jobId"] == "job-123" + assert state["phase"] == "navigator" + assert state["step"] == 0 + assert state["statements"] == ["balance_sheet", "income_statement", "cash_flow"] + assert state["corrections"] == {} + + def test_filters_not_found(self): + conf = _make_confidence_map() + conf["cash_flow"]["level"] = "not_found" + state = json.loads(init_session_state(conf)) + assert "cash_flow" not in state["statements"] + assert state["statements"] == ["balance_sheet", "income_statement"] + + def test_filters_missing_key(self): + conf = { + "balance_sheet": {"score": 0.9, "level": "high", "flagged_rows": []}, + } + state = json.loads(init_session_state(conf)) + assert state["statements"] == ["balance_sheet"] + + def test_empty_confidence(self): + state = json.loads(init_session_state({})) + assert state["statements"] == [] + + def test_available_statements_overrides_confidence(self): + """When available_statements is provided, confidence levels are ignored.""" + conf = _make_confidence_map() # all 3 have scores + # Only BS and CF actually have data + state = json.loads(init_session_state( + conf, available_statements=["balance_sheet", "cash_flow"] + )) + assert state["statements"] == ["balance_sheet", "cash_flow"] + + def test_available_statements_empty(self): + conf = _make_confidence_map() + state = json.loads(init_session_state(conf, available_statements=[])) + assert state["statements"] == [] + + def test_available_statements_preserves_canonical_order(self): + """Even if available_statements is passed in reverse, output follows STATEMENT_ORDER.""" + conf = _make_confidence_map() + state = json.loads(init_session_state( + conf, available_statements=["cash_flow", "balance_sheet"] + )) + assert state["statements"] == ["balance_sheet", "cash_flow"] + + +class TestAdvanceSessionState: + def _init(self, stmts=None): + """Create a review-ready state at step 1.""" + stmts = stmts or ["balance_sheet", "income_statement", "cash_flow"] + return json.dumps({ + "jobId": "j1", + "phase": "review", + "step": 1, + "statements": stmts, + "corrections": {}, + "editable": False, + "editAll": False, + "editAllPage": 0, + }) + + def _nav(self): + return json.dumps({ + "jobId": "j1", + "phase": "navigator", + "step": 0, + "statements": ["balance_sheet", "income_statement"], + "corrections": {}, + "editable": False, + "editAll": False, + "editAllPage": 0, + }) + + # ── Navigator phase ── + + def test_skip_review(self): + action, state_str = advance_session_state(self._nav(), {"action": "skip_review"}) + assert action == "skip" + + def test_start_review(self): + action, state_str = advance_session_state(self._nav(), {"action": "start_review"}) + assert action == "continue" + state = json.loads(state_str) + assert state["phase"] == "review" + assert state["step"] == 1 + + # ── Review phase: navigation ── + + def test_next_advances_step(self): + action, state_str = advance_session_state(self._init(), {"action": "next"}) + assert action == "continue" + state = json.loads(state_str) + assert state["step"] == 2 + + def test_next_past_last_returns_done(self): + s = json.loads(self._init()) + s["step"] = 3 # last of 3 statements + action, state_str = advance_session_state(json.dumps(s), {"action": "next"}) + assert action == "done" + state = json.loads(state_str) + assert state["phase"] == "export" + + def test_previous_decrements_step(self): + s = json.loads(self._init()) + s["step"] = 2 + action, state_str = advance_session_state(json.dumps(s), {"action": "previous"}) + assert action == "continue" + state = json.loads(state_str) + assert state["step"] == 1 + + def test_previous_at_step1_stays(self): + action, state_str = advance_session_state(self._init(), {"action": "previous"}) + state = json.loads(state_str) + assert state["step"] == 1 + + def test_submit_returns_done(self): + action, _ = advance_session_state(self._init(), {"action": "submit"}) + assert action == "done" + + # ── Review phase: edit toggles ── + + def test_edit_enables_editable(self): + action, state_str = advance_session_state(self._init(), {"action": "edit"}) + assert action == "continue" + state = json.loads(state_str) + assert state["editable"] is True + assert state["editAll"] is False + + def test_edit_all_enables_both(self): + action, state_str = advance_session_state(self._init(), {"action": "edit_all"}) + state = json.loads(state_str) + assert state["editable"] is True + assert state["editAll"] is True + + def test_edit_all_page_next(self): + s = json.loads(self._init()) + s["editAll"] = True + s["editAllPage"] = 0 + action, state_str = advance_session_state(json.dumps(s), {"action": "edit_all_page_next"}) + state = json.loads(state_str) + assert state["editAllPage"] == 1 + + def test_edit_all_page_prev_floors_at_zero(self): + s = json.loads(self._init()) + s["editAllPage"] = 0 + action, state_str = advance_session_state(json.dumps(s), {"action": "edit_all_page_prev"}) + state = json.loads(state_str) + assert state["editAllPage"] == 0 + + # ── Corrections accumulation ── + + def test_corrections_captured(self): + stmt = _make_test_statement() + payload = {"action": "next", "row_0_label": "Trade debtors"} + action, state_str = advance_session_state(self._init(), payload, stmt) + state = json.loads(state_str) + assert "balance_sheet" in state["corrections"] + assert state["corrections"]["balance_sheet"]["row_0"]["label"] == "Trade debtors" + + def test_corrections_accumulate_across_calls(self): + stmt = _make_test_statement() + # First call: correct row 0 + _, state_str = advance_session_state( + self._init(), {"action": "edit", "row_0_label": "Corrected"}, stmt + ) + # Second call on same step: correct row 1 + _, state_str = advance_session_state( + state_str, {"action": "next", "row_1_val_0": "999"}, stmt + ) + state = json.loads(state_str) + bs = state["corrections"]["balance_sheet"] + assert bs["row_0"]["label"] == "Corrected" + assert bs["row_1"]["val_0"] == "999" + + # ── Unknown action fallback ── + + def test_unknown_action_returns_continue(self): + action, _ = advance_session_state(self._init(), {"action": "bogus"}) + assert action == "continue" diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_confidence_scorer.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_confidence_scorer.py new file mode 100644 index 000000000..69c268a8f --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_confidence_scorer.py @@ -0,0 +1,656 @@ +""" +Tests for extractor/confidence_scorer.py — covers: + - Each of the 6 signal functions independently + - Composite score_statement (perfect statement → high, broken → low) + - flag_rows (each flag condition) + - Boundary conditions (exactly 5% empty labels → 1.0) +""" + +import pytest + +from extractor.confidence_scorer import ( + score_statement, + flag_rows, + _score_subtotal_validation, + _score_section_coverage, + _score_row_count, + _score_column_dates, + _score_empty_label_ratio, + _score_leaked_headers, + WEIGHTS, + THRESHOLDS, + ROW_RANGES, + SECTION_GROUPS, +) + + +# --------------------------------------------------------------------------- +# Fixture helpers +# --------------------------------------------------------------------------- + +def _row( + row_index, + label_raw="Cash", + row_type="line_item", + section="current_assets", + values=None, +): + """Build a minimal row dict for testing.""" + if values is None: + values = [{"raw": "100", "normalized": 100.0, "is_null": False}] + return { + "row_index": row_index, + "label_raw": label_raw, + "row_type": row_type, + "section": section, + "values": values, + } + + +def _make_statement(rows=None, columns=None, warnings=None): + """Build a minimal statement dict.""" + if rows is None: + rows = [] + if columns is None: + columns = [{"column_index": 0, "label": "2025"}] + if warnings is None: + warnings = [] + return { + "rows": rows, + "columns": columns, + "validation": {"warnings": warnings, "errors": []}, + } + + +# --------------------------------------------------------------------------- +# _score_subtotal_validation +# --------------------------------------------------------------------------- + +class TestScoreSubtotalValidation: + + def test_no_subtotals_returns_1(self): + stmt = _make_statement(rows=[_row(0), _row(1)]) + assert _score_subtotal_validation(stmt) == 1.0 + + def test_all_subtotals_pass(self): + rows = [ + _row(0, row_type="subtotal"), + _row(1, row_type="subtotal"), + ] + stmt = _make_statement(rows=rows, warnings=[]) + assert _score_subtotal_validation(stmt) == 1.0 + + def test_all_subtotals_fail(self): + rows = [ + _row(0, row_type="subtotal"), + _row(1, row_type="subtotal"), + ] + warnings = [ + {"row_index": 0, "column_index": 0, "difference": 50}, + {"row_index": 1, "column_index": 0, "difference": 20}, + ] + stmt = _make_statement(rows=rows, warnings=warnings) + assert _score_subtotal_validation(stmt) == 0.0 + + def test_partial_subtotals_fail(self): + rows = [ + _row(0, row_type="subtotal"), + _row(1, row_type="subtotal"), + _row(2, row_type="subtotal"), + _row(3, row_type="subtotal"), + ] + # Only row_index 0 fails + warnings = [{"row_index": 0, "column_index": 0, "difference": 50}] + stmt = _make_statement(rows=rows, warnings=warnings) + assert _score_subtotal_validation(stmt) == 0.75 + + def test_warning_for_non_subtotal_row_doesnt_affect_score(self): + """Warnings on non-subtotal rows should not count against score.""" + rows = [ + _row(0, row_type="subtotal"), + _row(1, row_type="line_item"), + ] + # Warning is on the line_item row, not the subtotal + warnings = [{"row_index": 1, "column_index": 0, "difference": 10}] + stmt = _make_statement(rows=rows, warnings=warnings) + assert _score_subtotal_validation(stmt) == 1.0 + + def test_empty_rows_returns_1(self): + stmt = _make_statement(rows=[]) + assert _score_subtotal_validation(stmt) == 1.0 + + def test_multiple_warnings_same_subtotal_row_counted_once(self): + """Multiple warnings for the same row_index should count as one failure.""" + rows = [ + _row(0, row_type="subtotal"), + _row(1, row_type="subtotal"), + ] + # Two warnings for row_index=0 (different columns) + warnings = [ + {"row_index": 0, "column_index": 0, "difference": 50}, + {"row_index": 0, "column_index": 1, "difference": 30}, + ] + stmt = _make_statement(rows=rows, warnings=warnings) + assert _score_subtotal_validation(stmt) == 0.5 + + +# --------------------------------------------------------------------------- +# _score_section_coverage +# --------------------------------------------------------------------------- + +class TestScoreSectionCoverage: + + def test_balance_sheet_all_groups_present(self): + rows = [ + _row(0, section="current_assets"), + _row(1, section="non_current_assets"), + _row(2, section="current_liabilities"), + _row(3, section="equity"), + ] + stmt = _make_statement(rows=rows) + score = _score_section_coverage(stmt, "balance_sheet") + assert score == 1.0 + + def test_balance_sheet_missing_equity_group(self): + rows = [ + _row(0, section="current_assets"), + _row(1, section="non_current_assets"), + _row(2, section="current_liabilities"), + ] + stmt = _make_statement(rows=rows) + score = _score_section_coverage(stmt, "balance_sheet") + # 2 out of 3 groups matched + assert score == pytest.approx(2 / 3) + + def test_balance_sheet_no_sections(self): + stmt = _make_statement(rows=[]) + score = _score_section_coverage(stmt, "balance_sheet") + assert score == 0.0 + + def test_income_statement_all_groups(self): + rows = [ + _row(0, section="revenue"), + _row(1, section="operating_expenses"), + ] + stmt = _make_statement(rows=rows) + score = _score_section_coverage(stmt, "income_statement") + assert score == 1.0 + + def test_income_statement_expenses_alternate(self): + """'expenses' is an alternative section name for the operating_expenses group.""" + rows = [ + _row(0, section="revenue"), + _row(1, section="expenses"), + ] + stmt = _make_statement(rows=rows) + score = _score_section_coverage(stmt, "income_statement") + assert score == 1.0 + + def test_income_statement_missing_revenue(self): + rows = [_row(0, section="operating_expenses")] + stmt = _make_statement(rows=rows) + score = _score_section_coverage(stmt, "income_statement") + assert score == pytest.approx(1 / 2) + + def test_cash_flow_all_groups(self): + rows = [ + _row(0, section="operating_activities"), + _row(1, section="investing_activities"), + _row(2, section="financing_activities"), + ] + stmt = _make_statement(rows=rows) + score = _score_section_coverage(stmt, "cash_flow") + assert score == 1.0 + + def test_cash_flow_only_operating(self): + rows = [_row(0, section="operating_activities")] + stmt = _make_statement(rows=rows) + score = _score_section_coverage(stmt, "cash_flow") + assert score == pytest.approx(1 / 3) + + def test_balance_sheet_uses_assets_top_level_section(self): + """Top-level 'assets' section satisfies the first group.""" + rows = [ + _row(0, section="assets"), + _row(1, section="liabilities"), + _row(2, section="equity"), + ] + stmt = _make_statement(rows=rows) + score = _score_section_coverage(stmt, "balance_sheet") + assert score == 1.0 + + +# --------------------------------------------------------------------------- +# _score_row_count +# --------------------------------------------------------------------------- + +class TestScoreRowCount: + + def test_balance_sheet_in_range(self): + rows = [_row(i) for i in range(20)] + stmt = _make_statement(rows=rows) + assert _score_row_count(stmt, "balance_sheet") == 1.0 + + def test_balance_sheet_at_min_boundary(self): + rows = [_row(i) for i in range(15)] + stmt = _make_statement(rows=rows) + assert _score_row_count(stmt, "balance_sheet") == 1.0 + + def test_balance_sheet_at_max_boundary(self): + rows = [_row(i) for i in range(80)] + stmt = _make_statement(rows=rows) + assert _score_row_count(stmt, "balance_sheet") == 1.0 + + def test_balance_sheet_too_few(self): + rows = [_row(i) for i in range(5)] + stmt = _make_statement(rows=rows) + assert _score_row_count(stmt, "balance_sheet") == 0.0 + + def test_balance_sheet_too_many(self): + rows = [_row(i) for i in range(100)] + stmt = _make_statement(rows=rows) + assert _score_row_count(stmt, "balance_sheet") == 0.0 + + def test_income_statement_in_range(self): + rows = [_row(i) for i in range(25)] + stmt = _make_statement(rows=rows) + assert _score_row_count(stmt, "income_statement") == 1.0 + + def test_income_statement_too_few(self): + rows = [_row(i) for i in range(3)] + stmt = _make_statement(rows=rows) + assert _score_row_count(stmt, "income_statement") == 0.0 + + def test_cash_flow_in_range(self): + rows = [_row(i) for i in range(20)] + stmt = _make_statement(rows=rows) + assert _score_row_count(stmt, "cash_flow") == 1.0 + + def test_cash_flow_too_many(self): + rows = [_row(i) for i in range(70)] + stmt = _make_statement(rows=rows) + assert _score_row_count(stmt, "cash_flow") == 0.0 + + +# --------------------------------------------------------------------------- +# _score_column_dates +# --------------------------------------------------------------------------- + +class TestScoreColumnDates: + + def test_all_columns_have_years(self): + columns = [ + {"column_index": 0, "label": "2025"}, + {"column_index": 1, "label": "2024"}, + ] + stmt = _make_statement(columns=columns) + assert _score_column_dates(stmt) == 1.0 + + def test_no_columns_returns_0(self): + stmt = _make_statement(columns=[]) + assert _score_column_dates(stmt) == 0.0 + + def test_partial_columns_have_years(self): + columns = [ + {"column_index": 0, "label": "December 31, 2025"}, + {"column_index": 1, "label": "Amount"}, + ] + stmt = _make_statement(columns=columns) + assert _score_column_dates(stmt) == 0.5 + + def test_label_containing_year(self): + """Year embedded in longer label still matches.""" + columns = [{"column_index": 0, "label": "Fiscal Year Ended December 31, 2023"}] + stmt = _make_statement(columns=columns) + assert _score_column_dates(stmt) == 1.0 + + def test_label_with_old_year_not_matching_20xx(self): + """Years before 2000 (e.g. 1999) should not match the 20xx pattern.""" + columns = [{"column_index": 0, "label": "1999"}] + stmt = _make_statement(columns=columns) + assert _score_column_dates(stmt) == 0.0 + + def test_label_with_no_year(self): + columns = [{"column_index": 0, "label": "Amount (USD thousands)"}] + stmt = _make_statement(columns=columns) + assert _score_column_dates(stmt) == 0.0 + + +# --------------------------------------------------------------------------- +# _score_empty_label_ratio +# --------------------------------------------------------------------------- + +class TestScoreEmptyLabelRatio: + + def test_no_blank_labels(self): + rows = [_row(i, label_raw=f"Row {i}") for i in range(20)] + stmt = _make_statement(rows=rows) + assert _score_empty_label_ratio(stmt) == 1.0 + + def test_exactly_5_percent_blank_is_ok(self): + """Exactly 5% blank labels → score 1.0 (uses <=).""" + rows = [_row(i, label_raw=f"Row {i}") for i in range(19)] + rows.append(_row(19, label_raw="")) # 1/20 = 5% + stmt = _make_statement(rows=rows) + assert _score_empty_label_ratio(stmt) == 1.0 + + def test_above_5_percent_blank_fails(self): + """More than 5% blank labels → score 0.0.""" + rows = [_row(i, label_raw=f"Row {i}") for i in range(18)] + rows.append(_row(18, label_raw="")) + rows.append(_row(19, label_raw="")) # 2/20 = 10% + stmt = _make_statement(rows=rows) + assert _score_empty_label_ratio(stmt) == 0.0 + + def test_all_blank_labels_fails(self): + rows = [_row(i, label_raw="") for i in range(5)] + stmt = _make_statement(rows=rows) + assert _score_empty_label_ratio(stmt) == 0.0 + + def test_empty_rows_returns_1(self): + """No rows → no blanks → 1.0.""" + stmt = _make_statement(rows=[]) + assert _score_empty_label_ratio(stmt) == 1.0 + + def test_whitespace_only_label_counts_as_blank(self): + """Labels with only whitespace are treated as blank.""" + rows = [_row(i, label_raw=f"Row {i}") for i in range(19)] + rows.append(_row(19, label_raw=" ")) # 1/20 = 5% + stmt = _make_statement(rows=rows) + assert _score_empty_label_ratio(stmt) == 1.0 + + +# --------------------------------------------------------------------------- +# _score_leaked_headers +# --------------------------------------------------------------------------- + +class TestScoreLeakedHeaders: + + def test_no_leaked_headers(self): + rows = [_row(i, label_raw=f"Cash {i}") for i in range(5)] + stmt = _make_statement(rows=rows) + assert _score_leaked_headers(stmt) == 1.0 + + def test_leaked_header_detected(self): + """A row with blank label and all values looking like years/unit-strings → leaked header.""" + rows = [ + _row(0, label_raw="Revenue"), + _row(1, label_raw="", values=[ + {"raw": "2025", "normalized": 2025.0, "is_null": False}, + {"raw": "2024", "normalized": 2024.0, "is_null": False}, + ]), + ] + stmt = _make_statement(rows=rows) + assert _score_leaked_headers(stmt) == 0.0 + + def test_blank_label_with_normal_values_not_leaked(self): + """Blank label with regular financial values is NOT a leaked header.""" + rows = [ + _row(0, label_raw="Revenue"), + _row(1, label_raw="", values=[ + {"raw": "100000", "normalized": 100000.0, "is_null": False}, + ]), + ] + stmt = _make_statement(rows=rows) + assert _score_leaked_headers(stmt) == 1.0 + + def test_leaked_header_unit_string(self): + """Blank label with unit string values (e.g. 'USD millions') → leaked header.""" + rows = [ + _row(0, label_raw=""), + ] + rows[0]["values"] = [{"raw": "USD millions", "normalized": None, "is_null": True}] + stmt = _make_statement(rows=rows) + assert _score_leaked_headers(stmt) == 0.0 + + def test_empty_rows_returns_1(self): + stmt = _make_statement(rows=[]) + assert _score_leaked_headers(stmt) == 1.0 + + +# --------------------------------------------------------------------------- +# flag_rows +# --------------------------------------------------------------------------- + +class TestFlagRows: + + def test_blank_label_flagged(self): + rows = [_row(0, label_raw="")] + stmt = _make_statement(rows=rows) + flagged = flag_rows(stmt) + assert 0 in flagged + + def test_row_in_validation_warning_flagged(self): + rows = [_row(0), _row(1)] + warnings = [{"row_index": 1, "column_index": 0, "difference": 50}] + stmt = _make_statement(rows=rows, warnings=warnings) + flagged = flag_rows(stmt) + assert 1 in flagged + assert 0 not in flagged + + def test_section_other_with_values_not_flagged(self): + """Rows in 'other' section WITH values should NOT be flagged (e.g., OCI items).""" + rows = [_row(0, section="other")] + stmt = _make_statement(rows=rows) + flagged = flag_rows(stmt) + assert 0 not in flagged + + def test_section_other_without_values_flagged(self): + """Rows in 'other' section WITHOUT values should be flagged.""" + rows = [_row(0, section="other")] + rows[0]["values"] = [{"raw": None, "normalized": None, "is_null": True}] + stmt = _make_statement(rows=rows) + flagged = flag_rows(stmt) + assert 0 in flagged + + def test_parse_failure_flagged(self): + """Row with normalized=None but non-empty raw is a parse failure.""" + rows = [_row(0, values=[{"raw": "abc", "normalized": None, "is_null": False}])] + stmt = _make_statement(rows=rows) + flagged = flag_rows(stmt) + assert 0 in flagged + + def test_total_label_but_line_item_type_flagged(self): + """Label containing 'total' but row_type='line_item' should be flagged.""" + rows = [_row(0, label_raw="Total current assets", row_type="line_item")] + stmt = _make_statement(rows=rows) + flagged = flag_rows(stmt) + assert 0 in flagged + + def test_total_label_with_correct_type_not_flagged(self): + """Label containing 'total' with row_type='subtotal' should NOT be flagged (by this rule).""" + rows = [_row(0, label_raw="Total current assets", row_type="subtotal")] + stmt = _make_statement(rows=rows) + flagged = flag_rows(stmt) + # Should not be flagged just for this rule (may be flagged by others) + assert 0 not in flagged + + def test_clean_row_not_flagged(self): + rows = [_row(0, label_raw="Cash", row_type="line_item", section="current_assets")] + stmt = _make_statement(rows=rows) + flagged = flag_rows(stmt) + assert 0 not in flagged + + def test_multiple_flag_conditions_same_row(self): + """A row meeting multiple conditions is still in the set once.""" + rows = [_row(0, label_raw="", section="other")] + stmt = _make_statement(rows=rows) + flagged = flag_rows(stmt) + assert 0 in flagged + assert len([x for x in flagged if x == 0]) == 1 + + def test_normalized_none_but_is_null_true_not_flagged(self): + """If is_null=True, a None normalized is expected (null cell, not parse failure).""" + rows = [_row(0, values=[{"raw": "", "normalized": None, "is_null": True}])] + stmt = _make_statement(rows=rows) + flagged = flag_rows(stmt) + assert 0 not in flagged + + def test_total_label_case_insensitive(self): + """'TOTAL revenue' with row_type='line_item' should be flagged.""" + rows = [_row(0, label_raw="TOTAL revenue", row_type="line_item")] + stmt = _make_statement(rows=rows) + flagged = flag_rows(stmt) + assert 0 in flagged + + def test_flag_rows_returns_set_of_ints(self): + rows = [_row(0, label_raw="")] + stmt = _make_statement(rows=rows) + result = flag_rows(stmt) + assert isinstance(result, set) + + +# --------------------------------------------------------------------------- +# score_statement — composite +# --------------------------------------------------------------------------- + +class TestScoreStatement: + + def _make_perfect_balance_sheet(self): + """A well-formed balance sheet that should score high.""" + rows = ( + [_row(i, label_raw=f"Current asset {i}", section="current_assets") for i in range(7)] + + [_row(7 + i, label_raw=f"Non-current asset {i}", section="non_current_assets") for i in range(5)] + + [_row(12, label_raw="Total assets", row_type="subtotal", section="assets")] + + [_row(13 + i, label_raw=f"Liability {i}", section="current_liabilities") for i in range(5)] + + [_row(18 + i, label_raw=f"NC Liability {i}", section="non_current_liabilities") for i in range(3)] + + [_row(21, label_raw="Total liabilities", row_type="subtotal", section="liabilities")] + + [_row(22 + i, label_raw=f"Equity {i}", section="equity") for i in range(3)] + ) + columns = [ + {"column_index": 0, "label": "2025"}, + {"column_index": 1, "label": "2024"}, + ] + return _make_statement(rows=rows, columns=columns, warnings=[]) + + def _make_broken_statement(self): + """A poorly-formed statement that should score low.""" + rows = [ + _row(0, label_raw="", section="other", values=[{"raw": "bad", "normalized": None, "is_null": False}]), + _row(1, label_raw="", section="other"), + _row(2, label_raw="", section="other"), + ] + columns = [{"column_index": 0, "label": "No year here"}] + warnings = [{"row_index": 0, "column_index": 0, "difference": 999}] + return _make_statement(rows=rows, columns=columns, warnings=warnings) + + def test_perfect_balance_sheet_scores_high(self): + stmt = self._make_perfect_balance_sheet() + result = score_statement(stmt, "balance_sheet") + assert result["score"] >= 0.85 + assert result["level"] == "high" + + def test_broken_statement_scores_low(self): + stmt = self._make_broken_statement() + result = score_statement(stmt, "balance_sheet") + assert result["score"] < 0.60 + assert result["level"] == "low" + + def test_result_structure(self): + stmt = self._make_perfect_balance_sheet() + result = score_statement(stmt, "balance_sheet") + assert "score" in result + assert "level" in result + assert "signals" in result + assert "flagged_rows" in result + + def test_signals_dict_has_all_keys(self): + stmt = self._make_perfect_balance_sheet() + result = score_statement(stmt, "balance_sheet") + expected_keys = { + "subtotal_validation", + "section_coverage", + "row_count", + "column_dates", + "empty_label_ratio", + "leaked_headers", + } + assert set(result["signals"].keys()) == expected_keys + + def test_all_signals_between_0_and_1(self): + stmt = self._make_perfect_balance_sheet() + result = score_statement(stmt, "balance_sheet") + for signal_name, signal_val in result["signals"].items(): + assert 0.0 <= signal_val <= 1.0, f"Signal {signal_name} out of range: {signal_val}" + + def test_composite_score_between_0_and_1(self): + stmt = self._make_broken_statement() + result = score_statement(stmt, "balance_sheet") + assert 0.0 <= result["score"] <= 1.0 + + def test_flagged_rows_is_list(self): + stmt = self._make_perfect_balance_sheet() + result = score_statement(stmt, "balance_sheet") + assert isinstance(result["flagged_rows"], list) + + def test_medium_level_threshold(self): + """Score between 0.60 and 0.85 → medium.""" + rows = ( + [_row(i, section="current_assets") for i in range(8)] + + [_row(8 + i, section="current_liabilities") for i in range(5)] + + [_row(13 + i, section="equity") for i in range(3)] + ) + # Use a column without a year to hurt column_dates signal + columns = [{"column_index": 0, "label": "No year"}] + stmt = _make_statement(rows=rows, columns=columns) + result = score_statement(stmt, "balance_sheet") + # The level should be one of the three valid levels + assert result["level"] in ("high", "medium", "low") + + def test_score_is_weighted_average(self): + """Verify the score is computed as weighted average of signals.""" + stmt = self._make_perfect_balance_sheet() + result = score_statement(stmt, "balance_sheet") + signals = result["signals"] + total_weight = sum(WEIGHTS.values()) + expected = sum(signals[k] * WEIGHTS[k] for k in WEIGHTS) / total_weight + assert result["score"] == pytest.approx(expected, abs=1e-9) + + def test_income_statement_scoring(self): + rows = ( + [_row(i, section="revenue") for i in range(5)] + + [_row(5 + i, section="operating_expenses") for i in range(10)] + ) + columns = [{"column_index": 0, "label": "2025"}] + stmt = _make_statement(rows=rows, columns=columns) + result = score_statement(stmt, "income_statement") + assert 0.0 <= result["score"] <= 1.0 + + def test_cash_flow_scoring(self): + rows = ( + [_row(i, section="operating_activities") for i in range(8)] + + [_row(8 + i, section="investing_activities") for i in range(5)] + + [_row(13 + i, section="financing_activities") for i in range(5)] + ) + columns = [{"column_index": 0, "label": "2025"}] + stmt = _make_statement(rows=rows, columns=columns) + result = score_statement(stmt, "cash_flow") + assert 0.0 <= result["score"] <= 1.0 + + +# --------------------------------------------------------------------------- +# Constants exported from module +# --------------------------------------------------------------------------- + +class TestModuleConstants: + + def test_weights_sum_to_1(self): + """All 6 weights should sum to 1.0.""" + assert sum(WEIGHTS.values()) == pytest.approx(1.0) + + def test_weights_has_6_keys(self): + assert len(WEIGHTS) == 6 + + def test_thresholds_has_high_and_medium(self): + assert "high" in THRESHOLDS + assert "medium" in THRESHOLDS + + def test_row_ranges_has_all_statement_types(self): + assert "balance_sheet" in ROW_RANGES + assert "income_statement" in ROW_RANGES + assert "cash_flow" in ROW_RANGES + + def test_section_groups_has_all_statement_types(self): + assert "balance_sheet" in SECTION_GROUPS + assert "income_statement" in SECTION_GROUPS + assert "cash_flow" in SECTION_GROUPS diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_corrections.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_corrections.py new file mode 100644 index 000000000..5ac8856fc --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_corrections.py @@ -0,0 +1,411 @@ +""" +Tests for extractor/corrections.py — covers apply_corrections(). + +Each test builds a minimal v1.2-style statement dict and verifies +that the corrections dict is applied correctly. +""" + +import pytest + +from extractor.corrections import apply_corrections + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + +def _make_statement(rows): + """Build a minimal v1.2 statement dict with the given rows.""" + return { + "rows": rows, + "columns": [], + "validation": {}, + } + + +def _make_row(row_index, label_raw, values=None, section="assets", + canonical_group="assets", row_type="line_item", + is_derived_total=False): + """Build a minimal row dict.""" + if values is None: + values = [] + return { + "row_index": row_index, + "label_raw": label_raw, + "values": values, + "section": section, + "canonical_group": canonical_group, + "row_type": row_type, + "is_derived_total": is_derived_total, + } + + +def _make_value_cell(raw, normalized, is_null=False, is_zero=None): + """Build a minimal value cell dict.""" + return { + "raw": raw, + "normalized": normalized, + "is_null": is_null, + "is_zero": is_zero, + } + + +# --------------------------------------------------------------------------- +# Test 1: label correction +# --------------------------------------------------------------------------- + +class TestApplyLabelCorrection: + def test_apply_label_correction(self): + """Correcting a label updates label_raw; other rows are unchanged.""" + rows = [ + _make_row(0, "Cassh and cash equivalents"), + _make_row(1, "Trade receivables"), + ] + statement = _make_statement(rows) + + corrections = { + "row_0": {"label": "Cash and cash equivalents"}, + } + + result = apply_corrections(statement, corrections) + + assert result["rows"][0]["label_raw"] == "Cash and cash equivalents" + # Row 1 must be untouched + assert result["rows"][1]["label_raw"] == "Trade receivables" + + def test_only_target_row_label_changes(self): + """Only the targeted row's label changes; siblings are not modified.""" + rows = [ + _make_row(0, "Inventorie"), + _make_row(1, "Prepaid expenses"), + _make_row(2, "Other assets"), + ] + statement = _make_statement(rows) + corrections = {"row_1": {"label": "Prepaid expenses (corrected)"}} + + result = apply_corrections(statement, corrections) + + assert result["rows"][0]["label_raw"] == "Inventorie" + assert result["rows"][1]["label_raw"] == "Prepaid expenses (corrected)" + assert result["rows"][2]["label_raw"] == "Other assets" + + +# --------------------------------------------------------------------------- +# Test 2: value correction re-parses +# --------------------------------------------------------------------------- + +class TestApplyValueCorrectionReparses: + def test_apply_value_correction_reparses(self): + """Correcting value '(3,200)' should re-parse to normalized=-3200.0.""" + rows = [ + _make_row(0, "Finance costs", values=[ + _make_value_cell("3,200", 3200.0), + ]), + ] + statement = _make_statement(rows) + corrections = {"row_0": {"val_0": "(3,200)"}} + + result = apply_corrections(statement, corrections) + + cell = result["rows"][0]["values"][0] + assert cell["raw"] == "(3,200)" + assert cell["normalized"] == -3200.0 + assert cell["is_null"] is False + assert cell["is_zero"] is False + + def test_positive_value_reparsed(self): + """A plain positive value is re-parsed correctly.""" + rows = [ + _make_row(0, "Revenue", values=[ + _make_value_cell("0", 0.0), + ]), + ] + statement = _make_statement(rows) + corrections = {"row_0": {"val_0": "1,500,000"}} + + result = apply_corrections(statement, corrections) + + cell = result["rows"][0]["values"][0] + assert cell["normalized"] == 1_500_000.0 + assert cell["is_null"] is False + assert cell["is_zero"] is False + + +# --------------------------------------------------------------------------- +# Test 3: section correction updates canonical_group +# --------------------------------------------------------------------------- + +class TestApplySectionCorrectionUpdatesGroup: + def test_apply_section_correction_updates_group(self): + """Correcting section to 'current_liabilities' sets canonical_group='liabilities'.""" + rows = [ + _make_row(0, "Trade payables", section="current_assets", + canonical_group="assets"), + ] + statement = _make_statement(rows) + corrections = {"row_0": {"section": "current_liabilities"}} + + result = apply_corrections(statement, corrections) + + assert result["rows"][0]["section"] == "current_liabilities" + assert result["rows"][0]["canonical_group"] == "liabilities" + + def test_non_current_liabilities_maps_to_liabilities(self): + """Section 'non_current_liabilities' maps to canonical_group 'liabilities'.""" + rows = [_make_row(0, "Long-term debt", section="current_assets", canonical_group="assets")] + statement = _make_statement(rows) + corrections = {"row_0": {"section": "non_current_liabilities"}} + + result = apply_corrections(statement, corrections) + + assert result["rows"][0]["canonical_group"] == "liabilities" + + +# --------------------------------------------------------------------------- +# Test 4: empty value sets null +# --------------------------------------------------------------------------- + +class TestApplyEmptyValueSetsNull: + def test_apply_empty_value_sets_null(self): + """Empty string correction sets is_null=True and normalized=None.""" + rows = [ + _make_row(0, "Deferred tax", values=[ + _make_value_cell("500", 500.0), + ]), + ] + statement = _make_statement(rows) + corrections = {"row_0": {"val_0": ""}} + + result = apply_corrections(statement, corrections) + + cell = result["rows"][0]["values"][0] + assert cell["raw"] is None + assert cell["normalized"] is None + assert cell["is_null"] is True + assert cell["is_zero"] is None + + def test_whitespace_only_value_sets_null(self): + """Whitespace-only correction string also sets is_null=True.""" + rows = [ + _make_row(0, "Other income", values=[ + _make_value_cell("100", 100.0), + ]), + ] + statement = _make_statement(rows) + corrections = {"row_0": {"val_0": " "}} + + result = apply_corrections(statement, corrections) + + cell = result["rows"][0]["values"][0] + assert cell["is_null"] is True + assert cell["normalized"] is None + + +# --------------------------------------------------------------------------- +# Test 5: dash value sets zero +# --------------------------------------------------------------------------- + +class TestApplyDashValueSetsZero: + def test_apply_dash_value_sets_zero(self): + """'-' correction → normalized=0.0, is_zero=True, is_null=False.""" + rows = [ + _make_row(0, "Goodwill", values=[ + _make_value_cell("500", 500.0), + ]), + ] + statement = _make_statement(rows) + corrections = {"row_0": {"val_0": "-"}} + + result = apply_corrections(statement, corrections) + + cell = result["rows"][0]["values"][0] + assert cell["normalized"] == 0.0 + assert cell["is_zero"] is True + assert cell["is_null"] is False + + def test_full_width_dash_sets_zero(self): + """Japanese full-width dash '-' also resolves to 0.0.""" + rows = [ + _make_row(0, "Investments", values=[ + _make_value_cell("1,000", 1000.0), + ]), + ] + statement = _make_statement(rows) + corrections = {"row_0": {"val_0": "-"}} + + result = apply_corrections(statement, corrections) + + cell = result["rows"][0]["values"][0] + assert cell["normalized"] == 0.0 + assert cell["is_zero"] is True + + +# --------------------------------------------------------------------------- +# Test 6: row_type correction +# --------------------------------------------------------------------------- + +class TestApplyRowTypeCorrection: + def test_apply_row_type_correction(self): + """Changing row_type to 'subtotal' sets is_derived_total=True.""" + rows = [ + _make_row(0, "Total current assets", row_type="line_item", + is_derived_total=False), + ] + statement = _make_statement(rows) + corrections = {"row_0": {"row_type": "subtotal"}} + + result = apply_corrections(statement, corrections) + + assert result["rows"][0]["row_type"] == "subtotal" + assert result["rows"][0]["is_derived_total"] is True + + def test_row_type_total_sets_is_derived_total(self): + """Changing row_type to 'total' also sets is_derived_total=True.""" + rows = [_make_row(0, "Total assets", row_type="line_item", is_derived_total=False)] + statement = _make_statement(rows) + corrections = {"row_0": {"row_type": "total"}} + + result = apply_corrections(statement, corrections) + + assert result["rows"][0]["is_derived_total"] is True + + def test_row_type_line_item_clears_is_derived_total(self): + """Changing row_type back to 'line_item' sets is_derived_total=False.""" + rows = [_make_row(0, "Mislabelled total", row_type="total", is_derived_total=True)] + statement = _make_statement(rows) + corrections = {"row_0": {"row_type": "line_item"}} + + result = apply_corrections(statement, corrections) + + assert result["rows"][0]["is_derived_total"] is False + + +# --------------------------------------------------------------------------- +# Test 7: section "other" maps to canonical_group "other" +# --------------------------------------------------------------------------- + +class TestApplyOtherSectionMapsToOtherGroup: + def test_apply_other_section_maps_to_other_group(self): + """Section='other' → canonical_group='other' (not in _SECTION_TO_GROUP).""" + rows = [ + _make_row(0, "Miscellaneous", section="assets", canonical_group="assets"), + ] + statement = _make_statement(rows) + corrections = {"row_0": {"section": "other"}} + + result = apply_corrections(statement, corrections) + + assert result["rows"][0]["section"] == "other" + assert result["rows"][0]["canonical_group"] == "other" + + def test_unknown_section_also_maps_to_other_group(self): + """An unrecognised section string falls back to canonical_group='other'.""" + rows = [_make_row(0, "Exotic item", section="assets", canonical_group="assets")] + statement = _make_statement(rows) + corrections = {"row_0": {"section": "completely_unknown_section"}} + + result = apply_corrections(statement, corrections) + + assert result["rows"][0]["canonical_group"] == "other" + + +# --------------------------------------------------------------------------- +# Test 8: no corrections returns statement unchanged +# --------------------------------------------------------------------------- + +class TestNoCorrectionsReturnsUnchanged: + def test_no_corrections_returns_unchanged(self): + """Empty corrections dict returns the statement dict without modification.""" + rows = [ + _make_row(0, "Cash", values=[_make_value_cell("1,000", 1000.0)], + section="current_assets", canonical_group="assets"), + ] + statement = _make_statement(rows) + # Take a snapshot of values before calling + original_label = statement["rows"][0]["label_raw"] + original_normalized = statement["rows"][0]["values"][0]["normalized"] + + result = apply_corrections(statement, {}) + + assert result["rows"][0]["label_raw"] == original_label + assert result["rows"][0]["values"][0]["normalized"] == original_normalized + + def test_returns_same_object(self): + """apply_corrections returns the same dict object (in-place mutation).""" + rows = [_make_row(0, "Cash")] + statement = _make_statement(rows) + + result = apply_corrections(statement, {}) + + assert result is statement + + +# --------------------------------------------------------------------------- +# Test 9: invalid row key is silently ignored +# --------------------------------------------------------------------------- + +class TestInvalidRowKeyIgnored: + def test_invalid_row_key_ignored(self): + """A correction key that doesn't start with 'row_' is silently skipped.""" + rows = [_make_row(0, "Revenue")] + statement = _make_statement(rows) + corrections = {"invalid_key": {"label": "Should not apply"}} + + result = apply_corrections(statement, corrections) + + assert result["rows"][0]["label_raw"] == "Revenue" + + def test_row_prefix_with_non_integer_suffix_ignored(self): + """A key like 'row_abc' (non-integer suffix) is silently skipped.""" + rows = [_make_row(0, "Revenue")] + statement = _make_statement(rows) + corrections = {"row_abc": {"label": "Should not apply"}} + + result = apply_corrections(statement, corrections) + + assert result["rows"][0]["label_raw"] == "Revenue" + + def test_empty_key_ignored(self): + """An empty string key is silently skipped.""" + rows = [_make_row(0, "Revenue")] + statement = _make_statement(rows) + corrections = {"": {"label": "Should not apply"}} + + result = apply_corrections(statement, corrections) + + assert result["rows"][0]["label_raw"] == "Revenue" + + +# --------------------------------------------------------------------------- +# Test 10: nonexistent row index is silently ignored +# --------------------------------------------------------------------------- + +class TestNonexistentRowIndexIgnored: + def test_nonexistent_row_index_ignored(self): + """Correction for row_999 (not in rows) is silently skipped.""" + rows = [_make_row(0, "Cash")] + statement = _make_statement(rows) + corrections = {"row_999": {"label": "Ghost row"}} + + result = apply_corrections(statement, corrections) + + assert result["rows"][0]["label_raw"] == "Cash" + assert len(result["rows"]) == 1 + + def test_correction_applies_when_row_exists_skips_when_not(self): + """Mixed corrections: existing row updated, missing row skipped.""" + rows = [ + _make_row(0, "Cash"), + _make_row(1, "Receivables"), + ] + statement = _make_statement(rows) + corrections = { + "row_0": {"label": "Cash and equivalents"}, + "row_50": {"label": "Should be ignored"}, + } + + result = apply_corrections(statement, corrections) + + assert result["rows"][0]["label_raw"] == "Cash and equivalents" + assert result["rows"][1]["label_raw"] == "Receivables" + assert len(result["rows"]) == 2 diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_dataverse_batch.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_dataverse_batch.py new file mode 100644 index 000000000..653518441 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_dataverse_batch.py @@ -0,0 +1,34 @@ +import sys +import os +sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..')) + +from extractor.dataverse_client import build_batch_payload, build_create_request + + +def test_build_create_request(): + req = build_create_request("cree1_extractionjobs", {"cree1_jobid": "j1"}, "1") + assert req["method"] == "POST" + assert req["url"] == "cree1_extractionjobs" + assert req["body"]["cree1_jobid"] == "j1" + + +def test_build_create_request_with_lookup(): + req = build_create_request( + "cree1_extractedlineitems", + {"cree1_lineitemname": "Cash"}, + "1", + lookups={"cree1_ExtractionJob@odata.bind": "/cree1_extractionjobs(abc)"}, + ) + assert req["body"]["cree1_ExtractionJob@odata.bind"] == "/cree1_extractionjobs(abc)" + + +def test_build_batch_payload(): + requests = [ + build_create_request("cree1_extractedlineitems", {"cree1_lineitemname": f"item_{i}"}, str(i)) + for i in range(3) + ] + boundary, body = build_batch_payload(requests) + assert boundary in body + assert body.count("POST cree1_extractedlineitems") == 3 + assert "item_0" in body + assert "item_2" in body diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_dataverse_parser.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_dataverse_parser.py new file mode 100644 index 000000000..7efe43098 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_dataverse_parser.py @@ -0,0 +1,100 @@ +import json +import sys +import os +sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..')) + +from extractor.dataverse_parser import parse_job_row, parse_statement_row, parse_line_item_rows + +SAMPLE_RESULT = { + "balance_sheet": { + "schema_version": "1.2.0", + "document_metadata": { + "source_file_name": "test.pdf", "company_name": "Acme Corp", + "company_name_raw": "Acme Corporation", "report_type": "other", + "report_language": "en", "source_country": None, "source_exchange": None, + "ticker": None, "identifier": None, "source_file_hash": None, + }, + "statement_metadata": { + "statement_type": "balance_sheet", "statement_title": "Consolidated Balance Sheet", + "statement_title_raw": "Consolidated Balance Sheet", "accounting_standard": "IFRS", + "currency": "USD", "currency_symbol": "$", "unit": "millions", "unit_raw": None, + "is_consolidated": True, "is_audited": True, + "page_range": {"start": 10, "end": 12}, "bbox_coordinate_system": "normalized_0_1", + }, + "columns": [ + {"column_index": 0, "label": "2023", "label_raw": "2023", "period_type": "instant", "fiscal_year": 2023, "fiscal_quarter": None, "start_date": None, "end_date": None, "is_comparative": False}, + {"column_index": 1, "label": "2024", "label_raw": "2024", "period_type": "instant", "fiscal_year": 2024, "fiscal_quarter": None, "start_date": None, "end_date": None, "is_comparative": True}, + ], + "rows": [{ + "row_index": 0, "label_raw": "Total Assets", "label_normalized": "Total Assets", "label_language": "en", + "canonical_key": "total_assets", "canonical_group": "assets", "row_type": "total", + "indent_level": 0, "section": "assets", "parent_canonical_key": None, "sign_hint": None, + "is_derived_total": False, "is_required_anchor": True, "source_page": 10, "source_bbox": None, + "values": [ + {"raw": "1,000", "normalized": 1000.0, "is_null": False, "is_zero": False, "value_kind": "currency", "confidence": 0.95, "column_index": 0}, + {"raw": "1,200", "normalized": 1200.0, "is_null": False, "is_zero": False, "value_kind": "currency", "confidence": 0.98, "column_index": 1}, + ], + }], + "validation": {"status": "passed", "warnings": [], "errors": []}, + }, + "summary": [{"statement_type": "balance_sheet", "status": "extracted", "row_count": 1, "column_count": 2}], + "confidence": {"balance_sheet": {"score": 0.95, "level": "high"}}, +} + + +def test_parse_job_row(): + row = parse_job_row("job-123", SAMPLE_RESULT) + assert row["cree1_jobid"] == "job-123" + assert row["cree1_companyname"] == "Acme Corp" + assert row["cree1_reportlanguage"] == "en" + assert row["cree1_currency"] == "USD" + assert row["cree1_currencyunit"] == "millions" + assert row["cree1_statementsfound"] == 1 + assert row["cree1_status"] == 833060002 + assert row["cree1_avgconfidence"] == 0.95 + + +def test_parse_statement_row(): + row = parse_statement_row("balance_sheet", SAMPLE_RESULT["balance_sheet"]) + assert row["cree1_statementtitle"] == "Consolidated Balance Sheet" + assert row["cree1_statementname"] == "balance_sheet" + assert row["cree1_statementtype"] == 833060001 + assert row["cree1_pagerangestart"] == 10 + assert row["cree1_pagerangeend"] == 12 + assert row["cree1_isconsolidated"] is True + assert row["cree1_isaudited"] is True + assert row["cree1_reviewcomplete"] is False + assert "schema_version" in row["cree1_rawstatementjson"] + + +def test_parse_line_item_rows(): + items = parse_line_item_rows("balance_sheet", SAMPLE_RESULT["balance_sheet"]) + assert len(items) == 2 + assert items[0]["cree1_lineitemname"] == "Total Assets" + assert items[0]["cree1_rowindex"] == 0 + assert items[0]["cree1_rowtype"] == 833060003 # Total + assert items[0]["cree1_canonicalkey"] == "total_assets" + assert items[0]["cree1_period"] == "2023" + assert items[0]["cree1_valueraw"] == "1,000" + assert items[0]["cree1_valuenormalized"] == 1000.0 + assert items[0]["cree1_aiconfidence"] == 0.95 + assert items[0]["cree1_reviewstatus"] == 833060000 + assert items[1]["cree1_period"] == "2024" + assert items[1]["cree1_valuenormalized"] == 1200.0 + + +def test_parse_line_item_rows_null_values(): + stmt = { + **SAMPLE_RESULT["balance_sheet"], + "rows": [{ + "row_index": 0, "label_raw": "Header", "label_normalized": "Header", "label_language": "en", + "canonical_key": "header", "canonical_group": "assets", "row_type": "section_header", + "indent_level": 0, "section": "assets", "parent_canonical_key": None, "sign_hint": None, + "is_derived_total": False, "is_required_anchor": False, "source_page": None, "source_bbox": None, + "values": [{"raw": None, "normalized": None, "is_null": True, "is_zero": None, "value_kind": "currency", "confidence": None, "column_index": 0}], + }], + } + items = parse_line_item_rows("balance_sheet", stmt) + assert len(items) == 1 + assert items[0]["cree1_valuenormalized"] is None + assert items[0]["cree1_rowtype"] == 833060000 # SectionHeader diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_llm_reconciler.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_llm_reconciler.py new file mode 100644 index 000000000..530fa64d0 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_llm_reconciler.py @@ -0,0 +1,311 @@ +""" +Tests for extractor/llm_reconciler.py — covers: + Fix 1: LLM row index validation + Fix 2: rows[] / cells[] sync after ghost swap + Fix 4: Heuristic-first matching before LLM calls +""" + +import json +from unittest.mock import patch, MagicMock + +from extractor.llm_reconciler import ( + suppress_noise_rows, + reconcile_suspect_ghost, + complete_truncated_labels, + _heuristic_match_ghosts, + _heuristic_complete_labels, + _apply_ghost_matches, + _apply_label_completions, + _build_grid, +) + + +# --------------------------------------------------------------------------- +# Helpers — build minimal rows/columns/cells triples for testing +# --------------------------------------------------------------------------- + +def _make_cells(data: list[tuple[int, str, list[str]]]) -> tuple[list[str], list[str], list[dict]]: + """ + Build (rows, columns, cells) from a compact description. + + data: list of (row_index, label, [val1, val2, ...]) + """ + rows = [] + columns = ["Period 1", "Period 2"] + cells = [] + for row_idx, label, vals in data: + rows.append(label) + cells.append({"row": row_idx, "col": 0, "content": label, "kind": "content"}) + for vi, val in enumerate(vals): + cells.append({"row": row_idx, "col": vi + 1, "content": val, "kind": "content"}) + return rows, columns, cells + + +# =========================================================================== +# Fix 1: LLM row index validation +# =========================================================================== + +class TestApplyGhostMatchesValidation: + """_apply_ghost_matches must reject invalid row indices and duplicates.""" + + def test_rejects_suspect_row_not_in_valid_set(self): + """An LLM-returned suspect_row outside the known set is rejected.""" + rows, cols, cells = _make_cells([ + (0, "Total revenue", ["100", "200"]), + (1, "fragment text", ["50", "60"]), # suspect + (2, "Real Label", []), # ghost + ]) + grid = _build_grid(cells) + valid_suspects = {1} + valid_ghosts = {2} + + # LLM returns row 0 as suspect — that's a real row, not a suspect. + matches = [{"suspect_row": 0, "ghost_row": 2}] + drops = _apply_ghost_matches( + matches, valid_suspects, valid_ghosts, grid, rows, cells, "test", + ) + assert drops == set() # rejected — row 0 not in valid_suspects + # Original label on row 0 unchanged. + assert rows[0] == "Total revenue" + + def test_rejects_ghost_row_not_in_valid_set(self): + """An LLM-returned ghost_row outside the known set is rejected.""" + rows, cols, cells = _make_cells([ + (0, "Total revenue", ["100", "200"]), + (1, "fragment text", ["50", "60"]), + (2, "Real Label", []), + ]) + grid = _build_grid(cells) + valid_suspects = {1} + valid_ghosts = {2} + + # LLM returns row 0 as ghost — that's a data row, not a ghost. + matches = [{"suspect_row": 1, "ghost_row": 0}] + drops = _apply_ghost_matches( + matches, valid_suspects, valid_ghosts, grid, rows, cells, "test", + ) + assert drops == set() # rejected — row 0 not in valid_ghosts + + def test_rejects_duplicate_ghost_target(self): + """Two suspects cannot map to the same ghost row.""" + rows, cols, cells = _make_cells([ + (0, "frag_a", ["10", "20"]), # suspect + (1, "frag_b", ["30", "40"]), # suspect + (2, "True Label", []), # ghost + ]) + grid = _build_grid(cells) + valid_suspects = {0, 1} + valid_ghosts = {2} + + matches = [ + {"suspect_row": 0, "ghost_row": 2}, + {"suspect_row": 1, "ghost_row": 2}, # duplicate ghost + ] + drops = _apply_ghost_matches( + matches, valid_suspects, valid_ghosts, grid, rows, cells, "test", + ) + # First match accepted, second rejected as duplicate. + assert drops == {2} + assert rows[0] == "True Label" + assert rows[1] == "frag_b" # unchanged — duplicate was rejected + + def test_accepts_valid_match(self): + """A valid match within both sets is applied correctly.""" + rows, cols, cells = _make_cells([ + (0, "fragment", ["50", "60"]), + (1, "Correct Label", []), + ]) + grid = _build_grid(cells) + + matches = [{"suspect_row": 0, "ghost_row": 1}] + drops = _apply_ghost_matches( + matches, {0}, {1}, grid, rows, cells, "test", + ) + assert drops == {1} + assert rows[0] == "Correct Label" + + +# =========================================================================== +# Fix 2: rows[] / cells[] sync after ghost swap +# =========================================================================== + +class TestRowsCellsSync: + """After reconcile_suspect_ghost, rows[] and cells[] must agree.""" + + @patch("extractor.llm_reconciler._get_client") + def test_rows_and_cells_in_sync_after_reconcile(self, mock_get_client): + """Both rows[i] and cells label for row i must contain the ghost label.""" + # Set up a mock LLM response that maps suspect→ghost. + mock_response = MagicMock() + mock_response.choices = [MagicMock()] + mock_response.choices[0].message.content = json.dumps({ + "matches": [{"suspect_row": 1, "ghost_row": 2}] + }) + mock_client = MagicMock() + mock_client.chat.completions.create.return_value = mock_response + mock_get_client.return_value = mock_client + + rows, cols, cells = _make_cells([ + (0, "Revenue", ["100", "200"]), + (1, "securities borrowing", ["50", "60"]), # suspect (lowercase) + (2, "Net decrease in receivables", []), # ghost (no values) + (3, "Depreciation", ["30", "40"]), + ]) + + new_rows, new_cols, new_cells = reconcile_suspect_ghost( + "cash_flow", rows, cols, cells, + ) + + # Find the label in cells for what was row 1. + cell_labels = {c["row"]: c["content"] for c in new_cells if c["col"] == 0} + + # rows and cells must agree on the swapped label. + assert "Net decrease in receivables" in new_rows + assert cell_labels[1] == "Net decrease in receivables" + + +# =========================================================================== +# Fix 4: Heuristic-first matching +# =========================================================================== + +class TestHeuristicMatchGhosts: + """_heuristic_match_ghosts resolves unambiguous pairs without LLM.""" + + def test_single_suspect_single_ghost(self): + """Trivial case: 1 suspect + 1 ghost → matched directly.""" + suspects = [{"row": 5, "label": "borrowing transactions", "values": ["100"]}] + ghosts = [{"row": 3, "label": "Net decrease in receivables under"}] + + matches, remaining_s, remaining_g = _heuristic_match_ghosts(suspects, ghosts) + + assert len(matches) == 1 + assert matches[0] == {"suspect_row": 5, "ghost_row": 3} + assert remaining_s == [] + assert remaining_g == [] + + def test_substring_match(self): + """Fragment is a substring of exactly one ghost label → matched.""" + suspects = [ + {"row": 5, "label": "liabilities", "values": ["100"]}, + {"row": 8, "label": "securities", "values": ["200"]}, + ] + ghosts = [ + {"row": 3, "label": "Increase and decrease in derivative assets and liabilities"}, + {"row": 7, "label": "Purchases of investment securities for banking"}, + ] + + matches, remaining_s, remaining_g = _heuristic_match_ghosts(suspects, ghosts) + + assert len(matches) == 2 + assert {"suspect_row": 5, "ghost_row": 3} in matches + assert {"suspect_row": 8, "ghost_row": 7} in matches + + def test_ambiguous_not_matched(self): + """Fragment matches multiple ghosts → left for LLM.""" + suspects = [{"row": 5, "label": "securities", "values": ["100"]}] + ghosts = [ + {"row": 3, "label": "Purchases of investment securities for banking"}, + {"row": 7, "label": "Proceeds from sales of investment securities"}, + ] + + matches, remaining_s, remaining_g = _heuristic_match_ghosts(suspects, ghosts) + + assert len(matches) == 0 + assert len(remaining_s) == 1 + assert len(remaining_g) == 2 + + @patch("extractor.llm_reconciler._get_client") + def test_heuristic_skips_llm_when_all_resolved(self, mock_get_client): + """When heuristics resolve everything, no LLM call is made.""" + rows, cols, cells = _make_cells([ + (0, "Revenue", ["100", "200"]), + (1, "liabilities", ["50", "60"]), # suspect + (2, "Increase in derivative assets and liabilities", []), # ghost + ]) + + reconcile_suspect_ghost("balance_sheet", rows, cols, cells) + + # LLM client should never have been called. + mock_get_client.assert_not_called() + + +class TestHeuristicCompleteLabels: + """_heuristic_complete_labels resolves known IFRS phrases from lookup.""" + + def test_known_phrase_completed(self): + truncated = [ + {"row": 5, "label": "Proceeds from sales and redemption of investment"}, + ] + completions, remaining = _heuristic_complete_labels(truncated) + + assert len(completions) == 1 + assert "securities" in completions[0]["completed_label"] + assert remaining == [] + + def test_unknown_phrase_left_for_llm(self): + truncated = [ + {"row": 5, "label": "Some unusual financial label ending with and"}, + ] + completions, remaining = _heuristic_complete_labels(truncated) + + assert completions == [] + assert len(remaining) == 1 + + +class TestApplyLabelCompletionsValidation: + """_apply_label_completions must reject row indices not in valid set.""" + + def test_rejects_invalid_row(self): + rows, cols, cells = _make_cells([ + (0, "Revenue", ["100"]), + (1, "Proceeds from sales and", ["200"]), + ]) + grid = _build_grid(cells) + valid_rows = {1} # only row 1 was truncated + + # Completion targets row 0 — should be rejected. + completions = [{"row": 0, "completed_label": "CORRUPTED"}] + _apply_label_completions(completions, valid_rows, grid, rows, cells, "test") + + assert rows[0] == "Revenue" # unchanged + + def test_accepts_valid_row(self): + rows, cols, cells = _make_cells([ + (0, "Revenue", ["100"]), + (1, "Proceeds from sales and", ["200"]), + ]) + grid = _build_grid(cells) + valid_rows = {1} + + completions = [{"row": 1, "completed_label": "Proceeds from sales and redemption"}] + _apply_label_completions(completions, valid_rows, grid, rows, cells, "test") + + assert rows[1] == "Proceeds from sales and redemption" + + +# =========================================================================== +# Noise suppression (existing functionality — regression test) +# =========================================================================== + +class TestSuppressNoiseRows: + """suppress_noise_rows removes syntactically impossible labels.""" + + def test_removes_yen_header(self): + rows, cols, cells = _make_cells([ + (0, "(Yen)", []), + (1, "Revenue", ["100", "200"]), + ]) + new_rows, _, new_cells = suppress_noise_rows(rows, cols, cells) + labels = [c["content"] for c in new_cells if c["col"] == 0] + assert "(Yen)" not in labels + assert "Revenue" in labels + + def test_keeps_real_ghost_row(self): + """A row with a proper label but no values must NOT be removed.""" + rows, cols, cells = _make_cells([ + (0, "Total Assets", []), + (1, "Revenue", ["100"]), + ]) + new_rows, _, new_cells = suppress_noise_rows(rows, cols, cells) + labels = [c["content"] for c in new_cells if c["col"] == 0] + assert "Total Assets" in labels diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_pdfplumber_backend.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_pdfplumber_backend.py new file mode 100644 index 000000000..9f05c86e8 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_pdfplumber_backend.py @@ -0,0 +1,299 @@ +"""Tests for pdfplumber backend — client, adapter, and integration.""" +import os +import json +from pathlib import Path +from unittest.mock import patch, MagicMock + +import pytest + + +# --- Client tests --- + +class TestPdfplumberClient: + def test_extracts_pages_and_tables(self): + """Test with the actual Xiamen PDF if available.""" + pdf_path = Path(__file__).parent.parent.parent / "docs" / "samples" / "Xiamen ITG Group Corp.,Ltd_QR_2025-09-30T00_00_00_Chinese.pdf" + if not pdf_path.exists(): + pytest.skip("Xiamen sample PDF not available") + + from extractor.pdfplumber_client import extract_document + result = extract_document(str(pdf_path)) + + assert "pages" in result + assert len(result["pages"]) == 16 + + # Check that tables were found + total_tables = sum(len(p["tables"]) for p in result["pages"]) + assert total_tables > 10 # Xiamen has ~20 tables + + # Check page 7 has balance sheet data + page7 = result["pages"][6] + assert len(page7["tables"]) > 0 + first_table = page7["tables"][0] + # Should have Chinese headers + header = first_table[0] + assert any("项目" in str(cell) or "2025" in str(cell) for cell in header) + + +# --- Adapter tests --- + +class TestTableToHtml: + def test_converts_simple_table(self): + from extractor.pdfplumber_adapter import _table_to_html + + table = [ + ["项目", "2025年9月30日", "2024年12月31日"], + ["货币资金", "9,228,601,780.33", "8,479,400,167.04"], + ["交易性金融资产", "10,355,932,397.72", "3,410,092,562.51"], + ] + html = _table_to_html(table) + + assert "
" in html + assert "" in html + assert "" in html + assert "" in html + + def test_handles_none_cells(self): + from extractor.pdfplumber_adapter import _table_to_html + + table = [ + ["项目", "2025", "2024"], + ["流动资产:", None, None], + ] + html = _table_to_html(table) + assert "" in html + assert "" in html # None becomes empty string + + def test_replaces_newlines_in_cells(self): + from extractor.pdfplumber_adapter import _table_to_html + + table = [ + ["项目", "2025年前三季度\n(1-9月)"], + ["营业收入", "100"], + ] + html = _table_to_html(table) + assert "\n" not in html.split("
项目货币资金9,228,601,780.33流动资产:
")[1] # No newlines inside table + + def test_empty_table_returns_empty_string(self): + from extractor.pdfplumber_adapter import _table_to_html + + assert _table_to_html([]) == "" + assert _table_to_html([[]]) == "" + + def test_single_row_table(self): + from extractor.pdfplumber_adapter import _table_to_html + + table = [["Header1", "Header2"]] + html = _table_to_html(table) + assert "" in html + assert "" in html + assert "
Header1Header2" not in html + + +class TestReconstructMarkdown: + def test_produces_markdown_with_tables(self): + from extractor.pdfplumber_adapter import reconstruct_markdown + + result = { + "pages": [ + { + "page_number": 1, + "text": "CONSOLIDATED BALANCE SHEET\nCompany XYZ", + "tables": [ + [["Item", "2025", "2024"], ["Assets", "100", "90"]], + ], + } + ] + } + md = reconstruct_markdown(result) + + assert "" in md + assert "CONSOLIDATED BALANCE SHEET" in md + assert "" in md + assert "Assets" in md + + def test_page_markers_present_for_each_page(self): + from extractor.pdfplumber_adapter import reconstruct_markdown + + result = { + "pages": [ + {"page_number": 1, "text": "page one", "tables": []}, + {"page_number": 2, "text": "page two", "tables": []}, + ] + } + md = reconstruct_markdown(result) + assert "" in md + assert "" in md + + def test_empty_pages_produce_markers_only(self): + from extractor.pdfplumber_adapter import reconstruct_markdown + + result = { + "pages": [ + {"page_number": 1, "text": "", "tables": []}, + ] + } + md = reconstruct_markdown(result) + assert "" in md + assert "
" not in md + + def test_multiple_tables_on_one_page(self): + from extractor.pdfplumber_adapter import reconstruct_markdown + + result = { + "pages": [ + { + "page_number": 1, + "text": "Some text", + "tables": [ + [["A", "B"], ["1", "2"]], + [["C", "D"], ["3", "4"]], + ], + } + ] + } + md = reconstruct_markdown(result) + assert md.count("
") == 2 + + +class TestBuildPageMap: + def test_builds_correct_page_map(self): + from extractor.pdfplumber_adapter import reconstruct_markdown, build_page_map + + result = { + "pages": [ + {"page_number": 1, "text": "page one", "tables": []}, + {"page_number": 2, "text": "page two", "tables": []}, + ] + } + md = reconstruct_markdown(result) + page_map = build_page_map(result, md) + + assert len(page_map) == 2 + for start, end, page_num in page_map: + assert start < end + assert page_num >= 1 + + def test_page_map_covers_full_markdown(self): + from extractor.pdfplumber_adapter import reconstruct_markdown, build_page_map + + result = { + "pages": [ + {"page_number": 1, "text": "page one", "tables": []}, + {"page_number": 2, "text": "page two", "tables": []}, + {"page_number": 3, "text": "page three", "tables": []}, + ] + } + md = reconstruct_markdown(result) + page_map = build_page_map(result, md) + + assert len(page_map) == 3 + # First page starts at 0 + assert page_map[0][0] == 0 + # Last page ends at len(markdown) + assert page_map[-1][1] == len(md) + # Pages are contiguous + for i in range(len(page_map) - 1): + assert page_map[i][1] == page_map[i + 1][0] + + def test_single_page_map(self): + from extractor.pdfplumber_adapter import reconstruct_markdown, build_page_map + + result = { + "pages": [ + {"page_number": 1, "text": "only page", "tables": []}, + ] + } + md = reconstruct_markdown(result) + page_map = build_page_map(result, md) + + assert len(page_map) == 1 + assert page_map[0][2] == 1 + assert page_map[0][0] == 0 + assert page_map[0][1] == len(md) + + +class TestClassifyStatementsWithLlm: + def test_returns_empty_when_llm_unavailable(self): + """Should gracefully return empty list when LLM is not configured.""" + from extractor.pdfplumber_adapter import classify_statements_with_llm + + with patch("extractor.pdfplumber_adapter.logger"): + # This will fail because env vars are not set in test + result = classify_statements_with_llm("some markdown text") + assert isinstance(result, list) + + +# --- Integration with real PDF --- + +class TestPdfplumberIntegration: + def test_full_extraction_on_xiamen(self): + """Run full pdfplumber extraction on Xiamen PDF and verify quality.""" + pdf_path = Path(__file__).parent.parent.parent / "docs" / "samples" / "Xiamen ITG Group Corp.,Ltd_QR_2025-09-30T00_00_00_Chinese.pdf" + if not pdf_path.exists(): + pytest.skip("Xiamen sample PDF not available") + + from extractor.pdfplumber_client import extract_document + from extractor.pdfplumber_adapter import reconstruct_markdown, build_page_map + + result = extract_document(str(pdf_path)) + markdown = reconstruct_markdown(result) + page_map = build_page_map(result, markdown) + + # Verify Chinese characters are preserved + assert "项目" in markdown + assert "货币资金" in markdown + assert "营业收入" in markdown or "营业总收入" in markdown + + # Verify numbers are preserved + assert "9,228,601,780.33" in markdown or "9,741,696,217.26" in markdown + + # Verify HTML tables present + assert "
" in markdown + assert markdown.count("
") > 10 + + # Verify page map + assert len(page_map) == 16 + + +# --- Analyze stage integration --- + +class TestAnalyzeStageIntegration: + def test_run_analyze_dispatches_to_pdfplumber(self): + """Verify that run_analyze routes to pdfplumber when backend='pdfplumber'.""" + from extractor.stages.contracts import PipelineOptions + + with patch("extractor.stages.analyze._run_analyze_pdfplumber") as mock_plumber: + mock_plumber.return_value = MagicMock() + from extractor.stages.analyze import run_analyze + + opts = PipelineOptions(backend="pdfplumber") + run_analyze("/tmp/test.pdf", opts) + + mock_plumber.assert_called_once_with("/tmp/test.pdf", opts) + + def test_run_analyze_still_dispatches_to_textract(self): + """Verify textract dispatch still works after pdfplumber addition.""" + from extractor.stages.contracts import PipelineOptions + + with patch("extractor.stages.analyze._run_analyze_textract") as mock_textract: + mock_textract.return_value = MagicMock() + from extractor.stages.analyze import run_analyze + + opts = PipelineOptions(backend="textract") + run_analyze("/tmp/test.pdf", opts) + + mock_textract.assert_called_once_with("/tmp/test.pdf", opts) + + def test_run_analyze_still_dispatches_to_cu(self): + """Verify CU dispatch still works (default).""" + from extractor.stages.contracts import PipelineOptions + + with patch("extractor.stages.analyze._run_analyze_cu") as mock_cu: + mock_cu.return_value = MagicMock() + from extractor.stages.analyze import run_analyze + + opts = PipelineOptions(backend="cu") + run_analyze("/tmp/test.pdf", opts) + + mock_cu.assert_called_once_with("/tmp/test.pdf", opts) diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_pipeline.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_pipeline.py new file mode 100644 index 000000000..c5b9e8b0c --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_pipeline.py @@ -0,0 +1,94 @@ +"""Integration test for the pipeline orchestrator.""" +import pytest +from unittest.mock import patch, MagicMock +from extractor.stages.contracts import PipelineOptions + + +def _mock_locator_response(): + """Minimal CU Locator response with 3 statements.""" + def _stmt(stype, title, page_s, page_e): + return {"valueObject": { + "statement_type": {"valueString": stype}, + "title_raw": {"valueString": title}, + "title_english": {"valueString": title}, + "page_start": {"valueInteger": page_s}, + "page_end": {"valueInteger": page_e}, + "company_name": {"valueString": "Test Corp"}, + "is_consolidated": {"valueBoolean": True}, + "currency": {"valueString": "USD"}, + "unit": {"valueString": "millions"}, + }} + + md_tables = ( + "Page 1 content\n" + "Consolidated Balance Sheet\n" + "
" + + "".join(f"" for i in range(20)) + + "
Item2025
Row {i}{i * 100}
\n" + "Consolidated Income Statement\n" + "" + + "".join(f"" for i in range(12)) + + "
Item2025
Rev {i}{i * 50}
\n" + "Consolidated Cash Flow Statement\n" + "" + + "".join(f"" for i in range(15)) + + "
Item2025
CF {i}{i * 30}
\n" + ) + + return { + "result": { + "contents": [{ + "fields": { + "statements": {"valueArray": [ + _stmt("balance_sheet", "Consolidated Balance Sheet", 1, 1), + _stmt("income_statement", "Consolidated Income Statement", 1, 1), + _stmt("cash_flow", "Consolidated Cash Flow Statement", 1, 1), + ]} + }, + "markdown": md_tables, + "pages": [{"pageNumber": 1, "spans": [{"offset": 0, "length": len(md_tables)}]}], + }] + } + } + + +class TestPipelineRun: + @patch("extractor.cu_client.analyze_document") + def test_basic_pipeline_run(self, mock_analyze): + mock_analyze.return_value = _mock_locator_response() + + from extractor.pipeline import run + options = PipelineOptions( + use_enrichment=False, + requested_types=["balance_sheet", "income_statement", "cash_flow"], + source_file_name="test.pdf", + ) + result = run("/tmp/test.pdf", options) + + # Basic structure checks + assert "summary" in result + assert "confidence" in result + assert len(result["summary"]) == 3 + + # At least balance_sheet should be extracted (it has 20 rows) + bs = result.get("balance_sheet") + if bs: + assert "rows" in bs + assert "columns" in bs + + @patch("extractor.cu_client.analyze_document") + def test_pipeline_preserves_output_shape(self, mock_analyze): + """Output dict must have the exact keys the HTTP layer expects.""" + mock_analyze.return_value = _mock_locator_response() + + from extractor.pipeline import run + result = run("/tmp/test.pdf", PipelineOptions(use_enrichment=False)) + + required_keys = {"summary", "balance_sheet", "income_statement", "cash_flow", "confidence"} + assert required_keys.issubset(result.keys()) + + # Summary entries have required fields + for entry in result["summary"]: + assert "statement_type" in entry + assert "status" in entry + assert "page_range" in entry diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_stage_analyze.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_stage_analyze.py new file mode 100644 index 000000000..d1c2b7cb8 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_stage_analyze.py @@ -0,0 +1,91 @@ +"""Tests for Stage 1: Analyze.""" +import pytest +from extractor.stages.contracts import CandidateStatement, PipelineOptions +from extractor.stages.analyze import parse_locator_statements + + +# --------------------------------------------------------------------------- +# Fixtures +# --------------------------------------------------------------------------- + +def _mock_locator_response(statements: list[dict]) -> dict: + """Build a minimal CU Locator response.""" + value_array = [] + for s in statements: + obj = {} + for key, val in s.items(): + if isinstance(val, bool): + obj[key] = {"valueBoolean": val} + elif isinstance(val, str): + obj[key] = {"valueString": val} + elif isinstance(val, int): + obj[key] = {"valueInteger": val} + value_array.append({"valueObject": obj}) + + return { + "result": { + "contents": [{ + "fields": { + "statements": {"valueArray": value_array} + }, + "markdown": "# Test", + "pages": [], + }] + } + } + + +# --------------------------------------------------------------------------- +# Tests +# --------------------------------------------------------------------------- + +class TestParseLocatorStatements: + def test_empty_response(self): + assert parse_locator_statements({}) == [] + assert parse_locator_statements({"result": {"contents": []}}) == [] + + def test_single_statement(self): + resp = _mock_locator_response([{ + "statement_type": "balance_sheet", + "title_raw": "Consolidated Balance Sheet", + "title_english": "Consolidated Balance Sheet", + "page_start": 45, + "page_end": 46, + "company_name": "Acme Corp", + "is_consolidated": True, + "currency": "USD", + }]) + candidates = parse_locator_statements(resp) + assert len(candidates) == 1 + c = candidates[0] + assert c.statement_type == "balance_sheet" + assert c.page_start == 45 + assert c.page_end == 46 + assert c.is_consolidated is True + assert c.currency == "USD" + + def test_multiple_statements(self): + resp = _mock_locator_response([ + {"statement_type": "balance_sheet", "page_start": 10, "page_end": 11}, + {"statement_type": "income_statement", "page_start": 12, "page_end": 13}, + {"statement_type": "cash_flow", "page_start": 14, "page_end": 16}, + ]) + candidates = parse_locator_statements(resp) + assert len(candidates) == 3 + types = [c.statement_type for c in candidates] + assert "balance_sheet" in types + assert "income_statement" in types + assert "cash_flow" in types + + def test_to_dict_round_trip(self): + resp = _mock_locator_response([{ + "statement_type": "cash_flow", + "title_raw": "Cash Flow", + "page_start": 5, + "page_end": 7, + }]) + c = parse_locator_statements(resp)[0] + d = c.to_dict() + assert d["statement_type"] == "cash_flow" + assert d["page_start"] == 5 + assert d["page_end"] == 7 diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_stage_extract.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_stage_extract.py new file mode 100644 index 000000000..bb3e860d1 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_stage_extract.py @@ -0,0 +1,142 @@ +"""Tests for Stage 3: Extract — full-markdown search + anti-contamination merging.""" +import pytest +from extractor.stages.contracts import CandidateStatement, SelectResult, ScoredCandidate +from extractor.stages.extract import ( + _find_heading_offset, + _merge_continuation_tables, + locate_table, +) + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + +def _candidate(stype, title_raw="", page_start=0, page_end=0) -> CandidateStatement: + return CandidateStatement( + statement_type=stype, + title_raw=title_raw, + page_start=page_start, + page_end=page_end, + ) + + +# --------------------------------------------------------------------------- +# _find_heading_offset (full-markdown search) +# --------------------------------------------------------------------------- + +class TestFindHeadingOffset: + def test_finds_known_heading_pattern(self): + md = " Some preamble Consolidated Balance Sheet ...
" + offset = _find_heading_offset("balance_sheet", "", md) + assert offset is not None + assert "Balance Sheet" in md[offset:offset + 50] + + def test_finds_chinese_heading(self): + md = "Preamble 合并资产负债表 data
" + offset = _find_heading_offset("balance_sheet", "Some Other Title", md) + assert offset is not None + + def test_falls_back_to_title_raw(self): + md = "Some Custom Heading For Statement data
" + offset = _find_heading_offset("balance_sheet", "Some Custom Heading For Statement", md) + assert offset is not None + + def test_returns_none_when_no_table(self): + md = "Just plain text, no tables" + offset = _find_heading_offset("balance_sheet", "Balance Sheet", md) + assert offset is None + + def test_returns_none_when_no_heading(self): + md = "Random text data
" + offset = _find_heading_offset("balance_sheet", "", md) + assert offset is None + + +# --------------------------------------------------------------------------- +# _merge_continuation_tables (content-based anti-contamination) +# --------------------------------------------------------------------------- + +class TestMergeContinuationTables: + def test_merges_adjacent_tables(self): + md = "t1
t2
" + end, count = _merge_continuation_tables( + "balance_sheet", md, 0, md.index("
") + len("
"), + ) + assert count == 2 + assert end == len(md) + + def test_stops_at_other_statement_heading(self): + md = "t1
INCOME STATEMENT t2
" + first_end = md.index("") + len("") + end, count = _merge_continuation_tables( + "balance_sheet", md, 0, first_end, + ) + assert count == 1 + assert end == first_end + + def test_stops_at_note_heading(self): + md = "t1
Notes to the Financial Statements t2
" + first_end = md.index("") + len("") + end, count = _merge_continuation_tables( + "balance_sheet", md, 0, first_end, + ) + assert count == 1 + + def test_stops_at_parent_company_marker(self): + md = "t1
母公司 t2
" + first_end = md.index("") + len("") + end, count = _merge_continuation_tables( + "cash_flow", md, 0, first_end, + ) + assert count == 1 + + def test_stops_at_max_continuation(self): + md = "t1
t2
t3
t4
" + first_end = md.index("") + len("") + end, count = _merge_continuation_tables( + "balance_sheet", md, 0, first_end, + ) + assert count == 3 # max is 3 + + def test_stops_on_non_continuation_labels(self): + md = 't1
revenue100
' + first_end = md.index("") + len("") + end, count = _merge_continuation_tables( + "balance_sheet", md, 0, first_end, + ) + assert count == 1 # "revenue" doesn't belong in a balance sheet + + +# --------------------------------------------------------------------------- +# locate_table integration +# --------------------------------------------------------------------------- + +class TestLocateTable: + def test_finds_table_with_heading(self): + md = "Preamble\nConsolidated Balance Sheet\n
Assets
" + c = _candidate("balance_sheet", "Consolidated Balance Sheet", 1, 1) + result = locate_table("balance_sheet", c, md, []) + assert result is not None + assert result["md_offset"] >= 0 + assert result["md_end_offset"] > result["md_offset"] + + def test_returns_none_when_not_found(self): + md = "No financial data here at all." + c = _candidate("balance_sheet", "Something", 1, 1) + result = locate_table("balance_sheet", c, md, []) + assert result is None + + def test_finds_table_even_with_wrong_page_range(self): + """The key test: locator says page 1, but real table is later in markdown.""" + md = ( + "Page 1 summary table
Summary
\n" + "... lots of other content ...\n" + "合并资产负债表\n" + "
Total Assets1000
" + ) + c = _candidate("balance_sheet", "Balance Sheet", 1, 1) + result = locate_table("balance_sheet", c, md, []) + assert result is not None + # Should find the real balance sheet heading, not the summary table + assert "合并资产负债表" in md[result["md_offset"]:result["md_offset"] + 50] diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_stage_select.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_stage_select.py new file mode 100644 index 000000000..6227cec07 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_stage_select.py @@ -0,0 +1,174 @@ +"""Tests for Stage 2: Select — scoring-based statement selection.""" +import pytest +from extractor.stages.contracts import ( + AnalyzeResult, + CandidateStatement, +) +from extractor.stages.select import score_statement, run_select + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + +def _candidate( + stype="balance_sheet", + title_raw="", + title_english="", + page_start=0, + page_end=0, + is_consolidated=None, +) -> CandidateStatement: + return CandidateStatement( + statement_type=stype, + title_raw=title_raw, + title_english=title_english, + page_start=page_start, + page_end=page_end, + is_consolidated=is_consolidated, + ) + + +def _analyze_result(candidates: list[CandidateStatement]) -> AnalyzeResult: + return AnalyzeResult( + candidates=candidates, + markdown="", + pages=[], + page_map=[], + ) + + +# --------------------------------------------------------------------------- +# Score tests +# --------------------------------------------------------------------------- + +class TestScoreStatement: + def test_ghost_match_returns_negative_1000(self): + c = _candidate(page_start=0, page_end=0, title_raw="Balance Sheet") + assert score_statement(c, "balance_sheet") == -1000 + + def test_empty_title_returns_negative_500(self): + c = _candidate(page_start=10, page_end=11) + assert score_statement(c, "balance_sheet") == -500 + + def test_consolidated_flag_adds_20(self): + c = _candidate( + title_raw="Balance Sheet", + page_start=10, page_end=11, + is_consolidated=True, + ) + score_with = score_statement(c, "balance_sheet") + c2 = _candidate( + title_raw="Balance Sheet", + page_start=10, page_end=11, + is_consolidated=False, + ) + score_without = score_statement(c2, "balance_sheet") + assert score_with - score_without == 20 + + def test_primary_heading_pattern_adds_50(self): + c = _candidate( + title_raw="Consolidated Statement of Financial Position", + page_start=85, page_end=86, + is_consolidated=True, + ) + score = score_statement(c, "balance_sheet") + # Should get: +50 (pattern) +20 (is_consolidated) +15 (consolidated kw) +10 (type kw) +5 (page>=80) = 100 + assert score >= 95 + + def test_non_primary_pattern_subtracts_100(self): + c = _candidate( + title_raw="Note D.5 Balance Sheet Details", + page_start=90, page_end=91, + ) + score = score_statement(c, "balance_sheet") + assert score < -50 + + def test_parent_company_disqualified(self): + c = _candidate( + title_raw="母公司资产负债表", + page_start=50, page_end=51, + ) + score = score_statement(c, "balance_sheet") + assert score < -50 + + def test_chinese_consolidated_scores_high(self): + c = _candidate( + title_raw="合并资产负债表", + page_start=45, page_end=46, + is_consolidated=True, + ) + score = score_statement(c, "balance_sheet") + assert score > 50 + + def test_temporal_marker_adds_30(self): + c = _candidate( + title_raw="Balance Sheet for the year ended December 31, 2025", + page_start=40, page_end=41, + ) + score = score_statement(c, "balance_sheet") + assert score >= 30 + + def test_incomplete_page_range_penalty(self): + c = _candidate( + title_raw="Cash Flow Statement", + page_start=10, page_end=0, + ) + score = score_statement(c, "cash_flow") + assert score < score_statement( + _candidate(title_raw="Cash Flow Statement", page_start=10, page_end=12), + "cash_flow", + ) + + +# --------------------------------------------------------------------------- +# Selection tests +# --------------------------------------------------------------------------- + +class TestRunSelect: + def test_selects_highest_scoring_candidate(self): + candidates = [ + _candidate("balance_sheet", "Note D.5 Details", "", 90, 91), + _candidate("balance_sheet", "Consolidated Balance Sheet", "", 85, 86, True), + ] + result = run_select(_analyze_result(candidates), ["balance_sheet"]) + assert "balance_sheet" in result.selected + assert result.selected["balance_sheet"].title_raw == "Consolidated Balance Sheet" + + def test_rejects_all_below_threshold(self): + candidates = [ + _candidate("balance_sheet", "Note D.5", "", 0, 0), + ] + result = run_select(_analyze_result(candidates), ["balance_sheet"]) + assert "balance_sheet" not in result.selected + assert len(result.rejected.get("balance_sheet", [])) == 1 + + def test_selects_one_per_type(self): + candidates = [ + _candidate("balance_sheet", "Consolidated Balance Sheet", "", 85, 86, True), + _candidate("income_statement", "Consolidated Income Statement", "", 87, 88, True), + _candidate("cash_flow", "Consolidated Cash Flow", "", 89, 91, True), + ] + result = run_select( + _analyze_result(candidates), + ["balance_sheet", "income_statement", "cash_flow"], + ) + assert len(result.selected) == 3 + + def test_empty_candidates_for_type(self): + result = run_select( + _analyze_result([]), + ["balance_sheet"], + ) + assert "balance_sheet" not in result.selected + assert result.scores.get("balance_sheet") == [] + + def test_consolidated_beats_parent(self): + """Xiamen ITG scenario: consolidated should beat parent company.""" + candidates = [ + _candidate("cash_flow", "母公司现金流量表", "", 55, 57), + _candidate("cash_flow", "合并现金流量表", "", 50, 53, True), + ] + result = run_select(_analyze_result(candidates), ["cash_flow"]) + assert "cash_flow" in result.selected + assert "合并" in result.selected["cash_flow"].title_raw diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_stage_validate.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_stage_validate.py new file mode 100644 index 000000000..b5ad5048f --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_stage_validate.py @@ -0,0 +1,162 @@ +"""Tests for Stage 5: Validate — structural validation gates.""" +import pytest +from extractor.stages.contracts import ( + EnrichedStatement, + EnrichResult, + QualityStatus, +) +from extractor.stages.validate import ( + _check_required_anchors, + _check_min_row_count, + _check_value_density, + _check_balance_equation, + _check_cross_statement, + _check_period_currency, + run_validate, +) + + +def _row(canonical_key="cash", row_type="line_item", values=None, section="current_assets"): + if values is None: + values = [{"raw": "100", "normalized": 100.0, "is_null": False}] + return { + "row_index": 0, + "canonical_key": canonical_key, + "row_type": row_type, + "section": section, + "label_raw": canonical_key, + "values": values, + } + + +def _doc(rows=None, statement_type="balance_sheet", currency="USD", warnings=None): + return { + "rows": rows or [], + "columns": [{"column_index": 0, "label": "2025"}], + "validation": {"warnings": warnings or []}, + "statement_metadata": {"currency": currency, "statement_type": statement_type}, + } + + +class TestRequiredAnchors: + def test_balance_sheet_all_present(self): + doc = _doc([_row("total_assets", "total"), _row("total_liabilities", "total")]) + check = _check_required_anchors(doc, "balance_sheet") + assert check.passed + assert check.score == 1.0 + + def test_balance_sheet_missing_one(self): + doc = _doc([_row("total_assets", "total")]) + check = _check_required_anchors(doc, "balance_sheet") + assert not check.passed + assert check.score == 0.5 + + def test_income_statement_revenue_present(self): + doc = _doc([_row("revenue")]) + check = _check_required_anchors(doc, "income_statement") + assert check.passed + + +class TestMinRowCount: + def test_balance_sheet_below_minimum(self): + doc = _doc([_row(f"row_{i}") for i in range(4)]) + check = _check_min_row_count(doc, "balance_sheet") + assert not check.passed + assert check.score < 1.0 + assert "4 rows" in check.details + + def test_balance_sheet_at_minimum(self): + doc = _doc([_row(f"row_{i}") for i in range(15)]) + check = _check_min_row_count(doc, "balance_sheet") + assert check.passed + + def test_income_statement_minimum_8(self): + doc = _doc([_row(f"row_{i}") for i in range(8)]) + check = _check_min_row_count(doc, "income_statement") + assert check.passed + + def test_cash_flow_minimum_12(self): + doc = _doc([_row(f"row_{i}") for i in range(11)]) + check = _check_min_row_count(doc, "cash_flow") + assert not check.passed + + +class TestValueDensity: + def test_all_have_values(self): + doc = _doc([_row(f"row_{i}") for i in range(10)]) + check = _check_value_density(doc) + assert check.passed + assert check.score == 1.0 + + def test_below_60_percent(self): + rows = [_row(f"row_{i}") for i in range(4)] + rows += [_row(f"empty_{i}", values=[{"raw": None, "normalized": None, "is_null": True}]) for i in range(8)] + doc = _doc(rows) + check = _check_value_density(doc) + assert not check.passed + + def test_exactly_60_percent(self): + rows = [_row(f"row_{i}") for i in range(6)] + rows += [_row(f"empty_{i}", values=[{"raw": None, "normalized": None, "is_null": True}]) for i in range(4)] + doc = _doc(rows) + check = _check_value_density(doc) + assert check.passed + + +class TestBalanceEquation: + def test_balanced(self): + doc = _doc([ + _row("total_assets", "total", [{"raw": "1000", "normalized": 1000.0, "is_null": False}]), + _row("total_liabilities_and_equity", "total", [{"raw": "1000", "normalized": 1000.0, "is_null": False}]), + ]) + check = _check_balance_equation(doc, "balance_sheet") + assert check.passed + + def test_imbalanced(self): + doc = _doc([ + _row("total_assets", "total", [{"raw": "1000", "normalized": 1000.0, "is_null": False}]), + _row("total_liabilities_and_equity", "total", [{"raw": "500", "normalized": 500.0, "is_null": False}]), + ]) + check = _check_balance_equation(doc, "balance_sheet") + assert not check.passed + + def test_non_balance_sheet_skipped(self): + check = _check_balance_equation(_doc(), "income_statement") + assert check.passed + + +class TestCrossStatement: + def test_matching_net_income(self): + is_doc = _doc([_row("net_income", values=[{"raw": "100", "normalized": 100.0, "is_null": False}])]) + cf_doc = _doc([_row("net_income", values=[{"raw": "100", "normalized": 100.0, "is_null": False}])]) + all_docs = {"income_statement": is_doc, "cash_flow": cf_doc} + check = _check_cross_statement(is_doc, "income_statement", all_docs) + assert check.passed + + def test_mismatched_net_income(self): + is_doc = _doc([_row("net_income", values=[{"raw": "100", "normalized": 100.0, "is_null": False}])]) + cf_doc = _doc([_row("net_income", values=[{"raw": "500", "normalized": 500.0, "is_null": False}])]) + all_docs = {"income_statement": is_doc, "cash_flow": cf_doc} + check = _check_cross_statement(is_doc, "income_statement", all_docs) + assert not check.passed + + +class TestRunValidate: + def test_global_reach_rejected(self): + """Global Reach scenario: 4-row fragment should be rejected.""" + doc = _doc([_row(f"row_{i}") for i in range(4)]) + enrich_result = EnrichResult(statements={"balance_sheet": EnrichedStatement("balance_sheet", doc)}) + result = run_validate(enrich_result) + vs = result.statements["balance_sheet"] + assert vs.status == QualityStatus.REJECTED or vs.quality_score < 0.70 + + def test_healthy_statement_accepted(self): + """A well-formed statement should score high.""" + rows = [_row(f"item_{i}") for i in range(20)] + rows.append(_row("total_assets", "total")) + rows.append(_row("total_liabilities", "total")) + doc = _doc(rows, currency="USD") + enrich_result = EnrichResult(statements={"balance_sheet": EnrichedStatement("balance_sheet", doc)}) + result = run_validate(enrich_result) + vs = result.statements["balance_sheet"] + assert vs.quality_score >= 0.70 diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_statement_detector.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_statement_detector.py new file mode 100644 index 000000000..0db485fd4 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_statement_detector.py @@ -0,0 +1,645 @@ +""" +Tests for extractor/statement_detector.py — covers: + Fix 1 (this round): $ sign token merging in parser + Fix 2 (this round): US GAAP section terminators + Fix 3 (this round): Two-tier heading detection (exact + keyword) + Fix 5: Dynamic column count detection + Fix 6: Cross-validate totals + Fix 7: Regex precedence in _SECTION_TOTAL_RE +""" + +import re + +from extractor.statement_detector import ( + _parse_plain_text_table, + _SECTION_TOTAL_RE, + _NOTES_TERMINATOR_RE, + _VALUE_RE, + _score_candidate, + _KEYWORD_SCORES, + _MIN_KEYWORD_SCORE, + _tier1_heading_offsets, + _tier2_keyword_offsets, + _all_heading_offsets, + _has_data_nearby, + validate_totals, + _parse_financial_value, + HEADINGS, +) + + +# =========================================================================== +# Fix 7: _SECTION_TOTAL_RE regex precedence +# =========================================================================== + +class TestSectionTotalRegex: + """Verify anchored regex branches match correctly after the fix.""" + + def test_matches_net_cash(self): + assert _SECTION_TOTAL_RE.search("Net cash flows from operating activities") + + def test_matches_label_ending_in_activities(self): + assert _SECTION_TOTAL_RE.search("Net cash flows from investing activities") + + def test_matches_cash_equivalents(self): + assert _SECTION_TOTAL_RE.search("Cash and cash equivalents at end of period") + + def test_matches_financing_activities(self): + assert _SECTION_TOTAL_RE.search("Net cash flows from financing activities") + + def test_does_not_match_mid_label_activities(self): + """The word 'activities' in the middle of a label should not match.""" + assert not _SECTION_TOTAL_RE.search("Income from activities related to banking") + + def test_matches_net_increase_decrease(self): + assert _SECTION_TOTAL_RE.search("Net increase (decrease) in cash and cash equivalents") + + +# =========================================================================== +# Fix 1 (this round): $ sign token merging +# =========================================================================== + +class TestCurrencyTokenMerging: + """Parser merges standalone $ tokens with the following numeric value.""" + + def test_dollar_newline_value_merged(self): + """'$\\n35,873' is normalised to '35,873' and parsed as a value.""" + section = ( + "Balance Sheet\n\n" + "December 31, 2025\n\n" + "Cash and cash equivalents\n\n" + "$\n35,873\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + # Should have one row with "Cash and cash equivalents" and value "35,873" + assert len(rows) >= 1 + assert rows[0] == "Cash and cash equivalents" + val_cells = [c for c in cells if c["row"] == 0 and c["col"] > 0] + assert len(val_cells) == 1 + assert val_cells[0]["content"] == "35,873" + + def test_standalone_dollar_followed_by_value(self): + """A standalone '$' token followed by a numeric token is merged.""" + section = ( + "Income Statement\n\n" + "December 31, 2025\n\n" + "Revenue\n\n" + "$\n\n" + "59,893\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + assert len(rows) >= 1 + assert rows[0] == "Revenue" + val_cells = [c for c in cells if c["row"] == 0 and c["col"] > 0] + assert len(val_cells) == 1 + assert val_cells[0]["content"] == "59,893" + + def test_multiple_dollar_values_in_row(self): + """Multiple $-prefixed values in a row are all parsed correctly.""" + section = ( + "Cash Flow\n\n" + "December 31, 2025\n\n" + "December 31, 2024\n\n" + "Net income\n\n" + "$\n22,768\n\n" + "$\n20,838\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + assert rows[0] == "Net income" + val_cells = [c for c in cells if c["row"] == 0 and c["col"] > 0] + assert len(val_cells) == 2 + assert val_cells[0]["content"] == "22,768" + assert val_cells[1]["content"] == "20,838" + + def test_value_re_matches_dollar_prefixed(self): + """_VALUE_RE accepts values with optional leading $.""" + assert _VALUE_RE.match("$35,873") + assert _VALUE_RE.match("$ 35,873") + assert _VALUE_RE.match("$(42,391)") + assert _VALUE_RE.match("35,873") + assert _VALUE_RE.match("(42,391)") + + def test_non_dollar_values_unaffected(self): + """Regular numeric tokens without $ are still parsed normally.""" + section = ( + "Balance Sheet\n\n" + "December 31, 2025\n\n" + "Marketable securities\n\n" + "45,719\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + assert rows[0] == "Marketable securities" + val_cells = [c for c in cells if c["row"] == 0 and c["col"] > 0] + assert val_cells[0]["content"] == "45,719" + + +class TestParseFinancialValueCurrency: + """_parse_financial_value strips currency symbols before parsing.""" + + def test_dollar_prefix(self): + assert _parse_financial_value("$35,873") == 35873.0 + + def test_dollar_negative_parens(self): + assert _parse_financial_value("$(42,391)") == -42391.0 + + def test_yen_prefix(self): + assert _parse_financial_value("¥1,000") == 1000.0 + + def test_euro_prefix(self): + assert _parse_financial_value("€500") == 500.0 + + +# =========================================================================== +# Fix 2 (this round): US GAAP section terminators +# =========================================================================== + +class TestNotesTerminatorRegex: + """_NOTES_TERMINATOR_RE catches both IFRS and US GAAP section boundaries.""" + + def test_notes_to(self): + assert _NOTES_TERMINATOR_RE.search("[Notes to the Financial Statements]") + + def test_comprehensive_income(self): + assert _NOTES_TERMINATOR_RE.search("Consolidated Statement of Comprehensive Income") + + def test_changes_in_equity(self): + assert _NOTES_TERMINATOR_RE.search("Statement of Changes in Stockholders' Equity") + + def test_segment_results(self): + assert _NOTES_TERMINATOR_RE.search("Segment Results") + + def test_segment_information(self): + assert _NOTES_TERMINATOR_RE.search("Segment Information") + + def test_reconciliation_of_gaap(self): + assert _NOTES_TERMINATOR_RE.search("Reconciliation of GAAP to Non-GAAP Results") + + def test_supplemental_is_not_terminator(self): + """'Supplemental cash flow data' is part of cash flow, NOT a terminator.""" + assert not _NOTES_TERMINATOR_RE.search("Supplemental cash flow data") + + def test_regular_label_no_match(self): + """Normal financial labels should NOT trigger a terminator match.""" + assert not _NOTES_TERMINATOR_RE.search("Total current assets") + assert not _NOTES_TERMINATOR_RE.search("Net cash from operating activities") + + +# =========================================================================== +# Fix 3 (this round): Two-tier heading detection +# =========================================================================== + +class TestScoreCandidate: + """_score_candidate sums keyword weights for matching terms.""" + + def test_balance_sheet_keywords(self): + keywords = _KEYWORD_SCORES["balance_sheet"] + score = _score_candidate("Consolidated Balance Sheet", keywords) + assert score >= 3.0 # "balance sheet" = 3.0 + + def test_cash_flow_keywords(self): + keywords = _KEYWORD_SCORES["cash_flow"] + score = _score_candidate("Statement of Cash Flows", keywords) + assert score >= 3.0 # "cash flows" = 3.0 + + def test_income_statement_keywords(self): + keywords = _KEYWORD_SCORES["income_statement"] + score = _score_candidate("Consolidated Statement of Income", keywords) + assert score >= 3.0 # "statement of income" = 3.0 + + def test_narrative_text_low_score(self): + """Narrative text mentioning one keyword should score below threshold.""" + keywords = _KEYWORD_SCORES["balance_sheet"] + score = _score_candidate("The company reported equity growth this quarter", keywords) + assert score < _MIN_KEYWORD_SCORE + + def test_nonstandard_heading_high_score(self): + """A non-standard heading with enough keywords should score high.""" + keywords = _KEYWORD_SCORES["balance_sheet"] + score = _score_candidate("Statement of Assets and Liabilities", keywords) + # "assets" (2.0) + "liabilities" (2.0) = 4.0 + assert score >= _MIN_KEYWORD_SCORE + + +class TestTier1HeadingOffsets: + """Tier 1 exact heading matching.""" + + def test_finds_standard_heading(self): + markdown = "Some intro text\n\n123,456\n\nConsolidated Balance Sheet\n\n100,000\n\n200,000" + offsets = _tier1_heading_offsets(markdown, HEADINGS["balance_sheet"]) + assert len(offsets) == 1 + + def test_no_match_without_data(self): + """A heading without nearby numeric data is rejected (likely TOC).""" + markdown = "Table of Contents\n\nConsolidated Balance Sheet\n\nSee next page for details" + offsets = _tier1_heading_offsets(markdown, HEADINGS["balance_sheet"]) + assert offsets == [] + + def test_priority_order(self): + """More specific heading is preferred over shorter fallback.""" + markdown = ( + "Condensed Consolidated Balance Sheets\n\n100,000\n\n" + "Balance Sheet\n\n200,000" + ) + offsets = _tier1_heading_offsets(markdown, HEADINGS["balance_sheet"]) + assert len(offsets) == 1 + # Should match the more specific "Condensed Consolidated Balance Sheets" + assert markdown[offsets[0]:].startswith("Condensed") + + +class TestTier2KeywordOffsets: + """Tier 2 keyword-based semantic scoring for non-standard headings.""" + + def test_finds_nonstandard_balance_sheet_heading(self): + """A non-standard heading with enough keywords is detected.""" + markdown = ( + "Some narrative text about the company.\n\n" + "Statement of Assets and Liabilities and Equity\n\n" + "100,000\n\n200,000\n\n" + ) + offsets = _tier2_keyword_offsets(markdown, "balance_sheet") + assert len(offsets) >= 1 + + def test_ignores_narrative_with_few_keywords(self): + """Narrative text with only one keyword should not match.""" + markdown = ( + "The company owns significant assets in multiple countries.\n\n" + "This paragraph discusses risk factors.\n\n" + ) + offsets = _tier2_keyword_offsets(markdown, "balance_sheet") + assert offsets == [] + + +class TestAllHeadingOffsets: + """_all_heading_offsets integrates both tiers.""" + + def test_tier1_preferred_when_available(self): + """When Tier 1 matches, Tier 2 is not needed.""" + markdown = "Consolidated Balance Sheet\n\n100,000\n\n200,000" + offsets = _all_heading_offsets(markdown, HEADINGS["balance_sheet"], "balance_sheet") + assert len(offsets) >= 1 + + def test_tier2_fallback_when_tier1_fails(self): + """When no exact heading matches, Tier 2 keyword scoring kicks in.""" + markdown = ( + "Some preamble text.\n\n" + "Unaudited Statement of Assets and Liabilities and Equity\n\n" + "100,000\n\n200,000\n\n300,000\n\n" + ) + # Tier 1 won't match "Unaudited Statement of Assets and Liabilities" + offsets = _all_heading_offsets(markdown, HEADINGS["balance_sheet"], "balance_sheet") + assert len(offsets) >= 1 + + +# =========================================================================== +# Fix 5: Dynamic column count +# =========================================================================== + +class TestDynamicColumnCount: + """_parse_plain_text_table derives column count from header tokens.""" + + def test_two_period_headers(self): + """Standard two-period report gets max_value_cols=2.""" + section = ( + "Consolidated Balance Sheet\n\n" + "Note Fiscal Year ended December 31, 2023\n\n" + "Fiscal Year ended December 31, 2024\n\n" + "Total assets\n\n" + "1,000,000\n\n" + "2,000,000\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + assert len(columns) == 2 + val_cells = [c for c in cells if c["row"] == 0 and c["col"] > 0] + assert len(val_cells) == 2 + + def test_three_period_headers(self): + """Three-period comparative statement should capture all three values. + + Header tokens must contain a standalone year (e.g. '2022') for + _HEADER_TOKEN_RE to recognise them — 'FY2022' would fail the \\b + word-boundary check. + """ + section = ( + "Income Statement\n\n" + "Fiscal Year ended 2022\n\n" + "Fiscal Year ended 2023\n\n" + "Fiscal Year ended 2024\n\n" + "Revenue\n\n" + "100\n\n" + "200\n\n" + "300\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + assert len(columns) == 3 + val_cells = [c for c in cells if c["row"] == 0 and c["col"] > 0] + assert len(val_cells) == 3 + assert val_cells[0]["content"] == "100" + assert val_cells[1]["content"] == "200" + assert val_cells[2]["content"] == "300" + + def test_fallback_when_no_year_headers(self): + """When no year-like header is found, falls back to 2 columns.""" + section = ( + "Balance Sheet\n\n" + "Label\n\n" + "Amount\n\n" + "Revenue\n\n" + "100\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + assert len(rows) >= 1 + + def test_four_period_meta_style(self): + """Four-period report (Q4 + FY for two years) captures all four values.""" + section = ( + "Income Statement\n\n" + "Three Months Ended December 31,\n\n" + "Twelve Months Ended December 31,\n\n" + "2025\n\n" + "2024\n\n" + "2025\n\n" + "2024\n\n" + "Revenue\n\n" + "59,893\n\n" + "48,385\n\n" + "200,966\n\n" + "164,501\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + val_cells = [c for c in cells if c["row"] == 0 and c["col"] > 0] + assert len(val_cells) == 4 + + +# =========================================================================== +# Per-cell currency detection +# =========================================================================== + +class TestPerCellCurrency: + """Parser tracks currency symbols and attaches them to value cells.""" + + def test_dollar_currency_on_value_cells(self): + """Values from '$\\n35,873' tokens get currency='USD'.""" + section = ( + "Balance Sheet\n\n" + "December 31, 2025\n\n" + "Cash and cash equivalents\n\n" + "$\n35,873\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + val_cells = [c for c in cells if c["row"] == 0 and c["col"] > 0] + assert len(val_cells) == 1 + assert val_cells[0]["currency"] == "USD" + + def test_no_currency_on_plain_values(self): + """Values without a $ prefix get currency=None.""" + section = ( + "Balance Sheet\n\n" + "December 31, 2025\n\n" + "Marketable securities\n\n" + "45,719\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + val_cells = [c for c in cells if c["row"] == 0 and c["col"] > 0] + assert val_cells[0]["currency"] is None + + def test_label_cells_have_no_currency(self): + """Label cells (col=0) always have currency=None.""" + section = ( + "Balance Sheet\n\n" + "December 31, 2025\n\n" + "Revenue\n\n" + "$\n59,893\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + label_cell = [c for c in cells if c["row"] == 0 and c["col"] == 0][0] + assert label_cell["currency"] is None + + def test_mixed_currency_and_plain_in_same_statement(self): + """Some rows have $, others don't — currency tracked per-cell.""" + section = ( + "Income Statement\n\n" + "December 31, 2025\n\n" + "Revenue\n\n" + "$\n59,893\n\n" + "Cost of revenue\n\n" + "10,905\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + revenue_val = [c for c in cells if c["row"] == 0 and c["col"] == 1][0] + cost_val = [c for c in cells if c["row"] == 1 and c["col"] == 1][0] + assert revenue_val["currency"] == "USD" + assert cost_val["currency"] is None + + +# =========================================================================== +# Row type and indent level metadata +# =========================================================================== + +class TestRowTypeAndIndent: + """Parser classifies rows and assigns indent levels.""" + + def test_section_header_detected(self): + """A label with no values is classified as section_header.""" + section = ( + "Income Statement\n\n" + "December 31, 2025\n\n" + "Costs and expenses:\n\n" + "Cost of revenue\n\n" + "10,905\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + header_cell = [c for c in cells if c["content"] == "Costs and expenses:"][0] + assert header_cell["row_type"] == "section_header" + + def test_line_item_detected(self): + """A regular data row is classified as line_item.""" + section = ( + "Income Statement\n\n" + "December 31, 2025\n\n" + "Revenue\n\n" + "59,893\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + revenue_cell = [c for c in cells if c["content"] == "Revenue"][0] + assert revenue_cell["row_type"] == "line_item" + + def test_subtotal_detected(self): + """'Total costs and expenses' is classified as subtotal.""" + section = ( + "Income Statement\n\n" + "December 31, 2025\n\n" + "Total costs and expenses\n\n" + "35,148\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + total_cell = [c for c in cells if c["content"] == "Total costs and expenses"][0] + assert total_cell["row_type"] == "subtotal" + + def test_grand_total_detected(self): + """'Total assets' is classified as total (grand total).""" + section = ( + "Balance Sheet\n\n" + "December 31, 2025\n\n" + "Total assets\n\n" + "366,021\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + total_cell = [c for c in cells if c["content"] == "Total assets"][0] + assert total_cell["row_type"] == "total" + + def test_indent_level_within_section(self): + """Line items between a section_header and subtotal get indent_level=1.""" + section = ( + "Income Statement\n\n" + "December 31, 2025\n\n" + "Costs and expenses:\n\n" + "Cost of revenue\n\n" + "10,905\n\n" + "Research and development\n\n" + "17,136\n\n" + "Total costs and expenses\n\n" + "28,041\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + cost_cell = [c for c in cells if c["content"] == "Cost of revenue" and c["col"] == 0][0] + rd_cell = [c for c in cells if c["content"] == "Research and development" and c["col"] == 0][0] + total_cell = [c for c in cells if c["content"] == "Total costs and expenses" and c["col"] == 0][0] + + assert cost_cell["indent_level"] == 1 + assert rd_cell["indent_level"] == 1 + assert total_cell["indent_level"] == 0 # subtotal is not indented + + def test_top_level_items_not_indented(self): + """Items outside a section header block stay at indent_level=0.""" + section = ( + "Income Statement\n\n" + "December 31, 2025\n\n" + "Revenue\n\n" + "59,893\n\n" + "Income from operations\n\n" + "24,745\n\n" + ) + rows, columns, cells = _parse_plain_text_table(section) + rev_cell = [c for c in cells if c["content"] == "Revenue" and c["col"] == 0][0] + inc_cell = [c for c in cells if c["content"] == "Income from operations" and c["col"] == 0][0] + assert rev_cell["indent_level"] == 0 + assert inc_cell["indent_level"] == 0 + + +# =========================================================================== +# Fix 6: Cross-validate totals +# =========================================================================== + +class TestParseFinancialValue: + """_parse_financial_value handles all common financial number formats.""" + + def test_plain_integer(self): + assert _parse_financial_value("12345") == 12345.0 + + def test_comma_separated(self): + assert _parse_financial_value("1,234,567") == 1234567.0 + + def test_parenthesised_negative(self): + assert _parse_financial_value("(42,391)") == -42391.0 + + def test_negative_sign(self): + assert _parse_financial_value("-217,741") == -217741.0 + + def test_decimal(self): + assert _parse_financial_value("12.34") == 12.34 + + def test_nil_dash(self): + assert _parse_financial_value("-") == 0.0 + + def test_empty_string(self): + assert _parse_financial_value("") is None + + def test_non_numeric(self): + assert _parse_financial_value("hello") is None + + +class TestValidateTotals: + """validate_totals uses hierarchy-aware logic to check totals.""" + + def _make_cells(self, data): + """Build cells from [(row, label, vals, row_type, indent_level)]. + + Short form: (row, label, vals) defaults to line_item/indent=1. + """ + cells = [] + for entry in data: + row_idx, label, vals = entry[0], entry[1], entry[2] + row_type = entry[3] if len(entry) > 3 else "line_item" + indent = entry[4] if len(entry) > 4 else (1 if row_type == "line_item" else 0) + cells.append({ + "row": row_idx, "col": 0, "content": label, + "kind": "content", "row_type": row_type, "indent_level": indent, + }) + for vi, val in enumerate(vals): + cells.append({ + "row": row_idx, "col": vi + 1, "content": val, + "kind": "content", "row_type": row_type, "indent_level": indent, + }) + rows = [d[1] for d in data] + return rows, cells + + def test_subtotal_sums_indented_children(self): + """Subtotal should equal sum of indented line_items in its section.""" + rows, cells = self._make_cells([ + (0, "Current assets:", [], "section_header", 0), + (1, "Cash", ["100", "200"], "line_item", 1), + (2, "Securities", ["30", "50"], "line_item", 1), + (3, "Total current assets", ["130", "250"], "subtotal", 0), + ]) + warnings = validate_totals(rows, cells) + assert warnings == [] + + def test_subtotal_mismatch_warns(self): + """Subtotal that doesn't match its children triggers a warning.""" + rows, cells = self._make_cells([ + (0, "Current assets:", [], "section_header", 0), + (1, "Cash", ["100", "200"], "line_item", 1), + (2, "Securities", ["30", "50"], "line_item", 1), + (3, "Total current assets", ["999", "250"], "subtotal", 0), + ]) + warnings = validate_totals(rows, cells) + assert len(warnings) >= 1 + assert warnings[0]["col"] == 1 + + def test_total_sums_subtotals_not_children(self): + """Grand total sums subtotals + ungrouped items, not individual children.""" + rows, cells = self._make_cells([ + (0, "Current assets:", [], "section_header", 0), + (1, "Cash", ["100"], "line_item", 1), + (2, "Securities", ["50"], "line_item", 1), + (3, "Total current assets", ["150"], "subtotal", 0), + (4, "Goodwill", ["50"], "line_item", 0), # ungrouped + (5, "Total assets", ["200"], "total", 0), # = 150 + 50 + ]) + warnings = validate_totals(rows, cells) + assert warnings == [] + + def test_negative_values_handled(self): + """Parenthesised negatives are parsed and summed correctly.""" + rows, cells = self._make_cells([ + (0, "Section:", [], "section_header", 0), + (1, "Income", ["500"], "line_item", 1), + (2, "Loss", ["(200)"], "line_item", 1), + (3, "Total current assets", ["300"], "subtotal", 0), + ]) + warnings = validate_totals(rows, cells) + assert warnings == [] + + def test_empty_cells_no_crash(self): + """Empty cells list returns no warnings and does not crash.""" + warnings = validate_totals([], []) + assert warnings == [] + + def test_no_totals_found(self): + """When no total rows exist, returns empty list.""" + rows, cells = self._make_cells([ + (0, "Revenue", ["100"], "line_item", 1), + (1, "Cost", ["50"], "line_item", 1), + ]) + warnings = validate_totals(rows, cells) + assert warnings == [] diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_textract_adapter.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_textract_adapter.py new file mode 100644 index 000000000..b83712f2d --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_textract_adapter.py @@ -0,0 +1,177 @@ +""" +tests/test_textract_adapter.py +------------------------------- +Unit tests for extractor/textract_adapter.py. + +Covers the four block-conversion functions (no LLM tests): + - _build_block_index + - _table_to_html + - reconstruct_markdown + - build_page_map + +Import strategy: uses importlib to load textract_adapter directly, bypassing +extractor/__init__.py (which eagerly imports azure_cu_client and requires +AZURE_CU_ENDPOINT). This adapter has no module-level extractor imports. +""" + +import importlib +import importlib.util +import json +import pathlib + +import pytest + + +# --------------------------------------------------------------------------- +# Module-level import: load textract_adapter without going through __init__ +# --------------------------------------------------------------------------- + +def _load_adapter(): + """Import extractor.textract_adapter bypassing extractor/__init__.py.""" + spec = importlib.util.spec_from_file_location( + "extractor.textract_adapter", + pathlib.Path(__file__).parent.parent / "extractor" / "textract_adapter.py", + ) + mod = importlib.util.module_from_spec(spec) + spec.loader.exec_module(mod) + return mod + + +adapter = _load_adapter() + + +# --------------------------------------------------------------------------- +# Fixtures +# --------------------------------------------------------------------------- + +FIXTURE_PATH = ( + pathlib.Path(__file__).parent / "fixtures" / "textract_sample_response.json" +) + + +@pytest.fixture(scope="module") +def textract_response() -> dict: + with open(FIXTURE_PATH, "r", encoding="utf-8") as f: + return json.load(f) + + +@pytest.fixture(scope="module") +def blocks(textract_response) -> list[dict]: + return textract_response["Blocks"] + + +# ── TestBlockIndex ─────────────────────────────────────────────────────────── + +class TestBlockIndex: + def test_builds_block_lookup_by_id(self, blocks): + index = adapter._build_block_index(blocks) + + # TABLE block is indexed + assert "block-table-001" in index + assert index["block-table-001"]["BlockType"] == "TABLE" + + # CELL blocks are indexed + assert "block-cell-r1c1" in index + assert index["block-cell-r1c1"]["BlockType"] == "CELL" + + # WORD blocks are indexed + assert "block-word-2024" in index + assert index["block-word-2024"]["Text"] == "2024" + + # Every block is indexed + assert len(index) == len(blocks) + + +# ── TestTableToHtml ────────────────────────────────────────────────────────── + +class TestTableToHtml: + def test_converts_table_block_to_html_string(self, blocks): + index = adapter._build_block_index(blocks) + table_block = index["block-table-001"] + + html = adapter._table_to_html(table_block, index) + + assert "" in html + assert "
" in html + # Column headers present + assert "2024" in html + assert "2023" in html + # Data values present + assert "Total Assets" in html + assert "125,435" in html + + def test_header_cells_use_th_tag(self, blocks): + index = adapter._build_block_index(blocks) + table_block = index["block-table-001"] + + html = adapter._table_to_html(table_block, index) + + # Header row cells should use tags + assert "" in html or " + assert "2024" in html + assert "2023" in html + + def test_data_cells_use_td_tag(self, blocks): + index = adapter._build_block_index(blocks) + table_block = index["block-table-001"] + + html = adapter._table_to_html(table_block, index) + + # Data row values should use tags + assert "125,435" in html + assert "118,290" in html + + +# ── TestReconstructMarkdown ────────────────────────────────────────────────── + +class TestReconstructMarkdown: + def test_produces_markdown_with_embedded_tables(self, blocks): + md = adapter.reconstruct_markdown(blocks) + + # Heading text appears + assert "CONSOLIDATED BALANCE SHEET" in md + # Table HTML is embedded + assert "" in md + assert "
" in md + + def test_lines_appear_before_their_table(self, blocks): + md = adapter.reconstruct_markdown(blocks) + + heading_pos = md.index("CONSOLIDATED BALANCE SHEET") + table_pos = md.index("") + + assert heading_pos < table_pos, ( + f"Heading at {heading_pos} should appear before table at {table_pos}" + ) + + def test_page_markers_are_present(self, blocks): + md = adapter.reconstruct_markdown(blocks) + + # At least one page marker for page 1 + assert "" in md + + +# ── TestBuildPageMap ───────────────────────────────────────────────────────── + +class TestBuildPageMap: + def test_builds_page_map_from_reconstructed_markdown(self, blocks): + md = adapter.reconstruct_markdown(blocks) + page_map = adapter.build_page_map(blocks, md) + + assert len(page_map) >= 1, "Should have at least one page entry" + + for start, end, page_num in page_map: + assert start < end, f"start ({start}) should be < end ({end})" + assert page_num >= 1, f"page_num ({page_num}) should be >= 1" + + def test_page_map_covers_full_markdown(self, blocks): + md = adapter.reconstruct_markdown(blocks) + page_map = adapter.build_page_map(blocks, md) + + # The last entry's end should be the end of the markdown + if page_map: + last_end = page_map[-1][1] + assert last_end == len(md), ( + f"Last page_map end ({last_end}) should equal markdown length ({len(md)})" + ) diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_textract_client.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_textract_client.py new file mode 100644 index 000000000..0267c9e35 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_textract_client.py @@ -0,0 +1,250 @@ +""" +tests/test_textract_client.py +------------------------------ +Unit tests for extractor.textract_client using mocked boto3. + +All AWS calls are intercepted — no real AWS credentials needed. + +Import strategy: we import textract_client via importlib to avoid triggering +extractor/__init__.py (which eagerly imports azure_cu_client requiring +AZURE_CU_ENDPOINT). The module is isolated and patched directly. +""" + +import importlib +import importlib.util +import json +import os +import sys +import pytest +from pathlib import Path +from unittest.mock import MagicMock, patch + + +# --------------------------------------------------------------------------- +# Module-level import: load textract_client without going through extractor/__init__ +# --------------------------------------------------------------------------- + +def _load_textract_client(): + """Import extractor.textract_client bypassing extractor/__init__.py eager imports.""" + spec = importlib.util.spec_from_file_location( + "extractor.textract_client", + Path(__file__).parent.parent / "extractor" / "textract_client.py", + ) + mod = importlib.util.module_from_spec(spec) + spec.loader.exec_module(mod) + return mod + + +textract_client = _load_textract_client() + + +# --------------------------------------------------------------------------- +# Fixtures +# --------------------------------------------------------------------------- + +FIXTURE_PATH = Path(__file__).parent / "fixtures" / "textract_sample_response.json" + + +@pytest.fixture() +def sample_response(): + """Load the realistic Textract response fixture.""" + with open(FIXTURE_PATH) as f: + return json.load(f) + + +@pytest.fixture() +def env_vars(monkeypatch): + """Set required environment variables for all tests.""" + monkeypatch.setenv("AWS_S3_BUCKET", "test-bucket") + monkeypatch.setenv("AWS_S3_PREFIX", "textract-input") + monkeypatch.setenv("AWS_REGION", "us-east-1") + + +def _make_boto3_mock(s3_mock, textract_mock): + """Return a boto3 module mock whose .client() dispatches by service name.""" + boto3_mock = MagicMock() + + def client_side_effect(service_name, **kwargs): + if service_name == "s3": + return s3_mock + if service_name == "textract": + return textract_mock + raise ValueError(f"Unexpected boto3 service: {service_name}") + + boto3_mock.client.side_effect = client_side_effect + return boto3_mock + + +# --------------------------------------------------------------------------- +# Tests +# --------------------------------------------------------------------------- + + +class TestAnalyzeDocument: + + def test_uploads_pdf_to_s3_and_starts_analysis(self, env_vars, sample_response, tmp_path): + """S3 upload_file is called with correct bucket/key; result contains Blocks.""" + # Arrange + fake_pdf = tmp_path / "report.pdf" + fake_pdf.write_bytes(b"%PDF-1.4 fake") + + s3_mock = MagicMock() + textract_mock = MagicMock() + textract_mock.start_document_analysis.return_value = {"JobId": "job-abc"} + textract_mock.get_document_analysis.return_value = sample_response + boto3_mock = _make_boto3_mock(s3_mock, textract_mock) + + # Act — patch boto3 on the already-loaded module object + with patch.object(textract_client, "boto3", boto3_mock): + result = textract_client.analyze_document(str(fake_pdf)) + + # Assert: S3 upload called with correct kwargs + s3_mock.upload_file.assert_called_once() + upload_call_kwargs = s3_mock.upload_file.call_args + assert upload_call_kwargs.kwargs["Bucket"] == "test-bucket" + assert upload_call_kwargs.kwargs["Filename"] == str(fake_pdf) + assert upload_call_kwargs.kwargs["Key"].startswith("textract-input/") + assert upload_call_kwargs.kwargs["Key"].endswith(".pdf") + + # Assert: Textract start called with correct kwargs + textract_mock.start_document_analysis.assert_called_once() + start_call_kwargs = textract_mock.start_document_analysis.call_args.kwargs + assert start_call_kwargs["FeatureTypes"] == ["TABLES"] + assert start_call_kwargs["DocumentLocation"]["S3Object"]["Bucket"] == "test-bucket" + + # Assert: result contains Blocks + assert "Blocks" in result + assert len(result["Blocks"]) > 0 + + def test_polls_until_succeeded(self, env_vars, tmp_path): + """get_document_analysis is called 3 times: IN_PROGRESS x2, then SUCCEEDED.""" + fake_pdf = tmp_path / "report.pdf" + fake_pdf.write_bytes(b"%PDF-1.4 fake") + + s3_mock = MagicMock() + textract_mock = MagicMock() + textract_mock.start_document_analysis.return_value = {"JobId": "job-poll-test"} + + # IN_PROGRESS, IN_PROGRESS, SUCCEEDED + textract_mock.get_document_analysis.side_effect = [ + {"JobStatus": "IN_PROGRESS"}, + {"JobStatus": "IN_PROGRESS"}, + {"JobStatus": "SUCCEEDED", "Blocks": [{"Id": "b1", "BlockType": "PAGE"}]}, + ] + boto3_mock = _make_boto3_mock(s3_mock, textract_mock) + + with patch.object(textract_client, "boto3", boto3_mock): + with patch.object(textract_client, "time") as mock_time: + # Make time.time() always return 0 so the timeout never triggers + mock_time.time.return_value = 0 + mock_time.sleep = MagicMock() + result = textract_client.analyze_document(str(fake_pdf)) + + # 3 calls to get_document_analysis during polling + assert textract_mock.get_document_analysis.call_count == 3 + assert result["Blocks"] == [{"Id": "b1", "BlockType": "PAGE"}] + + def test_raises_on_failed_job(self, env_vars, tmp_path): + """RuntimeError is raised with 'Textract job failed' when status is FAILED.""" + fake_pdf = tmp_path / "report.pdf" + fake_pdf.write_bytes(b"%PDF-1.4 fake") + + s3_mock = MagicMock() + textract_mock = MagicMock() + textract_mock.start_document_analysis.return_value = {"JobId": "job-fail-test"} + textract_mock.get_document_analysis.return_value = { + "JobStatus": "FAILED", + "StatusMessage": "Unsupported document type", + } + boto3_mock = _make_boto3_mock(s3_mock, textract_mock) + + with patch.object(textract_client, "boto3", boto3_mock): + with patch.object(textract_client, "time") as mock_time: + mock_time.time.return_value = 0 + mock_time.sleep = MagicMock() + with pytest.raises(RuntimeError, match="Textract job failed"): + textract_client.analyze_document(str(fake_pdf)) + + def test_cleans_up_s3_after_success(self, env_vars, sample_response, tmp_path): + """delete_object is called after a successful analysis run.""" + fake_pdf = tmp_path / "report.pdf" + fake_pdf.write_bytes(b"%PDF-1.4 fake") + + s3_mock = MagicMock() + textract_mock = MagicMock() + textract_mock.start_document_analysis.return_value = {"JobId": "job-cleanup-ok"} + textract_mock.get_document_analysis.return_value = sample_response + boto3_mock = _make_boto3_mock(s3_mock, textract_mock) + + with patch.object(textract_client, "boto3", boto3_mock): + textract_client.analyze_document(str(fake_pdf)) + + # delete_object must have been called once + s3_mock.delete_object.assert_called_once() + delete_kwargs = s3_mock.delete_object.call_args.kwargs + assert delete_kwargs["Bucket"] == "test-bucket" + assert delete_kwargs["Key"].startswith("textract-input/") + + def test_cleans_up_s3_even_on_failure(self, env_vars, tmp_path): + """delete_object is called even when Textract fails (finally block).""" + fake_pdf = tmp_path / "report.pdf" + fake_pdf.write_bytes(b"%PDF-1.4 fake") + + s3_mock = MagicMock() + textract_mock = MagicMock() + textract_mock.start_document_analysis.return_value = {"JobId": "job-cleanup-fail"} + textract_mock.get_document_analysis.return_value = { + "JobStatus": "FAILED", + "StatusMessage": "Access denied", + } + boto3_mock = _make_boto3_mock(s3_mock, textract_mock) + + with patch.object(textract_client, "boto3", boto3_mock): + with patch.object(textract_client, "time") as mock_time: + mock_time.time.return_value = 0 + mock_time.sleep = MagicMock() + with pytest.raises(RuntimeError): + textract_client.analyze_document(str(fake_pdf)) + + # S3 cleanup must still have been called + s3_mock.delete_object.assert_called_once() + + def test_handles_pagination(self, env_vars, tmp_path): + """Blocks from multiple pages are merged into a single list.""" + fake_pdf = tmp_path / "report.pdf" + fake_pdf.write_bytes(b"%PDF-1.4 fake") + + page1_blocks = [{"Id": "b1", "BlockType": "PAGE"}, {"Id": "b2", "BlockType": "TABLE"}] + page2_blocks = [{"Id": "b3", "BlockType": "CELL"}, {"Id": "b4", "BlockType": "WORD"}] + + s3_mock = MagicMock() + textract_mock = MagicMock() + textract_mock.start_document_analysis.return_value = {"JobId": "job-paginate"} + + # First get_document_analysis call (during polling): SUCCEEDED with NextToken + # Second call (pagination loop): more blocks, no NextToken + textract_mock.get_document_analysis.side_effect = [ + { + "JobStatus": "SUCCEEDED", + "Blocks": page1_blocks, + "NextToken": "token-page-2", + }, + { + "JobStatus": "SUCCEEDED", + "Blocks": page2_blocks, + # No NextToken -- this is the last page + }, + ] + boto3_mock = _make_boto3_mock(s3_mock, textract_mock) + + with patch.object(textract_client, "boto3", boto3_mock): + result = textract_client.analyze_document(str(fake_pdf)) + + # All blocks from both pages must be present + assert len(result["Blocks"]) == 4 + block_ids = [b["Id"] for b in result["Blocks"]] + assert block_ids == ["b1", "b2", "b3", "b4"] + + # Pagination call must pass NextToken + second_call_kwargs = textract_mock.get_document_analysis.call_args_list[1].kwargs + assert second_call_kwargs.get("NextToken") == "token-page-2" diff --git a/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_textract_integration.py b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_textract_integration.py new file mode 100644 index 000000000..082f14f7b --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/azure-functions/tests/test_textract_integration.py @@ -0,0 +1,202 @@ +""" +tests/test_textract_integration.py +------------------------------------ +End-to-end integration test for the Textract backend path through the full +extraction pipeline (Stages 1-5). + +All external calls are mocked: + - extractor.textract_client.analyze_document → returns textract_sample_response.json + - extractor.textract_adapter.classify_statements_with_llm → returns a balance_sheet classification + +A dummy AZURE_CU_ENDPOINT env var is monkeypatched so that importing the +extractor package (which eagerly loads azure_cu_client) does not fail. +""" + +import json +import os +import sys +import tempfile +from pathlib import Path +from unittest.mock import patch, MagicMock + +import pytest + + +# --------------------------------------------------------------------------- +# Path helpers +# --------------------------------------------------------------------------- + +TESTS_DIR = Path(__file__).parent +EXTRACTOR_DIR = TESTS_DIR.parent / "extractor" +FIXTURE_PATH = TESTS_DIR / "fixtures" / "textract_sample_response.json" + + +# --------------------------------------------------------------------------- +# Fixtures +# --------------------------------------------------------------------------- + +@pytest.fixture(autouse=True) +def set_azure_cu_endpoint(monkeypatch): + """Set a dummy AZURE_CU_ENDPOINT so extractor/__init__.py imports cleanly.""" + monkeypatch.setenv("AZURE_CU_ENDPOINT", "https://dummy.cognitiveservices.azure.com/") + + +@pytest.fixture() +def textract_response() -> dict: + """Load the realistic Textract response fixture.""" + with open(FIXTURE_PATH) as f: + return json.load(f) + + +@pytest.fixture() +def llm_classification(): + """Minimal LLM classification return for a balance sheet.""" + return [ + { + "statement_type": "balance_sheet", + "title_raw": "CONSOLIDATED BALANCE SHEET", + "currency": "USD", + "unit": "millions", + "accounting_standard": "US_GAAP", + "is_consolidated": True, + "report_language": "en", + "company_name": "Test Corp", + } + ] + + +@pytest.fixture() +def fake_pdf(tmp_path) -> str: + """Create a minimal PDF-like file for the pipeline to receive.""" + p = tmp_path / "test_report.pdf" + p.write_bytes(b"%PDF-1.4 fake content for testing") + return str(p) + + +# --------------------------------------------------------------------------- +# Tests +# --------------------------------------------------------------------------- + +class TestPipelineOptionsDefaults: + """Verify PipelineOptions defaults independently of the extractor package.""" + + def test_backend_defaults_to_cu(self, monkeypatch): + """PipelineOptions.backend should default to 'cu'.""" + # Import after monkeypatching the env var (autouse fixture handles it) + from extractor.stages.contracts import PipelineOptions + + opts = PipelineOptions() + assert opts.backend == "cu" + + def test_backend_can_be_set_to_textract(self): + """PipelineOptions.backend can be explicitly set to 'textract'.""" + from extractor.stages.contracts import PipelineOptions + + opts = PipelineOptions(backend="textract") + assert opts.backend == "textract" + + +class TestTextractEndToEnd: + """Full pipeline run through the Textract backend path (all external calls mocked).""" + + def test_pipeline_returns_summary_key( + self, fake_pdf, textract_response, llm_classification + ): + """Output dict must contain a 'summary' key with a list.""" + from extractor.pipeline import run + from extractor.stages.contracts import PipelineOptions + + options = PipelineOptions( + backend="textract", + use_enrichment=False, + requested_types=["balance_sheet"], + ) + + with patch("extractor.textract_client.analyze_document", return_value=textract_response), \ + patch("extractor.textract_adapter.classify_statements_with_llm", return_value=llm_classification): + result = run(fake_pdf, options) + + assert "summary" in result, "Pipeline output must contain 'summary' key" + assert isinstance(result["summary"], list), "'summary' must be a list" + + def test_pipeline_returns_balance_sheet_key( + self, fake_pdf, textract_response, llm_classification + ): + """Output dict must contain a 'balance_sheet' key when requested_types includes it.""" + from extractor.pipeline import run + from extractor.stages.contracts import PipelineOptions + + options = PipelineOptions( + backend="textract", + use_enrichment=False, + requested_types=["balance_sheet"], + ) + + with patch("extractor.textract_client.analyze_document", return_value=textract_response), \ + patch("extractor.textract_adapter.classify_statements_with_llm", return_value=llm_classification): + result = run(fake_pdf, options) + + assert "balance_sheet" in result, "Pipeline output must contain 'balance_sheet' key" + + def test_textract_analyze_document_is_called( + self, fake_pdf, textract_response, llm_classification + ): + """extractor.textract_client.analyze_document must be called exactly once.""" + from extractor.pipeline import run + from extractor.stages.contracts import PipelineOptions + + options = PipelineOptions( + backend="textract", + use_enrichment=False, + requested_types=["balance_sheet"], + ) + + with patch("extractor.textract_client.analyze_document", return_value=textract_response) as mock_analyze, \ + patch("extractor.textract_adapter.classify_statements_with_llm", return_value=llm_classification): + run(fake_pdf, options) + + mock_analyze.assert_called_once_with(fake_pdf) + + def test_llm_classification_is_called( + self, fake_pdf, textract_response, llm_classification + ): + """classify_statements_with_llm must be called once with the reconstructed markdown.""" + from extractor.pipeline import run + from extractor.stages.contracts import PipelineOptions + + options = PipelineOptions( + backend="textract", + use_enrichment=False, + requested_types=["balance_sheet"], + ) + + with patch("extractor.textract_client.analyze_document", return_value=textract_response), \ + patch("extractor.textract_adapter.classify_statements_with_llm", return_value=llm_classification) as mock_classify: + run(fake_pdf, options) + + mock_classify.assert_called_once() + # The argument must be a non-empty string (the reconstructed markdown) + call_args = mock_classify.call_args + markdown_arg = call_args.args[0] if call_args.args else call_args.kwargs.get("markdown", "") + assert isinstance(markdown_arg, str) + assert len(markdown_arg) > 0, "Markdown passed to LLM classifier must not be empty" + + def test_cu_backend_is_not_called_for_textract( + self, fake_pdf, textract_response, llm_classification + ): + """cu_client.analyze_document must NOT be called when backend='textract'.""" + from extractor.pipeline import run + from extractor.stages.contracts import PipelineOptions + + options = PipelineOptions( + backend="textract", + use_enrichment=False, + requested_types=["balance_sheet"], + ) + + with patch("extractor.textract_client.analyze_document", return_value=textract_response), \ + patch("extractor.textract_adapter.classify_statements_with_llm", return_value=llm_classification), \ + patch("extractor.cu_client.analyze_document") as mock_cu: + run(fake_pdf, options) + + mock_cu.assert_not_called() diff --git a/samples/mcs-finance-statement-agent/src/code-app/components.json b/samples/mcs-finance-statement-agent/src/code-app/components.json new file mode 100644 index 000000000..2b0833f09 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/components.json @@ -0,0 +1,22 @@ +{ + "$schema": "https://ui.shadcn.com/schema.json", + "style": "new-york", + "rsc": false, + "tsx": true, + "tailwind": { + "config": "", + "css": "src/index.css", + "baseColor": "neutral", + "cssVariables": true, + "prefix": "" + }, + "iconLibrary": "lucide", + "aliases": { + "components": "@/components", + "utils": "@/lib/utils", + "ui": "@/components/ui", + "lib": "@/lib", + "hooks": "@/hooks" + }, + "registries": {} +} diff --git a/samples/mcs-finance-statement-agent/src/code-app/eslint.config.js b/samples/mcs-finance-statement-agent/src/code-app/eslint.config.js new file mode 100644 index 000000000..b19330b10 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/eslint.config.js @@ -0,0 +1,23 @@ +import js from '@eslint/js' +import globals from 'globals' +import reactHooks from 'eslint-plugin-react-hooks' +import reactRefresh from 'eslint-plugin-react-refresh' +import tseslint from 'typescript-eslint' +import { defineConfig, globalIgnores } from 'eslint/config' + +export default defineConfig([ + globalIgnores(['dist']), + { + files: ['**/*.{ts,tsx}'], + extends: [ + js.configs.recommended, + tseslint.configs.recommended, + reactHooks.configs['recommended-latest'], + reactRefresh.configs.vite, + ], + languageOptions: { + ecmaVersion: 2020, + globals: globals.browser, + }, + }, +]) diff --git a/samples/mcs-finance-statement-agent/src/code-app/index.html b/samples/mcs-finance-statement-agent/src/code-app/index.html new file mode 100644 index 000000000..88e7d2d5e --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/index.html @@ -0,0 +1,13 @@ + + + + + + + Power Apps + + +
+ + + diff --git a/samples/mcs-finance-statement-agent/src/code-app/package-lock.json b/samples/mcs-finance-statement-agent/src/code-app/package-lock.json new file mode 100644 index 000000000..3a00a9438 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/package-lock.json @@ -0,0 +1,6626 @@ +{ + "name": "power-apps-template-starter", + "version": "0.0.0", + "lockfileVersion": 3, + "requires": true, + "packages": { + "": { + "name": "power-apps-template-starter", + "version": "0.0.0", + "dependencies": { + "@microsoft/power-apps": "^1.0.3", + "@radix-ui/react-checkbox": "^1.3.3", + "@radix-ui/react-dialog": "^1.1.15", + "@radix-ui/react-dropdown-menu": "^2.1.16", + "@radix-ui/react-label": "^2.1.7", + "@radix-ui/react-popover": "^1.1.15", + "@radix-ui/react-progress": "^1.1.7", + "@radix-ui/react-select": "^2.2.6", + "@radix-ui/react-separator": "^1.1.7", + "@radix-ui/react-slot": "^1.2.3", + "@radix-ui/react-tabs": "^1.1.13", + "@radix-ui/react-tooltip": "^1.2.8", + "@tailwindcss/vite": "^4.1.16", + "@tanstack/react-query": "^5.90.5", + "@tanstack/react-table": "^8.21.3", + "class-variance-authority": "^0.7.1", + "clsx": "^2.1.1", + "cmdk": "^1.1.1", + "date-fns": "^4.1.0", + "lucide-react": "^0.546.0", + "react": "^19.1.1", + "react-day-picker": "^9.11.1", + "react-dom": "^19.1.1", + "react-router-dom": "^7.9.4", + "recharts": "^2.15.4", + "sonner": "^2.0.7", + "tailwind-merge": "^3.3.1", + "tailwindcss": "^4.1.16", + "zustand": "^5.0.10" + }, + "devDependencies": { + "@eslint/js": "^9.36.0", + "@microsoft/power-apps-vite": "^1.0.2", + "@types/node": "^24.6.0", + "@types/react": "^19.1.16", + "@types/react-dom": "^19.1.9", + "@vitejs/plugin-react": "^5.0.4", + "eslint": "^9.36.0", + "eslint-plugin-react-hooks": "^5.2.0", + "eslint-plugin-react-refresh": "^0.4.22", + "globals": "^16.4.0", + "picocolors": "^1.1.1", + "tw-animate-css": "^1.4.0", + "typescript": "~5.9.3", + "typescript-eslint": "^8.45.0", + "vite": "^7.1.7" + } + }, + "node_modules/@azure/msal-common": { + "version": "15.8.1", + "resolved": "https://registry.npmjs.org/@azure/msal-common/-/msal-common-15.8.1.tgz", + "integrity": "sha512-ltIlFK5VxeJ5BurE25OsJIfcx1Q3H/IZg2LjV9d4vmH+5t4c1UCyRQ/HgKLgXuCZShs7qfc/TC95GYZfsUsJUQ==", + "license": "MIT", + "engines": { + "node": ">=0.8.0" + } + }, + "node_modules/@azure/msal-node": { + "version": "3.6.3", + "resolved": "https://registry.npmjs.org/@azure/msal-node/-/msal-node-3.6.3.tgz", + "integrity": "sha512-95wjsKGyUcAd5tFmQBo5Ug/kOj+hFh/8FsXuxluEvdfbgg6xCimhSP9qnyq6+xIg78/jREkBD1/BSqd7NIDDYQ==", + "license": "MIT", + "dependencies": { + "@azure/msal-common": "15.8.1", + "jsonwebtoken": "^9.0.0", + "uuid": "^8.3.0" + }, + "engines": { + "node": ">=16" + } + }, + "node_modules/@azure/msal-node-extensions": { + "version": "1.5.17", + "resolved": "https://registry.npmjs.org/@azure/msal-node-extensions/-/msal-node-extensions-1.5.17.tgz", + "integrity": "sha512-Td9EgSAdgJrU19+iLXiiqx/vV7jgJV8L78ewmaJa5qakeh1jLTecFpwIFb84H0Tl9oGfzFqQIprPL4DOWIRR3A==", + "hasInstallScript": true, + "license": "MIT", + "dependencies": { + "@azure/msal-common": "15.8.1", + "@azure/msal-node-runtime": "^0.18.1", + "keytar": "^7.8.0" + }, + "engines": { + "node": ">=16" + } + }, + "node_modules/@azure/msal-node-runtime": { + "version": "0.18.2", + "resolved": "https://registry.npmjs.org/@azure/msal-node-runtime/-/msal-node-runtime-0.18.2.tgz", + "integrity": "sha512-v45fyBQp80BrjZAeGJXl+qggHcbylQiFBihr0ijO2eniDCW9tz5TZBKYsqzH06VuiRaVG/Sa0Hcn4pjhJqFSTw==", + "hasInstallScript": true, + "license": "MIT" + }, + "node_modules/@babel/code-frame": { + "version": "7.29.0", + "resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.29.0.tgz", + "integrity": "sha512-9NhCeYjq9+3uxgdtp20LSiJXJvN0FeCtNGpJxuMFZ1Kv3cWUNb6DOhJwUvcVCzKGR66cw4njwM6hrJLqgOwbcw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/helper-validator-identifier": "^7.28.5", + "js-tokens": "^4.0.0", + "picocolors": "^1.1.1" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/compat-data": { + "version": "7.29.0", + "resolved": "https://registry.npmjs.org/@babel/compat-data/-/compat-data-7.29.0.tgz", + "integrity": "sha512-T1NCJqT/j9+cn8fvkt7jtwbLBfLC/1y1c7NtCeXFRgzGTsafi68MRv8yzkYSapBnFA6L3U2VSc02ciDzoAJhJg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/core": { + "version": "7.29.0", + "resolved": "https://registry.npmjs.org/@babel/core/-/core-7.29.0.tgz", + "integrity": "sha512-CGOfOJqWjg2qW/Mb6zNsDm+u5vFQ8DxXfbM09z69p5Z6+mE1ikP2jUXw+j42Pf1XTYED2Rni5f95npYeuwMDQA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/code-frame": "^7.29.0", + "@babel/generator": "^7.29.0", + "@babel/helper-compilation-targets": "^7.28.6", + "@babel/helper-module-transforms": "^7.28.6", + "@babel/helpers": "^7.28.6", + "@babel/parser": "^7.29.0", + "@babel/template": "^7.28.6", + "@babel/traverse": "^7.29.0", + "@babel/types": "^7.29.0", + "@jridgewell/remapping": "^2.3.5", + "convert-source-map": "^2.0.0", + "debug": "^4.1.0", + "gensync": "^1.0.0-beta.2", + "json5": "^2.2.3", + "semver": "^6.3.1" + }, + "engines": { + "node": ">=6.9.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/babel" + } + }, + "node_modules/@babel/generator": { + "version": "7.29.1", + "resolved": "https://registry.npmjs.org/@babel/generator/-/generator-7.29.1.tgz", + "integrity": "sha512-qsaF+9Qcm2Qv8SRIMMscAvG4O3lJ0F1GuMo5HR/Bp02LopNgnZBC/EkbevHFeGs4ls/oPz9v+Bsmzbkbe+0dUw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/parser": "^7.29.0", + "@babel/types": "^7.29.0", + "@jridgewell/gen-mapping": "^0.3.12", + "@jridgewell/trace-mapping": "^0.3.28", + "jsesc": "^3.0.2" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-compilation-targets": { + "version": "7.28.6", + "resolved": "https://registry.npmjs.org/@babel/helper-compilation-targets/-/helper-compilation-targets-7.28.6.tgz", + "integrity": "sha512-JYtls3hqi15fcx5GaSNL7SCTJ2MNmjrkHXg4FSpOA/grxK8KwyZ5bubHsCq8FXCkua6xhuaaBit+3b7+VZRfcA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/compat-data": "^7.28.6", + "@babel/helper-validator-option": "^7.27.1", + "browserslist": "^4.24.0", + "lru-cache": "^5.1.1", + "semver": "^6.3.1" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-globals": { + "version": "7.28.0", + "resolved": "https://registry.npmjs.org/@babel/helper-globals/-/helper-globals-7.28.0.tgz", + "integrity": "sha512-+W6cISkXFa1jXsDEdYA8HeevQT/FULhxzR99pxphltZcVaugps53THCeiWA8SguxxpSp3gKPiuYfSWopkLQ4hw==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-module-imports": { + "version": "7.28.6", + "resolved": "https://registry.npmjs.org/@babel/helper-module-imports/-/helper-module-imports-7.28.6.tgz", + "integrity": "sha512-l5XkZK7r7wa9LucGw9LwZyyCUscb4x37JWTPz7swwFE/0FMQAGpiWUZn8u9DzkSBWEcK25jmvubfpw2dnAMdbw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/traverse": "^7.28.6", + "@babel/types": "^7.28.6" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-module-transforms": { + "version": "7.28.6", + "resolved": "https://registry.npmjs.org/@babel/helper-module-transforms/-/helper-module-transforms-7.28.6.tgz", + "integrity": "sha512-67oXFAYr2cDLDVGLXTEABjdBJZ6drElUSI7WKp70NrpyISso3plG9SAGEF6y7zbha/wOzUByWWTJvEDVNIUGcA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/helper-module-imports": "^7.28.6", + "@babel/helper-validator-identifier": "^7.28.5", + "@babel/traverse": "^7.28.6" + }, + "engines": { + "node": ">=6.9.0" + }, + "peerDependencies": { + "@babel/core": "^7.0.0" + } + }, + "node_modules/@babel/helper-plugin-utils": { + "version": "7.28.6", + "resolved": "https://registry.npmjs.org/@babel/helper-plugin-utils/-/helper-plugin-utils-7.28.6.tgz", + "integrity": "sha512-S9gzZ/bz83GRysI7gAD4wPT/AI3uCnY+9xn+Mx/KPs2JwHJIz1W8PZkg2cqyt3RNOBM8ejcXhV6y8Og7ly/Dug==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-string-parser": { + "version": "7.27.1", + "resolved": "https://registry.npmjs.org/@babel/helper-string-parser/-/helper-string-parser-7.27.1.tgz", + "integrity": "sha512-qMlSxKbpRlAridDExk92nSobyDdpPijUq2DW6oDnUqd0iOGxmQjyqhMIihI9+zv4LPyZdRje2cavWPbCbWm3eA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-validator-identifier": { + "version": "7.28.5", + "resolved": "https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.28.5.tgz", + "integrity": "sha512-qSs4ifwzKJSV39ucNjsvc6WVHs6b7S03sOh2OcHF9UHfVPqWWALUsNUVzhSBiItjRZoLHx7nIarVjqKVusUZ1Q==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-validator-option": { + "version": "7.27.1", + "resolved": "https://registry.npmjs.org/@babel/helper-validator-option/-/helper-validator-option-7.27.1.tgz", + "integrity": "sha512-YvjJow9FxbhFFKDSuFnVCe2WxXk1zWc22fFePVNEaWJEu8IrZVlda6N0uHwzZrUM1il7NC9Mlp4MaJYbYd9JSg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helpers": { + "version": "7.29.2", + "resolved": "https://registry.npmjs.org/@babel/helpers/-/helpers-7.29.2.tgz", + "integrity": "sha512-HoGuUs4sCZNezVEKdVcwqmZN8GoHirLUcLaYVNBK2J0DadGtdcqgr3BCbvH8+XUo4NGjNl3VOtSjEKNzqfFgKw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/template": "^7.28.6", + "@babel/types": "^7.29.0" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/parser": { + "version": "7.29.2", + "resolved": "https://registry.npmjs.org/@babel/parser/-/parser-7.29.2.tgz", + "integrity": "sha512-4GgRzy/+fsBa72/RZVJmGKPmZu9Byn8o4MoLpmNe1m8ZfYnz5emHLQz3U4gLud6Zwl0RZIcgiLD7Uq7ySFuDLA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/types": "^7.29.0" + }, + "bin": { + "parser": "bin/babel-parser.js" + }, + "engines": { + "node": ">=6.0.0" + } + }, + "node_modules/@babel/plugin-transform-react-jsx-self": { + "version": "7.27.1", + "resolved": "https://registry.npmjs.org/@babel/plugin-transform-react-jsx-self/-/plugin-transform-react-jsx-self-7.27.1.tgz", + "integrity": "sha512-6UzkCs+ejGdZ5mFFC/OCUrv028ab2fp1znZmCZjAOBKiBK2jXD1O+BPSfX8X2qjJ75fZBMSnQn3Rq2mrBJK2mw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/helper-plugin-utils": "^7.27.1" + }, + "engines": { + "node": ">=6.9.0" + }, + "peerDependencies": { + "@babel/core": "^7.0.0-0" + } + }, + "node_modules/@babel/plugin-transform-react-jsx-source": { + "version": "7.27.1", + "resolved": "https://registry.npmjs.org/@babel/plugin-transform-react-jsx-source/-/plugin-transform-react-jsx-source-7.27.1.tgz", + "integrity": "sha512-zbwoTsBruTeKB9hSq73ha66iFeJHuaFkUbwvqElnygoNbj/jHRsSeokowZFN3CZ64IvEqcmmkVe89OPXc7ldAw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/helper-plugin-utils": "^7.27.1" + }, + "engines": { + "node": ">=6.9.0" + }, + "peerDependencies": { + "@babel/core": "^7.0.0-0" + } + }, + "node_modules/@babel/runtime": { + "version": "7.29.2", + "resolved": "https://registry.npmjs.org/@babel/runtime/-/runtime-7.29.2.tgz", + "integrity": "sha512-JiDShH45zKHWyGe4ZNVRrCjBz8Nh9TMmZG1kh4QTK8hCBTWBi8Da+i7s1fJw7/lYpM4ccepSNfqzZ/QvABBi5g==", + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/template": { + "version": "7.28.6", + "resolved": "https://registry.npmjs.org/@babel/template/-/template-7.28.6.tgz", + "integrity": "sha512-YA6Ma2KsCdGb+WC6UpBVFJGXL58MDA6oyONbjyF/+5sBgxY/dwkhLogbMT2GXXyU84/IhRw/2D1Os1B/giz+BQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/code-frame": "^7.28.6", + "@babel/parser": "^7.28.6", + "@babel/types": "^7.28.6" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/traverse": { + "version": "7.29.0", + "resolved": "https://registry.npmjs.org/@babel/traverse/-/traverse-7.29.0.tgz", + "integrity": "sha512-4HPiQr0X7+waHfyXPZpWPfWL/J7dcN1mx9gL6WdQVMbPnF3+ZhSMs8tCxN7oHddJE9fhNE7+lxdnlyemKfJRuA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/code-frame": "^7.29.0", + "@babel/generator": "^7.29.0", + "@babel/helper-globals": "^7.28.0", + "@babel/parser": "^7.29.0", + "@babel/template": "^7.28.6", + "@babel/types": "^7.29.0", + "debug": "^4.3.1" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/types": { + "version": "7.29.0", + "resolved": "https://registry.npmjs.org/@babel/types/-/types-7.29.0.tgz", + "integrity": "sha512-LwdZHpScM4Qz8Xw2iKSzS+cfglZzJGvofQICy7W7v4caru4EaAmyUuO6BGrbyQ2mYV11W0U8j5mBhd14dd3B0A==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/helper-string-parser": "^7.27.1", + "@babel/helper-validator-identifier": "^7.28.5" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@clack/core": { + "version": "0.3.5", + "resolved": "https://registry.npmjs.org/@clack/core/-/core-0.3.5.tgz", + "integrity": "sha512-5cfhQNH+1VQ2xLQlmzXMqUoiaH0lRBq9/CLW9lTyMbuKLC3+xEK01tHVvyut++mLOn5urSHmkm6I0Lg9MaJSTQ==", + "license": "MIT", + "dependencies": { + "picocolors": "^1.0.0", + "sisteransi": "^1.0.5" + } + }, + "node_modules/@clack/prompts": { + "version": "0.6.3", + "resolved": "https://registry.npmjs.org/@clack/prompts/-/prompts-0.6.3.tgz", + "integrity": "sha512-AM+kFmAHawpUQv2q9+mcB6jLKxXGjgu/r2EQjEwujgpCdzrST6BJqYw00GRn56/L/Izw5U7ImoLmy00X/r80Pw==", + "bundleDependencies": [ + "is-unicode-supported" + ], + "license": "MIT", + "dependencies": { + "@clack/core": "^0.3.2", + "is-unicode-supported": "*", + "picocolors": "^1.0.0", + "sisteransi": "^1.0.5" + } + }, + "node_modules/@clack/prompts/node_modules/is-unicode-supported": { + "version": "1.3.0", + "inBundle": true, + "license": "MIT", + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/@date-fns/tz": { + "version": "1.4.1", + "resolved": "https://registry.npmjs.org/@date-fns/tz/-/tz-1.4.1.tgz", + "integrity": "sha512-P5LUNhtbj6YfI3iJjw5EL9eUAG6OitD0W3fWQcpQjDRc/QIsL0tRNuO1PcDvPccWL1fSTXXdE1ds+l95DV/OFA==", + "license": "MIT" + }, + "node_modules/@esbuild/aix-ppc64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/aix-ppc64/-/aix-ppc64-0.27.4.tgz", + "integrity": "sha512-cQPwL2mp2nSmHHJlCyoXgHGhbEPMrEEU5xhkcy3Hs/O7nGZqEpZ2sUtLaL9MORLtDfRvVl2/3PAuEkYZH0Ty8Q==", + "cpu": [ + "ppc64" + ], + "license": "MIT", + "optional": true, + "os": [ + "aix" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/android-arm": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/android-arm/-/android-arm-0.27.4.tgz", + "integrity": "sha512-X9bUgvxiC8CHAGKYufLIHGXPJWnr0OCdR0anD2e21vdvgCI8lIfqFbnoeOz7lBjdrAGUhqLZLcQo6MLhTO2DKQ==", + "cpu": [ + "arm" + ], + "license": "MIT", + "optional": true, + "os": [ + "android" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/android-arm64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/android-arm64/-/android-arm64-0.27.4.tgz", + "integrity": "sha512-gdLscB7v75wRfu7QSm/zg6Rx29VLdy9eTr2t44sfTW7CxwAtQghZ4ZnqHk3/ogz7xao0QAgrkradbBzcqFPasw==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "android" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/android-x64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/android-x64/-/android-x64-0.27.4.tgz", + "integrity": "sha512-PzPFnBNVF292sfpfhiyiXCGSn9HZg5BcAz+ivBuSsl6Rk4ga1oEXAamhOXRFyMcjwr2DVtm40G65N3GLeH1Lvw==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "android" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/darwin-arm64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/darwin-arm64/-/darwin-arm64-0.27.4.tgz", + "integrity": "sha512-b7xaGIwdJlht8ZFCvMkpDN6uiSmnxxK56N2GDTMYPr2/gzvfdQN8rTfBsvVKmIVY/X7EM+/hJKEIbbHs9oA4tQ==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/darwin-x64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/darwin-x64/-/darwin-x64-0.27.4.tgz", + "integrity": "sha512-sR+OiKLwd15nmCdqpXMnuJ9W2kpy0KigzqScqHI3Hqwr7IXxBp3Yva+yJwoqh7rE8V77tdoheRYataNKL4QrPw==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/freebsd-arm64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/freebsd-arm64/-/freebsd-arm64-0.27.4.tgz", + "integrity": "sha512-jnfpKe+p79tCnm4GVav68A7tUFeKQwQyLgESwEAUzyxk/TJr4QdGog9sqWNcUbr/bZt/O/HXouspuQDd9JxFSw==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "freebsd" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/freebsd-x64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/freebsd-x64/-/freebsd-x64-0.27.4.tgz", + "integrity": "sha512-2kb4ceA/CpfUrIcTUl1wrP/9ad9Atrp5J94Lq69w7UwOMolPIGrfLSvAKJp0RTvkPPyn6CIWrNy13kyLikZRZQ==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "freebsd" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-arm": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/linux-arm/-/linux-arm-0.27.4.tgz", + "integrity": "sha512-aBYgcIxX/wd5n2ys0yESGeYMGF+pv6g0DhZr3G1ZG4jMfruU9Tl1i2Z+Wnj9/KjGz1lTLCcorqE2viePZqj4Eg==", + "cpu": [ + "arm" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-arm64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/linux-arm64/-/linux-arm64-0.27.4.tgz", + "integrity": "sha512-7nQOttdzVGth1iz57kxg9uCz57dxQLHWxopL6mYuYthohPKEK0vU0C3O21CcBK6KDlkYVcnDXY099HcCDXd9dA==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-ia32": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/linux-ia32/-/linux-ia32-0.27.4.tgz", + "integrity": "sha512-oPtixtAIzgvzYcKBQM/qZ3R+9TEUd1aNJQu0HhGyqtx6oS7qTpvjheIWBbes4+qu1bNlo2V4cbkISr8q6gRBFA==", + "cpu": [ + "ia32" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-loong64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/linux-loong64/-/linux-loong64-0.27.4.tgz", + "integrity": "sha512-8mL/vh8qeCoRcFH2nM8wm5uJP+ZcVYGGayMavi8GmRJjuI3g1v6Z7Ni0JJKAJW+m0EtUuARb6Lmp4hMjzCBWzA==", + "cpu": [ + "loong64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-mips64el": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/linux-mips64el/-/linux-mips64el-0.27.4.tgz", + "integrity": "sha512-1RdrWFFiiLIW7LQq9Q2NES+HiD4NyT8Itj9AUeCl0IVCA459WnPhREKgwrpaIfTOe+/2rdntisegiPWn/r/aAw==", + "cpu": [ + "mips64el" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-ppc64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/linux-ppc64/-/linux-ppc64-0.27.4.tgz", + "integrity": "sha512-tLCwNG47l3sd9lpfyx9LAGEGItCUeRCWeAx6x2Jmbav65nAwoPXfewtAdtbtit/pJFLUWOhpv0FpS6GQAmPrHA==", + "cpu": [ + "ppc64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-riscv64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/linux-riscv64/-/linux-riscv64-0.27.4.tgz", + "integrity": "sha512-BnASypppbUWyqjd1KIpU4AUBiIhVr6YlHx/cnPgqEkNoVOhHg+YiSVxM1RLfiy4t9cAulbRGTNCKOcqHrEQLIw==", + "cpu": [ + "riscv64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-s390x": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/linux-s390x/-/linux-s390x-0.27.4.tgz", + "integrity": "sha512-+eUqgb/Z7vxVLezG8bVB9SfBie89gMueS+I0xYh2tJdw3vqA/0ImZJ2ROeWwVJN59ihBeZ7Tu92dF/5dy5FttA==", + "cpu": [ + "s390x" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/linux-x64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/linux-x64/-/linux-x64-0.27.4.tgz", + "integrity": "sha512-S5qOXrKV8BQEzJPVxAwnryi2+Iq5pB40gTEIT69BQONqR7JH1EPIcQ/Uiv9mCnn05jff9umq/5nqzxlqTOg9NA==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/netbsd-arm64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/netbsd-arm64/-/netbsd-arm64-0.27.4.tgz", + "integrity": "sha512-xHT8X4sb0GS8qTqiwzHqpY00C95DPAq7nAwX35Ie/s+LO9830hrMd3oX0ZMKLvy7vsonee73x0lmcdOVXFzd6Q==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "netbsd" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/netbsd-x64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/netbsd-x64/-/netbsd-x64-0.27.4.tgz", + "integrity": "sha512-RugOvOdXfdyi5Tyv40kgQnI0byv66BFgAqjdgtAKqHoZTbTF2QqfQrFwa7cHEORJf6X2ht+l9ABLMP0dnKYsgg==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "netbsd" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/openbsd-arm64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/openbsd-arm64/-/openbsd-arm64-0.27.4.tgz", + "integrity": "sha512-2MyL3IAaTX+1/qP0O1SwskwcwCoOI4kV2IBX1xYnDDqthmq5ArrW94qSIKCAuRraMgPOmG0RDTA74mzYNQA9ow==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "openbsd" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/openbsd-x64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/openbsd-x64/-/openbsd-x64-0.27.4.tgz", + "integrity": "sha512-u8fg/jQ5aQDfsnIV6+KwLOf1CmJnfu1ShpwqdwC0uA7ZPwFws55Ngc12vBdeUdnuWoQYx/SOQLGDcdlfXhYmXQ==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "openbsd" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/openharmony-arm64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/openharmony-arm64/-/openharmony-arm64-0.27.4.tgz", + "integrity": "sha512-JkTZrl6VbyO8lDQO3yv26nNr2RM2yZzNrNHEsj9bm6dOwwu9OYN28CjzZkH57bh4w0I2F7IodpQvUAEd1mbWXg==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "openharmony" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/sunos-x64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/sunos-x64/-/sunos-x64-0.27.4.tgz", + "integrity": "sha512-/gOzgaewZJfeJTlsWhvUEmUG4tWEY2Spp5M20INYRg2ZKl9QPO3QEEgPeRtLjEWSW8FilRNacPOg8R1uaYkA6g==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "sunos" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/win32-arm64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/win32-arm64/-/win32-arm64-0.27.4.tgz", + "integrity": "sha512-Z9SExBg2y32smoDQdf1HRwHRt6vAHLXcxD2uGgO/v2jK7Y718Ix4ndsbNMU/+1Qiem9OiOdaqitioZwxivhXYg==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/win32-ia32": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/win32-ia32/-/win32-ia32-0.27.4.tgz", + "integrity": "sha512-DAyGLS0Jz5G5iixEbMHi5KdiApqHBWMGzTtMiJ72ZOLhbu/bzxgAe8Ue8CTS3n3HbIUHQz/L51yMdGMeoxXNJw==", + "cpu": [ + "ia32" + ], + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@esbuild/win32-x64": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/@esbuild/win32-x64/-/win32-x64-0.27.4.tgz", + "integrity": "sha512-+knoa0BDoeXgkNvvV1vvbZX4+hizelrkwmGJBdT17t8FNPwG2lKemmuMZlmaNQ3ws3DKKCxpb4zRZEIp3UxFCg==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">=18" + } + }, + "node_modules/@eslint-community/eslint-utils": { + "version": "4.9.1", + "resolved": "https://registry.npmjs.org/@eslint-community/eslint-utils/-/eslint-utils-4.9.1.tgz", + "integrity": "sha512-phrYmNiYppR7znFEdqgfWHXR6NCkZEK7hwWDHZUjit/2/U0r6XvkDl0SYnoM51Hq7FhCGdLDT6zxCCOY1hexsQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "eslint-visitor-keys": "^3.4.3" + }, + "engines": { + "node": "^12.22.0 || ^14.17.0 || >=16.0.0" + }, + "funding": { + "url": "https://opencollective.com/eslint" + }, + "peerDependencies": { + "eslint": "^6.0.0 || ^7.0.0 || >=8.0.0" + } + }, + "node_modules/@eslint-community/eslint-utils/node_modules/eslint-visitor-keys": { + "version": "3.4.3", + "resolved": "https://registry.npmjs.org/eslint-visitor-keys/-/eslint-visitor-keys-3.4.3.tgz", + "integrity": "sha512-wpc+LXeiyiisxPlEkUzU6svyS1frIO3Mgxj1fdy7Pm8Ygzguax2N3Fa/D/ag1WqbOprdI+uY6wMUl8/a2G+iag==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": "^12.22.0 || ^14.17.0 || >=16.0.0" + }, + "funding": { + "url": "https://opencollective.com/eslint" + } + }, + "node_modules/@eslint-community/regexpp": { + "version": "4.12.2", + "resolved": "https://registry.npmjs.org/@eslint-community/regexpp/-/regexpp-4.12.2.tgz", + "integrity": "sha512-EriSTlt5OC9/7SXkRSCAhfSxxoSUgBm33OH+IkwbdpgoqsSsUg7y3uh+IICI/Qg4BBWr3U2i39RpmycbxMq4ew==", + "dev": true, + "license": "MIT", + "engines": { + "node": "^12.0.0 || ^14.0.0 || >=16.0.0" + } + }, + "node_modules/@eslint/config-array": { + "version": "0.21.2", + "resolved": "https://registry.npmjs.org/@eslint/config-array/-/config-array-0.21.2.tgz", + "integrity": "sha512-nJl2KGTlrf9GjLimgIru+V/mzgSK0ABCDQRvxw5BjURL7WfH5uoWmizbH7QB6MmnMBd8cIC9uceWnezL1VZWWw==", + "dev": true, + "license": "Apache-2.0", + "dependencies": { + "@eslint/object-schema": "^2.1.7", + "debug": "^4.3.1", + "minimatch": "^3.1.5" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + } + }, + "node_modules/@eslint/config-helpers": { + "version": "0.4.2", + "resolved": "https://registry.npmjs.org/@eslint/config-helpers/-/config-helpers-0.4.2.tgz", + "integrity": "sha512-gBrxN88gOIf3R7ja5K9slwNayVcZgK6SOUORm2uBzTeIEfeVaIhOpCtTox3P6R7o2jLFwLFTLnC7kU/RGcYEgw==", + "dev": true, + "license": "Apache-2.0", + "dependencies": { + "@eslint/core": "^0.17.0" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + } + }, + "node_modules/@eslint/core": { + "version": "0.17.0", + "resolved": "https://registry.npmjs.org/@eslint/core/-/core-0.17.0.tgz", + "integrity": "sha512-yL/sLrpmtDaFEiUj1osRP4TI2MDz1AddJL+jZ7KSqvBuliN4xqYY54IfdN8qD8Toa6g1iloph1fxQNkjOxrrpQ==", + "dev": true, + "license": "Apache-2.0", + "dependencies": { + "@types/json-schema": "^7.0.15" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + } + }, + "node_modules/@eslint/eslintrc": { + "version": "3.3.5", + "resolved": "https://registry.npmjs.org/@eslint/eslintrc/-/eslintrc-3.3.5.tgz", + "integrity": "sha512-4IlJx0X0qftVsN5E+/vGujTRIFtwuLbNsVUe7TO6zYPDR1O6nFwvwhIKEKSrl6dZchmYBITazxKoUYOjdtjlRg==", + "dev": true, + "license": "MIT", + "dependencies": { + "ajv": "^6.14.0", + "debug": "^4.3.2", + "espree": "^10.0.1", + "globals": "^14.0.0", + "ignore": "^5.2.0", + "import-fresh": "^3.2.1", + "js-yaml": "^4.1.1", + "minimatch": "^3.1.5", + "strip-json-comments": "^3.1.1" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "url": "https://opencollective.com/eslint" + } + }, + "node_modules/@eslint/eslintrc/node_modules/globals": { + "version": "14.0.0", + "resolved": "https://registry.npmjs.org/globals/-/globals-14.0.0.tgz", + "integrity": "sha512-oahGvuMGQlPw/ivIYBjVSrWAfWLBeku5tpPE2fOPLi+WHffIWbuh2tCjhyQhTBPMf5E9jDEH4FOmTYgYwbKwtQ==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=18" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/@eslint/js": { + "version": "9.39.4", + "resolved": "https://registry.npmjs.org/@eslint/js/-/js-9.39.4.tgz", + "integrity": "sha512-nE7DEIchvtiFTwBw4Lfbu59PG+kCofhjsKaCWzxTpt4lfRjRMqG6uMBzKXuEcyXhOHoUp9riAm7/aWYGhXZ9cw==", + "dev": true, + "license": "MIT", + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "url": "https://eslint.org/donate" + } + }, + "node_modules/@eslint/object-schema": { + "version": "2.1.7", + "resolved": "https://registry.npmjs.org/@eslint/object-schema/-/object-schema-2.1.7.tgz", + "integrity": "sha512-VtAOaymWVfZcmZbp6E2mympDIHvyjXs/12LqWYjVw6qjrfF+VK+fyG33kChz3nnK+SU5/NeHOqrTEHS8sXO3OA==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + } + }, + "node_modules/@eslint/plugin-kit": { + "version": "0.4.1", + "resolved": "https://registry.npmjs.org/@eslint/plugin-kit/-/plugin-kit-0.4.1.tgz", + "integrity": "sha512-43/qtrDUokr7LJqoF2c3+RInu/t4zfrpYdoSDfYyhg52rwLV6TnOvdG4fXm7IkSB3wErkcmJS9iEhjVtOSEjjA==", + "dev": true, + "license": "Apache-2.0", + "dependencies": { + "@eslint/core": "^0.17.0", + "levn": "^0.4.1" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + } + }, + "node_modules/@floating-ui/core": { + "version": "1.7.5", + "resolved": "https://registry.npmjs.org/@floating-ui/core/-/core-1.7.5.tgz", + "integrity": "sha512-1Ih4WTWyw0+lKyFMcBHGbb5U5FtuHJuujoyyr5zTaWS5EYMeT6Jb2AuDeftsCsEuchO+mM2ij5+q9crhydzLhQ==", + "license": "MIT", + "dependencies": { + "@floating-ui/utils": "^0.2.11" + } + }, + "node_modules/@floating-ui/dom": { + "version": "1.7.6", + "resolved": "https://registry.npmjs.org/@floating-ui/dom/-/dom-1.7.6.tgz", + "integrity": "sha512-9gZSAI5XM36880PPMm//9dfiEngYoC6Am2izES1FF406YFsjvyBMmeJ2g4SAju3xWwtuynNRFL2s9hgxpLI5SQ==", + "license": "MIT", + "dependencies": { + "@floating-ui/core": "^1.7.5", + "@floating-ui/utils": "^0.2.11" + } + }, + "node_modules/@floating-ui/react-dom": { + "version": "2.1.8", + "resolved": "https://registry.npmjs.org/@floating-ui/react-dom/-/react-dom-2.1.8.tgz", + "integrity": "sha512-cC52bHwM/n/CxS87FH0yWdngEZrjdtLW/qVruo68qg+prK7ZQ4YGdut2GyDVpoGeAYe/h899rVeOVm6Oi40k2A==", + "license": "MIT", + "dependencies": { + "@floating-ui/dom": "^1.7.6" + }, + "peerDependencies": { + "react": ">=16.8.0", + "react-dom": ">=16.8.0" + } + }, + "node_modules/@floating-ui/utils": { + "version": "0.2.11", + "resolved": "https://registry.npmjs.org/@floating-ui/utils/-/utils-0.2.11.tgz", + "integrity": "sha512-RiB/yIh78pcIxl6lLMG0CgBXAZ2Y0eVHqMPYugu+9U0AeT6YBeiJpf7lbdJNIugFP5SIjwNRgo4DhR1Qxi26Gg==", + "license": "MIT" + }, + "node_modules/@humanfs/core": { + "version": "0.19.1", + "resolved": "https://registry.npmjs.org/@humanfs/core/-/core-0.19.1.tgz", + "integrity": "sha512-5DyQ4+1JEUzejeK1JGICcideyfUbGixgS9jNgex5nqkW+cY7WZhxBigmieN5Qnw9ZosSNVC9KQKyb+GUaGyKUA==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": ">=18.18.0" + } + }, + "node_modules/@humanfs/node": { + "version": "0.16.7", + "resolved": "https://registry.npmjs.org/@humanfs/node/-/node-0.16.7.tgz", + "integrity": "sha512-/zUx+yOsIrG4Y43Eh2peDeKCxlRt/gET6aHfaKpuq267qXdYDFViVHfMaLyygZOnl0kGWxFIgsBy8QFuTLUXEQ==", + "dev": true, + "license": "Apache-2.0", + "dependencies": { + "@humanfs/core": "^0.19.1", + "@humanwhocodes/retry": "^0.4.0" + }, + "engines": { + "node": ">=18.18.0" + } + }, + "node_modules/@humanwhocodes/module-importer": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/@humanwhocodes/module-importer/-/module-importer-1.0.1.tgz", + "integrity": "sha512-bxveV4V8v5Yb4ncFTT3rPSgZBOpCkjfK0y4oVVVJwIuDVBRMDXrPyXRL988i5ap9m9bnyEEjWfm5WkBmtffLfA==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": ">=12.22" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/nzakas" + } + }, + "node_modules/@humanwhocodes/retry": { + "version": "0.4.3", + "resolved": "https://registry.npmjs.org/@humanwhocodes/retry/-/retry-0.4.3.tgz", + "integrity": "sha512-bV0Tgo9K4hfPCek+aMAn81RppFKv2ySDQeMoSZuvTASywNTnVJCArCZE2FWqpvIatKu7VMRLWlR1EazvVhDyhQ==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": ">=18.18" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/nzakas" + } + }, + "node_modules/@jridgewell/gen-mapping": { + "version": "0.3.13", + "resolved": "https://registry.npmjs.org/@jridgewell/gen-mapping/-/gen-mapping-0.3.13.tgz", + "integrity": "sha512-2kkt/7niJ6MgEPxF0bYdQ6etZaA+fQvDcLKckhy1yIQOzaoKjBBjSj63/aLVjYE3qhRt5dvM+uUyfCg6UKCBbA==", + "license": "MIT", + "dependencies": { + "@jridgewell/sourcemap-codec": "^1.5.0", + "@jridgewell/trace-mapping": "^0.3.24" + } + }, + "node_modules/@jridgewell/remapping": { + "version": "2.3.5", + "resolved": "https://registry.npmjs.org/@jridgewell/remapping/-/remapping-2.3.5.tgz", + "integrity": "sha512-LI9u/+laYG4Ds1TDKSJW2YPrIlcVYOwi2fUC6xB43lueCjgxV4lffOCZCtYFiH6TNOX+tQKXx97T4IKHbhyHEQ==", + "license": "MIT", + "dependencies": { + "@jridgewell/gen-mapping": "^0.3.5", + "@jridgewell/trace-mapping": "^0.3.24" + } + }, + "node_modules/@jridgewell/resolve-uri": { + "version": "3.1.2", + "resolved": "https://registry.npmjs.org/@jridgewell/resolve-uri/-/resolve-uri-3.1.2.tgz", + "integrity": "sha512-bRISgCIjP20/tbWSPWMEi54QVPRZExkuD9lJL+UIxUKtwVJA8wW1Trb1jMs1RFXo1CBTNZ/5hpC9QvmKWdopKw==", + "license": "MIT", + "engines": { + "node": ">=6.0.0" + } + }, + "node_modules/@jridgewell/sourcemap-codec": { + "version": "1.5.5", + "resolved": "https://registry.npmjs.org/@jridgewell/sourcemap-codec/-/sourcemap-codec-1.5.5.tgz", + "integrity": "sha512-cYQ9310grqxueWbl+WuIUIaiUaDcj7WOq5fVhEljNVgRfOUhY9fy2zTvfoqWsnebh8Sl70VScFbICvJnLKB0Og==", + "license": "MIT" + }, + "node_modules/@jridgewell/trace-mapping": { + "version": "0.3.31", + "resolved": "https://registry.npmjs.org/@jridgewell/trace-mapping/-/trace-mapping-0.3.31.tgz", + "integrity": "sha512-zzNR+SdQSDJzc8joaeP8QQoCQr8NuYx2dIIytl1QeBEZHJ9uW6hebsrYgbz8hJwUQao3TWCMtmfV8Nu1twOLAw==", + "license": "MIT", + "dependencies": { + "@jridgewell/resolve-uri": "^3.1.0", + "@jridgewell/sourcemap-codec": "^1.4.14" + } + }, + "node_modules/@microsoft/1ds-core-js": { + "version": "4.3.11", + "resolved": "https://registry.npmjs.org/@microsoft/1ds-core-js/-/1ds-core-js-4.3.11.tgz", + "integrity": "sha512-QyQE/YzFYB+31WEpX9hvDoXZOIXA7308Z5uuL1mSsyDSkNPl24hBWz9O3vZL+/p9shy756eKLI2nFLwwIAhXyw==", + "license": "MIT", + "dependencies": { + "@microsoft/applicationinsights-core-js": "3.3.11", + "@microsoft/applicationinsights-shims": "3.0.1", + "@microsoft/dynamicproto-js": "^2.0.3", + "@nevware21/ts-async": ">= 0.5.4 < 2.x", + "@nevware21/ts-utils": ">= 0.11.8 < 2.x" + } + }, + "node_modules/@microsoft/1ds-post-js": { + "version": "4.3.11", + "resolved": "https://registry.npmjs.org/@microsoft/1ds-post-js/-/1ds-post-js-4.3.11.tgz", + "integrity": "sha512-V0ZeeALy/Pj8HWgNHDsK+yDeCYnJ9bCgTWhcrna/ZiAT+sGfWs6mDBjAVcG03uP7TDjdWLf8w79lgbXJ3+s3DA==", + "license": "MIT", + "dependencies": { + "@microsoft/1ds-core-js": "4.3.11", + "@microsoft/applicationinsights-shims": "3.0.1", + "@microsoft/dynamicproto-js": "^2.0.3", + "@nevware21/ts-async": ">= 0.5.4 < 2.x", + "@nevware21/ts-utils": ">= 0.11.8 < 2.x" + } + }, + "node_modules/@microsoft/applicationinsights-core-js": { + "version": "3.3.11", + "resolved": "https://registry.npmjs.org/@microsoft/applicationinsights-core-js/-/applicationinsights-core-js-3.3.11.tgz", + "integrity": "sha512-WlBY1sKDNL62T++NifgFCyDuOoNUNrVILfnHubOzgU/od7MFEQYWU8EZyDcBC/+Z8e3TD6jfixurYtWoUC+6Eg==", + "license": "MIT", + "dependencies": { + "@microsoft/applicationinsights-shims": "3.0.1", + "@microsoft/dynamicproto-js": "^2.0.3", + "@nevware21/ts-async": ">= 0.5.4 < 2.x", + "@nevware21/ts-utils": ">= 0.11.8 < 2.x" + }, + "peerDependencies": { + "tslib": ">= 1.0.0" + } + }, + "node_modules/@microsoft/applicationinsights-shims": { + "version": "3.0.1", + "resolved": "https://registry.npmjs.org/@microsoft/applicationinsights-shims/-/applicationinsights-shims-3.0.1.tgz", + "integrity": "sha512-DKwboF47H1nb33rSUfjqI6ryX29v+2QWcTrRvcQDA32AZr5Ilkr7whOOSsD1aBzwqX0RJEIP1Z81jfE3NBm/Lg==", + "license": "MIT", + "dependencies": { + "@nevware21/ts-utils": ">= 0.9.4 < 2.x" + } + }, + "node_modules/@microsoft/dynamicproto-js": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/@microsoft/dynamicproto-js/-/dynamicproto-js-2.0.3.tgz", + "integrity": "sha512-JTWTU80rMy3mdxOjjpaiDQsTLZ6YSGGqsjURsY6AUQtIj0udlF/jYmhdLZu8693ZIC0T1IwYnFa0+QeiMnziBA==", + "license": "MIT", + "dependencies": { + "@nevware21/ts-utils": ">= 0.10.4 < 2.x" + } + }, + "node_modules/@microsoft/power-apps": { + "version": "1.0.17", + "resolved": "https://registry.npmjs.org/@microsoft/power-apps/-/power-apps-1.0.17.tgz", + "integrity": "sha512-mKqroivdI9Nluyc6MkXh2y/DhPLhNJ+ma4Z+60UpgST32ED2PnS5/IZBs8q4i9vyWVAdg4tQeJSsaPUMgTYr/w==", + "license": "See license in LICENSE file", + "dependencies": { + "@microsoft/power-apps-cli": "0.9.1" + } + }, + "node_modules/@microsoft/power-apps-actions": { + "version": "1.3.1", + "resolved": "https://registry.npmjs.org/@microsoft/power-apps-actions/-/power-apps-actions-1.3.1.tgz", + "integrity": "sha512-VUGpu5cNSz4V9VUwkXSLH8tZQxGkMfizK5VZjUuyCH8EU577peNRGrpU2lurmlAISnYvQNQ2rJCYI4NEqBKnZA==", + "license": "See license in LICENSE file", + "dependencies": { + "@microsoft/power-apps-common": "1.0.3", + "prettier": "3.8.1", + "ts-morph": "27.0.2", + "zod": "3.24.4", + "zod-to-json-schema": "3.24.5" + }, + "engines": { + "node": ">=22" + } + }, + "node_modules/@microsoft/power-apps-cli": { + "version": "0.9.1", + "resolved": "https://registry.npmjs.org/@microsoft/power-apps-cli/-/power-apps-cli-0.9.1.tgz", + "integrity": "sha512-vb9MXATnIgMf8iUMNT+zL0GMxd9pFsZCZBxb34U9EYAAIkvSwAblJjM3eJtquq5e525dLEST6iJq1zpJwhawHA==", + "license": "See license in LICENSE file", + "dependencies": { + "@azure/msal-node": "3.6.3", + "@azure/msal-node-extensions": "1.5.17", + "@clack/prompts": "0.6.3", + "@microsoft/1ds-core-js": "4.3.11", + "@microsoft/1ds-post-js": "4.3.11", + "@microsoft/power-apps-actions": "1.3.1", + "@microsoft/power-apps-common": "1.0.3", + "chalk": "4.1.2", + "commander": "10.0.1", + "open": "8.4.0" + }, + "bin": { + "power-apps": "dist/Bin.js" + }, + "engines": { + "node": ">=22" + } + }, + "node_modules/@microsoft/power-apps-common": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/@microsoft/power-apps-common/-/power-apps-common-1.0.3.tgz", + "integrity": "sha512-tjoQbkJnYWDKL2tuVdURsH1XWyjnjK02U0IjwLTv3QmoZ+Uqrqxc3x5YSqObOwIGE4jdJ312zltPpq0itfAueg==", + "license": "See license in LICENSE file", + "peerDependencies": { + "@microsoft/1ds-core-js": "4.3.11", + "@microsoft/1ds-post-js": "4.3.11" + }, + "peerDependenciesMeta": { + "@microsoft/1ds-core-js": { + "optional": true + }, + "@microsoft/1ds-post-js": { + "optional": true + } + } + }, + "node_modules/@microsoft/power-apps-vite": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/@microsoft/power-apps-vite/-/power-apps-vite-1.0.2.tgz", + "integrity": "sha512-KlWQqa1eE4uqaVRPlL1lJW9/Qne2Rm+hKnvbYaZo4oDT2NigB5QC1lD+eaBdMDFw/2Lv9SfOn+YfgAJ82Am10A==", + "dev": true, + "license": "See license in LICENSE file", + "dependencies": { + "picocolors": "^1.1.1" + }, + "engines": { + "node": ">=18" + }, + "peerDependencies": { + "vite": ">=5.0.0" + }, + "peerDependenciesMeta": { + "vite": { + "optional": false + } + } + }, + "node_modules/@nevware21/ts-async": { + "version": "0.5.5", + "resolved": "https://registry.npmjs.org/@nevware21/ts-async/-/ts-async-0.5.5.tgz", + "integrity": "sha512-vwqaL05iJPjLeh5igPi8MeeAu10i+Aq7xko1fbo9F5Si6MnVN5505qaV7AhSdk5MCBJVT/UYMk3kgInNjDb4Ig==", + "license": "MIT", + "dependencies": { + "@nevware21/ts-utils": ">= 0.12.2 < 2.x" + } + }, + "node_modules/@nevware21/ts-utils": { + "version": "0.13.0", + "resolved": "https://registry.npmjs.org/@nevware21/ts-utils/-/ts-utils-0.13.0.tgz", + "integrity": "sha512-F3mD+DsUn9OiZmZc5tg0oKqrJCtiCstwx+wE+DNzFYh2cCRUuzTYdK9zGGP/au2BWvbOQ6Tqlbjr2+dT1P3AlQ==", + "license": "MIT" + }, + "node_modules/@radix-ui/number": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/@radix-ui/number/-/number-1.1.1.tgz", + "integrity": "sha512-MkKCwxlXTgz6CFoJx3pCwn07GKp36+aZyu/u2Ln2VrA5DcdyCZkASEDBTd8x5whTQQL5CiYf4prXKLcgQdv29g==", + "license": "MIT" + }, + "node_modules/@radix-ui/primitive": { + "version": "1.1.3", + "resolved": "https://registry.npmjs.org/@radix-ui/primitive/-/primitive-1.1.3.tgz", + "integrity": "sha512-JTF99U/6XIjCBo0wqkU5sK10glYe27MRRsfwoiq5zzOEZLHU3A3KCMa5X/azekYRCJ0HlwI0crAXS/5dEHTzDg==", + "license": "MIT" + }, + "node_modules/@radix-ui/react-arrow": { + "version": "1.1.7", + "resolved": "https://registry.npmjs.org/@radix-ui/react-arrow/-/react-arrow-1.1.7.tgz", + "integrity": "sha512-F+M1tLhO+mlQaOWspE8Wstg+z6PwxwRd8oQ8IXceWz92kfAmalTRf0EjrouQeo7QssEPfCn05B4Ihs1K9WQ/7w==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-primitive": "2.1.3" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-checkbox": { + "version": "1.3.3", + "resolved": "https://registry.npmjs.org/@radix-ui/react-checkbox/-/react-checkbox-1.3.3.tgz", + "integrity": "sha512-wBbpv+NQftHDdG86Qc0pIyXk5IR3tM8Vd0nWLKDcX8nNn4nXFOFwsKuqw2okA/1D/mpaAkmuyndrPJTYDNZtFw==", + "license": "MIT", + "dependencies": { + "@radix-ui/primitive": "1.1.3", + "@radix-ui/react-compose-refs": "1.1.2", + "@radix-ui/react-context": "1.1.2", + "@radix-ui/react-presence": "1.1.5", + "@radix-ui/react-primitive": "2.1.3", + "@radix-ui/react-use-controllable-state": "1.2.2", + "@radix-ui/react-use-previous": "1.1.1", + "@radix-ui/react-use-size": "1.1.1" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-collection": { + "version": "1.1.7", + "resolved": "https://registry.npmjs.org/@radix-ui/react-collection/-/react-collection-1.1.7.tgz", + "integrity": "sha512-Fh9rGN0MoI4ZFUNyfFVNU4y9LUz93u9/0K+yLgA2bwRojxM8JU1DyvvMBabnZPBgMWREAJvU2jjVzq+LrFUglw==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-compose-refs": "1.1.2", + "@radix-ui/react-context": "1.1.2", + "@radix-ui/react-primitive": "2.1.3", + "@radix-ui/react-slot": "1.2.3" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-collection/node_modules/@radix-ui/react-slot": { + "version": "1.2.3", + "resolved": "https://registry.npmjs.org/@radix-ui/react-slot/-/react-slot-1.2.3.tgz", + "integrity": "sha512-aeNmHnBxbi2St0au6VBVC7JXFlhLlOnvIIlePNniyUNAClzmtAUEY8/pBiK3iHjufOlwA+c20/8jngo7xcrg8A==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-compose-refs": "1.1.2" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-compose-refs": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/@radix-ui/react-compose-refs/-/react-compose-refs-1.1.2.tgz", + "integrity": "sha512-z4eqJvfiNnFMHIIvXP3CY57y2WJs5g2v3X0zm9mEJkrkNv4rDxu+sg9Jh8EkXyeqBkB7SOcboo9dMVqhyrACIg==", + "license": "MIT", + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-context": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/@radix-ui/react-context/-/react-context-1.1.2.tgz", + "integrity": "sha512-jCi/QKUM2r1Ju5a3J64TH2A5SpKAgh0LpknyqdQ4m6DCV0xJ2HG1xARRwNGPQfi1SLdLWZ1OJz6F4OMBBNiGJA==", + "license": "MIT", + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-dialog": { + "version": "1.1.15", + "resolved": "https://registry.npmjs.org/@radix-ui/react-dialog/-/react-dialog-1.1.15.tgz", + "integrity": "sha512-TCglVRtzlffRNxRMEyR36DGBLJpeusFcgMVD9PZEzAKnUs1lKCgX5u9BmC2Yg+LL9MgZDugFFs1Vl+Jp4t/PGw==", + "license": "MIT", + "dependencies": { + "@radix-ui/primitive": "1.1.3", + "@radix-ui/react-compose-refs": "1.1.2", + "@radix-ui/react-context": "1.1.2", + "@radix-ui/react-dismissable-layer": "1.1.11", + "@radix-ui/react-focus-guards": "1.1.3", + "@radix-ui/react-focus-scope": "1.1.7", + "@radix-ui/react-id": "1.1.1", + "@radix-ui/react-portal": "1.1.9", + "@radix-ui/react-presence": "1.1.5", + "@radix-ui/react-primitive": "2.1.3", + "@radix-ui/react-slot": "1.2.3", + "@radix-ui/react-use-controllable-state": "1.2.2", + "aria-hidden": "^1.2.4", + "react-remove-scroll": "^2.6.3" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-dialog/node_modules/@radix-ui/react-slot": { + "version": "1.2.3", + "resolved": "https://registry.npmjs.org/@radix-ui/react-slot/-/react-slot-1.2.3.tgz", + "integrity": "sha512-aeNmHnBxbi2St0au6VBVC7JXFlhLlOnvIIlePNniyUNAClzmtAUEY8/pBiK3iHjufOlwA+c20/8jngo7xcrg8A==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-compose-refs": "1.1.2" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-direction": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/@radix-ui/react-direction/-/react-direction-1.1.1.tgz", + "integrity": "sha512-1UEWRX6jnOA2y4H5WczZ44gOOjTEmlqv1uNW4GAJEO5+bauCBhv8snY65Iw5/VOS/ghKN9gr2KjnLKxrsvoMVw==", + "license": "MIT", + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-dismissable-layer": { + "version": "1.1.11", + "resolved": "https://registry.npmjs.org/@radix-ui/react-dismissable-layer/-/react-dismissable-layer-1.1.11.tgz", + "integrity": "sha512-Nqcp+t5cTB8BinFkZgXiMJniQH0PsUt2k51FUhbdfeKvc4ACcG2uQniY/8+h1Yv6Kza4Q7lD7PQV0z0oicE0Mg==", + "license": "MIT", + "dependencies": { + "@radix-ui/primitive": "1.1.3", + "@radix-ui/react-compose-refs": "1.1.2", + "@radix-ui/react-primitive": "2.1.3", + "@radix-ui/react-use-callback-ref": "1.1.1", + "@radix-ui/react-use-escape-keydown": "1.1.1" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-dropdown-menu": { + "version": "2.1.16", + "resolved": "https://registry.npmjs.org/@radix-ui/react-dropdown-menu/-/react-dropdown-menu-2.1.16.tgz", + "integrity": "sha512-1PLGQEynI/3OX/ftV54COn+3Sud/Mn8vALg2rWnBLnRaGtJDduNW/22XjlGgPdpcIbiQxjKtb7BkcjP00nqfJw==", + "license": "MIT", + "dependencies": { + "@radix-ui/primitive": "1.1.3", + "@radix-ui/react-compose-refs": "1.1.2", + "@radix-ui/react-context": "1.1.2", + "@radix-ui/react-id": "1.1.1", + "@radix-ui/react-menu": "2.1.16", + "@radix-ui/react-primitive": "2.1.3", + "@radix-ui/react-use-controllable-state": "1.2.2" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-focus-guards": { + "version": "1.1.3", + "resolved": "https://registry.npmjs.org/@radix-ui/react-focus-guards/-/react-focus-guards-1.1.3.tgz", + "integrity": "sha512-0rFg/Rj2Q62NCm62jZw0QX7a3sz6QCQU0LpZdNrJX8byRGaGVTqbrW9jAoIAHyMQqsNpeZ81YgSizOt5WXq0Pw==", + "license": "MIT", + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-focus-scope": { + "version": "1.1.7", + "resolved": "https://registry.npmjs.org/@radix-ui/react-focus-scope/-/react-focus-scope-1.1.7.tgz", + "integrity": "sha512-t2ODlkXBQyn7jkl6TNaw/MtVEVvIGelJDCG41Okq/KwUsJBwQ4XVZsHAVUkK4mBv3ewiAS3PGuUWuY2BoK4ZUw==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-compose-refs": "1.1.2", + "@radix-ui/react-primitive": "2.1.3", + "@radix-ui/react-use-callback-ref": "1.1.1" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-id": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/@radix-ui/react-id/-/react-id-1.1.1.tgz", + "integrity": "sha512-kGkGegYIdQsOb4XjsfM97rXsiHaBwco+hFI66oO4s9LU+PLAC5oJ7khdOVFxkhsmlbpUqDAvXw11CluXP+jkHg==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-use-layout-effect": "1.1.1" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-label": { + "version": "2.1.8", + "resolved": "https://registry.npmjs.org/@radix-ui/react-label/-/react-label-2.1.8.tgz", + "integrity": "sha512-FmXs37I6hSBVDlO4y764TNz1rLgKwjJMQ0EGte6F3Cb3f4bIuHB/iLa/8I9VKkmOy+gNHq8rql3j686ACVV21A==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-primitive": "2.1.4" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-label/node_modules/@radix-ui/react-primitive": { + "version": "2.1.4", + "resolved": "https://registry.npmjs.org/@radix-ui/react-primitive/-/react-primitive-2.1.4.tgz", + "integrity": "sha512-9hQc4+GNVtJAIEPEqlYqW5RiYdrr8ea5XQ0ZOnD6fgru+83kqT15mq2OCcbe8KnjRZl5vF3ks69AKz3kh1jrhg==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-slot": "1.2.4" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-menu": { + "version": "2.1.16", + "resolved": "https://registry.npmjs.org/@radix-ui/react-menu/-/react-menu-2.1.16.tgz", + "integrity": "sha512-72F2T+PLlphrqLcAotYPp0uJMr5SjP5SL01wfEspJbru5Zs5vQaSHb4VB3ZMJPimgHHCHG7gMOeOB9H3Hdmtxg==", + "license": "MIT", + "dependencies": { + "@radix-ui/primitive": "1.1.3", + "@radix-ui/react-collection": "1.1.7", + "@radix-ui/react-compose-refs": "1.1.2", + "@radix-ui/react-context": "1.1.2", + "@radix-ui/react-direction": "1.1.1", + "@radix-ui/react-dismissable-layer": "1.1.11", + "@radix-ui/react-focus-guards": "1.1.3", + "@radix-ui/react-focus-scope": "1.1.7", + "@radix-ui/react-id": "1.1.1", + "@radix-ui/react-popper": "1.2.8", + "@radix-ui/react-portal": "1.1.9", + "@radix-ui/react-presence": "1.1.5", + "@radix-ui/react-primitive": "2.1.3", + "@radix-ui/react-roving-focus": "1.1.11", + "@radix-ui/react-slot": "1.2.3", + "@radix-ui/react-use-callback-ref": "1.1.1", + "aria-hidden": "^1.2.4", + "react-remove-scroll": "^2.6.3" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-menu/node_modules/@radix-ui/react-slot": { + "version": "1.2.3", + "resolved": "https://registry.npmjs.org/@radix-ui/react-slot/-/react-slot-1.2.3.tgz", + "integrity": "sha512-aeNmHnBxbi2St0au6VBVC7JXFlhLlOnvIIlePNniyUNAClzmtAUEY8/pBiK3iHjufOlwA+c20/8jngo7xcrg8A==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-compose-refs": "1.1.2" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-popover": { + "version": "1.1.15", + "resolved": "https://registry.npmjs.org/@radix-ui/react-popover/-/react-popover-1.1.15.tgz", + "integrity": "sha512-kr0X2+6Yy/vJzLYJUPCZEc8SfQcf+1COFoAqauJm74umQhta9M7lNJHP7QQS3vkvcGLQUbWpMzwrXYwrYztHKA==", + "license": "MIT", + "dependencies": { + "@radix-ui/primitive": "1.1.3", + "@radix-ui/react-compose-refs": "1.1.2", + "@radix-ui/react-context": "1.1.2", + "@radix-ui/react-dismissable-layer": "1.1.11", + "@radix-ui/react-focus-guards": "1.1.3", + "@radix-ui/react-focus-scope": "1.1.7", + "@radix-ui/react-id": "1.1.1", + "@radix-ui/react-popper": "1.2.8", + "@radix-ui/react-portal": "1.1.9", + "@radix-ui/react-presence": "1.1.5", + "@radix-ui/react-primitive": "2.1.3", + "@radix-ui/react-slot": "1.2.3", + "@radix-ui/react-use-controllable-state": "1.2.2", + "aria-hidden": "^1.2.4", + "react-remove-scroll": "^2.6.3" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-popover/node_modules/@radix-ui/react-slot": { + "version": "1.2.3", + "resolved": "https://registry.npmjs.org/@radix-ui/react-slot/-/react-slot-1.2.3.tgz", + "integrity": "sha512-aeNmHnBxbi2St0au6VBVC7JXFlhLlOnvIIlePNniyUNAClzmtAUEY8/pBiK3iHjufOlwA+c20/8jngo7xcrg8A==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-compose-refs": "1.1.2" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-popper": { + "version": "1.2.8", + "resolved": "https://registry.npmjs.org/@radix-ui/react-popper/-/react-popper-1.2.8.tgz", + "integrity": "sha512-0NJQ4LFFUuWkE7Oxf0htBKS6zLkkjBH+hM1uk7Ng705ReR8m/uelduy1DBo0PyBXPKVnBA6YBlU94MBGXrSBCw==", + "license": "MIT", + "dependencies": { + "@floating-ui/react-dom": "^2.0.0", + "@radix-ui/react-arrow": "1.1.7", + "@radix-ui/react-compose-refs": "1.1.2", + "@radix-ui/react-context": "1.1.2", + "@radix-ui/react-primitive": "2.1.3", + "@radix-ui/react-use-callback-ref": "1.1.1", + "@radix-ui/react-use-layout-effect": "1.1.1", + "@radix-ui/react-use-rect": "1.1.1", + "@radix-ui/react-use-size": "1.1.1", + "@radix-ui/rect": "1.1.1" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-portal": { + "version": "1.1.9", + "resolved": "https://registry.npmjs.org/@radix-ui/react-portal/-/react-portal-1.1.9.tgz", + "integrity": "sha512-bpIxvq03if6UNwXZ+HTK71JLh4APvnXntDc6XOX8UVq4XQOVl7lwok0AvIl+b8zgCw3fSaVTZMpAPPagXbKmHQ==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-primitive": "2.1.3", + "@radix-ui/react-use-layout-effect": "1.1.1" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-presence": { + "version": "1.1.5", + "resolved": "https://registry.npmjs.org/@radix-ui/react-presence/-/react-presence-1.1.5.tgz", + "integrity": "sha512-/jfEwNDdQVBCNvjkGit4h6pMOzq8bHkopq458dPt2lMjx+eBQUohZNG9A7DtO/O5ukSbxuaNGXMjHicgwy6rQQ==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-compose-refs": "1.1.2", + "@radix-ui/react-use-layout-effect": "1.1.1" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-primitive": { + "version": "2.1.3", + "resolved": "https://registry.npmjs.org/@radix-ui/react-primitive/-/react-primitive-2.1.3.tgz", + "integrity": "sha512-m9gTwRkhy2lvCPe6QJp4d3G1TYEUHn/FzJUtq9MjH46an1wJU+GdoGC5VLof8RX8Ft/DlpshApkhswDLZzHIcQ==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-slot": "1.2.3" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-primitive/node_modules/@radix-ui/react-slot": { + "version": "1.2.3", + "resolved": "https://registry.npmjs.org/@radix-ui/react-slot/-/react-slot-1.2.3.tgz", + "integrity": "sha512-aeNmHnBxbi2St0au6VBVC7JXFlhLlOnvIIlePNniyUNAClzmtAUEY8/pBiK3iHjufOlwA+c20/8jngo7xcrg8A==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-compose-refs": "1.1.2" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-progress": { + "version": "1.1.8", + "resolved": "https://registry.npmjs.org/@radix-ui/react-progress/-/react-progress-1.1.8.tgz", + "integrity": "sha512-+gISHcSPUJ7ktBy9RnTqbdKW78bcGke3t6taawyZ71pio1JewwGSJizycs7rLhGTvMJYCQB1DBK4KQsxs7U8dA==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-context": "1.1.3", + "@radix-ui/react-primitive": "2.1.4" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-progress/node_modules/@radix-ui/react-context": { + "version": "1.1.3", + "resolved": "https://registry.npmjs.org/@radix-ui/react-context/-/react-context-1.1.3.tgz", + "integrity": "sha512-ieIFACdMpYfMEjF0rEf5KLvfVyIkOz6PDGyNnP+u+4xQ6jny3VCgA4OgXOwNx2aUkxn8zx9fiVcM8CfFYv9Lxw==", + "license": "MIT", + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-progress/node_modules/@radix-ui/react-primitive": { + "version": "2.1.4", + "resolved": "https://registry.npmjs.org/@radix-ui/react-primitive/-/react-primitive-2.1.4.tgz", + "integrity": "sha512-9hQc4+GNVtJAIEPEqlYqW5RiYdrr8ea5XQ0ZOnD6fgru+83kqT15mq2OCcbe8KnjRZl5vF3ks69AKz3kh1jrhg==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-slot": "1.2.4" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-roving-focus": { + "version": "1.1.11", + "resolved": "https://registry.npmjs.org/@radix-ui/react-roving-focus/-/react-roving-focus-1.1.11.tgz", + "integrity": "sha512-7A6S9jSgm/S+7MdtNDSb+IU859vQqJ/QAtcYQcfFC6W8RS4IxIZDldLR0xqCFZ6DCyrQLjLPsxtTNch5jVA4lA==", + "license": "MIT", + "dependencies": { + "@radix-ui/primitive": "1.1.3", + "@radix-ui/react-collection": "1.1.7", + "@radix-ui/react-compose-refs": "1.1.2", + "@radix-ui/react-context": "1.1.2", + "@radix-ui/react-direction": "1.1.1", + "@radix-ui/react-id": "1.1.1", + "@radix-ui/react-primitive": "2.1.3", + "@radix-ui/react-use-callback-ref": "1.1.1", + "@radix-ui/react-use-controllable-state": "1.2.2" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-select": { + "version": "2.2.6", + "resolved": "https://registry.npmjs.org/@radix-ui/react-select/-/react-select-2.2.6.tgz", + "integrity": "sha512-I30RydO+bnn2PQztvo25tswPH+wFBjehVGtmagkU78yMdwTwVf12wnAOF+AeP8S2N8xD+5UPbGhkUfPyvT+mwQ==", + "license": "MIT", + "dependencies": { + "@radix-ui/number": "1.1.1", + "@radix-ui/primitive": "1.1.3", + "@radix-ui/react-collection": "1.1.7", + "@radix-ui/react-compose-refs": "1.1.2", + "@radix-ui/react-context": "1.1.2", + "@radix-ui/react-direction": "1.1.1", + "@radix-ui/react-dismissable-layer": "1.1.11", + "@radix-ui/react-focus-guards": "1.1.3", + "@radix-ui/react-focus-scope": "1.1.7", + "@radix-ui/react-id": "1.1.1", + "@radix-ui/react-popper": "1.2.8", + "@radix-ui/react-portal": "1.1.9", + "@radix-ui/react-primitive": "2.1.3", + "@radix-ui/react-slot": "1.2.3", + "@radix-ui/react-use-callback-ref": "1.1.1", + "@radix-ui/react-use-controllable-state": "1.2.2", + "@radix-ui/react-use-layout-effect": "1.1.1", + "@radix-ui/react-use-previous": "1.1.1", + "@radix-ui/react-visually-hidden": "1.2.3", + "aria-hidden": "^1.2.4", + "react-remove-scroll": "^2.6.3" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-select/node_modules/@radix-ui/react-slot": { + "version": "1.2.3", + "resolved": "https://registry.npmjs.org/@radix-ui/react-slot/-/react-slot-1.2.3.tgz", + "integrity": "sha512-aeNmHnBxbi2St0au6VBVC7JXFlhLlOnvIIlePNniyUNAClzmtAUEY8/pBiK3iHjufOlwA+c20/8jngo7xcrg8A==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-compose-refs": "1.1.2" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-separator": { + "version": "1.1.8", + "resolved": "https://registry.npmjs.org/@radix-ui/react-separator/-/react-separator-1.1.8.tgz", + "integrity": "sha512-sDvqVY4itsKwwSMEe0jtKgfTh+72Sy3gPmQpjqcQneqQ4PFmr/1I0YA+2/puilhggCe2gJcx5EBAYFkWkdpa5g==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-primitive": "2.1.4" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-separator/node_modules/@radix-ui/react-primitive": { + "version": "2.1.4", + "resolved": "https://registry.npmjs.org/@radix-ui/react-primitive/-/react-primitive-2.1.4.tgz", + "integrity": "sha512-9hQc4+GNVtJAIEPEqlYqW5RiYdrr8ea5XQ0ZOnD6fgru+83kqT15mq2OCcbe8KnjRZl5vF3ks69AKz3kh1jrhg==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-slot": "1.2.4" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-slot": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@radix-ui/react-slot/-/react-slot-1.2.4.tgz", + "integrity": "sha512-Jl+bCv8HxKnlTLVrcDE8zTMJ09R9/ukw4qBs/oZClOfoQk/cOTbDn+NceXfV7j09YPVQUryJPHurafcSg6EVKA==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-compose-refs": "1.1.2" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-tabs": { + "version": "1.1.13", + "resolved": "https://registry.npmjs.org/@radix-ui/react-tabs/-/react-tabs-1.1.13.tgz", + "integrity": "sha512-7xdcatg7/U+7+Udyoj2zodtI9H/IIopqo+YOIcZOq1nJwXWBZ9p8xiu5llXlekDbZkca79a/fozEYQXIA4sW6A==", + "license": "MIT", + "dependencies": { + "@radix-ui/primitive": "1.1.3", + "@radix-ui/react-context": "1.1.2", + "@radix-ui/react-direction": "1.1.1", + "@radix-ui/react-id": "1.1.1", + "@radix-ui/react-presence": "1.1.5", + "@radix-ui/react-primitive": "2.1.3", + "@radix-ui/react-roving-focus": "1.1.11", + "@radix-ui/react-use-controllable-state": "1.2.2" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-tooltip": { + "version": "1.2.8", + "resolved": "https://registry.npmjs.org/@radix-ui/react-tooltip/-/react-tooltip-1.2.8.tgz", + "integrity": "sha512-tY7sVt1yL9ozIxvmbtN5qtmH2krXcBCfjEiCgKGLqunJHvgvZG2Pcl2oQ3kbcZARb1BGEHdkLzcYGO8ynVlieg==", + "license": "MIT", + "dependencies": { + "@radix-ui/primitive": "1.1.3", + "@radix-ui/react-compose-refs": "1.1.2", + "@radix-ui/react-context": "1.1.2", + "@radix-ui/react-dismissable-layer": "1.1.11", + "@radix-ui/react-id": "1.1.1", + "@radix-ui/react-popper": "1.2.8", + "@radix-ui/react-portal": "1.1.9", + "@radix-ui/react-presence": "1.1.5", + "@radix-ui/react-primitive": "2.1.3", + "@radix-ui/react-slot": "1.2.3", + "@radix-ui/react-use-controllable-state": "1.2.2", + "@radix-ui/react-visually-hidden": "1.2.3" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-tooltip/node_modules/@radix-ui/react-slot": { + "version": "1.2.3", + "resolved": "https://registry.npmjs.org/@radix-ui/react-slot/-/react-slot-1.2.3.tgz", + "integrity": "sha512-aeNmHnBxbi2St0au6VBVC7JXFlhLlOnvIIlePNniyUNAClzmtAUEY8/pBiK3iHjufOlwA+c20/8jngo7xcrg8A==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-compose-refs": "1.1.2" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-use-callback-ref": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/@radix-ui/react-use-callback-ref/-/react-use-callback-ref-1.1.1.tgz", + "integrity": "sha512-FkBMwD+qbGQeMu1cOHnuGB6x4yzPjho8ap5WtbEJ26umhgqVXbhekKUQO+hZEL1vU92a3wHwdp0HAcqAUF5iDg==", + "license": "MIT", + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-use-controllable-state": { + "version": "1.2.2", + "resolved": "https://registry.npmjs.org/@radix-ui/react-use-controllable-state/-/react-use-controllable-state-1.2.2.tgz", + "integrity": "sha512-BjasUjixPFdS+NKkypcyyN5Pmg83Olst0+c6vGov0diwTEo6mgdqVR6hxcEgFuh4QrAs7Rc+9KuGJ9TVCj0Zzg==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-use-effect-event": "0.0.2", + "@radix-ui/react-use-layout-effect": "1.1.1" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-use-effect-event": { + "version": "0.0.2", + "resolved": "https://registry.npmjs.org/@radix-ui/react-use-effect-event/-/react-use-effect-event-0.0.2.tgz", + "integrity": "sha512-Qp8WbZOBe+blgpuUT+lw2xheLP8q0oatc9UpmiemEICxGvFLYmHm9QowVZGHtJlGbS6A6yJ3iViad/2cVjnOiA==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-use-layout-effect": "1.1.1" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-use-escape-keydown": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/@radix-ui/react-use-escape-keydown/-/react-use-escape-keydown-1.1.1.tgz", + "integrity": "sha512-Il0+boE7w/XebUHyBjroE+DbByORGR9KKmITzbR7MyQ4akpORYP/ZmbhAr0DG7RmmBqoOnZdy2QlvajJ2QA59g==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-use-callback-ref": "1.1.1" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-use-layout-effect": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/@radix-ui/react-use-layout-effect/-/react-use-layout-effect-1.1.1.tgz", + "integrity": "sha512-RbJRS4UWQFkzHTTwVymMTUv8EqYhOp8dOOviLj2ugtTiXRaRQS7GLGxZTLL1jWhMeoSCf5zmcZkqTl9IiYfXcQ==", + "license": "MIT", + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-use-previous": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/@radix-ui/react-use-previous/-/react-use-previous-1.1.1.tgz", + "integrity": "sha512-2dHfToCj/pzca2Ck724OZ5L0EVrr3eHRNsG/b3xQJLA2hZpVCS99bLAX+hm1IHXDEnzU6by5z/5MIY794/a8NQ==", + "license": "MIT", + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-use-rect": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/@radix-ui/react-use-rect/-/react-use-rect-1.1.1.tgz", + "integrity": "sha512-QTYuDesS0VtuHNNvMh+CjlKJ4LJickCMUAqjlE3+j8w+RlRpwyX3apEQKGFzbZGdo7XNG1tXa+bQqIE7HIXT2w==", + "license": "MIT", + "dependencies": { + "@radix-ui/rect": "1.1.1" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-use-size": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/@radix-ui/react-use-size/-/react-use-size-1.1.1.tgz", + "integrity": "sha512-ewrXRDTAqAXlkl6t/fkXWNAhFX9I+CkKlw6zjEwk86RSPKwZr3xpBRso655aqYafwtnbpHLj6toFzmd6xdVptQ==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-use-layout-effect": "1.1.1" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/@radix-ui/react-visually-hidden": { + "version": "1.2.3", + "resolved": "https://registry.npmjs.org/@radix-ui/react-visually-hidden/-/react-visually-hidden-1.2.3.tgz", + "integrity": "sha512-pzJq12tEaaIhqjbzpCuv/OypJY/BPavOofm+dbab+MHLajy277+1lLm6JFcGgF5eskJ6mquGirhXY2GD/8u8Ug==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-primitive": "2.1.3" + }, + "peerDependencies": { + "@types/react": "*", + "@types/react-dom": "*", + "react": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc", + "react-dom": "^16.8 || ^17.0 || ^18.0 || ^19.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "@types/react-dom": { + "optional": true + } + } + }, + "node_modules/@radix-ui/rect": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/@radix-ui/rect/-/rect-1.1.1.tgz", + "integrity": "sha512-HPwpGIzkl28mWyZqG52jiqDJ12waP11Pa1lGoiyUkIEuMLBP0oeK/C89esbXrxsky5we7dfd8U58nm0SgAWpVw==", + "license": "MIT" + }, + "node_modules/@rolldown/pluginutils": { + "version": "1.0.0-rc.3", + "resolved": "https://registry.npmjs.org/@rolldown/pluginutils/-/pluginutils-1.0.0-rc.3.tgz", + "integrity": "sha512-eybk3TjzzzV97Dlj5c+XrBFW57eTNhzod66y9HrBlzJ6NsCrWCp/2kaPS3K9wJmurBC0Tdw4yPjXKZqlznim3Q==", + "dev": true, + "license": "MIT" + }, + "node_modules/@rollup/rollup-android-arm-eabi": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-android-arm-eabi/-/rollup-android-arm-eabi-4.60.1.tgz", + "integrity": "sha512-d6FinEBLdIiK+1uACUttJKfgZREXrF0Qc2SmLII7W2AD8FfiZ9Wjd+rD/iRuf5s5dWrr1GgwXCvPqOuDquOowA==", + "cpu": [ + "arm" + ], + "license": "MIT", + "optional": true, + "os": [ + "android" + ] + }, + "node_modules/@rollup/rollup-android-arm64": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-android-arm64/-/rollup-android-arm64-4.60.1.tgz", + "integrity": "sha512-YjG/EwIDvvYI1YvYbHvDz/BYHtkY4ygUIXHnTdLhG+hKIQFBiosfWiACWortsKPKU/+dUwQQCKQM3qrDe8c9BA==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "android" + ] + }, + "node_modules/@rollup/rollup-darwin-arm64": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-darwin-arm64/-/rollup-darwin-arm64-4.60.1.tgz", + "integrity": "sha512-mjCpF7GmkRtSJwon+Rq1N8+pI+8l7w5g9Z3vWj4T7abguC4Czwi3Yu/pFaLvA3TTeMVjnu3ctigusqWUfjZzvw==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ] + }, + "node_modules/@rollup/rollup-darwin-x64": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-darwin-x64/-/rollup-darwin-x64-4.60.1.tgz", + "integrity": "sha512-haZ7hJ1JT4e9hqkoT9R/19XW2QKqjfJVv+i5AGg57S+nLk9lQnJ1F/eZloRO3o9Scy9CM3wQ9l+dkXtcBgN5Ew==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ] + }, + "node_modules/@rollup/rollup-freebsd-arm64": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-freebsd-arm64/-/rollup-freebsd-arm64-4.60.1.tgz", + "integrity": "sha512-czw90wpQq3ZsAVBlinZjAYTKduOjTywlG7fEeWKUA7oCmpA8xdTkxZZlwNJKWqILlq0wehoZcJYfBvOyhPTQ6w==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "freebsd" + ] + }, + "node_modules/@rollup/rollup-freebsd-x64": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-freebsd-x64/-/rollup-freebsd-x64-4.60.1.tgz", + "integrity": "sha512-KVB2rqsxTHuBtfOeySEyzEOB7ltlB/ux38iu2rBQzkjbwRVlkhAGIEDiiYnO2kFOkJp+Z7pUXKyrRRFuFUKt+g==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "freebsd" + ] + }, + "node_modules/@rollup/rollup-linux-arm-gnueabihf": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm-gnueabihf/-/rollup-linux-arm-gnueabihf-4.60.1.tgz", + "integrity": "sha512-L+34Qqil+v5uC0zEubW7uByo78WOCIrBvci69E7sFASRl0X7b/MB6Cqd1lky/CtcSVTydWa2WZwFuWexjS5o6g==", + "cpu": [ + "arm" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-arm-musleabihf": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm-musleabihf/-/rollup-linux-arm-musleabihf-4.60.1.tgz", + "integrity": "sha512-n83O8rt4v34hgFzlkb1ycniJh7IR5RCIqt6mz1VRJD6pmhRi0CXdmfnLu9dIUS6buzh60IvACM842Ffb3xd6Gg==", + "cpu": [ + "arm" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-arm64-gnu": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm64-gnu/-/rollup-linux-arm64-gnu-4.60.1.tgz", + "integrity": "sha512-Nql7sTeAzhTAja3QXeAI48+/+GjBJ+QmAH13snn0AJSNL50JsDqotyudHyMbO2RbJkskbMbFJfIJKWA6R1LCJQ==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-arm64-musl": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-arm64-musl/-/rollup-linux-arm64-musl-4.60.1.tgz", + "integrity": "sha512-+pUymDhd0ys9GcKZPPWlFiZ67sTWV5UU6zOJat02M1+PiuSGDziyRuI/pPue3hoUwm2uGfxdL+trT6Z9rxnlMA==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-loong64-gnu": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-loong64-gnu/-/rollup-linux-loong64-gnu-4.60.1.tgz", + "integrity": "sha512-VSvgvQeIcsEvY4bKDHEDWcpW4Yw7BtlKG1GUT4FzBUlEKQK0rWHYBqQt6Fm2taXS+1bXvJT6kICu5ZwqKCnvlQ==", + "cpu": [ + "loong64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-loong64-musl": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-loong64-musl/-/rollup-linux-loong64-musl-4.60.1.tgz", + "integrity": "sha512-4LqhUomJqwe641gsPp6xLfhqWMbQV04KtPp7/dIp0nzPxAkNY1AbwL5W0MQpcalLYk07vaW9Kp1PBhdpZYYcEw==", + "cpu": [ + "loong64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-ppc64-gnu": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-ppc64-gnu/-/rollup-linux-ppc64-gnu-4.60.1.tgz", + "integrity": "sha512-tLQQ9aPvkBxOc/EUT6j3pyeMD6Hb8QF2BTBnCQWP/uu1lhc9AIrIjKnLYMEroIz/JvtGYgI9dF3AxHZNaEH0rw==", + "cpu": [ + "ppc64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-ppc64-musl": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-ppc64-musl/-/rollup-linux-ppc64-musl-4.60.1.tgz", + "integrity": "sha512-RMxFhJwc9fSXP6PqmAz4cbv3kAyvD1etJFjTx4ONqFP9DkTkXsAMU4v3Vyc5BgzC+anz7nS/9tp4obsKfqkDHg==", + "cpu": [ + "ppc64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-riscv64-gnu": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-riscv64-gnu/-/rollup-linux-riscv64-gnu-4.60.1.tgz", + "integrity": "sha512-QKgFl+Yc1eEk6MmOBfRHYF6lTxiiiV3/z/BRrbSiW2I7AFTXoBFvdMEyglohPj//2mZS4hDOqeB0H1ACh3sBbg==", + "cpu": [ + "riscv64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-riscv64-musl": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-riscv64-musl/-/rollup-linux-riscv64-musl-4.60.1.tgz", + "integrity": "sha512-RAjXjP/8c6ZtzatZcA1RaQr6O1TRhzC+adn8YZDnChliZHviqIjmvFwHcxi4JKPSDAt6Uhf/7vqcBzQJy0PDJg==", + "cpu": [ + "riscv64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-s390x-gnu": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-s390x-gnu/-/rollup-linux-s390x-gnu-4.60.1.tgz", + "integrity": "sha512-wcuocpaOlaL1COBYiA89O6yfjlp3RwKDeTIA0hM7OpmhR1Bjo9j31G1uQVpDlTvwxGn2nQs65fBFL5UFd76FcQ==", + "cpu": [ + "s390x" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-x64-gnu": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-x64-gnu/-/rollup-linux-x64-gnu-4.60.1.tgz", + "integrity": "sha512-77PpsFQUCOiZR9+LQEFg9GClyfkNXj1MP6wRnzYs0EeWbPcHs02AXu4xuUbM1zhwn3wqaizle3AEYg5aeoohhg==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-x64-musl": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-linux-x64-musl/-/rollup-linux-x64-musl-4.60.1.tgz", + "integrity": "sha512-5cIATbk5vynAjqqmyBjlciMJl1+R/CwX9oLk/EyiFXDWd95KpHdrOJT//rnUl4cUcskrd0jCCw3wpZnhIHdD9w==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-openbsd-x64": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-openbsd-x64/-/rollup-openbsd-x64-4.60.1.tgz", + "integrity": "sha512-cl0w09WsCi17mcmWqqglez9Gk8isgeWvoUZ3WiJFYSR3zjBQc2J5/ihSjpl+VLjPqjQ/1hJRcqBfLjssREQILw==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "openbsd" + ] + }, + "node_modules/@rollup/rollup-openharmony-arm64": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-openharmony-arm64/-/rollup-openharmony-arm64-4.60.1.tgz", + "integrity": "sha512-4Cv23ZrONRbNtbZa37mLSueXUCtN7MXccChtKpUnQNgF010rjrjfHx3QxkS2PI7LqGT5xXyYs1a7LbzAwT0iCA==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "openharmony" + ] + }, + "node_modules/@rollup/rollup-win32-arm64-msvc": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-arm64-msvc/-/rollup-win32-arm64-msvc-4.60.1.tgz", + "integrity": "sha512-i1okWYkA4FJICtr7KpYzFpRTHgy5jdDbZiWfvny21iIKky5YExiDXP+zbXzm3dUcFpkEeYNHgQ5fuG236JPq0g==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "win32" + ] + }, + "node_modules/@rollup/rollup-win32-ia32-msvc": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-ia32-msvc/-/rollup-win32-ia32-msvc-4.60.1.tgz", + "integrity": "sha512-u09m3CuwLzShA0EYKMNiFgcjjzwqtUMLmuCJLeZWjjOYA3IT2Di09KaxGBTP9xVztWyIWjVdsB2E9goMjZvTQg==", + "cpu": [ + "ia32" + ], + "license": "MIT", + "optional": true, + "os": [ + "win32" + ] + }, + "node_modules/@rollup/rollup-win32-x64-gnu": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-x64-gnu/-/rollup-win32-x64-gnu-4.60.1.tgz", + "integrity": "sha512-k+600V9Zl1CM7eZxJgMyTUzmrmhB/0XZnF4pRypKAlAgxmedUA+1v9R+XOFv56W4SlHEzfeMtzujLJD22Uz5zg==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "win32" + ] + }, + "node_modules/@rollup/rollup-win32-x64-msvc": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/@rollup/rollup-win32-x64-msvc/-/rollup-win32-x64-msvc-4.60.1.tgz", + "integrity": "sha512-lWMnixq/QzxyhTV6NjQJ4SFo1J6PvOX8vUx5Wb4bBPsEb+8xZ89Bz6kOXpfXj9ak9AHTQVQzlgzBEc1SyM27xQ==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "win32" + ] + }, + "node_modules/@tabby_ai/hijri-converter": { + "version": "1.0.5", + "resolved": "https://registry.npmjs.org/@tabby_ai/hijri-converter/-/hijri-converter-1.0.5.tgz", + "integrity": "sha512-r5bClKrcIusDoo049dSL8CawnHR6mRdDwhlQuIgZRNty68q0x8k3Lf1BtPAMxRf/GgnHBnIO4ujd3+GQdLWzxQ==", + "license": "MIT", + "engines": { + "node": ">=16.0.0" + } + }, + "node_modules/@tailwindcss/node": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/@tailwindcss/node/-/node-4.2.2.tgz", + "integrity": "sha512-pXS+wJ2gZpVXqFaUEjojq7jzMpTGf8rU6ipJz5ovJV6PUGmlJ+jvIwGrzdHdQ80Sg+wmQxUFuoW1UAAwHNEdFA==", + "license": "MIT", + "dependencies": { + "@jridgewell/remapping": "^2.3.5", + "enhanced-resolve": "^5.19.0", + "jiti": "^2.6.1", + "lightningcss": "1.32.0", + "magic-string": "^0.30.21", + "source-map-js": "^1.2.1", + "tailwindcss": "4.2.2" + } + }, + "node_modules/@tailwindcss/oxide": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/@tailwindcss/oxide/-/oxide-4.2.2.tgz", + "integrity": "sha512-qEUA07+E5kehxYp9BVMpq9E8vnJuBHfJEC0vPC5e7iL/hw7HR61aDKoVoKzrG+QKp56vhNZe4qwkRmMC0zDLvg==", + "license": "MIT", + "engines": { + "node": ">= 20" + }, + "optionalDependencies": { + "@tailwindcss/oxide-android-arm64": "4.2.2", + "@tailwindcss/oxide-darwin-arm64": "4.2.2", + "@tailwindcss/oxide-darwin-x64": "4.2.2", + "@tailwindcss/oxide-freebsd-x64": "4.2.2", + "@tailwindcss/oxide-linux-arm-gnueabihf": "4.2.2", + "@tailwindcss/oxide-linux-arm64-gnu": "4.2.2", + "@tailwindcss/oxide-linux-arm64-musl": "4.2.2", + "@tailwindcss/oxide-linux-x64-gnu": "4.2.2", + "@tailwindcss/oxide-linux-x64-musl": "4.2.2", + "@tailwindcss/oxide-wasm32-wasi": "4.2.2", + "@tailwindcss/oxide-win32-arm64-msvc": "4.2.2", + "@tailwindcss/oxide-win32-x64-msvc": "4.2.2" + } + }, + "node_modules/@tailwindcss/oxide-android-arm64": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/@tailwindcss/oxide-android-arm64/-/oxide-android-arm64-4.2.2.tgz", + "integrity": "sha512-dXGR1n+P3B6748jZO/SvHZq7qBOqqzQ+yFrXpoOWWALWndF9MoSKAT3Q0fYgAzYzGhxNYOoysRvYlpixRBBoDg==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "android" + ], + "engines": { + "node": ">= 20" + } + }, + "node_modules/@tailwindcss/oxide-darwin-arm64": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/@tailwindcss/oxide-darwin-arm64/-/oxide-darwin-arm64-4.2.2.tgz", + "integrity": "sha512-iq9Qjr6knfMpZHj55/37ouZeykwbDqF21gPFtfnhCCKGDcPI/21FKC9XdMO/XyBM7qKORx6UIhGgg6jLl7BZlg==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": ">= 20" + } + }, + "node_modules/@tailwindcss/oxide-darwin-x64": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/@tailwindcss/oxide-darwin-x64/-/oxide-darwin-x64-4.2.2.tgz", + "integrity": "sha512-BlR+2c3nzc8f2G639LpL89YY4bdcIdUmiOOkv2GQv4/4M0vJlpXEa0JXNHhCHU7VWOKWT/CjqHdTP8aUuDJkuw==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": ">= 20" + } + }, + "node_modules/@tailwindcss/oxide-freebsd-x64": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/@tailwindcss/oxide-freebsd-x64/-/oxide-freebsd-x64-4.2.2.tgz", + "integrity": "sha512-YUqUgrGMSu2CDO82hzlQ5qSb5xmx3RUrke/QgnoEx7KvmRJHQuZHZmZTLSuuHwFf0DJPybFMXMYf+WJdxHy/nQ==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "freebsd" + ], + "engines": { + "node": ">= 20" + } + }, + "node_modules/@tailwindcss/oxide-linux-arm-gnueabihf": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/@tailwindcss/oxide-linux-arm-gnueabihf/-/oxide-linux-arm-gnueabihf-4.2.2.tgz", + "integrity": "sha512-FPdhvsW6g06T9BWT0qTwiVZYE2WIFo2dY5aCSpjG/S/u1tby+wXoslXS0kl3/KXnULlLr1E3NPRRw0g7t2kgaQ==", + "cpu": [ + "arm" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 20" + } + }, + "node_modules/@tailwindcss/oxide-linux-arm64-gnu": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/@tailwindcss/oxide-linux-arm64-gnu/-/oxide-linux-arm64-gnu-4.2.2.tgz", + "integrity": "sha512-4og1V+ftEPXGttOO7eCmW7VICmzzJWgMx+QXAJRAhjrSjumCwWqMfkDrNu1LXEQzNAwz28NCUpucgQPrR4S2yw==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 20" + } + }, + "node_modules/@tailwindcss/oxide-linux-arm64-musl": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/@tailwindcss/oxide-linux-arm64-musl/-/oxide-linux-arm64-musl-4.2.2.tgz", + "integrity": "sha512-oCfG/mS+/+XRlwNjnsNLVwnMWYH7tn/kYPsNPh+JSOMlnt93mYNCKHYzylRhI51X+TbR+ufNhhKKzm6QkqX8ag==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 20" + } + }, + "node_modules/@tailwindcss/oxide-linux-x64-gnu": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/@tailwindcss/oxide-linux-x64-gnu/-/oxide-linux-x64-gnu-4.2.2.tgz", + "integrity": "sha512-rTAGAkDgqbXHNp/xW0iugLVmX62wOp2PoE39BTCGKjv3Iocf6AFbRP/wZT/kuCxC9QBh9Pu8XPkv/zCZB2mcMg==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 20" + } + }, + "node_modules/@tailwindcss/oxide-linux-x64-musl": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/@tailwindcss/oxide-linux-x64-musl/-/oxide-linux-x64-musl-4.2.2.tgz", + "integrity": "sha512-XW3t3qwbIwiSyRCggeO2zxe3KWaEbM0/kW9e8+0XpBgyKU4ATYzcVSMKteZJ1iukJ3HgHBjbg9P5YPRCVUxlnQ==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 20" + } + }, + "node_modules/@tailwindcss/oxide-wasm32-wasi": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/@tailwindcss/oxide-wasm32-wasi/-/oxide-wasm32-wasi-4.2.2.tgz", + "integrity": "sha512-eKSztKsmEsn1O5lJ4ZAfyn41NfG7vzCg496YiGtMDV86jz1q/irhms5O0VrY6ZwTUkFy/EKG3RfWgxSI3VbZ8Q==", + "bundleDependencies": [ + "@napi-rs/wasm-runtime", + "@emnapi/core", + "@emnapi/runtime", + "@tybys/wasm-util", + "@emnapi/wasi-threads", + "tslib" + ], + "cpu": [ + "wasm32" + ], + "license": "MIT", + "optional": true, + "dependencies": { + "@emnapi/core": "^1.8.1", + "@emnapi/runtime": "^1.8.1", + "@emnapi/wasi-threads": "^1.1.0", + "@napi-rs/wasm-runtime": "^1.1.1", + "@tybys/wasm-util": "^0.10.1", + "tslib": "^2.8.1" + }, + "engines": { + "node": ">=14.0.0" + } + }, + "node_modules/@tailwindcss/oxide-win32-arm64-msvc": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/@tailwindcss/oxide-win32-arm64-msvc/-/oxide-win32-arm64-msvc-4.2.2.tgz", + "integrity": "sha512-qPmaQM4iKu5mxpsrWZMOZRgZv1tOZpUm+zdhhQP0VhJfyGGO3aUKdbh3gDZc/dPLQwW4eSqWGrrcWNBZWUWaXQ==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">= 20" + } + }, + "node_modules/@tailwindcss/oxide-win32-x64-msvc": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/@tailwindcss/oxide-win32-x64-msvc/-/oxide-win32-x64-msvc-4.2.2.tgz", + "integrity": "sha512-1T/37VvI7WyH66b+vqHj/cLwnCxt7Qt3WFu5Q8hk65aOvlwAhs7rAp1VkulBJw/N4tMirXjVnylTR72uI0HGcA==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">= 20" + } + }, + "node_modules/@tailwindcss/vite": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/@tailwindcss/vite/-/vite-4.2.2.tgz", + "integrity": "sha512-mEiF5HO1QqCLXoNEfXVA1Tzo+cYsrqV7w9Juj2wdUFyW07JRenqMG225MvPwr3ZD9N1bFQj46X7r33iHxLUW0w==", + "license": "MIT", + "dependencies": { + "@tailwindcss/node": "4.2.2", + "@tailwindcss/oxide": "4.2.2", + "tailwindcss": "4.2.2" + }, + "peerDependencies": { + "vite": "^5.2.0 || ^6 || ^7 || ^8" + } + }, + "node_modules/@tanstack/query-core": { + "version": "5.96.0", + "resolved": "https://registry.npmjs.org/@tanstack/query-core/-/query-core-5.96.0.tgz", + "integrity": "sha512-sfO3uQeol1BU7cRP6NYY7nAiX3GiNY20lI/dtSbKLwcIkYw/X+w/tEsQAkc544AfIhBX/IvH/QYtPHrPhyAKGw==", + "license": "MIT", + "funding": { + "type": "github", + "url": "https://github.com/sponsors/tannerlinsley" + } + }, + "node_modules/@tanstack/react-query": { + "version": "5.96.0", + "resolved": "https://registry.npmjs.org/@tanstack/react-query/-/react-query-5.96.0.tgz", + "integrity": "sha512-6qbjdm1K5kizVKv9TNqhIN3doq2anRhdF2XaFMFSn4m8L22S69RV+FilvlyVT4RoJyMxtPU5rs4RpdFa/PEC7A==", + "license": "MIT", + "dependencies": { + "@tanstack/query-core": "5.96.0" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/tannerlinsley" + }, + "peerDependencies": { + "react": "^18 || ^19" + } + }, + "node_modules/@tanstack/react-table": { + "version": "8.21.3", + "resolved": "https://registry.npmjs.org/@tanstack/react-table/-/react-table-8.21.3.tgz", + "integrity": "sha512-5nNMTSETP4ykGegmVkhjcS8tTLW6Vl4axfEGQN3v0zdHYbK4UfoqfPChclTrJ4EoK9QynqAu9oUf8VEmrpZ5Ww==", + "license": "MIT", + "dependencies": { + "@tanstack/table-core": "8.21.3" + }, + "engines": { + "node": ">=12" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/tannerlinsley" + }, + "peerDependencies": { + "react": ">=16.8", + "react-dom": ">=16.8" + } + }, + "node_modules/@tanstack/table-core": { + "version": "8.21.3", + "resolved": "https://registry.npmjs.org/@tanstack/table-core/-/table-core-8.21.3.tgz", + "integrity": "sha512-ldZXEhOBb8Is7xLs01fR3YEc3DERiz5silj8tnGkFZytt1abEvl/GhUmCE0PMLaMPTa3Jk4HbKmRlHmu+gCftg==", + "license": "MIT", + "engines": { + "node": ">=12" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/tannerlinsley" + } + }, + "node_modules/@ts-morph/common": { + "version": "0.28.1", + "resolved": "https://registry.npmjs.org/@ts-morph/common/-/common-0.28.1.tgz", + "integrity": "sha512-W74iWf7ILp1ZKNYXY5qbddNaml7e9Sedv5lvU1V8lftlitkc9Pq1A+jlH23ltDgWYeZFFEqGCD1Ies9hqu3O+g==", + "license": "MIT", + "dependencies": { + "minimatch": "^10.0.1", + "path-browserify": "^1.0.1", + "tinyglobby": "^0.2.14" + } + }, + "node_modules/@ts-morph/common/node_modules/balanced-match": { + "version": "4.0.4", + "resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-4.0.4.tgz", + "integrity": "sha512-BLrgEcRTwX2o6gGxGOCNyMvGSp35YofuYzw9h1IMTRmKqttAZZVU67bdb9Pr2vUHA8+j3i2tJfjO6C6+4myGTA==", + "license": "MIT", + "engines": { + "node": "18 || 20 || >=22" + } + }, + "node_modules/@ts-morph/common/node_modules/brace-expansion": { + "version": "5.0.5", + "resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-5.0.5.tgz", + "integrity": "sha512-VZznLgtwhn+Mact9tfiwx64fA9erHH/MCXEUfB/0bX/6Fz6ny5EGTXYltMocqg4xFAQZtnO3DHWWXi8RiuN7cQ==", + "license": "MIT", + "dependencies": { + "balanced-match": "^4.0.2" + }, + "engines": { + "node": "18 || 20 || >=22" + } + }, + "node_modules/@ts-morph/common/node_modules/minimatch": { + "version": "10.2.5", + "resolved": "https://registry.npmjs.org/minimatch/-/minimatch-10.2.5.tgz", + "integrity": "sha512-MULkVLfKGYDFYejP07QOurDLLQpcjk7Fw+7jXS2R2czRQzR56yHRveU5NDJEOviH+hETZKSkIk5c+T23GjFUMg==", + "license": "BlueOak-1.0.0", + "dependencies": { + "brace-expansion": "^5.0.5" + }, + "engines": { + "node": "18 || 20 || >=22" + }, + "funding": { + "url": "https://github.com/sponsors/isaacs" + } + }, + "node_modules/@types/babel__core": { + "version": "7.20.5", + "resolved": "https://registry.npmjs.org/@types/babel__core/-/babel__core-7.20.5.tgz", + "integrity": "sha512-qoQprZvz5wQFJwMDqeseRXWv3rqMvhgpbXFfVyWhbx9X47POIA6i/+dXefEmZKoAgOaTdaIgNSMqMIU61yRyzA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/parser": "^7.20.7", + "@babel/types": "^7.20.7", + "@types/babel__generator": "*", + "@types/babel__template": "*", + "@types/babel__traverse": "*" + } + }, + "node_modules/@types/babel__generator": { + "version": "7.27.0", + "resolved": "https://registry.npmjs.org/@types/babel__generator/-/babel__generator-7.27.0.tgz", + "integrity": "sha512-ufFd2Xi92OAVPYsy+P4n7/U7e68fex0+Ee8gSG9KX7eo084CWiQ4sdxktvdl0bOPupXtVJPY19zk6EwWqUQ8lg==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/types": "^7.0.0" + } + }, + "node_modules/@types/babel__template": { + "version": "7.4.4", + "resolved": "https://registry.npmjs.org/@types/babel__template/-/babel__template-7.4.4.tgz", + "integrity": "sha512-h/NUaSyG5EyxBIp8YRxo4RMe2/qQgvyowRwVMzhYhBCONbW8PUsg4lkFMrhgZhUe5z3L3MiLDuvyJ/CaPa2A8A==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/parser": "^7.1.0", + "@babel/types": "^7.0.0" + } + }, + "node_modules/@types/babel__traverse": { + "version": "7.28.0", + "resolved": "https://registry.npmjs.org/@types/babel__traverse/-/babel__traverse-7.28.0.tgz", + "integrity": "sha512-8PvcXf70gTDZBgt9ptxJ8elBeBjcLOAcOtoO/mPJjtji1+CdGbHgm77om1GrsPxsiE+uXIpNSK64UYaIwQXd4Q==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/types": "^7.28.2" + } + }, + "node_modules/@types/d3-array": { + "version": "3.2.2", + "resolved": "https://registry.npmjs.org/@types/d3-array/-/d3-array-3.2.2.tgz", + "integrity": "sha512-hOLWVbm7uRza0BYXpIIW5pxfrKe0W+D5lrFiAEYR+pb6w3N2SwSMaJbXdUfSEv+dT4MfHBLtn5js0LAWaO6otw==", + "license": "MIT" + }, + "node_modules/@types/d3-color": { + "version": "3.1.3", + "resolved": "https://registry.npmjs.org/@types/d3-color/-/d3-color-3.1.3.tgz", + "integrity": "sha512-iO90scth9WAbmgv7ogoq57O9YpKmFBbmoEoCHDB2xMBY0+/KVrqAaCDyCE16dUspeOvIxFFRI+0sEtqDqy2b4A==", + "license": "MIT" + }, + "node_modules/@types/d3-ease": { + "version": "3.0.2", + "resolved": "https://registry.npmjs.org/@types/d3-ease/-/d3-ease-3.0.2.tgz", + "integrity": "sha512-NcV1JjO5oDzoK26oMzbILE6HW7uVXOHLQvHshBUW4UMdZGfiY6v5BeQwh9a9tCzv+CeefZQHJt5SRgK154RtiA==", + "license": "MIT" + }, + "node_modules/@types/d3-interpolate": { + "version": "3.0.4", + "resolved": "https://registry.npmjs.org/@types/d3-interpolate/-/d3-interpolate-3.0.4.tgz", + "integrity": "sha512-mgLPETlrpVV1YRJIglr4Ez47g7Yxjl1lj7YKsiMCb27VJH9W8NVM6Bb9d8kkpG/uAQS5AmbA48q2IAolKKo1MA==", + "license": "MIT", + "dependencies": { + "@types/d3-color": "*" + } + }, + "node_modules/@types/d3-path": { + "version": "3.1.1", + "resolved": "https://registry.npmjs.org/@types/d3-path/-/d3-path-3.1.1.tgz", + "integrity": "sha512-VMZBYyQvbGmWyWVea0EHs/BwLgxc+MKi1zLDCONksozI4YJMcTt8ZEuIR4Sb1MMTE8MMW49v0IwI5+b7RmfWlg==", + "license": "MIT" + }, + "node_modules/@types/d3-scale": { + "version": "4.0.9", + "resolved": "https://registry.npmjs.org/@types/d3-scale/-/d3-scale-4.0.9.tgz", + "integrity": "sha512-dLmtwB8zkAeO/juAMfnV+sItKjlsw2lKdZVVy6LRr0cBmegxSABiLEpGVmSJJ8O08i4+sGR6qQtb6WtuwJdvVw==", + "license": "MIT", + "dependencies": { + "@types/d3-time": "*" + } + }, + "node_modules/@types/d3-shape": { + "version": "3.1.8", + "resolved": "https://registry.npmjs.org/@types/d3-shape/-/d3-shape-3.1.8.tgz", + "integrity": "sha512-lae0iWfcDeR7qt7rA88BNiqdvPS5pFVPpo5OfjElwNaT2yyekbM0C9vK+yqBqEmHr6lDkRnYNoTBYlAgJa7a4w==", + "license": "MIT", + "dependencies": { + "@types/d3-path": "*" + } + }, + "node_modules/@types/d3-time": { + "version": "3.0.4", + "resolved": "https://registry.npmjs.org/@types/d3-time/-/d3-time-3.0.4.tgz", + "integrity": "sha512-yuzZug1nkAAaBlBBikKZTgzCeA+k1uy4ZFwWANOfKw5z5LRhV0gNA7gNkKm7HoK+HRN0wX3EkxGk0fpbWhmB7g==", + "license": "MIT" + }, + "node_modules/@types/d3-timer": { + "version": "3.0.2", + "resolved": "https://registry.npmjs.org/@types/d3-timer/-/d3-timer-3.0.2.tgz", + "integrity": "sha512-Ps3T8E8dZDam6fUyNiMkekK3XUsaUEik+idO9/YjPtfj2qruF8tFBXS7XhtE4iIXBLxhmLjP3SXpLhVf21I9Lw==", + "license": "MIT" + }, + "node_modules/@types/estree": { + "version": "1.0.8", + "resolved": "https://registry.npmjs.org/@types/estree/-/estree-1.0.8.tgz", + "integrity": "sha512-dWHzHa2WqEXI/O1E9OjrocMTKJl2mSrEolh1Iomrv6U+JuNwaHXsXx9bLu5gG7BUWFIN0skIQJQ/L1rIex4X6w==", + "license": "MIT" + }, + "node_modules/@types/json-schema": { + "version": "7.0.15", + "resolved": "https://registry.npmjs.org/@types/json-schema/-/json-schema-7.0.15.tgz", + "integrity": "sha512-5+fP8P8MFNC+AyZCDxrB2pkZFPGzqQWUzpSeuuVLvm8VMcorNYavBqoFcxK8bQz4Qsbn4oUEEem4wDLfcysGHA==", + "dev": true, + "license": "MIT" + }, + "node_modules/@types/node": { + "version": "24.12.0", + "resolved": "https://registry.npmjs.org/@types/node/-/node-24.12.0.tgz", + "integrity": "sha512-GYDxsZi3ChgmckRT9HPU0WEhKLP08ev/Yfcq2AstjrDASOYCSXeyjDsHg4v5t4jOj7cyDX3vmprafKlWIG9MXQ==", + "devOptional": true, + "license": "MIT", + "dependencies": { + "undici-types": "~7.16.0" + } + }, + "node_modules/@types/react": { + "version": "19.2.14", + "resolved": "https://registry.npmjs.org/@types/react/-/react-19.2.14.tgz", + "integrity": "sha512-ilcTH/UniCkMdtexkoCN0bI7pMcJDvmQFPvuPvmEaYA/NSfFTAgdUSLAoVjaRJm7+6PvcM+q1zYOwS4wTYMF9w==", + "devOptional": true, + "license": "MIT", + "dependencies": { + "csstype": "^3.2.2" + } + }, + "node_modules/@types/react-dom": { + "version": "19.2.3", + "resolved": "https://registry.npmjs.org/@types/react-dom/-/react-dom-19.2.3.tgz", + "integrity": "sha512-jp2L/eY6fn+KgVVQAOqYItbF0VY/YApe5Mz2F0aykSO8gx31bYCZyvSeYxCHKvzHG5eZjc+zyaS5BrBWya2+kQ==", + "devOptional": true, + "license": "MIT", + "peerDependencies": { + "@types/react": "^19.2.0" + } + }, + "node_modules/@typescript-eslint/eslint-plugin": { + "version": "8.58.0", + "resolved": "https://registry.npmjs.org/@typescript-eslint/eslint-plugin/-/eslint-plugin-8.58.0.tgz", + "integrity": "sha512-RLkVSiNuUP1C2ROIWfqX+YcUfLaSnxGE/8M+Y57lopVwg9VTYYfhuz15Yf1IzCKgZj6/rIbYTmJCUSqr76r0Wg==", + "dev": true, + "license": "MIT", + "dependencies": { + "@eslint-community/regexpp": "^4.12.2", + "@typescript-eslint/scope-manager": "8.58.0", + "@typescript-eslint/type-utils": "8.58.0", + "@typescript-eslint/utils": "8.58.0", + "@typescript-eslint/visitor-keys": "8.58.0", + "ignore": "^7.0.5", + "natural-compare": "^1.4.0", + "ts-api-utils": "^2.5.0" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/typescript-eslint" + }, + "peerDependencies": { + "@typescript-eslint/parser": "^8.58.0", + "eslint": "^8.57.0 || ^9.0.0 || ^10.0.0", + "typescript": ">=4.8.4 <6.1.0" + } + }, + "node_modules/@typescript-eslint/eslint-plugin/node_modules/ignore": { + "version": "7.0.5", + "resolved": "https://registry.npmjs.org/ignore/-/ignore-7.0.5.tgz", + "integrity": "sha512-Hs59xBNfUIunMFgWAbGX5cq6893IbWg4KnrjbYwX3tx0ztorVgTDA6B2sxf8ejHJ4wz8BqGUMYlnzNBer5NvGg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">= 4" + } + }, + "node_modules/@typescript-eslint/parser": { + "version": "8.58.0", + "resolved": "https://registry.npmjs.org/@typescript-eslint/parser/-/parser-8.58.0.tgz", + "integrity": "sha512-rLoGZIf9afaRBYsPUMtvkDWykwXwUPL60HebR4JgTI8mxfFe2cQTu3AGitANp4b9B2QlVru6WzjgB2IzJKiCSA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@typescript-eslint/scope-manager": "8.58.0", + "@typescript-eslint/types": "8.58.0", + "@typescript-eslint/typescript-estree": "8.58.0", + "@typescript-eslint/visitor-keys": "8.58.0", + "debug": "^4.4.3" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/typescript-eslint" + }, + "peerDependencies": { + "eslint": "^8.57.0 || ^9.0.0 || ^10.0.0", + "typescript": ">=4.8.4 <6.1.0" + } + }, + "node_modules/@typescript-eslint/project-service": { + "version": "8.58.0", + "resolved": "https://registry.npmjs.org/@typescript-eslint/project-service/-/project-service-8.58.0.tgz", + "integrity": "sha512-8Q/wBPWLQP1j16NxoPNIKpDZFMaxl7yWIoqXWYeWO+Bbd2mjgvoF0dxP2jKZg5+x49rgKdf7Ck473M8PC3V9lg==", + "dev": true, + "license": "MIT", + "dependencies": { + "@typescript-eslint/tsconfig-utils": "^8.58.0", + "@typescript-eslint/types": "^8.58.0", + "debug": "^4.4.3" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/typescript-eslint" + }, + "peerDependencies": { + "typescript": ">=4.8.4 <6.1.0" + } + }, + "node_modules/@typescript-eslint/scope-manager": { + "version": "8.58.0", + "resolved": "https://registry.npmjs.org/@typescript-eslint/scope-manager/-/scope-manager-8.58.0.tgz", + "integrity": "sha512-W1Lur1oF50FxSnNdGp3Vs6P+yBRSmZiw4IIjEeYxd8UQJwhUF0gDgDD/W/Tgmh73mxgEU3qX0Bzdl/NGuSPEpQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@typescript-eslint/types": "8.58.0", + "@typescript-eslint/visitor-keys": "8.58.0" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/typescript-eslint" + } + }, + "node_modules/@typescript-eslint/tsconfig-utils": { + "version": "8.58.0", + "resolved": "https://registry.npmjs.org/@typescript-eslint/tsconfig-utils/-/tsconfig-utils-8.58.0.tgz", + "integrity": "sha512-doNSZEVJsWEu4htiVC+PR6NpM+pa+a4ClH9INRWOWCUzMst/VA9c4gXq92F8GUD1rwhNvRLkgjfYtFXegXQF7A==", + "dev": true, + "license": "MIT", + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/typescript-eslint" + }, + "peerDependencies": { + "typescript": ">=4.8.4 <6.1.0" + } + }, + "node_modules/@typescript-eslint/type-utils": { + "version": "8.58.0", + "resolved": "https://registry.npmjs.org/@typescript-eslint/type-utils/-/type-utils-8.58.0.tgz", + "integrity": "sha512-aGsCQImkDIqMyx1u4PrVlbi/krmDsQUs4zAcCV6M7yPcPev+RqVlndsJy9kJ8TLihW9TZ0kbDAzctpLn5o+lOg==", + "dev": true, + "license": "MIT", + "dependencies": { + "@typescript-eslint/types": "8.58.0", + "@typescript-eslint/typescript-estree": "8.58.0", + "@typescript-eslint/utils": "8.58.0", + "debug": "^4.4.3", + "ts-api-utils": "^2.5.0" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/typescript-eslint" + }, + "peerDependencies": { + "eslint": "^8.57.0 || ^9.0.0 || ^10.0.0", + "typescript": ">=4.8.4 <6.1.0" + } + }, + "node_modules/@typescript-eslint/types": { + "version": "8.58.0", + "resolved": "https://registry.npmjs.org/@typescript-eslint/types/-/types-8.58.0.tgz", + "integrity": "sha512-O9CjxypDT89fbHxRfETNoAnHj/i6IpRK0CvbVN3qibxlLdo5p5hcLmUuCCrHMpxiWSwKyI8mCP7qRNYuOJ0Uww==", + "dev": true, + "license": "MIT", + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/typescript-eslint" + } + }, + "node_modules/@typescript-eslint/typescript-estree": { + "version": "8.58.0", + "resolved": "https://registry.npmjs.org/@typescript-eslint/typescript-estree/-/typescript-estree-8.58.0.tgz", + "integrity": "sha512-7vv5UWbHqew/dvs+D3e1RvLv1v2eeZ9txRHPnEEBUgSNLx5ghdzjHa0sgLWYVKssH+lYmV0JaWdoubo0ncGYLA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@typescript-eslint/project-service": "8.58.0", + "@typescript-eslint/tsconfig-utils": "8.58.0", + "@typescript-eslint/types": "8.58.0", + "@typescript-eslint/visitor-keys": "8.58.0", + "debug": "^4.4.3", + "minimatch": "^10.2.2", + "semver": "^7.7.3", + "tinyglobby": "^0.2.15", + "ts-api-utils": "^2.5.0" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/typescript-eslint" + }, + "peerDependencies": { + "typescript": ">=4.8.4 <6.1.0" + } + }, + "node_modules/@typescript-eslint/typescript-estree/node_modules/balanced-match": { + "version": "4.0.4", + "resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-4.0.4.tgz", + "integrity": "sha512-BLrgEcRTwX2o6gGxGOCNyMvGSp35YofuYzw9h1IMTRmKqttAZZVU67bdb9Pr2vUHA8+j3i2tJfjO6C6+4myGTA==", + "dev": true, + "license": "MIT", + "engines": { + "node": "18 || 20 || >=22" + } + }, + "node_modules/@typescript-eslint/typescript-estree/node_modules/brace-expansion": { + "version": "5.0.5", + "resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-5.0.5.tgz", + "integrity": "sha512-VZznLgtwhn+Mact9tfiwx64fA9erHH/MCXEUfB/0bX/6Fz6ny5EGTXYltMocqg4xFAQZtnO3DHWWXi8RiuN7cQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "balanced-match": "^4.0.2" + }, + "engines": { + "node": "18 || 20 || >=22" + } + }, + "node_modules/@typescript-eslint/typescript-estree/node_modules/minimatch": { + "version": "10.2.5", + "resolved": "https://registry.npmjs.org/minimatch/-/minimatch-10.2.5.tgz", + "integrity": "sha512-MULkVLfKGYDFYejP07QOurDLLQpcjk7Fw+7jXS2R2czRQzR56yHRveU5NDJEOviH+hETZKSkIk5c+T23GjFUMg==", + "dev": true, + "license": "BlueOak-1.0.0", + "dependencies": { + "brace-expansion": "^5.0.5" + }, + "engines": { + "node": "18 || 20 || >=22" + }, + "funding": { + "url": "https://github.com/sponsors/isaacs" + } + }, + "node_modules/@typescript-eslint/typescript-estree/node_modules/semver": { + "version": "7.7.4", + "resolved": "https://registry.npmjs.org/semver/-/semver-7.7.4.tgz", + "integrity": "sha512-vFKC2IEtQnVhpT78h1Yp8wzwrf8CM+MzKMHGJZfBtzhZNycRFnXsHk6E5TxIkkMsgNS7mdX3AGB7x2QM2di4lA==", + "dev": true, + "license": "ISC", + "bin": { + "semver": "bin/semver.js" + }, + "engines": { + "node": ">=10" + } + }, + "node_modules/@typescript-eslint/utils": { + "version": "8.58.0", + "resolved": "https://registry.npmjs.org/@typescript-eslint/utils/-/utils-8.58.0.tgz", + "integrity": "sha512-RfeSqcFeHMHlAWzt4TBjWOAtoW9lnsAGiP3GbaX9uVgTYYrMbVnGONEfUCiSss+xMHFl+eHZiipmA8WkQ7FuNA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@eslint-community/eslint-utils": "^4.9.1", + "@typescript-eslint/scope-manager": "8.58.0", + "@typescript-eslint/types": "8.58.0", + "@typescript-eslint/typescript-estree": "8.58.0" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/typescript-eslint" + }, + "peerDependencies": { + "eslint": "^8.57.0 || ^9.0.0 || ^10.0.0", + "typescript": ">=4.8.4 <6.1.0" + } + }, + "node_modules/@typescript-eslint/visitor-keys": { + "version": "8.58.0", + "resolved": "https://registry.npmjs.org/@typescript-eslint/visitor-keys/-/visitor-keys-8.58.0.tgz", + "integrity": "sha512-XJ9UD9+bbDo4a4epraTwG3TsNPeiB9aShrUneAVXy8q4LuwowN+qu89/6ByLMINqvIMeI9H9hOHQtg/ijrYXzQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@typescript-eslint/types": "8.58.0", + "eslint-visitor-keys": "^5.0.0" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/typescript-eslint" + } + }, + "node_modules/@typescript-eslint/visitor-keys/node_modules/eslint-visitor-keys": { + "version": "5.0.1", + "resolved": "https://registry.npmjs.org/eslint-visitor-keys/-/eslint-visitor-keys-5.0.1.tgz", + "integrity": "sha512-tD40eHxA35h0PEIZNeIjkHoDR4YjjJp34biM0mDvplBe//mB+IHCqHDGV7pxF+7MklTvighcCPPZC7ynWyjdTA==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": "^20.19.0 || ^22.13.0 || >=24" + }, + "funding": { + "url": "https://opencollective.com/eslint" + } + }, + "node_modules/@vitejs/plugin-react": { + "version": "5.2.0", + "resolved": "https://registry.npmjs.org/@vitejs/plugin-react/-/plugin-react-5.2.0.tgz", + "integrity": "sha512-YmKkfhOAi3wsB1PhJq5Scj3GXMn3WvtQ/JC0xoopuHoXSdmtdStOpFrYaT1kie2YgFBcIe64ROzMYRjCrYOdYw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/core": "^7.29.0", + "@babel/plugin-transform-react-jsx-self": "^7.27.1", + "@babel/plugin-transform-react-jsx-source": "^7.27.1", + "@rolldown/pluginutils": "1.0.0-rc.3", + "@types/babel__core": "^7.20.5", + "react-refresh": "^0.18.0" + }, + "engines": { + "node": "^20.19.0 || >=22.12.0" + }, + "peerDependencies": { + "vite": "^4.2.0 || ^5.0.0 || ^6.0.0 || ^7.0.0 || ^8.0.0" + } + }, + "node_modules/acorn": { + "version": "8.16.0", + "resolved": "https://registry.npmjs.org/acorn/-/acorn-8.16.0.tgz", + "integrity": "sha512-UVJyE9MttOsBQIDKw1skb9nAwQuR5wuGD3+82K6JgJlm/Y+KI92oNsMNGZCYdDsVtRHSak0pcV5Dno5+4jh9sw==", + "dev": true, + "license": "MIT", + "bin": { + "acorn": "bin/acorn" + }, + "engines": { + "node": ">=0.4.0" + } + }, + "node_modules/acorn-jsx": { + "version": "5.3.2", + "resolved": "https://registry.npmjs.org/acorn-jsx/-/acorn-jsx-5.3.2.tgz", + "integrity": "sha512-rq9s+JNhf0IChjtDXxllJ7g41oZk5SlXtp0LHwyA5cejwn7vKmKp4pPri6YEePv2PU65sAsegbXtIinmDFDXgQ==", + "dev": true, + "license": "MIT", + "peerDependencies": { + "acorn": "^6.0.0 || ^7.0.0 || ^8.0.0" + } + }, + "node_modules/ajv": { + "version": "6.14.0", + "resolved": "https://registry.npmjs.org/ajv/-/ajv-6.14.0.tgz", + "integrity": "sha512-IWrosm/yrn43eiKqkfkHis7QioDleaXQHdDVPKg0FSwwd/DuvyX79TZnFOnYpB7dcsFAMmtFztZuXPDvSePkFw==", + "dev": true, + "license": "MIT", + "dependencies": { + "fast-deep-equal": "^3.1.1", + "fast-json-stable-stringify": "^2.0.0", + "json-schema-traverse": "^0.4.1", + "uri-js": "^4.2.2" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/epoberezkin" + } + }, + "node_modules/ansi-styles": { + "version": "4.3.0", + "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-4.3.0.tgz", + "integrity": "sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg==", + "license": "MIT", + "dependencies": { + "color-convert": "^2.0.1" + }, + "engines": { + "node": ">=8" + }, + "funding": { + "url": "https://github.com/chalk/ansi-styles?sponsor=1" + } + }, + "node_modules/argparse": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/argparse/-/argparse-2.0.1.tgz", + "integrity": "sha512-8+9WqebbFzpX9OR+Wa6O29asIogeRMzcGtAINdpMHHyAg10f05aSFVBbcEqGf/PXw1EjAZ+q2/bEBg3DvurK3Q==", + "dev": true, + "license": "Python-2.0" + }, + "node_modules/aria-hidden": { + "version": "1.2.6", + "resolved": "https://registry.npmjs.org/aria-hidden/-/aria-hidden-1.2.6.tgz", + "integrity": "sha512-ik3ZgC9dY/lYVVM++OISsaYDeg1tb0VtP5uL3ouh1koGOaUMDPpbFIei4JkFimWUFPn90sbMNMXQAIVOlnYKJA==", + "license": "MIT", + "dependencies": { + "tslib": "^2.0.0" + }, + "engines": { + "node": ">=10" + } + }, + "node_modules/balanced-match": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-1.0.2.tgz", + "integrity": "sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw==", + "dev": true, + "license": "MIT" + }, + "node_modules/base64-js": { + "version": "1.5.1", + "resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz", + "integrity": "sha512-AKpaYlHn8t4SVbOHCy+b5+KKgvR4vrsD8vbvrbiQJps7fKDTkjkDry6ji0rUJjC0kzbNePLwzxq8iypo41qeWA==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT" + }, + "node_modules/baseline-browser-mapping": { + "version": "2.10.13", + "resolved": "https://registry.npmjs.org/baseline-browser-mapping/-/baseline-browser-mapping-2.10.13.tgz", + "integrity": "sha512-BL2sTuHOdy0YT1lYieUxTw/QMtPBC3pmlJC6xk8BBYVv6vcw3SGdKemQ+Xsx9ik2F/lYDO9tqsFQH1r9PFuHKw==", + "dev": true, + "license": "Apache-2.0", + "bin": { + "baseline-browser-mapping": "dist/cli.cjs" + }, + "engines": { + "node": ">=6.0.0" + } + }, + "node_modules/bl": { + "version": "4.1.0", + "resolved": "https://registry.npmjs.org/bl/-/bl-4.1.0.tgz", + "integrity": "sha512-1W07cM9gS6DcLperZfFSj+bWLtaPGSOHWhPiGzXmvVJbRLdG82sH/Kn8EtW1VqWVA54AKf2h5k5BbnIbwF3h6w==", + "license": "MIT", + "dependencies": { + "buffer": "^5.5.0", + "inherits": "^2.0.4", + "readable-stream": "^3.4.0" + } + }, + "node_modules/brace-expansion": { + "version": "1.1.13", + "resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-1.1.13.tgz", + "integrity": "sha512-9ZLprWS6EENmhEOpjCYW2c8VkmOvckIJZfkr7rBW6dObmfgJ/L1GpSYW5Hpo9lDz4D1+n0Ckz8rU7FwHDQiG/w==", + "dev": true, + "license": "MIT", + "dependencies": { + "balanced-match": "^1.0.0", + "concat-map": "0.0.1" + } + }, + "node_modules/browserslist": { + "version": "4.28.2", + "resolved": "https://registry.npmjs.org/browserslist/-/browserslist-4.28.2.tgz", + "integrity": "sha512-48xSriZYYg+8qXna9kwqjIVzuQxi+KYWp2+5nCYnYKPTr0LvD89Jqk2Or5ogxz0NUMfIjhh2lIUX/LyX9B4oIg==", + "dev": true, + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/browserslist" + }, + { + "type": "tidelift", + "url": "https://tidelift.com/funding/github/npm/browserslist" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "dependencies": { + "baseline-browser-mapping": "^2.10.12", + "caniuse-lite": "^1.0.30001782", + "electron-to-chromium": "^1.5.328", + "node-releases": "^2.0.36", + "update-browserslist-db": "^1.2.3" + }, + "bin": { + "browserslist": "cli.js" + }, + "engines": { + "node": "^6 || ^7 || ^8 || ^9 || ^10 || ^11 || ^12 || >=13.7" + } + }, + "node_modules/buffer": { + "version": "5.7.1", + "resolved": "https://registry.npmjs.org/buffer/-/buffer-5.7.1.tgz", + "integrity": "sha512-EHcyIPBQ4BSGlvjB16k5KgAJ27CIsHY/2JBmCRReo48y9rQ3MaUzWX3KVlBa4U7MyX02HdVj0K7C3WaB3ju7FQ==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT", + "dependencies": { + "base64-js": "^1.3.1", + "ieee754": "^1.1.13" + } + }, + "node_modules/buffer-equal-constant-time": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/buffer-equal-constant-time/-/buffer-equal-constant-time-1.0.1.tgz", + "integrity": "sha512-zRpUiDwd/xk6ADqPMATG8vc9VPrkck7T07OIx0gnjmJAnHnTVXNQG3vfvWNuiZIkwu9KrKdA1iJKfsfTVxE6NA==", + "license": "BSD-3-Clause" + }, + "node_modules/callsites": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/callsites/-/callsites-3.1.0.tgz", + "integrity": "sha512-P8BjAsXvZS+VIDUI11hHCQEv74YT67YUi5JJFNWIqL235sBmjX4+qx9Muvls5ivyNENctx46xQLQ3aTuE7ssaQ==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6" + } + }, + "node_modules/caniuse-lite": { + "version": "1.0.30001782", + "resolved": "https://registry.npmjs.org/caniuse-lite/-/caniuse-lite-1.0.30001782.tgz", + "integrity": "sha512-dZcaJLJeDMh4rELYFw1tvSn1bhZWYFOt468FcbHHxx/Z/dFidd1I6ciyFdi3iwfQCyOjqo9upF6lGQYtMiJWxw==", + "dev": true, + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/browserslist" + }, + { + "type": "tidelift", + "url": "https://tidelift.com/funding/github/npm/caniuse-lite" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "CC-BY-4.0" + }, + "node_modules/chalk": { + "version": "4.1.2", + "resolved": "https://registry.npmjs.org/chalk/-/chalk-4.1.2.tgz", + "integrity": "sha512-oKnbhFyRIXpUuez8iBMmyEa4nbj4IOQyuhc/wy9kY7/WVPcwIO9VA668Pu8RkO7+0G76SLROeyw9CpQ061i4mA==", + "license": "MIT", + "dependencies": { + "ansi-styles": "^4.1.0", + "supports-color": "^7.1.0" + }, + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/chalk/chalk?sponsor=1" + } + }, + "node_modules/chownr": { + "version": "1.1.4", + "resolved": "https://registry.npmjs.org/chownr/-/chownr-1.1.4.tgz", + "integrity": "sha512-jJ0bqzaylmJtVnNgzTeSOs8DPavpbYgEr/b0YL8/2GO3xJEhInFmhKMUnEJQjZumK7KXGFhUy89PrsJWlakBVg==", + "license": "ISC" + }, + "node_modules/class-variance-authority": { + "version": "0.7.1", + "resolved": "https://registry.npmjs.org/class-variance-authority/-/class-variance-authority-0.7.1.tgz", + "integrity": "sha512-Ka+9Trutv7G8M6WT6SeiRWz792K5qEqIGEGzXKhAE6xOWAY6pPH8U+9IY3oCMv6kqTmLsv7Xh/2w2RigkePMsg==", + "license": "Apache-2.0", + "dependencies": { + "clsx": "^2.1.1" + }, + "funding": { + "url": "https://polar.sh/cva" + } + }, + "node_modules/clsx": { + "version": "2.1.1", + "resolved": "https://registry.npmjs.org/clsx/-/clsx-2.1.1.tgz", + "integrity": "sha512-eYm0QWBtUrBWZWG0d386OGAw16Z995PiOVo2B7bjWSbHedGl5e0ZWaq65kOGgUSNesEIDkB9ISbTg/JK9dhCZA==", + "license": "MIT", + "engines": { + "node": ">=6" + } + }, + "node_modules/cmdk": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/cmdk/-/cmdk-1.1.1.tgz", + "integrity": "sha512-Vsv7kFaXm+ptHDMZ7izaRsP70GgrW9NBNGswt9OZaVBLlE0SNpDq8eu/VGXyF9r7M0azK3Wy7OlYXsuyYLFzHg==", + "license": "MIT", + "dependencies": { + "@radix-ui/react-compose-refs": "^1.1.1", + "@radix-ui/react-dialog": "^1.1.6", + "@radix-ui/react-id": "^1.1.0", + "@radix-ui/react-primitive": "^2.0.2" + }, + "peerDependencies": { + "react": "^18 || ^19 || ^19.0.0-rc", + "react-dom": "^18 || ^19 || ^19.0.0-rc" + } + }, + "node_modules/code-block-writer": { + "version": "13.0.3", + "resolved": "https://registry.npmjs.org/code-block-writer/-/code-block-writer-13.0.3.tgz", + "integrity": "sha512-Oofo0pq3IKnsFtuHqSF7TqBfr71aeyZDVJ0HpmqB7FBM2qEigL0iPONSCZSO9pE9dZTAxANe5XHG9Uy0YMv8cg==", + "license": "MIT" + }, + "node_modules/color-convert": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/color-convert/-/color-convert-2.0.1.tgz", + "integrity": "sha512-RRECPsj7iu/xb5oKYcsFHSppFNnsj/52OVTRKb4zP5onXwVF3zVmmToNcOfGC+CRDpfK/U584fMg38ZHCaElKQ==", + "license": "MIT", + "dependencies": { + "color-name": "~1.1.4" + }, + "engines": { + "node": ">=7.0.0" + } + }, + "node_modules/color-name": { + "version": "1.1.4", + "resolved": "https://registry.npmjs.org/color-name/-/color-name-1.1.4.tgz", + "integrity": "sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA==", + "license": "MIT" + }, + "node_modules/commander": { + "version": "10.0.1", + "resolved": "https://registry.npmjs.org/commander/-/commander-10.0.1.tgz", + "integrity": "sha512-y4Mg2tXshplEbSGzx7amzPwKKOCGuoSRP/CjEdwwk0FOGlUbq6lKuoyDZTNZkmxHdJtp54hdfY/JUrdL7Xfdug==", + "license": "MIT", + "engines": { + "node": ">=14" + } + }, + "node_modules/concat-map": { + "version": "0.0.1", + "resolved": "https://registry.npmjs.org/concat-map/-/concat-map-0.0.1.tgz", + "integrity": "sha512-/Srv4dswyQNBfohGpz9o6Yb3Gz3SrUDqBH5rTuhGR7ahtlbYKnVxw2bCFMRljaA7EXHaXZ8wsHdodFvbkhKmqg==", + "dev": true, + "license": "MIT" + }, + "node_modules/convert-source-map": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/convert-source-map/-/convert-source-map-2.0.0.tgz", + "integrity": "sha512-Kvp459HrV2FEJ1CAsi1Ku+MY3kasH19TFykTz2xWmMeq6bk2NU3XXvfJ+Q61m0xktWwt+1HSYf3JZsTms3aRJg==", + "dev": true, + "license": "MIT" + }, + "node_modules/cookie": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/cookie/-/cookie-1.1.1.tgz", + "integrity": "sha512-ei8Aos7ja0weRpFzJnEA9UHJ/7XQmqglbRwnf2ATjcB9Wq874VKH9kfjjirM6UhU2/E5fFYadylyhFldcqSidQ==", + "license": "MIT", + "engines": { + "node": ">=18" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/express" + } + }, + "node_modules/cross-spawn": { + "version": "7.0.6", + "resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.6.tgz", + "integrity": "sha512-uV2QOWP2nWzsy2aMp8aRibhi9dlzF5Hgh5SHaB9OiTGEyDTiJJyx0uy51QXdyWbtAHNua4XJzUKca3OzKUd3vA==", + "dev": true, + "license": "MIT", + "dependencies": { + "path-key": "^3.1.0", + "shebang-command": "^2.0.0", + "which": "^2.0.1" + }, + "engines": { + "node": ">= 8" + } + }, + "node_modules/csstype": { + "version": "3.2.3", + "resolved": "https://registry.npmjs.org/csstype/-/csstype-3.2.3.tgz", + "integrity": "sha512-z1HGKcYy2xA8AGQfwrn0PAy+PB7X/GSj3UVJW9qKyn43xWa+gl5nXmU4qqLMRzWVLFC8KusUX8T/0kCiOYpAIQ==", + "license": "MIT" + }, + "node_modules/d3-array": { + "version": "3.2.4", + "resolved": "https://registry.npmjs.org/d3-array/-/d3-array-3.2.4.tgz", + "integrity": "sha512-tdQAmyA18i4J7wprpYq8ClcxZy3SC31QMeByyCFyRt7BVHdREQZ5lpzoe5mFEYZUWe+oq8HBvk9JjpibyEV4Jg==", + "license": "ISC", + "dependencies": { + "internmap": "1 - 2" + }, + "engines": { + "node": ">=12" + } + }, + "node_modules/d3-color": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/d3-color/-/d3-color-3.1.0.tgz", + "integrity": "sha512-zg/chbXyeBtMQ1LbD/WSoW2DpC3I0mpmPdW+ynRTj/x2DAWYrIY7qeZIHidozwV24m4iavr15lNwIwLxRmOxhA==", + "license": "ISC", + "engines": { + "node": ">=12" + } + }, + "node_modules/d3-ease": { + "version": "3.0.1", + "resolved": "https://registry.npmjs.org/d3-ease/-/d3-ease-3.0.1.tgz", + "integrity": "sha512-wR/XK3D3XcLIZwpbvQwQ5fK+8Ykds1ip7A2Txe0yxncXSdq1L9skcG7blcedkOX+ZcgxGAmLX1FrRGbADwzi0w==", + "license": "BSD-3-Clause", + "engines": { + "node": ">=12" + } + }, + "node_modules/d3-format": { + "version": "3.1.2", + "resolved": "https://registry.npmjs.org/d3-format/-/d3-format-3.1.2.tgz", + "integrity": "sha512-AJDdYOdnyRDV5b6ArilzCPPwc1ejkHcoyFarqlPqT7zRYjhavcT3uSrqcMvsgh2CgoPbK3RCwyHaVyxYcP2Arg==", + "license": "ISC", + "engines": { + "node": ">=12" + } + }, + "node_modules/d3-interpolate": { + "version": "3.0.1", + "resolved": "https://registry.npmjs.org/d3-interpolate/-/d3-interpolate-3.0.1.tgz", + "integrity": "sha512-3bYs1rOD33uo8aqJfKP3JWPAibgw8Zm2+L9vBKEHJ2Rg+viTR7o5Mmv5mZcieN+FRYaAOWX5SJATX6k1PWz72g==", + "license": "ISC", + "dependencies": { + "d3-color": "1 - 3" + }, + "engines": { + "node": ">=12" + } + }, + "node_modules/d3-path": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/d3-path/-/d3-path-3.1.0.tgz", + "integrity": "sha512-p3KP5HCf/bvjBSSKuXid6Zqijx7wIfNW+J/maPs+iwR35at5JCbLUT0LzF1cnjbCHWhqzQTIN2Jpe8pRebIEFQ==", + "license": "ISC", + "engines": { + "node": ">=12" + } + }, + "node_modules/d3-scale": { + "version": "4.0.2", + "resolved": "https://registry.npmjs.org/d3-scale/-/d3-scale-4.0.2.tgz", + "integrity": "sha512-GZW464g1SH7ag3Y7hXjf8RoUuAFIqklOAq3MRl4OaWabTFJY9PN/E1YklhXLh+OQ3fM9yS2nOkCoS+WLZ6kvxQ==", + "license": "ISC", + "dependencies": { + "d3-array": "2.10.0 - 3", + "d3-format": "1 - 3", + "d3-interpolate": "1.2.0 - 3", + "d3-time": "2.1.1 - 3", + "d3-time-format": "2 - 4" + }, + "engines": { + "node": ">=12" + } + }, + "node_modules/d3-shape": { + "version": "3.2.0", + "resolved": "https://registry.npmjs.org/d3-shape/-/d3-shape-3.2.0.tgz", + "integrity": "sha512-SaLBuwGm3MOViRq2ABk3eLoxwZELpH6zhl3FbAoJ7Vm1gofKx6El1Ib5z23NUEhF9AsGl7y+dzLe5Cw2AArGTA==", + "license": "ISC", + "dependencies": { + "d3-path": "^3.1.0" + }, + "engines": { + "node": ">=12" + } + }, + "node_modules/d3-time": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/d3-time/-/d3-time-3.1.0.tgz", + "integrity": "sha512-VqKjzBLejbSMT4IgbmVgDjpkYrNWUYJnbCGo874u7MMKIWsILRX+OpX/gTk8MqjpT1A/c6HY2dCA77ZN0lkQ2Q==", + "license": "ISC", + "dependencies": { + "d3-array": "2 - 3" + }, + "engines": { + "node": ">=12" + } + }, + "node_modules/d3-time-format": { + "version": "4.1.0", + "resolved": "https://registry.npmjs.org/d3-time-format/-/d3-time-format-4.1.0.tgz", + "integrity": "sha512-dJxPBlzC7NugB2PDLwo9Q8JiTR3M3e4/XANkreKSUxF8vvXKqm1Yfq4Q5dl8budlunRVlUUaDUgFt7eA8D6NLg==", + "license": "ISC", + "dependencies": { + "d3-time": "1 - 3" + }, + "engines": { + "node": ">=12" + } + }, + "node_modules/d3-timer": { + "version": "3.0.1", + "resolved": "https://registry.npmjs.org/d3-timer/-/d3-timer-3.0.1.tgz", + "integrity": "sha512-ndfJ/JxxMd3nw31uyKoY2naivF+r29V+Lc0svZxe1JvvIRmi8hUsrMvdOwgS1o6uBHmiz91geQ0ylPP0aj1VUA==", + "license": "ISC", + "engines": { + "node": ">=12" + } + }, + "node_modules/date-fns": { + "version": "4.1.0", + "resolved": "https://registry.npmjs.org/date-fns/-/date-fns-4.1.0.tgz", + "integrity": "sha512-Ukq0owbQXxa/U3EGtsdVBkR1w7KOQ5gIBqdH2hkvknzZPYvBxb/aa6E8L7tmjFtkwZBu3UXBbjIgPo/Ez4xaNg==", + "license": "MIT", + "funding": { + "type": "github", + "url": "https://github.com/sponsors/kossnocorp" + } + }, + "node_modules/date-fns-jalali": { + "version": "4.1.0-0", + "resolved": "https://registry.npmjs.org/date-fns-jalali/-/date-fns-jalali-4.1.0-0.tgz", + "integrity": "sha512-hTIP/z+t+qKwBDcmmsnmjWTduxCg+5KfdqWQvb2X/8C9+knYY6epN/pfxdDuyVlSVeFz0sM5eEfwIUQ70U4ckg==", + "license": "MIT" + }, + "node_modules/debug": { + "version": "4.4.3", + "resolved": "https://registry.npmjs.org/debug/-/debug-4.4.3.tgz", + "integrity": "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA==", + "dev": true, + "license": "MIT", + "dependencies": { + "ms": "^2.1.3" + }, + "engines": { + "node": ">=6.0" + }, + "peerDependenciesMeta": { + "supports-color": { + "optional": true + } + } + }, + "node_modules/decimal.js-light": { + "version": "2.5.1", + "resolved": "https://registry.npmjs.org/decimal.js-light/-/decimal.js-light-2.5.1.tgz", + "integrity": "sha512-qIMFpTMZmny+MMIitAB6D7iVPEorVw6YQRWkvarTkT4tBeSLLiHzcwj6q0MmYSFCiVpiqPJTJEYIrpcPzVEIvg==", + "license": "MIT" + }, + "node_modules/decompress-response": { + "version": "6.0.0", + "resolved": "https://registry.npmjs.org/decompress-response/-/decompress-response-6.0.0.tgz", + "integrity": "sha512-aW35yZM6Bb/4oJlZncMH2LCoZtJXTRxES17vE3hoRiowU2kWHaJKFkSBDnDR+cm9J+9QhXmREyIfv0pji9ejCQ==", + "license": "MIT", + "dependencies": { + "mimic-response": "^3.1.0" + }, + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/deep-extend": { + "version": "0.6.0", + "resolved": "https://registry.npmjs.org/deep-extend/-/deep-extend-0.6.0.tgz", + "integrity": "sha512-LOHxIOaPYdHlJRtCQfDIVZtfw/ufM8+rVj649RIHzcm/vGwQRXFt6OPqIFWsm2XEMrNIEtWR64sY1LEKD2vAOA==", + "license": "MIT", + "engines": { + "node": ">=4.0.0" + } + }, + "node_modules/deep-is": { + "version": "0.1.4", + "resolved": "https://registry.npmjs.org/deep-is/-/deep-is-0.1.4.tgz", + "integrity": "sha512-oIPzksmTg4/MriiaYGO+okXDT7ztn/w3Eptv/+gSIdMdKsJo0u4CfYNFJPy+4SKMuCqGw2wxnA+URMg3t8a/bQ==", + "dev": true, + "license": "MIT" + }, + "node_modules/define-lazy-prop": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/define-lazy-prop/-/define-lazy-prop-2.0.0.tgz", + "integrity": "sha512-Ds09qNh8yw3khSjiJjiUInaGX9xlqZDY7JVryGxdxV7NPeuqQfplOpQ66yJFZut3jLa5zOwkXw1g9EI2uKh4Og==", + "license": "MIT", + "engines": { + "node": ">=8" + } + }, + "node_modules/detect-libc": { + "version": "2.1.2", + "resolved": "https://registry.npmjs.org/detect-libc/-/detect-libc-2.1.2.tgz", + "integrity": "sha512-Btj2BOOO83o3WyH59e8MgXsxEQVcarkUOpEYrubB0urwnN10yQ364rsiByU11nZlqWYZm05i/of7io4mzihBtQ==", + "license": "Apache-2.0", + "engines": { + "node": ">=8" + } + }, + "node_modules/detect-node-es": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/detect-node-es/-/detect-node-es-1.1.0.tgz", + "integrity": "sha512-ypdmJU/TbBby2Dxibuv7ZLW3Bs1QEmM7nHjEANfohJLvE0XVujisn1qPJcZxg+qDucsr+bP6fLD1rPS3AhJ7EQ==", + "license": "MIT" + }, + "node_modules/dom-helpers": { + "version": "5.2.1", + "resolved": "https://registry.npmjs.org/dom-helpers/-/dom-helpers-5.2.1.tgz", + "integrity": "sha512-nRCa7CK3VTrM2NmGkIy4cbK7IZlgBE/PYMn55rrXefr5xXDP0LdtfPnblFDoVdcAfslJ7or6iqAUnx0CCGIWQA==", + "license": "MIT", + "dependencies": { + "@babel/runtime": "^7.8.7", + "csstype": "^3.0.2" + } + }, + "node_modules/ecdsa-sig-formatter": { + "version": "1.0.11", + "resolved": "https://registry.npmjs.org/ecdsa-sig-formatter/-/ecdsa-sig-formatter-1.0.11.tgz", + "integrity": "sha512-nagl3RYrbNv6kQkeJIpt6NJZy8twLB/2vtz6yN9Z4vRKHN4/QZJIEbqohALSgwKdnksuY3k5Addp5lg8sVoVcQ==", + "license": "Apache-2.0", + "dependencies": { + "safe-buffer": "^5.0.1" + } + }, + "node_modules/electron-to-chromium": { + "version": "1.5.329", + "resolved": "https://registry.npmjs.org/electron-to-chromium/-/electron-to-chromium-1.5.329.tgz", + "integrity": "sha512-/4t+AS1l4S3ZC0Ja7PHFIWeBIxGA3QGqV8/yKsP36v7NcyUCl+bIcmw6s5zVuMIECWwBrAK/6QLzTmbJChBboQ==", + "dev": true, + "license": "ISC" + }, + "node_modules/end-of-stream": { + "version": "1.4.5", + "resolved": "https://registry.npmjs.org/end-of-stream/-/end-of-stream-1.4.5.tgz", + "integrity": "sha512-ooEGc6HP26xXq/N+GCGOT0JKCLDGrq2bQUZrQ7gyrJiZANJ/8YDTxTpQBXGMn+WbIQXNVpyWymm7KYVICQnyOg==", + "license": "MIT", + "dependencies": { + "once": "^1.4.0" + } + }, + "node_modules/enhanced-resolve": { + "version": "5.20.1", + "resolved": "https://registry.npmjs.org/enhanced-resolve/-/enhanced-resolve-5.20.1.tgz", + "integrity": "sha512-Qohcme7V1inbAfvjItgw0EaxVX5q2rdVEZHRBrEQdRZTssLDGsL8Lwrznl8oQ/6kuTJONLaDcGjkNP247XEhcA==", + "license": "MIT", + "dependencies": { + "graceful-fs": "^4.2.4", + "tapable": "^2.3.0" + }, + "engines": { + "node": ">=10.13.0" + } + }, + "node_modules/esbuild": { + "version": "0.27.4", + "resolved": "https://registry.npmjs.org/esbuild/-/esbuild-0.27.4.tgz", + "integrity": "sha512-Rq4vbHnYkK5fws5NF7MYTU68FPRE1ajX7heQ/8QXXWqNgqqJ/GkmmyxIzUnf2Sr/bakf8l54716CcMGHYhMrrQ==", + "hasInstallScript": true, + "license": "MIT", + "bin": { + "esbuild": "bin/esbuild" + }, + "engines": { + "node": ">=18" + }, + "optionalDependencies": { + "@esbuild/aix-ppc64": "0.27.4", + "@esbuild/android-arm": "0.27.4", + "@esbuild/android-arm64": "0.27.4", + "@esbuild/android-x64": "0.27.4", + "@esbuild/darwin-arm64": "0.27.4", + "@esbuild/darwin-x64": "0.27.4", + "@esbuild/freebsd-arm64": "0.27.4", + "@esbuild/freebsd-x64": "0.27.4", + "@esbuild/linux-arm": "0.27.4", + "@esbuild/linux-arm64": "0.27.4", + "@esbuild/linux-ia32": "0.27.4", + "@esbuild/linux-loong64": "0.27.4", + "@esbuild/linux-mips64el": "0.27.4", + "@esbuild/linux-ppc64": "0.27.4", + "@esbuild/linux-riscv64": "0.27.4", + "@esbuild/linux-s390x": "0.27.4", + "@esbuild/linux-x64": "0.27.4", + "@esbuild/netbsd-arm64": "0.27.4", + "@esbuild/netbsd-x64": "0.27.4", + "@esbuild/openbsd-arm64": "0.27.4", + "@esbuild/openbsd-x64": "0.27.4", + "@esbuild/openharmony-arm64": "0.27.4", + "@esbuild/sunos-x64": "0.27.4", + "@esbuild/win32-arm64": "0.27.4", + "@esbuild/win32-ia32": "0.27.4", + "@esbuild/win32-x64": "0.27.4" + } + }, + "node_modules/escalade": { + "version": "3.2.0", + "resolved": "https://registry.npmjs.org/escalade/-/escalade-3.2.0.tgz", + "integrity": "sha512-WUj2qlxaQtO4g6Pq5c29GTcWGDyd8itL8zTlipgECz3JesAiiOKotd8JU6otB3PACgG6xkJUyVhboMS+bje/jA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6" + } + }, + "node_modules/escape-string-regexp": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/escape-string-regexp/-/escape-string-regexp-4.0.0.tgz", + "integrity": "sha512-TtpcNJ3XAzx3Gq8sWRzJaVajRs0uVxA2YAkdb1jm2YkPz4G6egUFAyA3n5vtEIZefPk5Wa4UXbKuS5fKkJWdgA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/eslint": { + "version": "9.39.4", + "resolved": "https://registry.npmjs.org/eslint/-/eslint-9.39.4.tgz", + "integrity": "sha512-XoMjdBOwe/esVgEvLmNsD3IRHkm7fbKIUGvrleloJXUZgDHig2IPWNniv+GwjyJXzuNqVjlr5+4yVUZjycJwfQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@eslint-community/eslint-utils": "^4.8.0", + "@eslint-community/regexpp": "^4.12.1", + "@eslint/config-array": "^0.21.2", + "@eslint/config-helpers": "^0.4.2", + "@eslint/core": "^0.17.0", + "@eslint/eslintrc": "^3.3.5", + "@eslint/js": "9.39.4", + "@eslint/plugin-kit": "^0.4.1", + "@humanfs/node": "^0.16.6", + "@humanwhocodes/module-importer": "^1.0.1", + "@humanwhocodes/retry": "^0.4.2", + "@types/estree": "^1.0.6", + "ajv": "^6.14.0", + "chalk": "^4.0.0", + "cross-spawn": "^7.0.6", + "debug": "^4.3.2", + "escape-string-regexp": "^4.0.0", + "eslint-scope": "^8.4.0", + "eslint-visitor-keys": "^4.2.1", + "espree": "^10.4.0", + "esquery": "^1.5.0", + "esutils": "^2.0.2", + "fast-deep-equal": "^3.1.3", + "file-entry-cache": "^8.0.0", + "find-up": "^5.0.0", + "glob-parent": "^6.0.2", + "ignore": "^5.2.0", + "imurmurhash": "^0.1.4", + "is-glob": "^4.0.0", + "json-stable-stringify-without-jsonify": "^1.0.1", + "lodash.merge": "^4.6.2", + "minimatch": "^3.1.5", + "natural-compare": "^1.4.0", + "optionator": "^0.9.3" + }, + "bin": { + "eslint": "bin/eslint.js" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "url": "https://eslint.org/donate" + }, + "peerDependencies": { + "jiti": "*" + }, + "peerDependenciesMeta": { + "jiti": { + "optional": true + } + } + }, + "node_modules/eslint-plugin-react-hooks": { + "version": "5.2.0", + "resolved": "https://registry.npmjs.org/eslint-plugin-react-hooks/-/eslint-plugin-react-hooks-5.2.0.tgz", + "integrity": "sha512-+f15FfK64YQwZdJNELETdn5ibXEUQmW1DZL6KXhNnc2heoy/sg9VJJeT7n8TlMWouzWqSWavFkIhHyIbIAEapg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=10" + }, + "peerDependencies": { + "eslint": "^3.0.0 || ^4.0.0 || ^5.0.0 || ^6.0.0 || ^7.0.0 || ^8.0.0-0 || ^9.0.0" + } + }, + "node_modules/eslint-plugin-react-refresh": { + "version": "0.4.26", + "resolved": "https://registry.npmjs.org/eslint-plugin-react-refresh/-/eslint-plugin-react-refresh-0.4.26.tgz", + "integrity": "sha512-1RETEylht2O6FM/MvgnyvT+8K21wLqDNg4qD51Zj3guhjt433XbnnkVttHMyaVyAFD03QSV4LPS5iE3VQmO7XQ==", + "dev": true, + "license": "MIT", + "peerDependencies": { + "eslint": ">=8.40" + } + }, + "node_modules/eslint-scope": { + "version": "8.4.0", + "resolved": "https://registry.npmjs.org/eslint-scope/-/eslint-scope-8.4.0.tgz", + "integrity": "sha512-sNXOfKCn74rt8RICKMvJS7XKV/Xk9kA7DyJr8mJik3S7Cwgy3qlkkmyS2uQB3jiJg6VNdZd/pDBJu0nvG2NlTg==", + "dev": true, + "license": "BSD-2-Clause", + "dependencies": { + "esrecurse": "^4.3.0", + "estraverse": "^5.2.0" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "url": "https://opencollective.com/eslint" + } + }, + "node_modules/eslint-visitor-keys": { + "version": "4.2.1", + "resolved": "https://registry.npmjs.org/eslint-visitor-keys/-/eslint-visitor-keys-4.2.1.tgz", + "integrity": "sha512-Uhdk5sfqcee/9H/rCOJikYz67o0a2Tw2hGRPOG2Y1R2dg7brRe1uG0yaNQDHu+TO/uQPF/5eCapvYSmHUjt7JQ==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "url": "https://opencollective.com/eslint" + } + }, + "node_modules/espree": { + "version": "10.4.0", + "resolved": "https://registry.npmjs.org/espree/-/espree-10.4.0.tgz", + "integrity": "sha512-j6PAQ2uUr79PZhBjP5C5fhl8e39FmRnOjsD5lGnWrFU8i2G776tBK7+nP8KuQUTTyAZUwfQqXAgrVH5MbH9CYQ==", + "dev": true, + "license": "BSD-2-Clause", + "dependencies": { + "acorn": "^8.15.0", + "acorn-jsx": "^5.3.2", + "eslint-visitor-keys": "^4.2.1" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "url": "https://opencollective.com/eslint" + } + }, + "node_modules/esquery": { + "version": "1.7.0", + "resolved": "https://registry.npmjs.org/esquery/-/esquery-1.7.0.tgz", + "integrity": "sha512-Ap6G0WQwcU/LHsvLwON1fAQX9Zp0A2Y6Y/cJBl9r/JbW90Zyg4/zbG6zzKa2OTALELarYHmKu0GhpM5EO+7T0g==", + "dev": true, + "license": "BSD-3-Clause", + "dependencies": { + "estraverse": "^5.1.0" + }, + "engines": { + "node": ">=0.10" + } + }, + "node_modules/esrecurse": { + "version": "4.3.0", + "resolved": "https://registry.npmjs.org/esrecurse/-/esrecurse-4.3.0.tgz", + "integrity": "sha512-KmfKL3b6G+RXvP8N1vr3Tq1kL/oCFgn2NYXEtqP8/L3pKapUA4G8cFVaoF3SU323CD4XypR/ffioHmkti6/Tag==", + "dev": true, + "license": "BSD-2-Clause", + "dependencies": { + "estraverse": "^5.2.0" + }, + "engines": { + "node": ">=4.0" + } + }, + "node_modules/estraverse": { + "version": "5.3.0", + "resolved": "https://registry.npmjs.org/estraverse/-/estraverse-5.3.0.tgz", + "integrity": "sha512-MMdARuVEQziNTeJD8DgMqmhwR11BRQ/cBP+pLtYdSTnf3MIO8fFeiINEbX36ZdNlfU/7A9f3gUw49B3oQsvwBA==", + "dev": true, + "license": "BSD-2-Clause", + "engines": { + "node": ">=4.0" + } + }, + "node_modules/esutils": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/esutils/-/esutils-2.0.3.tgz", + "integrity": "sha512-kVscqXk4OCp68SZ0dkgEKVi6/8ij300KBWTJq32P/dYeWTSwK41WyTxalN1eRmA5Z9UU/LX9D7FWSmV9SAYx6g==", + "dev": true, + "license": "BSD-2-Clause", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/eventemitter3": { + "version": "4.0.7", + "resolved": "https://registry.npmjs.org/eventemitter3/-/eventemitter3-4.0.7.tgz", + "integrity": "sha512-8guHBZCwKnFhYdHr2ysuRWErTwhoN2X8XELRlrRwpmfeY2jjuUN4taQMsULKUVo1K4DvZl+0pgfyoysHxvmvEw==", + "license": "MIT" + }, + "node_modules/expand-template": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/expand-template/-/expand-template-2.0.3.tgz", + "integrity": "sha512-XYfuKMvj4O35f/pOXLObndIRvyQ+/+6AhODh+OKWj9S9498pHHn/IMszH+gt0fBCRWMNfk1ZSp5x3AifmnI2vg==", + "license": "(MIT OR WTFPL)", + "engines": { + "node": ">=6" + } + }, + "node_modules/fast-deep-equal": { + "version": "3.1.3", + "resolved": "https://registry.npmjs.org/fast-deep-equal/-/fast-deep-equal-3.1.3.tgz", + "integrity": "sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q==", + "dev": true, + "license": "MIT" + }, + "node_modules/fast-equals": { + "version": "5.4.0", + "resolved": "https://registry.npmjs.org/fast-equals/-/fast-equals-5.4.0.tgz", + "integrity": "sha512-jt2DW/aNFNwke7AUd+Z+e6pz39KO5rzdbbFCg2sGafS4mk13MI7Z8O5z9cADNn5lhGODIgLwug6TZO2ctf7kcw==", + "license": "MIT", + "engines": { + "node": ">=6.0.0" + } + }, + "node_modules/fast-json-stable-stringify": { + "version": "2.1.0", + "resolved": "https://registry.npmjs.org/fast-json-stable-stringify/-/fast-json-stable-stringify-2.1.0.tgz", + "integrity": "sha512-lhd/wF+Lk98HZoTCtlVraHtfh5XYijIjalXck7saUtuanSDyLMxnHhSXEDJqHxD7msR8D0uCmqlkwjCV8xvwHw==", + "dev": true, + "license": "MIT" + }, + "node_modules/fast-levenshtein": { + "version": "2.0.6", + "resolved": "https://registry.npmjs.org/fast-levenshtein/-/fast-levenshtein-2.0.6.tgz", + "integrity": "sha512-DCXu6Ifhqcks7TZKY3Hxp3y6qphY5SJZmrWMDrKcERSOXWQdMhU9Ig/PYrzyw/ul9jOIyh0N4M0tbC5hodg8dw==", + "dev": true, + "license": "MIT" + }, + "node_modules/fdir": { + "version": "6.5.0", + "resolved": "https://registry.npmjs.org/fdir/-/fdir-6.5.0.tgz", + "integrity": "sha512-tIbYtZbucOs0BRGqPJkshJUYdL+SDH7dVM8gjy+ERp3WAUjLEFJE+02kanyHtwjWOnwrKYBiwAmM0p4kLJAnXg==", + "license": "MIT", + "engines": { + "node": ">=12.0.0" + }, + "peerDependencies": { + "picomatch": "^3 || ^4" + }, + "peerDependenciesMeta": { + "picomatch": { + "optional": true + } + } + }, + "node_modules/file-entry-cache": { + "version": "8.0.0", + "resolved": "https://registry.npmjs.org/file-entry-cache/-/file-entry-cache-8.0.0.tgz", + "integrity": "sha512-XXTUwCvisa5oacNGRP9SfNtYBNAMi+RPwBFmblZEF7N7swHYQS6/Zfk7SRwx4D5j3CH211YNRco1DEMNVfZCnQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "flat-cache": "^4.0.0" + }, + "engines": { + "node": ">=16.0.0" + } + }, + "node_modules/find-up": { + "version": "5.0.0", + "resolved": "https://registry.npmjs.org/find-up/-/find-up-5.0.0.tgz", + "integrity": "sha512-78/PXT1wlLLDgTzDs7sjq9hzz0vXD+zn+7wypEe4fXQxCmdmqfGsEPQxmiCSQI3ajFV91bVSsvNtrJRiW6nGng==", + "dev": true, + "license": "MIT", + "dependencies": { + "locate-path": "^6.0.0", + "path-exists": "^4.0.0" + }, + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/flat-cache": { + "version": "4.0.1", + "resolved": "https://registry.npmjs.org/flat-cache/-/flat-cache-4.0.1.tgz", + "integrity": "sha512-f7ccFPK3SXFHpx15UIGyRJ/FJQctuKZ0zVuN3frBo4HnK3cay9VEW0R6yPYFHC0AgqhukPzKjq22t5DmAyqGyw==", + "dev": true, + "license": "MIT", + "dependencies": { + "flatted": "^3.2.9", + "keyv": "^4.5.4" + }, + "engines": { + "node": ">=16" + } + }, + "node_modules/flatted": { + "version": "3.4.2", + "resolved": "https://registry.npmjs.org/flatted/-/flatted-3.4.2.tgz", + "integrity": "sha512-PjDse7RzhcPkIJwy5t7KPWQSZ9cAbzQXcafsetQoD7sOJRQlGikNbx7yZp2OotDnJyrDcbyRq3Ttb18iYOqkxA==", + "dev": true, + "license": "ISC" + }, + "node_modules/fs-constants": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/fs-constants/-/fs-constants-1.0.0.tgz", + "integrity": "sha512-y6OAwoSIf7FyjMIv94u+b5rdheZEjzR63GTyZJm5qh4Bi+2YgwLCcI/fPFZkL5PSixOt6ZNKm+w+Hfp/Bciwow==", + "license": "MIT" + }, + "node_modules/fsevents": { + "version": "2.3.3", + "resolved": "https://registry.npmjs.org/fsevents/-/fsevents-2.3.3.tgz", + "integrity": "sha512-5xoDfX+fL7faATnagmWPpbFtwh/R77WmMMqqHGS65C3vvB0YHrgF+B1YmZ3441tMj5n63k0212XNoJwzlhffQw==", + "hasInstallScript": true, + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": "^8.16.0 || ^10.6.0 || >=11.0.0" + } + }, + "node_modules/gensync": { + "version": "1.0.0-beta.2", + "resolved": "https://registry.npmjs.org/gensync/-/gensync-1.0.0-beta.2.tgz", + "integrity": "sha512-3hN7NaskYvMDLQY55gnW3NQ+mesEAepTqlg+VEbj7zzqEMBVNhzcGYYeqFo/TlYz6eQiFcp1HcsCZO+nGgS8zg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/get-nonce": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/get-nonce/-/get-nonce-1.0.1.tgz", + "integrity": "sha512-FJhYRoDaiatfEkUK8HKlicmu/3SGFD51q3itKDGoSTysQJBnfOcxU5GxnhE1E6soB76MbT0MBtnKJuXyAx+96Q==", + "license": "MIT", + "engines": { + "node": ">=6" + } + }, + "node_modules/github-from-package": { + "version": "0.0.0", + "resolved": "https://registry.npmjs.org/github-from-package/-/github-from-package-0.0.0.tgz", + "integrity": "sha512-SyHy3T1v2NUXn29OsWdxmK6RwHD+vkj3v8en8AOBZ1wBQ/hCAQ5bAQTD02kW4W9tUp/3Qh6J8r9EvntiyCmOOw==", + "license": "MIT" + }, + "node_modules/glob-parent": { + "version": "6.0.2", + "resolved": "https://registry.npmjs.org/glob-parent/-/glob-parent-6.0.2.tgz", + "integrity": "sha512-XxwI8EOhVQgWp6iDL+3b0r86f4d6AX6zSU55HfB4ydCEuXLXc5FcYeOu+nnGftS4TEju/11rt4KJPTMgbfmv4A==", + "dev": true, + "license": "ISC", + "dependencies": { + "is-glob": "^4.0.3" + }, + "engines": { + "node": ">=10.13.0" + } + }, + "node_modules/globals": { + "version": "16.5.0", + "resolved": "https://registry.npmjs.org/globals/-/globals-16.5.0.tgz", + "integrity": "sha512-c/c15i26VrJ4IRt5Z89DnIzCGDn9EcebibhAOjw5ibqEHsE1wLUgkPn9RDmNcUKyU87GeaL633nyJ+pplFR2ZQ==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=18" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/graceful-fs": { + "version": "4.2.11", + "resolved": "https://registry.npmjs.org/graceful-fs/-/graceful-fs-4.2.11.tgz", + "integrity": "sha512-RbJ5/jmFcNNCcDV5o9eTnBLJ/HszWV0P73bc+Ff4nS/rJj+YaS6IGyiOL0VoBYX+l1Wrl3k63h/KrH+nhJ0XvQ==", + "license": "ISC" + }, + "node_modules/has-flag": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/has-flag/-/has-flag-4.0.0.tgz", + "integrity": "sha512-EykJT/Q1KjTWctppgIAgfSO0tKVuZUjhgMr17kqTumMl6Afv3EISleU7qZUzoXDFTAHTDC4NOoG/ZxU3EvlMPQ==", + "license": "MIT", + "engines": { + "node": ">=8" + } + }, + "node_modules/ieee754": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/ieee754/-/ieee754-1.2.1.tgz", + "integrity": "sha512-dcyqhDvX1C46lXZcVqCpK+FtMRQVdIMN6/Df5js2zouUsqG7I6sFxitIC+7KYK29KdXOLHdu9zL4sFnoVQnqaA==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "BSD-3-Clause" + }, + "node_modules/ignore": { + "version": "5.3.2", + "resolved": "https://registry.npmjs.org/ignore/-/ignore-5.3.2.tgz", + "integrity": "sha512-hsBTNUqQTDwkWtcdYI2i06Y/nUBEsNEDJKjWdigLvegy8kDuJAS8uRlpkkcQpyEXL0Z/pjDy5HBmMjRCJ2gq+g==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">= 4" + } + }, + "node_modules/import-fresh": { + "version": "3.3.1", + "resolved": "https://registry.npmjs.org/import-fresh/-/import-fresh-3.3.1.tgz", + "integrity": "sha512-TR3KfrTZTYLPB6jUjfx6MF9WcWrHL9su5TObK4ZkYgBdWKPOFoSoQIdEuTuR82pmtxH2spWG9h6etwfr1pLBqQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "parent-module": "^1.0.0", + "resolve-from": "^4.0.0" + }, + "engines": { + "node": ">=6" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/imurmurhash": { + "version": "0.1.4", + "resolved": "https://registry.npmjs.org/imurmurhash/-/imurmurhash-0.1.4.tgz", + "integrity": "sha512-JmXMZ6wuvDmLiHEml9ykzqO6lwFbof0GG4IkcGaENdCRDDmMVnny7s5HsIgHCbaq0w2MyPhDqkhTUgS2LU2PHA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=0.8.19" + } + }, + "node_modules/inherits": { + "version": "2.0.4", + "resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz", + "integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==", + "license": "ISC" + }, + "node_modules/ini": { + "version": "1.3.8", + "resolved": "https://registry.npmjs.org/ini/-/ini-1.3.8.tgz", + "integrity": "sha512-JV/yugV2uzW5iMRSiZAyDtQd+nxtUnjeLt0acNdw98kKLrvuRVyB80tsREOE7yvGVgalhZ6RNXCmEHkUKBKxew==", + "license": "ISC" + }, + "node_modules/internmap": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/internmap/-/internmap-2.0.3.tgz", + "integrity": "sha512-5Hh7Y1wQbvY5ooGgPbDaL5iYLAPzMTUrjMulskHLH6wnv/A+1q5rgEaiuqEjB+oxGXIVZs1FF+R/KPN3ZSQYYg==", + "license": "ISC", + "engines": { + "node": ">=12" + } + }, + "node_modules/is-docker": { + "version": "2.2.1", + "resolved": "https://registry.npmjs.org/is-docker/-/is-docker-2.2.1.tgz", + "integrity": "sha512-F+i2BKsFrH66iaUFc0woD8sLy8getkwTwtOBjvs56Cx4CgJDeKQeqfz8wAYiSb8JOprWhHH5p77PbmYCvvUuXQ==", + "license": "MIT", + "bin": { + "is-docker": "cli.js" + }, + "engines": { + "node": ">=8" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/is-extglob": { + "version": "2.1.1", + "resolved": "https://registry.npmjs.org/is-extglob/-/is-extglob-2.1.1.tgz", + "integrity": "sha512-SbKbANkN603Vi4jEZv49LeVJMn4yGwsbzZworEoyEiutsN3nJYdbO36zfhGJ6QEDpOZIFkDtnq5JRxmvl3jsoQ==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/is-glob": { + "version": "4.0.3", + "resolved": "https://registry.npmjs.org/is-glob/-/is-glob-4.0.3.tgz", + "integrity": "sha512-xelSayHH36ZgE7ZWhli7pW34hNbNl8Ojv5KVmkJD4hBdD3th8Tfk9vYasLM+mXWOZhFkgZfxhLSnrwRr4elSSg==", + "dev": true, + "license": "MIT", + "dependencies": { + "is-extglob": "^2.1.1" + }, + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/is-wsl": { + "version": "2.2.0", + "resolved": "https://registry.npmjs.org/is-wsl/-/is-wsl-2.2.0.tgz", + "integrity": "sha512-fKzAra0rGJUUBwGBgNkHZuToZcn+TtXHpeCgmkMJMMYx1sQDYaCSyjJBSCa2nH1DGm7s3n1oBnohoVTBaN7Lww==", + "license": "MIT", + "dependencies": { + "is-docker": "^2.0.0" + }, + "engines": { + "node": ">=8" + } + }, + "node_modules/isexe": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/isexe/-/isexe-2.0.0.tgz", + "integrity": "sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw==", + "dev": true, + "license": "ISC" + }, + "node_modules/jiti": { + "version": "2.6.1", + "resolved": "https://registry.npmjs.org/jiti/-/jiti-2.6.1.tgz", + "integrity": "sha512-ekilCSN1jwRvIbgeg/57YFh8qQDNbwDb9xT/qu2DAHbFFZUicIl4ygVaAvzveMhMVr3LnpSKTNnwt8PoOfmKhQ==", + "license": "MIT", + "bin": { + "jiti": "lib/jiti-cli.mjs" + } + }, + "node_modules/js-tokens": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/js-tokens/-/js-tokens-4.0.0.tgz", + "integrity": "sha512-RdJUflcE3cUzKiMqQgsCu06FPu9UdIJO0beYbPhHN4k6apgJtifcoCtT9bcxOpYBtpD2kCM6Sbzg4CausW/PKQ==", + "license": "MIT" + }, + "node_modules/js-yaml": { + "version": "4.1.1", + "resolved": "https://registry.npmjs.org/js-yaml/-/js-yaml-4.1.1.tgz", + "integrity": "sha512-qQKT4zQxXl8lLwBtHMWwaTcGfFOZviOJet3Oy/xmGk2gZH677CJM9EvtfdSkgWcATZhj/55JZ0rmy3myCT5lsA==", + "dev": true, + "license": "MIT", + "dependencies": { + "argparse": "^2.0.1" + }, + "bin": { + "js-yaml": "bin/js-yaml.js" + } + }, + "node_modules/jsesc": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/jsesc/-/jsesc-3.1.0.tgz", + "integrity": "sha512-/sM3dO2FOzXjKQhJuo0Q173wf2KOo8t4I8vHy6lF9poUp7bKT0/NHE8fPX23PwfhnykfqnC2xRxOnVw5XuGIaA==", + "dev": true, + "license": "MIT", + "bin": { + "jsesc": "bin/jsesc" + }, + "engines": { + "node": ">=6" + } + }, + "node_modules/json-buffer": { + "version": "3.0.1", + "resolved": "https://registry.npmjs.org/json-buffer/-/json-buffer-3.0.1.tgz", + "integrity": "sha512-4bV5BfR2mqfQTJm+V5tPPdf+ZpuhiIvTuAB5g8kcrXOZpTT/QwwVRWBywX1ozr6lEuPdbHxwaJlm9G6mI2sfSQ==", + "dev": true, + "license": "MIT" + }, + "node_modules/json-schema-traverse": { + "version": "0.4.1", + "resolved": "https://registry.npmjs.org/json-schema-traverse/-/json-schema-traverse-0.4.1.tgz", + "integrity": "sha512-xbbCH5dCYU5T8LcEhhuh7HJ88HXuW3qsI3Y0zOZFKfZEHcpWiHU/Jxzk629Brsab/mMiHQti9wMP+845RPe3Vg==", + "dev": true, + "license": "MIT" + }, + "node_modules/json-stable-stringify-without-jsonify": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/json-stable-stringify-without-jsonify/-/json-stable-stringify-without-jsonify-1.0.1.tgz", + "integrity": "sha512-Bdboy+l7tA3OGW6FjyFHWkP5LuByj1Tk33Ljyq0axyzdk9//JSi2u3fP1QSmd1KNwq6VOKYGlAu87CisVir6Pw==", + "dev": true, + "license": "MIT" + }, + "node_modules/json5": { + "version": "2.2.3", + "resolved": "https://registry.npmjs.org/json5/-/json5-2.2.3.tgz", + "integrity": "sha512-XmOWe7eyHYH14cLdVPoyg+GOH3rYX++KpzrylJwSW98t3Nk+U8XOl8FWKOgwtzdb8lXGf6zYwDUzeHMWfxasyg==", + "dev": true, + "license": "MIT", + "bin": { + "json5": "lib/cli.js" + }, + "engines": { + "node": ">=6" + } + }, + "node_modules/jsonwebtoken": { + "version": "9.0.3", + "resolved": "https://registry.npmjs.org/jsonwebtoken/-/jsonwebtoken-9.0.3.tgz", + "integrity": "sha512-MT/xP0CrubFRNLNKvxJ2BYfy53Zkm++5bX9dtuPbqAeQpTVe0MQTFhao8+Cp//EmJp244xt6Drw/GVEGCUj40g==", + "license": "MIT", + "dependencies": { + "jws": "^4.0.1", + "lodash.includes": "^4.3.0", + "lodash.isboolean": "^3.0.3", + "lodash.isinteger": "^4.0.4", + "lodash.isnumber": "^3.0.3", + "lodash.isplainobject": "^4.0.6", + "lodash.isstring": "^4.0.1", + "lodash.once": "^4.0.0", + "ms": "^2.1.1", + "semver": "^7.5.4" + }, + "engines": { + "node": ">=12", + "npm": ">=6" + } + }, + "node_modules/jsonwebtoken/node_modules/semver": { + "version": "7.7.4", + "resolved": "https://registry.npmjs.org/semver/-/semver-7.7.4.tgz", + "integrity": "sha512-vFKC2IEtQnVhpT78h1Yp8wzwrf8CM+MzKMHGJZfBtzhZNycRFnXsHk6E5TxIkkMsgNS7mdX3AGB7x2QM2di4lA==", + "license": "ISC", + "bin": { + "semver": "bin/semver.js" + }, + "engines": { + "node": ">=10" + } + }, + "node_modules/jwa": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/jwa/-/jwa-2.0.1.tgz", + "integrity": "sha512-hRF04fqJIP8Abbkq5NKGN0Bbr3JxlQ+qhZufXVr0DvujKy93ZCbXZMHDL4EOtodSbCWxOqR8MS1tXA5hwqCXDg==", + "license": "MIT", + "dependencies": { + "buffer-equal-constant-time": "^1.0.1", + "ecdsa-sig-formatter": "1.0.11", + "safe-buffer": "^5.0.1" + } + }, + "node_modules/jws": { + "version": "4.0.1", + "resolved": "https://registry.npmjs.org/jws/-/jws-4.0.1.tgz", + "integrity": "sha512-EKI/M/yqPncGUUh44xz0PxSidXFr/+r0pA70+gIYhjv+et7yxM+s29Y+VGDkovRofQem0fs7Uvf4+YmAdyRduA==", + "license": "MIT", + "dependencies": { + "jwa": "^2.0.1", + "safe-buffer": "^5.0.1" + } + }, + "node_modules/keytar": { + "version": "7.9.0", + "resolved": "https://registry.npmjs.org/keytar/-/keytar-7.9.0.tgz", + "integrity": "sha512-VPD8mtVtm5JNtA2AErl6Chp06JBfy7diFQ7TQQhdpWOl6MrCRB+eRbvAZUsbGQS9kiMq0coJsy0W0vHpDCkWsQ==", + "hasInstallScript": true, + "license": "MIT", + "dependencies": { + "node-addon-api": "^4.3.0", + "prebuild-install": "^7.0.1" + } + }, + "node_modules/keyv": { + "version": "4.5.4", + "resolved": "https://registry.npmjs.org/keyv/-/keyv-4.5.4.tgz", + "integrity": "sha512-oxVHkHR/EJf2CNXnWxRLW6mg7JyCCUcG0DtEGmL2ctUo1PNTin1PUil+r/+4r5MpVgC/fn1kjsx7mjSujKqIpw==", + "dev": true, + "license": "MIT", + "dependencies": { + "json-buffer": "3.0.1" + } + }, + "node_modules/levn": { + "version": "0.4.1", + "resolved": "https://registry.npmjs.org/levn/-/levn-0.4.1.tgz", + "integrity": "sha512-+bT2uH4E5LGE7h/n3evcS/sQlJXCpIp6ym8OWJ5eV6+67Dsql/LaaT7qJBAt2rzfoa/5QBGBhxDix1dMt2kQKQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "prelude-ls": "^1.2.1", + "type-check": "~0.4.0" + }, + "engines": { + "node": ">= 0.8.0" + } + }, + "node_modules/lightningcss": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss/-/lightningcss-1.32.0.tgz", + "integrity": "sha512-NXYBzinNrblfraPGyrbPoD19C1h9lfI/1mzgWYvXUTe414Gz/X1FD2XBZSZM7rRTrMA8JL3OtAaGifrIKhQ5yQ==", + "license": "MPL-2.0", + "dependencies": { + "detect-libc": "^2.0.3" + }, + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + }, + "optionalDependencies": { + "lightningcss-android-arm64": "1.32.0", + "lightningcss-darwin-arm64": "1.32.0", + "lightningcss-darwin-x64": "1.32.0", + "lightningcss-freebsd-x64": "1.32.0", + "lightningcss-linux-arm-gnueabihf": "1.32.0", + "lightningcss-linux-arm64-gnu": "1.32.0", + "lightningcss-linux-arm64-musl": "1.32.0", + "lightningcss-linux-x64-gnu": "1.32.0", + "lightningcss-linux-x64-musl": "1.32.0", + "lightningcss-win32-arm64-msvc": "1.32.0", + "lightningcss-win32-x64-msvc": "1.32.0" + } + }, + "node_modules/lightningcss-android-arm64": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-android-arm64/-/lightningcss-android-arm64-1.32.0.tgz", + "integrity": "sha512-YK7/ClTt4kAK0vo6w3X+Pnm0D2cf2vPHbhOXdoNti1Ga0al1P4TBZhwjATvjNwLEBCnKvjJc2jQgHXH0NEwlAg==", + "cpu": [ + "arm64" + ], + "license": "MPL-2.0", + "optional": true, + "os": [ + "android" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-darwin-arm64": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-darwin-arm64/-/lightningcss-darwin-arm64-1.32.0.tgz", + "integrity": "sha512-RzeG9Ju5bag2Bv1/lwlVJvBE3q6TtXskdZLLCyfg5pt+HLz9BqlICO7LZM7VHNTTn/5PRhHFBSjk5lc4cmscPQ==", + "cpu": [ + "arm64" + ], + "license": "MPL-2.0", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-darwin-x64": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-darwin-x64/-/lightningcss-darwin-x64-1.32.0.tgz", + "integrity": "sha512-U+QsBp2m/s2wqpUYT/6wnlagdZbtZdndSmut/NJqlCcMLTWp5muCrID+K5UJ6jqD2BFshejCYXniPDbNh73V8w==", + "cpu": [ + "x64" + ], + "license": "MPL-2.0", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-freebsd-x64": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-freebsd-x64/-/lightningcss-freebsd-x64-1.32.0.tgz", + "integrity": "sha512-JCTigedEksZk3tHTTthnMdVfGf61Fky8Ji2E4YjUTEQX14xiy/lTzXnu1vwiZe3bYe0q+SpsSH/CTeDXK6WHig==", + "cpu": [ + "x64" + ], + "license": "MPL-2.0", + "optional": true, + "os": [ + "freebsd" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-linux-arm-gnueabihf": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-linux-arm-gnueabihf/-/lightningcss-linux-arm-gnueabihf-1.32.0.tgz", + "integrity": "sha512-x6rnnpRa2GL0zQOkt6rts3YDPzduLpWvwAF6EMhXFVZXD4tPrBkEFqzGowzCsIWsPjqSK+tyNEODUBXeeVHSkw==", + "cpu": [ + "arm" + ], + "license": "MPL-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-linux-arm64-gnu": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-linux-arm64-gnu/-/lightningcss-linux-arm64-gnu-1.32.0.tgz", + "integrity": "sha512-0nnMyoyOLRJXfbMOilaSRcLH3Jw5z9HDNGfT/gwCPgaDjnx0i8w7vBzFLFR1f6CMLKF8gVbebmkUN3fa/kQJpQ==", + "cpu": [ + "arm64" + ], + "license": "MPL-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-linux-arm64-musl": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-linux-arm64-musl/-/lightningcss-linux-arm64-musl-1.32.0.tgz", + "integrity": "sha512-UpQkoenr4UJEzgVIYpI80lDFvRmPVg6oqboNHfoH4CQIfNA+HOrZ7Mo7KZP02dC6LjghPQJeBsvXhJod/wnIBg==", + "cpu": [ + "arm64" + ], + "license": "MPL-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-linux-x64-gnu": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-linux-x64-gnu/-/lightningcss-linux-x64-gnu-1.32.0.tgz", + "integrity": "sha512-V7Qr52IhZmdKPVr+Vtw8o+WLsQJYCTd8loIfpDaMRWGUZfBOYEJeyJIkqGIDMZPwPx24pUMfwSxxI8phr/MbOA==", + "cpu": [ + "x64" + ], + "license": "MPL-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-linux-x64-musl": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-linux-x64-musl/-/lightningcss-linux-x64-musl-1.32.0.tgz", + "integrity": "sha512-bYcLp+Vb0awsiXg/80uCRezCYHNg1/l3mt0gzHnWV9XP1W5sKa5/TCdGWaR/zBM2PeF/HbsQv/j2URNOiVuxWg==", + "cpu": [ + "x64" + ], + "license": "MPL-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-win32-arm64-msvc": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-win32-arm64-msvc/-/lightningcss-win32-arm64-msvc-1.32.0.tgz", + "integrity": "sha512-8SbC8BR40pS6baCM8sbtYDSwEVQd4JlFTOlaD3gWGHfThTcABnNDBda6eTZeqbofalIJhFx0qKzgHJmcPTnGdw==", + "cpu": [ + "arm64" + ], + "license": "MPL-2.0", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-win32-x64-msvc": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-win32-x64-msvc/-/lightningcss-win32-x64-msvc-1.32.0.tgz", + "integrity": "sha512-Amq9B/SoZYdDi1kFrojnoqPLxYhQ4Wo5XiL8EVJrVsB8ARoC1PWW6VGtT0WKCemjy8aC+louJnjS7U18x3b06Q==", + "cpu": [ + "x64" + ], + "license": "MPL-2.0", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/locate-path": { + "version": "6.0.0", + "resolved": "https://registry.npmjs.org/locate-path/-/locate-path-6.0.0.tgz", + "integrity": "sha512-iPZK6eYjbxRu3uB4/WZ3EsEIMJFMqAoopl3R+zuq0UjcAm/MO6KCweDgPfP3elTztoKP3KtnVHxTn2NHBSDVUw==", + "dev": true, + "license": "MIT", + "dependencies": { + "p-locate": "^5.0.0" + }, + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/lodash": { + "version": "4.17.23", + "resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.23.tgz", + "integrity": "sha512-LgVTMpQtIopCi79SJeDiP0TfWi5CNEc/L/aRdTh3yIvmZXTnheWpKjSZhnvMl8iXbC1tFg9gdHHDMLoV7CnG+w==", + "license": "MIT" + }, + "node_modules/lodash.includes": { + "version": "4.3.0", + "resolved": "https://registry.npmjs.org/lodash.includes/-/lodash.includes-4.3.0.tgz", + "integrity": "sha512-W3Bx6mdkRTGtlJISOvVD/lbqjTlPPUDTMnlXZFnVwi9NKJ6tiAk6LVdlhZMm17VZisqhKcgzpO5Wz91PCt5b0w==", + "license": "MIT" + }, + "node_modules/lodash.isboolean": { + "version": "3.0.3", + "resolved": "https://registry.npmjs.org/lodash.isboolean/-/lodash.isboolean-3.0.3.tgz", + "integrity": "sha512-Bz5mupy2SVbPHURB98VAcw+aHh4vRV5IPNhILUCsOzRmsTmSQ17jIuqopAentWoehktxGd9e/hbIXq980/1QJg==", + "license": "MIT" + }, + "node_modules/lodash.isinteger": { + "version": "4.0.4", + "resolved": "https://registry.npmjs.org/lodash.isinteger/-/lodash.isinteger-4.0.4.tgz", + "integrity": "sha512-DBwtEWN2caHQ9/imiNeEA5ys1JoRtRfY3d7V9wkqtbycnAmTvRRmbHKDV4a0EYc678/dia0jrte4tjYwVBaZUA==", + "license": "MIT" + }, + "node_modules/lodash.isnumber": { + "version": "3.0.3", + "resolved": "https://registry.npmjs.org/lodash.isnumber/-/lodash.isnumber-3.0.3.tgz", + "integrity": "sha512-QYqzpfwO3/CWf3XP+Z+tkQsfaLL/EnUlXWVkIk5FUPc4sBdTehEqZONuyRt2P67PXAk+NXmTBcc97zw9t1FQrw==", + "license": "MIT" + }, + "node_modules/lodash.isplainobject": { + "version": "4.0.6", + "resolved": "https://registry.npmjs.org/lodash.isplainobject/-/lodash.isplainobject-4.0.6.tgz", + "integrity": "sha512-oSXzaWypCMHkPC3NvBEaPHf0KsA5mvPrOPgQWDsbg8n7orZ290M0BmC/jgRZ4vcJ6DTAhjrsSYgdsW/F+MFOBA==", + "license": "MIT" + }, + "node_modules/lodash.isstring": { + "version": "4.0.1", + "resolved": "https://registry.npmjs.org/lodash.isstring/-/lodash.isstring-4.0.1.tgz", + "integrity": "sha512-0wJxfxH1wgO3GrbuP+dTTk7op+6L41QCXbGINEmD+ny/G/eCqGzxyCsh7159S+mgDDcoarnBw6PC1PS5+wUGgw==", + "license": "MIT" + }, + "node_modules/lodash.merge": { + "version": "4.6.2", + "resolved": "https://registry.npmjs.org/lodash.merge/-/lodash.merge-4.6.2.tgz", + "integrity": "sha512-0KpjqXRVvrYyCsX1swR/XTK0va6VQkQM6MNo7PqW77ByjAhoARA8EfrP1N4+KlKj8YS0ZUCtRT/YUuhyYDujIQ==", + "dev": true, + "license": "MIT" + }, + "node_modules/lodash.once": { + "version": "4.1.1", + "resolved": "https://registry.npmjs.org/lodash.once/-/lodash.once-4.1.1.tgz", + "integrity": "sha512-Sb487aTOCr9drQVL8pIxOzVhafOjZN9UU54hiN8PU3uAiSV7lx1yYNpbNmex2PK6dSJoNTSJUUswT651yww3Mg==", + "license": "MIT" + }, + "node_modules/loose-envify": { + "version": "1.4.0", + "resolved": "https://registry.npmjs.org/loose-envify/-/loose-envify-1.4.0.tgz", + "integrity": "sha512-lyuxPGr/Wfhrlem2CL/UcnUc1zcqKAImBDzukY7Y5F/yQiNdko6+fRLevlw1HgMySw7f611UIY408EtxRSoK3Q==", + "license": "MIT", + "dependencies": { + "js-tokens": "^3.0.0 || ^4.0.0" + }, + "bin": { + "loose-envify": "cli.js" + } + }, + "node_modules/lru-cache": { + "version": "5.1.1", + "resolved": "https://registry.npmjs.org/lru-cache/-/lru-cache-5.1.1.tgz", + "integrity": "sha512-KpNARQA3Iwv+jTA0utUVVbrh+Jlrr1Fv0e56GGzAFOXN7dk/FviaDW8LHmK52DlcH4WP2n6gI8vN1aesBFgo9w==", + "dev": true, + "license": "ISC", + "dependencies": { + "yallist": "^3.0.2" + } + }, + "node_modules/lucide-react": { + "version": "0.546.0", + "resolved": "https://registry.npmjs.org/lucide-react/-/lucide-react-0.546.0.tgz", + "integrity": "sha512-Z94u6fKT43lKeYHiVyvyR8fT7pwCzDu7RyMPpTvh054+xahSgj4HFQ+NmflvzdXsoAjYGdCguGaFKYuvq0ThCQ==", + "license": "ISC", + "peerDependencies": { + "react": "^16.5.1 || ^17.0.0 || ^18.0.0 || ^19.0.0" + } + }, + "node_modules/magic-string": { + "version": "0.30.21", + "resolved": "https://registry.npmjs.org/magic-string/-/magic-string-0.30.21.tgz", + "integrity": "sha512-vd2F4YUyEXKGcLHoq+TEyCjxueSeHnFxyyjNp80yg0XV4vUhnDer/lvvlqM/arB5bXQN5K2/3oinyCRyx8T2CQ==", + "license": "MIT", + "dependencies": { + "@jridgewell/sourcemap-codec": "^1.5.5" + } + }, + "node_modules/mimic-response": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/mimic-response/-/mimic-response-3.1.0.tgz", + "integrity": "sha512-z0yWI+4FDrrweS8Zmt4Ej5HdJmky15+L2e6Wgn3+iK5fWzb6T3fhNFq2+MeTRb064c6Wr4N/wv0DzQTjNzHNGQ==", + "license": "MIT", + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/minimatch": { + "version": "3.1.5", + "resolved": "https://registry.npmjs.org/minimatch/-/minimatch-3.1.5.tgz", + "integrity": "sha512-VgjWUsnnT6n+NUk6eZq77zeFdpW2LWDzP6zFGrCbHXiYNul5Dzqk2HHQ5uFH2DNW5Xbp8+jVzaeNt94ssEEl4w==", + "dev": true, + "license": "ISC", + "dependencies": { + "brace-expansion": "^1.1.7" + }, + "engines": { + "node": "*" + } + }, + "node_modules/minimist": { + "version": "1.2.8", + "resolved": "https://registry.npmjs.org/minimist/-/minimist-1.2.8.tgz", + "integrity": "sha512-2yyAR8qBkN3YuheJanUpWC5U3bb5osDywNB8RzDVlDwDHbocAJveqqj1u8+SVD7jkWT4yvsHCpWqqWqAxb0zCA==", + "license": "MIT", + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/mkdirp-classic": { + "version": "0.5.3", + "resolved": "https://registry.npmjs.org/mkdirp-classic/-/mkdirp-classic-0.5.3.tgz", + "integrity": "sha512-gKLcREMhtuZRwRAfqP3RFW+TK4JqApVBtOIftVgjuABpAtpxhPGaDcfvbhNvD0B8iD1oUr/txX35NjcaY6Ns/A==", + "license": "MIT" + }, + "node_modules/ms": { + "version": "2.1.3", + "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz", + "integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==", + "license": "MIT" + }, + "node_modules/nanoid": { + "version": "3.3.11", + "resolved": "https://registry.npmjs.org/nanoid/-/nanoid-3.3.11.tgz", + "integrity": "sha512-N8SpfPUnUp1bK+PMYW8qSWdl9U+wwNWI4QKxOYDy9JAro3WMX7p2OeVRF9v+347pnakNevPmiHhNmZ2HbFA76w==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "bin": { + "nanoid": "bin/nanoid.cjs" + }, + "engines": { + "node": "^10 || ^12 || ^13.7 || ^14 || >=15.0.1" + } + }, + "node_modules/napi-build-utils": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/napi-build-utils/-/napi-build-utils-2.0.0.tgz", + "integrity": "sha512-GEbrYkbfF7MoNaoh2iGG84Mnf/WZfB0GdGEsM8wz7Expx/LlWf5U8t9nvJKXSp3qr5IsEbK04cBGhol/KwOsWA==", + "license": "MIT" + }, + "node_modules/natural-compare": { + "version": "1.4.0", + "resolved": "https://registry.npmjs.org/natural-compare/-/natural-compare-1.4.0.tgz", + "integrity": "sha512-OWND8ei3VtNC9h7V60qff3SVobHr996CTwgxubgyQYEpg290h9J0buyECNNJexkFm5sOajh5G116RYA1c8ZMSw==", + "dev": true, + "license": "MIT" + }, + "node_modules/node-abi": { + "version": "3.89.0", + "resolved": "https://registry.npmjs.org/node-abi/-/node-abi-3.89.0.tgz", + "integrity": "sha512-6u9UwL0HlAl21+agMN3YAMXcKByMqwGx+pq+P76vii5f7hTPtKDp08/H9py6DY+cfDw7kQNTGEj/rly3IgbNQA==", + "license": "MIT", + "dependencies": { + "semver": "^7.3.5" + }, + "engines": { + "node": ">=10" + } + }, + "node_modules/node-abi/node_modules/semver": { + "version": "7.7.4", + "resolved": "https://registry.npmjs.org/semver/-/semver-7.7.4.tgz", + "integrity": "sha512-vFKC2IEtQnVhpT78h1Yp8wzwrf8CM+MzKMHGJZfBtzhZNycRFnXsHk6E5TxIkkMsgNS7mdX3AGB7x2QM2di4lA==", + "license": "ISC", + "bin": { + "semver": "bin/semver.js" + }, + "engines": { + "node": ">=10" + } + }, + "node_modules/node-addon-api": { + "version": "4.3.0", + "resolved": "https://registry.npmjs.org/node-addon-api/-/node-addon-api-4.3.0.tgz", + "integrity": "sha512-73sE9+3UaLYYFmDsFZnqCInzPyh3MqIwZO9cw58yIqAZhONrrabrYyYe3TuIqtIiOuTXVhsGau8hcrhhwSsDIQ==", + "license": "MIT" + }, + "node_modules/node-releases": { + "version": "2.0.36", + "resolved": "https://registry.npmjs.org/node-releases/-/node-releases-2.0.36.tgz", + "integrity": "sha512-TdC8FSgHz8Mwtw9g5L4gR/Sh9XhSP/0DEkQxfEFXOpiul5IiHgHan2VhYYb6agDSfp4KuvltmGApc8HMgUrIkA==", + "dev": true, + "license": "MIT" + }, + "node_modules/object-assign": { + "version": "4.1.1", + "resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz", + "integrity": "sha512-rJgTQnkUnH1sFw8yT6VSU3zD3sWmu6sZhIseY8VX+GRu3P6F7Fu+JNDoXfklElbLJSnc3FUQHVe4cU5hj+BcUg==", + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/once": { + "version": "1.4.0", + "resolved": "https://registry.npmjs.org/once/-/once-1.4.0.tgz", + "integrity": "sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==", + "license": "ISC", + "dependencies": { + "wrappy": "1" + } + }, + "node_modules/open": { + "version": "8.4.0", + "resolved": "https://registry.npmjs.org/open/-/open-8.4.0.tgz", + "integrity": "sha512-XgFPPM+B28FtCCgSb9I+s9szOC1vZRSwgWsRUA5ylIxRTgKozqjOCrVOqGsYABPYK5qnfqClxZTFBa8PKt2v6Q==", + "license": "MIT", + "dependencies": { + "define-lazy-prop": "^2.0.0", + "is-docker": "^2.1.1", + "is-wsl": "^2.2.0" + }, + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/optionator": { + "version": "0.9.4", + "resolved": "https://registry.npmjs.org/optionator/-/optionator-0.9.4.tgz", + "integrity": "sha512-6IpQ7mKUxRcZNLIObR0hz7lxsapSSIYNZJwXPGeF0mTVqGKFIXj1DQcMoT22S3ROcLyY/rz0PWaWZ9ayWmad9g==", + "dev": true, + "license": "MIT", + "dependencies": { + "deep-is": "^0.1.3", + "fast-levenshtein": "^2.0.6", + "levn": "^0.4.1", + "prelude-ls": "^1.2.1", + "type-check": "^0.4.0", + "word-wrap": "^1.2.5" + }, + "engines": { + "node": ">= 0.8.0" + } + }, + "node_modules/p-limit": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/p-limit/-/p-limit-3.1.0.tgz", + "integrity": "sha512-TYOanM3wGwNGsZN2cVTYPArw454xnXj5qmWF1bEoAc4+cU/ol7GVh7odevjp1FNHduHc3KZMcFduxU5Xc6uJRQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "yocto-queue": "^0.1.0" + }, + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/p-locate": { + "version": "5.0.0", + "resolved": "https://registry.npmjs.org/p-locate/-/p-locate-5.0.0.tgz", + "integrity": "sha512-LaNjtRWUBY++zB5nE/NwcaoMylSPk+S+ZHNB1TzdbMJMny6dynpAGt7X/tl/QYq3TIeE6nxHppbo2LGymrG5Pw==", + "dev": true, + "license": "MIT", + "dependencies": { + "p-limit": "^3.0.2" + }, + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/parent-module": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/parent-module/-/parent-module-1.0.1.tgz", + "integrity": "sha512-GQ2EWRpQV8/o+Aw8YqtfZZPfNRWZYkbidE9k5rpl/hC3vtHHBfGm2Ifi6qWV+coDGkrUKZAxE3Lot5kcsRlh+g==", + "dev": true, + "license": "MIT", + "dependencies": { + "callsites": "^3.0.0" + }, + "engines": { + "node": ">=6" + } + }, + "node_modules/path-browserify": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/path-browserify/-/path-browserify-1.0.1.tgz", + "integrity": "sha512-b7uo2UCUOYZcnF/3ID0lulOJi/bafxa1xPe7ZPsammBSpjSWQkjNxlt635YGS2MiR9GjvuXCtz2emr3jbsz98g==", + "license": "MIT" + }, + "node_modules/path-exists": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/path-exists/-/path-exists-4.0.0.tgz", + "integrity": "sha512-ak9Qy5Q7jYb2Wwcey5Fpvg2KoAc/ZIhLSLOSBmRmygPsGwkVVt0fZa0qrtMz+m6tJTAHfZQ8FnmB4MG4LWy7/w==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=8" + } + }, + "node_modules/path-key": { + "version": "3.1.1", + "resolved": "https://registry.npmjs.org/path-key/-/path-key-3.1.1.tgz", + "integrity": "sha512-ojmeN0qd+y0jszEtoY48r0Peq5dwMEkIlCOu6Q5f41lfkswXuKtYrhgoTpLnyIcHm24Uhqx+5Tqm2InSwLhE6Q==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=8" + } + }, + "node_modules/picocolors": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/picocolors/-/picocolors-1.1.1.tgz", + "integrity": "sha512-xceH2snhtb5M9liqDsmEw56le376mTZkEX/jEb/RxNFyegNul7eNslCXP9FDj/Lcu0X8KEyMceP2ntpaHrDEVA==", + "license": "ISC" + }, + "node_modules/picomatch": { + "version": "4.0.4", + "resolved": "https://registry.npmjs.org/picomatch/-/picomatch-4.0.4.tgz", + "integrity": "sha512-QP88BAKvMam/3NxH6vj2o21R6MjxZUAd6nlwAS/pnGvN9IVLocLHxGYIzFhg6fUQ+5th6P4dv4eW9jX3DSIj7A==", + "license": "MIT", + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://github.com/sponsors/jonschlinkert" + } + }, + "node_modules/postcss": { + "version": "8.5.8", + "resolved": "https://registry.npmjs.org/postcss/-/postcss-8.5.8.tgz", + "integrity": "sha512-OW/rX8O/jXnm82Ey1k44pObPtdblfiuWnrd8X7GJ7emImCOstunGbXUpp7HdBrFQX6rJzn3sPT397Wp5aCwCHg==", + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/postcss/" + }, + { + "type": "tidelift", + "url": "https://tidelift.com/funding/github/npm/postcss" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "dependencies": { + "nanoid": "^3.3.11", + "picocolors": "^1.1.1", + "source-map-js": "^1.2.1" + }, + "engines": { + "node": "^10 || ^12 || >=14" + } + }, + "node_modules/prebuild-install": { + "version": "7.1.3", + "resolved": "https://registry.npmjs.org/prebuild-install/-/prebuild-install-7.1.3.tgz", + "integrity": "sha512-8Mf2cbV7x1cXPUILADGI3wuhfqWvtiLA1iclTDbFRZkgRQS0NqsPZphna9V+HyTEadheuPmjaJMsbzKQFOzLug==", + "deprecated": "No longer maintained. Please contact the author of the relevant native addon; alternatives are available.", + "license": "MIT", + "dependencies": { + "detect-libc": "^2.0.0", + "expand-template": "^2.0.3", + "github-from-package": "0.0.0", + "minimist": "^1.2.3", + "mkdirp-classic": "^0.5.3", + "napi-build-utils": "^2.0.0", + "node-abi": "^3.3.0", + "pump": "^3.0.0", + "rc": "^1.2.7", + "simple-get": "^4.0.0", + "tar-fs": "^2.0.0", + "tunnel-agent": "^0.6.0" + }, + "bin": { + "prebuild-install": "bin.js" + }, + "engines": { + "node": ">=10" + } + }, + "node_modules/prelude-ls": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/prelude-ls/-/prelude-ls-1.2.1.tgz", + "integrity": "sha512-vkcDPrRZo1QZLbn5RLGPpg/WmIQ65qoWWhcGKf/b5eplkkarX0m9z8ppCat4mlOqUsWpyNuYgO3VRyrYHSzX5g==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">= 0.8.0" + } + }, + "node_modules/prettier": { + "version": "3.8.1", + "resolved": "https://registry.npmjs.org/prettier/-/prettier-3.8.1.tgz", + "integrity": "sha512-UOnG6LftzbdaHZcKoPFtOcCKztrQ57WkHDeRD9t/PTQtmT0NHSeWWepj6pS0z/N7+08BHFDQVUrfmfMRcZwbMg==", + "license": "MIT", + "bin": { + "prettier": "bin/prettier.cjs" + }, + "engines": { + "node": ">=14" + }, + "funding": { + "url": "https://github.com/prettier/prettier?sponsor=1" + } + }, + "node_modules/prop-types": { + "version": "15.8.1", + "resolved": "https://registry.npmjs.org/prop-types/-/prop-types-15.8.1.tgz", + "integrity": "sha512-oj87CgZICdulUohogVAR7AjlC0327U4el4L6eAvOqCeudMDVU0NThNaV+b9Df4dXgSP1gXMTnPdhfe/2qDH5cg==", + "license": "MIT", + "dependencies": { + "loose-envify": "^1.4.0", + "object-assign": "^4.1.1", + "react-is": "^16.13.1" + } + }, + "node_modules/prop-types/node_modules/react-is": { + "version": "16.13.1", + "resolved": "https://registry.npmjs.org/react-is/-/react-is-16.13.1.tgz", + "integrity": "sha512-24e6ynE2H+OKt4kqsOvNd8kBpV65zoxbA4BVsEOB3ARVWQki/DHzaUoC5KuON/BiccDaCCTZBuOcfZs70kR8bQ==", + "license": "MIT" + }, + "node_modules/pump": { + "version": "3.0.4", + "resolved": "https://registry.npmjs.org/pump/-/pump-3.0.4.tgz", + "integrity": "sha512-VS7sjc6KR7e1ukRFhQSY5LM2uBWAUPiOPa/A3mkKmiMwSmRFUITt0xuj+/lesgnCv+dPIEYlkzrcyXgquIHMcA==", + "license": "MIT", + "dependencies": { + "end-of-stream": "^1.1.0", + "once": "^1.3.1" + } + }, + "node_modules/punycode": { + "version": "2.3.1", + "resolved": "https://registry.npmjs.org/punycode/-/punycode-2.3.1.tgz", + "integrity": "sha512-vYt7UD1U9Wg6138shLtLOvdAu+8DsC/ilFtEVHcH+wydcSpNE20AfSOduf6MkRFahL5FY7X1oU7nKVZFtfq8Fg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6" + } + }, + "node_modules/rc": { + "version": "1.2.8", + "resolved": "https://registry.npmjs.org/rc/-/rc-1.2.8.tgz", + "integrity": "sha512-y3bGgqKj3QBdxLbLkomlohkvsA8gdAiUQlSBJnBhfn+BPxg4bc62d8TcBW15wavDfgexCgccckhcZvywyQYPOw==", + "license": "(BSD-2-Clause OR MIT OR Apache-2.0)", + "dependencies": { + "deep-extend": "^0.6.0", + "ini": "~1.3.0", + "minimist": "^1.2.0", + "strip-json-comments": "~2.0.1" + }, + "bin": { + "rc": "cli.js" + } + }, + "node_modules/rc/node_modules/strip-json-comments": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/strip-json-comments/-/strip-json-comments-2.0.1.tgz", + "integrity": "sha512-4gB8na07fecVVkOI6Rs4e7T6NOTki5EmL7TUduTs6bu3EdnSycntVJ4re8kgZA+wx9IueI2Y11bfbgwtzuE0KQ==", + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/react": { + "version": "19.2.4", + "resolved": "https://registry.npmjs.org/react/-/react-19.2.4.tgz", + "integrity": "sha512-9nfp2hYpCwOjAN+8TZFGhtWEwgvWHXqESH8qT89AT/lWklpLON22Lc8pEtnpsZz7VmawabSU0gCjnj8aC0euHQ==", + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/react-day-picker": { + "version": "9.14.0", + "resolved": "https://registry.npmjs.org/react-day-picker/-/react-day-picker-9.14.0.tgz", + "integrity": "sha512-tBaoDWjPwe0M5pGrum4H0SR6Lyk+BO9oHnp9JbKpGKW2mlraNPgP9BMfsg5pWpwrssARmeqk7YBl2oXutZTaHA==", + "license": "MIT", + "dependencies": { + "@date-fns/tz": "^1.4.1", + "@tabby_ai/hijri-converter": "1.0.5", + "date-fns": "^4.1.0", + "date-fns-jalali": "4.1.0-0" + }, + "engines": { + "node": ">=18" + }, + "funding": { + "type": "individual", + "url": "https://github.com/sponsors/gpbl" + }, + "peerDependencies": { + "react": ">=16.8.0" + } + }, + "node_modules/react-dom": { + "version": "19.2.4", + "resolved": "https://registry.npmjs.org/react-dom/-/react-dom-19.2.4.tgz", + "integrity": "sha512-AXJdLo8kgMbimY95O2aKQqsz2iWi9jMgKJhRBAxECE4IFxfcazB2LmzloIoibJI3C12IlY20+KFaLv+71bUJeQ==", + "license": "MIT", + "dependencies": { + "scheduler": "^0.27.0" + }, + "peerDependencies": { + "react": "^19.2.4" + } + }, + "node_modules/react-is": { + "version": "18.3.1", + "resolved": "https://registry.npmjs.org/react-is/-/react-is-18.3.1.tgz", + "integrity": "sha512-/LLMVyas0ljjAtoYiPqYiL8VWXzUUdThrmU5+n20DZv+a+ClRoevUzw5JxU+Ieh5/c87ytoTBV9G1FiKfNJdmg==", + "license": "MIT" + }, + "node_modules/react-refresh": { + "version": "0.18.0", + "resolved": "https://registry.npmjs.org/react-refresh/-/react-refresh-0.18.0.tgz", + "integrity": "sha512-QgT5//D3jfjJb6Gsjxv0Slpj23ip+HtOpnNgnb2S5zU3CB26G/IDPGoy4RJB42wzFE46DRsstbW6tKHoKbhAxw==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/react-remove-scroll": { + "version": "2.7.2", + "resolved": "https://registry.npmjs.org/react-remove-scroll/-/react-remove-scroll-2.7.2.tgz", + "integrity": "sha512-Iqb9NjCCTt6Hf+vOdNIZGdTiH1QSqr27H/Ek9sv/a97gfueI/5h1s3yRi1nngzMUaOOToin5dI1dXKdXiF+u0Q==", + "license": "MIT", + "dependencies": { + "react-remove-scroll-bar": "^2.3.7", + "react-style-singleton": "^2.2.3", + "tslib": "^2.1.0", + "use-callback-ref": "^1.3.3", + "use-sidecar": "^1.1.3" + }, + "engines": { + "node": ">=10" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8.0 || ^17.0.0 || ^18.0.0 || ^19.0.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/react-remove-scroll-bar": { + "version": "2.3.8", + "resolved": "https://registry.npmjs.org/react-remove-scroll-bar/-/react-remove-scroll-bar-2.3.8.tgz", + "integrity": "sha512-9r+yi9+mgU33AKcj6IbT9oRCO78WriSj6t/cF8DWBZJ9aOGPOTEDvdUDz1FwKim7QXWwmHqtdHnRJfhAxEG46Q==", + "license": "MIT", + "dependencies": { + "react-style-singleton": "^2.2.2", + "tslib": "^2.0.0" + }, + "engines": { + "node": ">=10" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8.0 || ^17.0.0 || ^18.0.0 || ^19.0.0" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/react-router": { + "version": "7.13.2", + "resolved": "https://registry.npmjs.org/react-router/-/react-router-7.13.2.tgz", + "integrity": "sha512-tX1Aee+ArlKQP+NIUd7SE6Li+CiGKwQtbS+FfRxPX6Pe4vHOo6nr9d++u5cwg+Z8K/x8tP+7qLmujDtfrAoUJA==", + "license": "MIT", + "dependencies": { + "cookie": "^1.0.1", + "set-cookie-parser": "^2.6.0" + }, + "engines": { + "node": ">=20.0.0" + }, + "peerDependencies": { + "react": ">=18", + "react-dom": ">=18" + }, + "peerDependenciesMeta": { + "react-dom": { + "optional": true + } + } + }, + "node_modules/react-router-dom": { + "version": "7.13.2", + "resolved": "https://registry.npmjs.org/react-router-dom/-/react-router-dom-7.13.2.tgz", + "integrity": "sha512-aR7SUORwTqAW0JDeiWF07e9SBE9qGpByR9I8kJT5h/FrBKxPMS6TiC7rmVO+gC0q52Bx7JnjWe8Z1sR9faN4YA==", + "license": "MIT", + "dependencies": { + "react-router": "7.13.2" + }, + "engines": { + "node": ">=20.0.0" + }, + "peerDependencies": { + "react": ">=18", + "react-dom": ">=18" + } + }, + "node_modules/react-smooth": { + "version": "4.0.4", + "resolved": "https://registry.npmjs.org/react-smooth/-/react-smooth-4.0.4.tgz", + "integrity": "sha512-gnGKTpYwqL0Iii09gHobNolvX4Kiq4PKx6eWBCYYix+8cdw+cGo3do906l1NBPKkSWx1DghC1dlWG9L2uGd61Q==", + "license": "MIT", + "dependencies": { + "fast-equals": "^5.0.1", + "prop-types": "^15.8.1", + "react-transition-group": "^4.4.5" + }, + "peerDependencies": { + "react": "^16.8.0 || ^17.0.0 || ^18.0.0 || ^19.0.0", + "react-dom": "^16.8.0 || ^17.0.0 || ^18.0.0 || ^19.0.0" + } + }, + "node_modules/react-style-singleton": { + "version": "2.2.3", + "resolved": "https://registry.npmjs.org/react-style-singleton/-/react-style-singleton-2.2.3.tgz", + "integrity": "sha512-b6jSvxvVnyptAiLjbkWLE/lOnR4lfTtDAl+eUC7RZy+QQWc6wRzIV2CE6xBuMmDxc2qIihtDCZD5NPOFl7fRBQ==", + "license": "MIT", + "dependencies": { + "get-nonce": "^1.0.0", + "tslib": "^2.0.0" + }, + "engines": { + "node": ">=10" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8.0 || ^17.0.0 || ^18.0.0 || ^19.0.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/react-transition-group": { + "version": "4.4.5", + "resolved": "https://registry.npmjs.org/react-transition-group/-/react-transition-group-4.4.5.tgz", + "integrity": "sha512-pZcd1MCJoiKiBR2NRxeCRg13uCXbydPnmB4EOeRrY7480qNWO8IIgQG6zlDkm6uRMsURXPuKq0GWtiM59a5Q6g==", + "license": "BSD-3-Clause", + "dependencies": { + "@babel/runtime": "^7.5.5", + "dom-helpers": "^5.0.1", + "loose-envify": "^1.4.0", + "prop-types": "^15.6.2" + }, + "peerDependencies": { + "react": ">=16.6.0", + "react-dom": ">=16.6.0" + } + }, + "node_modules/readable-stream": { + "version": "3.6.2", + "resolved": "https://registry.npmjs.org/readable-stream/-/readable-stream-3.6.2.tgz", + "integrity": "sha512-9u/sniCrY3D5WdsERHzHE4G2YCXqoG5FTHUiCC4SIbr6XcLZBY05ya9EKjYek9O5xOAwjGq+1JdGBAS7Q9ScoA==", + "license": "MIT", + "dependencies": { + "inherits": "^2.0.3", + "string_decoder": "^1.1.1", + "util-deprecate": "^1.0.1" + }, + "engines": { + "node": ">= 6" + } + }, + "node_modules/recharts": { + "version": "2.15.4", + "resolved": "https://registry.npmjs.org/recharts/-/recharts-2.15.4.tgz", + "integrity": "sha512-UT/q6fwS3c1dHbXv2uFgYJ9BMFHu3fwnd7AYZaEQhXuYQ4hgsxLvsUXzGdKeZrW5xopzDCvuA2N41WJ88I7zIw==", + "license": "MIT", + "dependencies": { + "clsx": "^2.0.0", + "eventemitter3": "^4.0.1", + "lodash": "^4.17.21", + "react-is": "^18.3.1", + "react-smooth": "^4.0.4", + "recharts-scale": "^0.4.4", + "tiny-invariant": "^1.3.1", + "victory-vendor": "^36.6.8" + }, + "engines": { + "node": ">=14" + }, + "peerDependencies": { + "react": "^16.0.0 || ^17.0.0 || ^18.0.0 || ^19.0.0", + "react-dom": "^16.0.0 || ^17.0.0 || ^18.0.0 || ^19.0.0" + } + }, + "node_modules/recharts-scale": { + "version": "0.4.5", + "resolved": "https://registry.npmjs.org/recharts-scale/-/recharts-scale-0.4.5.tgz", + "integrity": "sha512-kivNFO+0OcUNu7jQquLXAxz1FIwZj8nrj+YkOKc5694NbjCvcT6aSZiIzNzd2Kul4o4rTto8QVR9lMNtxD4G1w==", + "license": "MIT", + "dependencies": { + "decimal.js-light": "^2.4.1" + } + }, + "node_modules/resolve-from": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/resolve-from/-/resolve-from-4.0.0.tgz", + "integrity": "sha512-pb/MYmXstAkysRFx8piNI1tGFNQIFA3vkE3Gq4EuA1dF6gHp/+vgZqsCGJapvy8N3Q+4o7FwvquPJcnZ7RYy4g==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=4" + } + }, + "node_modules/rollup": { + "version": "4.60.1", + "resolved": "https://registry.npmjs.org/rollup/-/rollup-4.60.1.tgz", + "integrity": "sha512-VmtB2rFU/GroZ4oL8+ZqXgSA38O6GR8KSIvWmEFv63pQ0G6KaBH9s07PO8XTXP4vI+3UJUEypOfjkGfmSBBR0w==", + "license": "MIT", + "dependencies": { + "@types/estree": "1.0.8" + }, + "bin": { + "rollup": "dist/bin/rollup" + }, + "engines": { + "node": ">=18.0.0", + "npm": ">=8.0.0" + }, + "optionalDependencies": { + "@rollup/rollup-android-arm-eabi": "4.60.1", + "@rollup/rollup-android-arm64": "4.60.1", + "@rollup/rollup-darwin-arm64": "4.60.1", + "@rollup/rollup-darwin-x64": "4.60.1", + "@rollup/rollup-freebsd-arm64": "4.60.1", + "@rollup/rollup-freebsd-x64": "4.60.1", + "@rollup/rollup-linux-arm-gnueabihf": "4.60.1", + "@rollup/rollup-linux-arm-musleabihf": "4.60.1", + "@rollup/rollup-linux-arm64-gnu": "4.60.1", + "@rollup/rollup-linux-arm64-musl": "4.60.1", + "@rollup/rollup-linux-loong64-gnu": "4.60.1", + "@rollup/rollup-linux-loong64-musl": "4.60.1", + "@rollup/rollup-linux-ppc64-gnu": "4.60.1", + "@rollup/rollup-linux-ppc64-musl": "4.60.1", + "@rollup/rollup-linux-riscv64-gnu": "4.60.1", + "@rollup/rollup-linux-riscv64-musl": "4.60.1", + "@rollup/rollup-linux-s390x-gnu": "4.60.1", + "@rollup/rollup-linux-x64-gnu": "4.60.1", + "@rollup/rollup-linux-x64-musl": "4.60.1", + "@rollup/rollup-openbsd-x64": "4.60.1", + "@rollup/rollup-openharmony-arm64": "4.60.1", + "@rollup/rollup-win32-arm64-msvc": "4.60.1", + "@rollup/rollup-win32-ia32-msvc": "4.60.1", + "@rollup/rollup-win32-x64-gnu": "4.60.1", + "@rollup/rollup-win32-x64-msvc": "4.60.1", + "fsevents": "~2.3.2" + } + }, + "node_modules/safe-buffer": { + "version": "5.2.1", + "resolved": "https://registry.npmjs.org/safe-buffer/-/safe-buffer-5.2.1.tgz", + "integrity": "sha512-rp3So07KcdmmKbGvgaNxQSJr7bGVSVk5S9Eq1F+ppbRo70+YeaDxkw5Dd8NPN+GD6bjnYm2VuPuCXmpuYvmCXQ==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT" + }, + "node_modules/scheduler": { + "version": "0.27.0", + "resolved": "https://registry.npmjs.org/scheduler/-/scheduler-0.27.0.tgz", + "integrity": "sha512-eNv+WrVbKu1f3vbYJT/xtiF5syA5HPIMtf9IgY/nKg0sWqzAUEvqY/xm7OcZc/qafLx/iO9FgOmeSAp4v5ti/Q==", + "license": "MIT" + }, + "node_modules/semver": { + "version": "6.3.1", + "resolved": "https://registry.npmjs.org/semver/-/semver-6.3.1.tgz", + "integrity": "sha512-BR7VvDCVHO+q2xBEWskxS6DJE1qRnb7DxzUrogb71CWoSficBxYsiAGd+Kl0mmq/MprG9yArRkyrQxTO6XjMzA==", + "dev": true, + "license": "ISC", + "bin": { + "semver": "bin/semver.js" + } + }, + "node_modules/set-cookie-parser": { + "version": "2.7.2", + "resolved": "https://registry.npmjs.org/set-cookie-parser/-/set-cookie-parser-2.7.2.tgz", + "integrity": "sha512-oeM1lpU/UvhTxw+g3cIfxXHyJRc/uidd3yK1P242gzHds0udQBYzs3y8j4gCCW+ZJ7ad0yctld8RYO+bdurlvw==", + "license": "MIT" + }, + "node_modules/shebang-command": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/shebang-command/-/shebang-command-2.0.0.tgz", + "integrity": "sha512-kHxr2zZpYtdmrN1qDjrrX/Z1rR1kG8Dx+gkpK1G4eXmvXswmcE1hTWBWYUzlraYw1/yZp6YuDY77YtvbN0dmDA==", + "dev": true, + "license": "MIT", + "dependencies": { + "shebang-regex": "^3.0.0" + }, + "engines": { + "node": ">=8" + } + }, + "node_modules/shebang-regex": { + "version": "3.0.0", + "resolved": "https://registry.npmjs.org/shebang-regex/-/shebang-regex-3.0.0.tgz", + "integrity": "sha512-7++dFhtcx3353uBaq8DDR4NuxBetBzC7ZQOhmTQInHEd6bSrXdiEyzCvG07Z44UYdLShWUyXt5M/yhz8ekcb1A==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=8" + } + }, + "node_modules/simple-concat": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/simple-concat/-/simple-concat-1.0.1.tgz", + "integrity": "sha512-cSFtAPtRhljv69IK0hTVZQ+OfE9nePi/rtJmw5UjHeVyVroEqJXP1sFztKUy1qU+xvz3u/sfYJLa947b7nAN2Q==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT" + }, + "node_modules/simple-get": { + "version": "4.0.1", + "resolved": "https://registry.npmjs.org/simple-get/-/simple-get-4.0.1.tgz", + "integrity": "sha512-brv7p5WgH0jmQJr1ZDDfKDOSeWWg+OVypG99A/5vYGPqJ6pxiaHLy8nxtFjBA7oMa01ebA9gfh1uMCFqOuXxvA==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT", + "dependencies": { + "decompress-response": "^6.0.0", + "once": "^1.3.1", + "simple-concat": "^1.0.0" + } + }, + "node_modules/sisteransi": { + "version": "1.0.5", + "resolved": "https://registry.npmjs.org/sisteransi/-/sisteransi-1.0.5.tgz", + "integrity": "sha512-bLGGlR1QxBcynn2d5YmDX4MGjlZvy2MRBDRNHLJ8VI6l6+9FUiyTFNJ0IveOSP0bcXgVDPRcfGqA0pjaqUpfVg==", + "license": "MIT" + }, + "node_modules/sonner": { + "version": "2.0.7", + "resolved": "https://registry.npmjs.org/sonner/-/sonner-2.0.7.tgz", + "integrity": "sha512-W6ZN4p58k8aDKA4XPcx2hpIQXBRAgyiWVkYhT7CvK6D3iAu7xjvVyhQHg2/iaKJZ1XVJ4r7XuwGL+WGEK37i9w==", + "license": "MIT", + "peerDependencies": { + "react": "^18.0.0 || ^19.0.0 || ^19.0.0-rc", + "react-dom": "^18.0.0 || ^19.0.0 || ^19.0.0-rc" + } + }, + "node_modules/source-map-js": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/source-map-js/-/source-map-js-1.2.1.tgz", + "integrity": "sha512-UXWMKhLOwVKb728IUtQPXxfYU+usdybtUrK/8uGE8CQMvrhOpwvzDBwj0QhSL7MQc7vIsISBG8VQ8+IDQxpfQA==", + "license": "BSD-3-Clause", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/string_decoder": { + "version": "1.3.0", + "resolved": "https://registry.npmjs.org/string_decoder/-/string_decoder-1.3.0.tgz", + "integrity": "sha512-hkRX8U1WjJFd8LsDJ2yQ/wWWxaopEsABU1XfkM8A+j0+85JAGppt16cr1Whg6KIbb4okU6Mql6BOj+uup/wKeA==", + "license": "MIT", + "dependencies": { + "safe-buffer": "~5.2.0" + } + }, + "node_modules/strip-json-comments": { + "version": "3.1.1", + "resolved": "https://registry.npmjs.org/strip-json-comments/-/strip-json-comments-3.1.1.tgz", + "integrity": "sha512-6fPc+R4ihwqP6N/aIv2f1gMH8lOVtWQHoqC4yK6oSDVVocumAsfCqjkXnqiYMhmMwS/mEHLp7Vehlt3ql6lEig==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=8" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/supports-color": { + "version": "7.2.0", + "resolved": "https://registry.npmjs.org/supports-color/-/supports-color-7.2.0.tgz", + "integrity": "sha512-qpCAvRl9stuOHveKsn7HncJRvv501qIacKzQlO/+Lwxc9+0q2wLyv4Dfvt80/DPn2pqOBsJdDiogXGR9+OvwRw==", + "license": "MIT", + "dependencies": { + "has-flag": "^4.0.0" + }, + "engines": { + "node": ">=8" + } + }, + "node_modules/tailwind-merge": { + "version": "3.5.0", + "resolved": "https://registry.npmjs.org/tailwind-merge/-/tailwind-merge-3.5.0.tgz", + "integrity": "sha512-I8K9wewnVDkL1NTGoqWmVEIlUcB9gFriAEkXkfCjX5ib8ezGxtR3xD7iZIxrfArjEsH7F1CHD4RFUtxefdqV/A==", + "license": "MIT", + "funding": { + "type": "github", + "url": "https://github.com/sponsors/dcastil" + } + }, + "node_modules/tailwindcss": { + "version": "4.2.2", + "resolved": "https://registry.npmjs.org/tailwindcss/-/tailwindcss-4.2.2.tgz", + "integrity": "sha512-KWBIxs1Xb6NoLdMVqhbhgwZf2PGBpPEiwOqgI4pFIYbNTfBXiKYyWoTsXgBQ9WFg/OlhnvHaY+AEpW7wSmFo2Q==", + "license": "MIT" + }, + "node_modules/tapable": { + "version": "2.3.2", + "resolved": "https://registry.npmjs.org/tapable/-/tapable-2.3.2.tgz", + "integrity": "sha512-1MOpMXuhGzGL5TTCZFItxCc0AARf1EZFQkGqMm7ERKj8+Hgr5oLvJOVFcC+lRmR8hCe2S3jC4T5D7Vg/d7/fhA==", + "license": "MIT", + "engines": { + "node": ">=6" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/webpack" + } + }, + "node_modules/tar-fs": { + "version": "2.1.4", + "resolved": "https://registry.npmjs.org/tar-fs/-/tar-fs-2.1.4.tgz", + "integrity": "sha512-mDAjwmZdh7LTT6pNleZ05Yt65HC3E+NiQzl672vQG38jIrehtJk/J3mNwIg+vShQPcLF/LV7CMnDW6vjj6sfYQ==", + "license": "MIT", + "dependencies": { + "chownr": "^1.1.1", + "mkdirp-classic": "^0.5.2", + "pump": "^3.0.0", + "tar-stream": "^2.1.4" + } + }, + "node_modules/tar-stream": { + "version": "2.2.0", + "resolved": "https://registry.npmjs.org/tar-stream/-/tar-stream-2.2.0.tgz", + "integrity": "sha512-ujeqbceABgwMZxEJnk2HDY2DlnUZ+9oEcb1KzTVfYHio0UE6dG71n60d8D2I4qNvleWrrXpmjpt7vZeF1LnMZQ==", + "license": "MIT", + "dependencies": { + "bl": "^4.0.3", + "end-of-stream": "^1.4.1", + "fs-constants": "^1.0.0", + "inherits": "^2.0.3", + "readable-stream": "^3.1.1" + }, + "engines": { + "node": ">=6" + } + }, + "node_modules/tiny-invariant": { + "version": "1.3.3", + "resolved": "https://registry.npmjs.org/tiny-invariant/-/tiny-invariant-1.3.3.tgz", + "integrity": "sha512-+FbBPE1o9QAYvviau/qC5SE3caw21q3xkvWKBtja5vgqOWIHHJ3ioaq1VPfn/Szqctz2bU/oYeKd9/z5BL+PVg==", + "license": "MIT" + }, + "node_modules/tinyglobby": { + "version": "0.2.15", + "resolved": "https://registry.npmjs.org/tinyglobby/-/tinyglobby-0.2.15.tgz", + "integrity": "sha512-j2Zq4NyQYG5XMST4cbs02Ak8iJUdxRM0XI5QyxXuZOzKOINmWurp3smXu3y5wDcJrptwpSjgXHzIQxR0omXljQ==", + "license": "MIT", + "dependencies": { + "fdir": "^6.5.0", + "picomatch": "^4.0.3" + }, + "engines": { + "node": ">=12.0.0" + }, + "funding": { + "url": "https://github.com/sponsors/SuperchupuDev" + } + }, + "node_modules/ts-api-utils": { + "version": "2.5.0", + "resolved": "https://registry.npmjs.org/ts-api-utils/-/ts-api-utils-2.5.0.tgz", + "integrity": "sha512-OJ/ibxhPlqrMM0UiNHJ/0CKQkoKF243/AEmplt3qpRgkW8VG7IfOS41h7V8TjITqdByHzrjcS/2si+y4lIh8NA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=18.12" + }, + "peerDependencies": { + "typescript": ">=4.8.4" + } + }, + "node_modules/ts-morph": { + "version": "27.0.2", + "resolved": "https://registry.npmjs.org/ts-morph/-/ts-morph-27.0.2.tgz", + "integrity": "sha512-fhUhgeljcrdZ+9DZND1De1029PrE+cMkIP7ooqkLRTrRLTqcki2AstsyJm0vRNbTbVCNJ0idGlbBrfqc7/nA8w==", + "license": "MIT", + "dependencies": { + "@ts-morph/common": "~0.28.1", + "code-block-writer": "^13.0.3" + } + }, + "node_modules/tslib": { + "version": "2.8.1", + "resolved": "https://registry.npmjs.org/tslib/-/tslib-2.8.1.tgz", + "integrity": "sha512-oJFu94HQb+KVduSUQL7wnpmqnfmLsOA/nAh6b6EH0wCEoK0/mPeXU6c3wKDV83MkOuHPRHtSXKKU99IBazS/2w==", + "license": "0BSD" + }, + "node_modules/tunnel-agent": { + "version": "0.6.0", + "resolved": "https://registry.npmjs.org/tunnel-agent/-/tunnel-agent-0.6.0.tgz", + "integrity": "sha512-McnNiV1l8RYeY8tBgEpuodCC1mLUdbSN+CYBL7kJsJNInOP8UjDDEwdk6Mw60vdLLrr5NHKZhMAOSrR2NZuQ+w==", + "license": "Apache-2.0", + "dependencies": { + "safe-buffer": "^5.0.1" + }, + "engines": { + "node": "*" + } + }, + "node_modules/tw-animate-css": { + "version": "1.4.0", + "resolved": "https://registry.npmjs.org/tw-animate-css/-/tw-animate-css-1.4.0.tgz", + "integrity": "sha512-7bziOlRqH0hJx80h/3mbicLW7o8qLsH5+RaLR2t+OHM3D0JlWGODQKQ4cxbK7WlvmUxpcj6Kgu6EKqjrGFe3QQ==", + "dev": true, + "license": "MIT", + "funding": { + "url": "https://github.com/sponsors/Wombosvideo" + } + }, + "node_modules/type-check": { + "version": "0.4.0", + "resolved": "https://registry.npmjs.org/type-check/-/type-check-0.4.0.tgz", + "integrity": "sha512-XleUoc9uwGXqjWwXaUTZAmzMcFZ5858QA2vvx1Ur5xIcixXIP+8LnFDgRplU30us6teqdlskFfu+ae4K79Ooew==", + "dev": true, + "license": "MIT", + "dependencies": { + "prelude-ls": "^1.2.1" + }, + "engines": { + "node": ">= 0.8.0" + } + }, + "node_modules/typescript": { + "version": "5.9.3", + "resolved": "https://registry.npmjs.org/typescript/-/typescript-5.9.3.tgz", + "integrity": "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==", + "dev": true, + "license": "Apache-2.0", + "bin": { + "tsc": "bin/tsc", + "tsserver": "bin/tsserver" + }, + "engines": { + "node": ">=14.17" + } + }, + "node_modules/typescript-eslint": { + "version": "8.58.0", + "resolved": "https://registry.npmjs.org/typescript-eslint/-/typescript-eslint-8.58.0.tgz", + "integrity": "sha512-e2TQzKfaI85fO+F3QywtX+tCTsu/D3WW5LVU6nz8hTFKFZ8yBJ6mSYRpXqdR3mFjPWmO0eWsTa5f+UpAOe/FMA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@typescript-eslint/eslint-plugin": "8.58.0", + "@typescript-eslint/parser": "8.58.0", + "@typescript-eslint/typescript-estree": "8.58.0", + "@typescript-eslint/utils": "8.58.0" + }, + "engines": { + "node": "^18.18.0 || ^20.9.0 || >=21.1.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/typescript-eslint" + }, + "peerDependencies": { + "eslint": "^8.57.0 || ^9.0.0 || ^10.0.0", + "typescript": ">=4.8.4 <6.1.0" + } + }, + "node_modules/undici-types": { + "version": "7.16.0", + "resolved": "https://registry.npmjs.org/undici-types/-/undici-types-7.16.0.tgz", + "integrity": "sha512-Zz+aZWSj8LE6zoxD+xrjh4VfkIG8Ya6LvYkZqtUQGJPZjYl53ypCaUwWqo7eI0x66KBGeRo+mlBEkMSeSZ38Nw==", + "devOptional": true, + "license": "MIT" + }, + "node_modules/update-browserslist-db": { + "version": "1.2.3", + "resolved": "https://registry.npmjs.org/update-browserslist-db/-/update-browserslist-db-1.2.3.tgz", + "integrity": "sha512-Js0m9cx+qOgDxo0eMiFGEueWztz+d4+M3rGlmKPT+T4IS/jP4ylw3Nwpu6cpTTP8R1MAC1kF4VbdLt3ARf209w==", + "dev": true, + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/browserslist" + }, + { + "type": "tidelift", + "url": "https://tidelift.com/funding/github/npm/browserslist" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "dependencies": { + "escalade": "^3.2.0", + "picocolors": "^1.1.1" + }, + "bin": { + "update-browserslist-db": "cli.js" + }, + "peerDependencies": { + "browserslist": ">= 4.21.0" + } + }, + "node_modules/uri-js": { + "version": "4.4.1", + "resolved": "https://registry.npmjs.org/uri-js/-/uri-js-4.4.1.tgz", + "integrity": "sha512-7rKUyy33Q1yc98pQ1DAmLtwX109F7TIfWlW1Ydo8Wl1ii1SeHieeh0HHfPeL2fMXK6z0s8ecKs9frCuLJvndBg==", + "dev": true, + "license": "BSD-2-Clause", + "dependencies": { + "punycode": "^2.1.0" + } + }, + "node_modules/use-callback-ref": { + "version": "1.3.3", + "resolved": "https://registry.npmjs.org/use-callback-ref/-/use-callback-ref-1.3.3.tgz", + "integrity": "sha512-jQL3lRnocaFtu3V00JToYz/4QkNWswxijDaCVNZRiRTO3HQDLsdu1ZtmIUvV4yPp+rvWm5j0y0TG/S61cuijTg==", + "license": "MIT", + "dependencies": { + "tslib": "^2.0.0" + }, + "engines": { + "node": ">=10" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8.0 || ^17.0.0 || ^18.0.0 || ^19.0.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/use-sidecar": { + "version": "1.1.3", + "resolved": "https://registry.npmjs.org/use-sidecar/-/use-sidecar-1.1.3.tgz", + "integrity": "sha512-Fedw0aZvkhynoPYlA5WXrMCAMm+nSWdZt6lzJQ7Ok8S6Q+VsHmHpRWndVRJ8Be0ZbkfPc5LRYH+5XrzXcEeLRQ==", + "license": "MIT", + "dependencies": { + "detect-node-es": "^1.1.0", + "tslib": "^2.0.0" + }, + "engines": { + "node": ">=10" + }, + "peerDependencies": { + "@types/react": "*", + "react": "^16.8.0 || ^17.0.0 || ^18.0.0 || ^19.0.0 || ^19.0.0-rc" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/util-deprecate": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/util-deprecate/-/util-deprecate-1.0.2.tgz", + "integrity": "sha512-EPD5q1uXyFxJpCrLnCc1nHnq3gOa6DZBocAIiI2TaSCA7VCJ1UJDMagCzIkXNsUYfD1daK//LTEQ8xiIbrHtcw==", + "license": "MIT" + }, + "node_modules/uuid": { + "version": "8.3.2", + "resolved": "https://registry.npmjs.org/uuid/-/uuid-8.3.2.tgz", + "integrity": "sha512-+NYs2QeMWy+GWFOEm9xnn6HCDp0l7QBD7ml8zLUmJ+93Q5NF0NocErnwkTkXVFNiX3/fpC6afS8Dhb/gz7R7eg==", + "license": "MIT", + "bin": { + "uuid": "dist/bin/uuid" + } + }, + "node_modules/victory-vendor": { + "version": "36.9.2", + "resolved": "https://registry.npmjs.org/victory-vendor/-/victory-vendor-36.9.2.tgz", + "integrity": "sha512-PnpQQMuxlwYdocC8fIJqVXvkeViHYzotI+NJrCuav0ZYFoq912ZHBk3mCeuj+5/VpodOjPe1z0Fk2ihgzlXqjQ==", + "license": "MIT AND ISC", + "dependencies": { + "@types/d3-array": "^3.0.3", + "@types/d3-ease": "^3.0.0", + "@types/d3-interpolate": "^3.0.1", + "@types/d3-scale": "^4.0.2", + "@types/d3-shape": "^3.1.0", + "@types/d3-time": "^3.0.0", + "@types/d3-timer": "^3.0.0", + "d3-array": "^3.1.6", + "d3-ease": "^3.0.1", + "d3-interpolate": "^3.0.1", + "d3-scale": "^4.0.2", + "d3-shape": "^3.1.0", + "d3-time": "^3.0.0", + "d3-timer": "^3.0.1" + } + }, + "node_modules/vite": { + "version": "7.3.1", + "resolved": "https://registry.npmjs.org/vite/-/vite-7.3.1.tgz", + "integrity": "sha512-w+N7Hifpc3gRjZ63vYBXA56dvvRlNWRczTdmCBBa+CotUzAPf5b7YMdMR/8CQoeYE5LX3W4wj6RYTgonm1b9DA==", + "license": "MIT", + "dependencies": { + "esbuild": "^0.27.0", + "fdir": "^6.5.0", + "picomatch": "^4.0.3", + "postcss": "^8.5.6", + "rollup": "^4.43.0", + "tinyglobby": "^0.2.15" + }, + "bin": { + "vite": "bin/vite.js" + }, + "engines": { + "node": "^20.19.0 || >=22.12.0" + }, + "funding": { + "url": "https://github.com/vitejs/vite?sponsor=1" + }, + "optionalDependencies": { + "fsevents": "~2.3.3" + }, + "peerDependencies": { + "@types/node": "^20.19.0 || >=22.12.0", + "jiti": ">=1.21.0", + "less": "^4.0.0", + "lightningcss": "^1.21.0", + "sass": "^1.70.0", + "sass-embedded": "^1.70.0", + "stylus": ">=0.54.8", + "sugarss": "^5.0.0", + "terser": "^5.16.0", + "tsx": "^4.8.1", + "yaml": "^2.4.2" + }, + "peerDependenciesMeta": { + "@types/node": { + "optional": true + }, + "jiti": { + "optional": true + }, + "less": { + "optional": true + }, + "lightningcss": { + "optional": true + }, + "sass": { + "optional": true + }, + "sass-embedded": { + "optional": true + }, + "stylus": { + "optional": true + }, + "sugarss": { + "optional": true + }, + "terser": { + "optional": true + }, + "tsx": { + "optional": true + }, + "yaml": { + "optional": true + } + } + }, + "node_modules/which": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/which/-/which-2.0.2.tgz", + "integrity": "sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA==", + "dev": true, + "license": "ISC", + "dependencies": { + "isexe": "^2.0.0" + }, + "bin": { + "node-which": "bin/node-which" + }, + "engines": { + "node": ">= 8" + } + }, + "node_modules/word-wrap": { + "version": "1.2.5", + "resolved": "https://registry.npmjs.org/word-wrap/-/word-wrap-1.2.5.tgz", + "integrity": "sha512-BN22B5eaMMI9UMtjrGd5g5eCYPpCPDUy0FJXbYsaT5zYxjFOckS53SQDE3pWkVoWpHXVb3BrYcEN4Twa55B5cA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/wrappy": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/wrappy/-/wrappy-1.0.2.tgz", + "integrity": "sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ==", + "license": "ISC" + }, + "node_modules/yallist": { + "version": "3.1.1", + "resolved": "https://registry.npmjs.org/yallist/-/yallist-3.1.1.tgz", + "integrity": "sha512-a4UGQaWPH59mOXUYnAG2ewncQS4i4F43Tv3JoAM+s2VDAmS9NsK8GpDMLrCHPksFT7h3K6TOoUNn2pb7RoXx4g==", + "dev": true, + "license": "ISC" + }, + "node_modules/yocto-queue": { + "version": "0.1.0", + "resolved": "https://registry.npmjs.org/yocto-queue/-/yocto-queue-0.1.0.tgz", + "integrity": "sha512-rVksvsnNCdJ/ohGc6xgPwyN8eheCxsiLM8mxuE/t/mOVqJewPuO1miLpTHQiRgTKCLexL4MeAFVagts7HmNZ2Q==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/zod": { + "version": "3.24.4", + "resolved": "https://registry.npmjs.org/zod/-/zod-3.24.4.tgz", + "integrity": "sha512-OdqJE9UDRPwWsrHjLN2F8bPxvwJBK22EHLWtanu0LSYr5YqzsaaW3RMgmjwr8Rypg5k+meEJdSPXJZXE/yqOMg==", + "license": "MIT", + "funding": { + "url": "https://github.com/sponsors/colinhacks" + } + }, + "node_modules/zod-to-json-schema": { + "version": "3.24.5", + "resolved": "https://registry.npmjs.org/zod-to-json-schema/-/zod-to-json-schema-3.24.5.tgz", + "integrity": "sha512-/AuWwMP+YqiPbsJx5D6TfgRTc4kTLjsh5SOcd4bLsfUg2RcEXrFMJl1DGgdHy2aCfsIA/cr/1JM0xcB2GZji8g==", + "license": "ISC", + "peerDependencies": { + "zod": "^3.24.1" + } + }, + "node_modules/zustand": { + "version": "5.0.12", + "resolved": "https://registry.npmjs.org/zustand/-/zustand-5.0.12.tgz", + "integrity": "sha512-i77ae3aZq4dhMlRhJVCYgMLKuSiZAaUPAct2AksxQ+gOtimhGMdXljRT21P5BNpeT4kXlLIckvkPM029OljD7g==", + "license": "MIT", + "engines": { + "node": ">=12.20.0" + }, + "peerDependencies": { + "@types/react": ">=18.0.0", + "immer": ">=9.0.6", + "react": ">=18.0.0", + "use-sync-external-store": ">=1.2.0" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "immer": { + "optional": true + }, + "react": { + "optional": true + }, + "use-sync-external-store": { + "optional": true + } + } + } + } +} diff --git a/samples/mcs-finance-statement-agent/src/code-app/package.json b/samples/mcs-finance-statement-agent/src/code-app/package.json new file mode 100644 index 000000000..ee6a1bb9f --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/package.json @@ -0,0 +1,60 @@ +{ + "name": "power-apps-template-starter", + "private": true, + "version": "0.0.0", + "type": "module", + "scripts": { + "dev": "vite", + "build": "tsc -b && vite build", + "lint": "eslint .", + "preview": "vite preview" + }, + "dependencies": { + "@microsoft/power-apps": "^1.0.3", + "@radix-ui/react-checkbox": "^1.3.3", + "@radix-ui/react-dialog": "^1.1.15", + "@radix-ui/react-dropdown-menu": "^2.1.16", + "@radix-ui/react-label": "^2.1.7", + "@radix-ui/react-popover": "^1.1.15", + "@radix-ui/react-progress": "^1.1.7", + "@radix-ui/react-select": "^2.2.6", + "@radix-ui/react-separator": "^1.1.7", + "@radix-ui/react-slot": "^1.2.3", + "@radix-ui/react-tabs": "^1.1.13", + "@radix-ui/react-tooltip": "^1.2.8", + "@tailwindcss/vite": "^4.1.16", + "@tanstack/react-query": "^5.90.5", + "@tanstack/react-table": "^8.21.3", + "class-variance-authority": "^0.7.1", + "clsx": "^2.1.1", + "cmdk": "^1.1.1", + "date-fns": "^4.1.0", + "lucide-react": "^0.546.0", + "react": "^19.1.1", + "react-day-picker": "^9.11.1", + "react-dom": "^19.1.1", + "react-router-dom": "^7.9.4", + "recharts": "^2.15.4", + "sonner": "^2.0.7", + "tailwind-merge": "^3.3.1", + "tailwindcss": "^4.1.16", + "zustand": "^5.0.10" + }, + "devDependencies": { + "@eslint/js": "^9.36.0", + "@microsoft/power-apps-vite": "^1.0.2", + "@types/node": "^24.6.0", + "@types/react": "^19.1.16", + "@types/react-dom": "^19.1.9", + "@vitejs/plugin-react": "^5.0.4", + "eslint": "^9.36.0", + "eslint-plugin-react-hooks": "^5.2.0", + "eslint-plugin-react-refresh": "^0.4.22", + "globals": "^16.4.0", + "picocolors": "^1.1.1", + "tw-animate-css": "^1.4.0", + "typescript": "~5.9.3", + "typescript-eslint": "^8.45.0", + "vite": "^7.1.7" + } +} diff --git a/samples/mcs-finance-statement-agent/src/code-app/public/power-apps.svg b/samples/mcs-finance-statement-agent/src/code-app/public/power-apps.svg new file mode 100644 index 000000000..45685924d --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/public/power-apps.svg @@ -0,0 +1,55 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/samples/mcs-finance-statement-agent/src/code-app/src/App.tsx b/samples/mcs-finance-statement-agent/src/code-app/src/App.tsx new file mode 100644 index 000000000..45aefb915 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/src/App.tsx @@ -0,0 +1,14 @@ +import { SonnerProvider } from "@/providers/sonner-provider" +import { QueryProvider } from "./providers/query-provider" +import { RouterProvider } from "react-router-dom" +import { router } from "@/router" + +export default function App() { + return ( + + + + + + ) +} diff --git a/samples/mcs-finance-statement-agent/src/code-app/src/assets/react.svg b/samples/mcs-finance-statement-agent/src/code-app/src/assets/react.svg new file mode 100644 index 000000000..6c87de9bb --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/src/assets/react.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/samples/mcs-finance-statement-agent/src/code-app/src/components/mode-toggle.tsx b/samples/mcs-finance-statement-agent/src/code-app/src/components/mode-toggle.tsx new file mode 100644 index 000000000..ca82ad5f0 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/src/components/mode-toggle.tsx @@ -0,0 +1,37 @@ +import { Moon, Sun } from "lucide-react" + +import { Button } from "@/components/ui/button" +import { + DropdownMenu, + DropdownMenuContent, + DropdownMenuItem, + DropdownMenuTrigger, +} from "@/components/ui/dropdown-menu" +import { useTheme } from "@/hooks/use-theme" + +export function ModeToggle() { + const { setTheme } = useTheme() + + return ( + + + + + + setTheme("light")}> + Light + + setTheme("dark")}> + Dark + + setTheme("system")}> + System + + + + ) +} \ No newline at end of file diff --git a/samples/mcs-finance-statement-agent/src/code-app/src/components/review/approve-button.tsx b/samples/mcs-finance-statement-agent/src/code-app/src/components/review/approve-button.tsx new file mode 100644 index 000000000..bee0f2899 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/src/components/review/approve-button.tsx @@ -0,0 +1,111 @@ +import { useState } from "react"; +import { CheckCircle2, Loader2 } from "lucide-react"; +import { updateJobStatus } from "@/services/dataverse-service"; + +interface ApproveButtonProps { + jobRecordId: string; + jobId: string; + onApproved: () => void; + correctionCount: number; + fxTargetCurrency?: string | null; + fxSpotRate?: number | null; + fxAvgRate?: number | null; + fxRateDate?: string; +} + +export function ApproveButton({ + jobRecordId, + onApproved, + correctionCount, +}: ApproveButtonProps) { + const [applyLoading, setApplyLoading] = useState(false); + const [appliedCount, setAppliedCount] = useState(0); + const [approveLoading, setApproveLoading] = useState(false); + const [status, setStatus] = useState<"idle" | "approved">("idle"); + + const pendingCorrections = correctionCount - appliedCount; + const hasUnapplied = pendingCorrections > 0; + + const handleApplyCorrections = async () => { + setApplyLoading(true); + try { + // Corrections are already auto-saved per cell to Dataverse. + // This gives visual confirmation that all are persisted. + // A small delay to represent the bulk confirmation. + await new Promise((r) => setTimeout(r, 400)); + setAppliedCount(correctionCount); + } catch (err) { + console.error("Failed to apply corrections:", err); + } finally { + setApplyLoading(false); + } + }; + + const handleApproveAndGenerate = async () => { + setApproveLoading(true); + try { + // 1. Mark job as Approved in Dataverse + await updateJobStatus(jobRecordId, "Approved"); + setStatus("approved"); + onApproved(); + + // Excel generation is now handled via agent chat after approval + } catch (err) { + console.error("Failed to approve:", err); + } finally { + setApproveLoading(false); + } + }; + + if (status === "approved") { + return ( +
+
+ + Approved +
+ + Return to agent chat and say "generate Excel" + +
+ ); + } + + return ( +
+ {/* Apply Corrections button — always visible, disabled when nothing to apply */} + + + {/* Approve & Generate Excel button */} + +
+ ); +} diff --git a/samples/mcs-finance-statement-agent/src/code-app/src/components/review/editable-cell.tsx b/samples/mcs-finance-statement-agent/src/code-app/src/components/review/editable-cell.tsx new file mode 100644 index 000000000..bfb3e63d5 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/src/components/review/editable-cell.tsx @@ -0,0 +1,97 @@ +import { useState, useRef, useEffect } from "react"; + +interface EditableCellProps { + value: string | null; + correctedValue: string | null; + confidence: number | null; + flagReason?: string; + onSave: (newValue: string) => void; +} + +export function EditableCell({ value, correctedValue, confidence, flagReason, onSave }: EditableCellProps) { + const [editing, setEditing] = useState(false); + const [draft, setDraft] = useState(correctedValue || value || ""); + const inputRef = useRef(null); + + useEffect(() => { + if (editing && inputRef.current) { + inputRef.current.focus(); + inputRef.current.select(); + } + }, [editing]); + + // Reset draft when props change (e.g., unit switch) + useEffect(() => { + if (!editing) setDraft(correctedValue || value || ""); + }, [correctedValue, value]); + + const displayValue = correctedValue || value || ""; + const isCorrected = correctedValue !== null && correctedValue !== value; + + let bgClass = ""; + if (confidence !== null) { + if (confidence < 0.5) bgClass = "bg-red-50"; + else if (confidence < 0.7) bgClass = "bg-orange-50"; + else if (confidence < 0.85) bgClass = "bg-yellow-50"; + } + + if (editing) { + return ( + setDraft(e.target.value)} + onBlur={() => { + setEditing(false); + if (draft !== (correctedValue || value || "")) { + onSave(draft); + } + }} + onKeyDown={(e) => { + if (e.key === "Enter") { + setEditing(false); + if (draft !== (correctedValue || value || "")) { + onSave(draft); + } + } + if (e.key === "Escape") { + setEditing(false); + setDraft(correctedValue || value || ""); + } + }} + className="w-full px-1.5 py-0.5 text-sm border border-[#6B4EE6] rounded outline-none bg-white" + /> + ); + } + + // Determine flag badge text and color + let badgeText: string | null = null; + let badgeColor = "#d97706"; // amber-600 default + + if (isCorrected) { + badgeText = "\u2713 Resolved"; + badgeColor = "#059669"; // emerald-600 + } else if (flagReason) { + badgeText = flagReason; + } + + return ( +
setEditing(true)} + className={`px-1.5 py-0.5 text-sm cursor-pointer rounded ${bgClass} ${ + isCorrected ? "text-[#6B4EE6] font-medium" : "text-[#1a1a2e]" + } hover:ring-1 hover:ring-[#6B4EE6]`} + title={confidence !== null ? `Confidence: ${(confidence * 100).toFixed(0)}%` : undefined} + > + {displayValue || "\u2014"} + {badgeText && ( +
+ {badgeText} +
+ )} +
+ ); +} diff --git a/samples/mcs-finance-statement-agent/src/code-app/src/components/review/line-item-table.tsx b/samples/mcs-finance-statement-agent/src/code-app/src/components/review/line-item-table.tsx new file mode 100644 index 000000000..b4a9e3a8c --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/src/components/review/line-item-table.tsx @@ -0,0 +1,310 @@ +import { useMemo } from "react"; +import { + useReactTable, + getCoreRowModel, + flexRender, + createColumnHelper, +} from "@tanstack/react-table"; +import { AlertTriangle } from "lucide-react"; +import { EditableCell } from "./editable-cell"; +import { useAppStore } from "@/store/app-store"; +import { updateLineItemCorrection } from "@/services/dataverse-service"; +import type { ReviewLineItem } from "@/types/extraction"; + +interface PivotedRow { + row: ReviewLineItem; + values: Map; + isFlagged: boolean; +} + +const columnHelper = createColumnHelper(); + +const UNIT_DIVISORS: Record = { + ones: 1, thousands: 1_000, millions: 1_000_000, billions: 1_000_000_000, +}; + +interface LineItemTableProps { + items: ReviewLineItem[]; + fxRate: number | null; // The rate to apply (spot for BS, avg for IS/CF) + fxTargetCurrency: string | null; + displayUnit?: string; // "ones" | "thousands" | "millions" | "billions" +} + +/** Format period header: "Quarterly Q4 2025" / "Annual FY 2025" / raw label */ +function formatPeriodHeader(label: string, periodType: string): string { + if (!periodType || !label) return label; + const pt = periodType.toLowerCase(); + if (pt === "quarterly" || pt === "quarter") { + return label.match(/Q\d/i) ? `Quarterly ${label}` : `Quarterly ${label}`; + } + if (pt === "annual" || pt === "year") { + return `Annual FY ${label}`; + } + return label; +} + +/** Compute delta percentage between two numeric values */ +function computeDelta(current: number | null, prior: number | null): string | null { + if (current === null || prior === null || prior === 0) return null; + const pct = ((current - prior) / Math.abs(prior)) * 100; + const sign = pct >= 0 ? "+" : ""; + return `${sign}${pct.toFixed(1)}%`; +} + +interface LineItemTablePropsWithSource extends LineItemTableProps { + sourceUnit?: string; // The unit Dataverse values are actually stored in (from job.currencyUnit) +} + +export function LineItemTable({ items, fxRate, fxTargetCurrency, displayUnit = "ones", sourceUnit = "ones" }: LineItemTablePropsWithSource) { + // Dataverse values are in sourceUnit. To display in targetUnit: + // displayValue = dataverseValue × (sourceDivisor / targetDivisor) + const sourceDivisor = UNIT_DIVISORS[sourceUnit] || 1; + const targetDivisor = UNIT_DIVISORS[displayUnit] || 1; + const unitScaleFactor = sourceDivisor / targetDivisor; + const updateLineItem = useAppStore((s) => s.updateLineItem); + + // Get unique periods — deduplicate by period label (not columnIndex) + const periods = useMemo(() => { + const seen = new Map(); + items.forEach((item) => { + const key = `${item.periodType}|${item.period}`; + if (!seen.has(key)) { + seen.set(key, { colIdx: item.columnIndex, label: item.period, periodType: item.periodType }); + } + }); + return Array.from(seen.values()).sort((a, b) => a.colIdx - b.colIdx); + }, [items]); + + // Delta columns disabled — shown only in Excel if needed + const deltaPairs: { currentIdx: number; priorIdx: number; label: string }[] = []; + + // Pivot: group by rowIndex, spread periods as columns + const pivotedRows = useMemo(() => { + const grouped = new Map(); + items.forEach((item) => { + if (!grouped.has(item.rowIndex)) { + const isFlagged = item.reviewStatus === "Flagged" || (item.aiConfidence !== null && item.aiConfidence < 0.7); + grouped.set(item.rowIndex, { row: item, values: new Map(), isFlagged }); + } + const entry = grouped.get(item.rowIndex)!; + entry.values.set(item.columnIndex, item); + // A row is flagged if ANY cell in it is flagged + if (item.reviewStatus === "Flagged" || (item.aiConfidence !== null && item.aiConfidence < 0.7)) { + entry.isFlagged = true; + } + }); + return Array.from(grouped.values()).sort((a, b) => a.row.rowIndex - b.row.rowIndex); + }, [items]); + + const columns = useMemo(() => { + const cols: any[] = [ + columnHelper.display({ + id: "label", + header: "Line Item", + cell: ({ row }) => { + const item = row.original.row; + const indent = item.indentLevel * 16; + const isHeader = item.rowType === "SectionHeader"; + const isTotal = item.rowType === "Total" || item.rowType === "Subtotal"; + const isFlagged = row.original.isFlagged; + return ( +
+ {isFlagged && !isHeader && ( + + )} + {item.lineItemName || item.labelRaw} +
+ ); + }, + size: 300, + }), + ]; + + // Add period columns with delta columns interleaved + let deltaIdx = 0; + periods.forEach((p) => { + // Period value column + cols.push( + columnHelper.display({ + id: `period_${p.colIdx}`, + header: fxTargetCurrency + ? `${formatPeriodHeader(p.label, p.periodType)} (${fxTargetCurrency})` + : formatPeriodHeader(p.label, p.periodType), + cell: ({ row }) => { + const valueItem = row.original.values.get(p.colIdx); + if (!valueItem) return
{"\u2014"}
; + + // For display: use valueRaw (full precision) when showing in original unit, + // use valueNormalized × scaleFactor when user changes unit + const rawNumeric = valueItem.valueRaw ? parseFloat(valueItem.valueRaw.replace(/,/g, "")) : null; + const scaledValue = displayUnit === "ones" && rawNumeric !== null && !isNaN(rawNumeric) + ? rawNumeric // Full precision from original extraction + : valueItem.valueNormalized !== null + ? valueItem.valueNormalized * unitScaleFactor + : null; + const showConverted = fxRate && fxTargetCurrency && scaledValue !== null; + const convertedValue = showConverted ? scaledValue! * fxRate : null; + + // Compute flag reason for this cell + let flagReason: string | undefined; + if (valueItem.aiConfidence !== null && valueItem.aiConfidence < 0.5) { + flagReason = `\u26A0 Low confidence: ${Math.round(valueItem.aiConfidence * 100)}%`; + } else if (valueItem.aiConfidence !== null && valueItem.aiConfidence < 0.7) { + flagReason = `\u26A0 Confidence: ${Math.round(valueItem.aiConfidence * 100)}%`; + } else if ( + (valueItem.valueRaw === null || valueItem.valueRaw === "") && + valueItem.rowType === "LineItem" + ) { + flagReason = "\u26A0 Value expected"; + } + + return ( +
+ {convertedValue !== null ? ( + <> + {/* Show converted value as primary */} +
+ {convertedValue.toLocaleString(undefined, { minimumFractionDigits: 2, maximumFractionDigits: 2 })} +
+ {/* Original value in small gray text */} +
+ {valueItem.valueRaw || "\u2014"} +
+ + ) : ( + { + // Reverse-scale: user edits in display unit, save in Dataverse unit + const numericVal = parseFloat(newValue.replace(/,/g, "")); + const dataverseUnitVal = !isNaN(numericVal) + ? (numericVal / unitScaleFactor).toString() + : newValue; + updateLineItem(valueItem.recordId, { + analystCorrectedValue: dataverseUnitVal, + reviewStatus: "Corrected", + }); + updateLineItemCorrection(valueItem.recordId, dataverseUnitVal).catch(console.error); + }} + /> + )} +
+ ); + }, + size: 140, + }) + ); + + // Check if this period starts a delta pair + const pair = deltaPairs[deltaIdx]; + if (pair && pair.currentIdx === p.colIdx) { + cols.push( + columnHelper.display({ + id: `delta_${pair.currentIdx}_${pair.priorIdx}`, + header: pair.label, + cell: ({ row }) => { + if (row.original.row.rowType === "SectionHeader") return null; + const currItem = row.original.values.get(pair.currentIdx); + const priorItem = row.original.values.get(pair.priorIdx); + const delta = computeDelta( + currItem?.valueNormalized ?? null, + priorItem?.valueNormalized ?? null, + ); + if (!delta) return
{"\u2014"}
; + const isPositive = delta.startsWith("+"); + const isLarge = Math.abs(parseFloat(delta)) > 100; + return ( +
+ {delta} +
+ ); + }, + size: 130, + }) + ); + deltaIdx++; + } + }); + + return cols; + }, [periods, deltaPairs, updateLineItem, fxRate, fxTargetCurrency, unitScaleFactor, displayUnit]); + + const table = useReactTable({ + data: pivotedRows, + columns, + getCoreRowModel: getCoreRowModel(), + }); + + if (items.length === 0) { + return ( +
+ No line items for this statement. +
+ ); + } + + return ( +
+
+ + {table.getHeaderGroups().map((headerGroup) => ( + + {headerGroup.headers.map((header) => ( + + ))} + + ))} + + + {table.getRowModel().rows.map((row) => { + const isHeader = row.original.row.rowType === "SectionHeader"; + const isFlagged = row.original.isFlagged; + return ( + + {row.getVisibleCells().map((cell) => ( + + ))} + + ); + })} + +
+ {flexRender(header.column.columnDef.header, header.getContext())} +
+ {flexRender(cell.column.columnDef.cell, cell.getContext())} +
+ + ); +} diff --git a/samples/mcs-finance-statement-agent/src/code-app/src/components/review/statement-tabs.tsx b/samples/mcs-finance-statement-agent/src/code-app/src/components/review/statement-tabs.tsx new file mode 100644 index 000000000..32e993b2f --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/src/components/review/statement-tabs.tsx @@ -0,0 +1,44 @@ +import type { ReviewStatement } from "@/types/extraction"; + +interface StatementTabsProps { + statements: ReviewStatement[]; + activeTab: string; + onTabChange: (statementType: string) => void; + itemCounts: Record; + flagCounts: Record; +} + +export function StatementTabs({ statements, activeTab, onTabChange, itemCounts, flagCounts }: StatementTabsProps) { + const TAB_LABELS: Record = { + IncomeStatement: "Income Statement", + BalanceSheet: "Balance Sheet", + CashFlow: "Cash Flow", + }; + + return ( +
+ {statements.map((stmt) => { + const isActive = activeTab === stmt.statementType; + const count = itemCounts[stmt.statementType] || 0; + const flags = flagCounts[stmt.statementType] || 0; + return ( + + ); + })} +
+ ); +} diff --git a/samples/mcs-finance-statement-agent/src/code-app/src/components/review/summary-bar.tsx b/samples/mcs-finance-statement-agent/src/code-app/src/components/review/summary-bar.tsx new file mode 100644 index 000000000..ac08f7d5a --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/src/components/review/summary-bar.tsx @@ -0,0 +1,285 @@ +import { useState } from "react"; +import type { ReviewJob, ReviewLineItem, FxConversionState } from "@/types/extraction"; + +interface SummaryBarProps { + job: ReviewJob; + lineItems: ReviewLineItem[]; + onJumpToFlags?: () => void; + onJumpToNextUnresolved?: () => void; + fx: FxConversionState; + onFxCurrencyChange: (currency: string | null) => void; + onFxRateChange: (spotRate: number, avgRate: number) => void; + displayUnit?: string; + onDisplayUnitChange?: (unit: string) => void; +} + +const CURRENCIES = [ + { code: "USD", symbol: "$", name: "US Dollar" }, + { code: "AUD", symbol: "A$", name: "Australian Dollar" }, + { code: "EUR", symbol: "\u20ac", name: "Euro" }, + { code: "GBP", symbol: "\u00a3", name: "British Pound" }, + { code: "HKD", symbol: "HK$", name: "Hong Kong Dollar" }, + { code: "SGD", symbol: "S$", name: "Singapore Dollar" }, + { code: "JPY", symbol: "\u00a5", name: "Japanese Yen" }, +]; + +const DISPLAY_UNITS = [ + { code: "ones", label: "Ones (as reported)" }, + { code: "thousands", label: "Thousands" }, + { code: "millions", label: "Millions" }, + { code: "billions", label: "Billions" }, +]; + +export function SummaryBar({ job, lineItems, onJumpToFlags, onJumpToNextUnresolved, fx, onFxCurrencyChange, onFxRateChange, displayUnit, onDisplayUnitChange }: SummaryBarProps) { + const [editingSpotRate, setEditingSpotRate] = useState(false); + const [editingAvgRate, setEditingAvgRate] = useState(false); + const [tempSpotRate, setTempSpotRate] = useState(""); + const [tempAvgRate, setTempAvgRate] = useState(""); + + const totalItems = lineItems.length; + + // A flagged item is one that was originally flagged OR has low confidence + const flaggedItems = lineItems.filter( + (i) => i.reviewStatus === "Flagged" || (i.aiConfidence !== null && i.aiConfidence < 0.7) + ); + const totalFlagged = flaggedItems.length; + + // Resolved = items that were flagged but now corrected or accepted + const resolvedCount = flaggedItems.filter( + (i) => i.reviewStatus === "Corrected" + ).length; + + // Still-unresolved flagged count (for the old display) + const unresolvedCount = totalFlagged - resolvedCount; + + // Breakdown of flag reasons + const lowConfidenceCount = lineItems.filter( + (i) => i.aiConfidence !== null && i.aiConfidence < 0.7 && i.reviewStatus !== "Corrected" + ).length; + + const correctedCount = lineItems.filter((i) => i.reviewStatus === "Corrected").length; + const itemsWithConf = lineItems.filter((i) => i.aiConfidence !== null); + const avgConf = itemsWithConf.length > 0 + ? itemsWithConf.reduce((sum, i) => sum + i.aiConfidence!, 0) / itemsWithConf.length + : 0; + const confPct = Math.round(avgConf * 100); + const confColor = confPct >= 85 ? "#22c55e" : confPct >= 60 ? "#f59e0b" : "#dc2626"; + + const allResolved = totalFlagged > 0 && resolvedCount >= totalFlagged; + + // Don't show source currency in the dropdown + const availableCurrencies = CURRENCIES.filter((c) => c.code !== job.currency); + + function handleSpotRateSave() { + const val = parseFloat(tempSpotRate); + if (!isNaN(val) && val > 0) { + onFxRateChange(val, fx.avgRate || val); + } + setEditingSpotRate(false); + } + + function handleAvgRateSave() { + const val = parseFloat(tempAvgRate); + if (!isNaN(val) && val > 0) { + onFxRateChange(fx.spotRate || val, val); + } + setEditingAvgRate(false); + } + + // Build progress bar string (visual block representation) + function renderProgressBlocks() { + const totalBlocks = 10; + const filledBlocks = Math.round((resolvedCount / Math.max(totalFlagged, 1)) * totalBlocks); + const filled = "\u2588".repeat(filledBlocks); + const empty = "\u2591".repeat(totalBlocks - filledBlocks); + return `${filled}${empty}`; + } + + return ( +
+ {/* Main header row */} +
+ {/* Company + periods */} +
+ {job.companyName} + + Periods: {job.periods} · Statements: {job.statementsFound} + {job.currency && <> · {job.currency} {job.currencyUnit && `(${job.currencyUnit})`}} + +
+ + {/* Confidence bar */} +
+
+
+
+ + {confPct}% + +
+ + {/* Counts */} +
+ {totalItems} items + · + + · + {correctedCount} corrected +
+ + {/* Status badge */} + + {job.status} + +
+ + {/* Unit + FX Conversion bar */} +
+ {/* Display unit selector */} + Unit: + + + | + + FX Conversion: + + {/* Currency selector */} + + + {/* Rates (shown only when conversion active) */} + {fx.targetCurrency && ( + <> + {fx.isLoading ? ( + Fetching rates... + ) : ( + <> + {/* Spot rate (BS) */} +
+ Spot (BS): + {editingSpotRate ? ( + setTempSpotRate(e.target.value)} + onBlur={handleSpotRateSave} + onKeyDown={(e) => e.key === "Enter" && handleSpotRateSave()} + autoFocus + className="w-20 border border-[#6B4EE6] rounded px-1 py-0.5 text-xs focus:outline-none" + /> + ) : ( + + )} +
+ + {/* Average rate (IS/CF) */} +
+ Avg (IS/CF): + {editingAvgRate ? ( + setTempAvgRate(e.target.value)} + onBlur={handleAvgRateSave} + onKeyDown={(e) => e.key === "Enter" && handleAvgRateSave()} + autoFocus + className="w-20 border border-[#6B4EE6] rounded px-1 py-0.5 text-xs focus:outline-none" + /> + ) : ( + + )} +
+ + + as at {fx.rateDate} · {fx.rateSource} + + + )} + + )} +
+ + {/* Alert / Progress banner */} + {totalFlagged > 0 && ( +
+ {allResolved ? ( +
+ {"\u2713"} All items reviewed + Ready to approve. +
+ ) : ( +
+ {/* Progress bar */} + 0 ? "#d97706" : "#059669" }}> + {renderProgressBlocks()} {resolvedCount}/{totalFlagged} + + + Review Progress: {resolvedCount} of {totalFlagged} resolved + + {/* Breakdown */} + {lowConfidenceCount > 0 && ( + <> + · + {lowConfidenceCount} low confidence + + )} + {/* Jump to next unresolved */} + +
+ )} +
+ )} +
+ ); +} diff --git a/samples/mcs-finance-statement-agent/src/code-app/src/components/ui/badge.tsx b/samples/mcs-finance-statement-agent/src/code-app/src/components/ui/badge.tsx new file mode 100644 index 000000000..fd3a406ba --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/src/components/ui/badge.tsx @@ -0,0 +1,46 @@ +import * as React from "react" +import { Slot } from "@radix-ui/react-slot" +import { cva, type VariantProps } from "class-variance-authority" + +import { cn } from "@/lib/utils" + +const badgeVariants = cva( + "inline-flex items-center justify-center rounded-full border px-2 py-0.5 text-xs font-medium w-fit whitespace-nowrap shrink-0 [&>svg]:size-3 gap-1 [&>svg]:pointer-events-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive transition-[color,box-shadow] overflow-hidden", + { + variants: { + variant: { + default: + "border-transparent bg-primary text-primary-foreground [a&]:hover:bg-primary/90", + secondary: + "border-transparent bg-secondary text-secondary-foreground [a&]:hover:bg-secondary/90", + destructive: + "border-transparent bg-destructive text-white [a&]:hover:bg-destructive/90 focus-visible:ring-destructive/20 dark:focus-visible:ring-destructive/40 dark:bg-destructive/60", + outline: + "text-foreground [a&]:hover:bg-accent [a&]:hover:text-accent-foreground", + }, + }, + defaultVariants: { + variant: "default", + }, + } +) + +function Badge({ + className, + variant, + asChild = false, + ...props +}: React.ComponentProps<"span"> & + VariantProps & { asChild?: boolean }) { + const Comp = asChild ? Slot : "span" + + return ( + + ) +} + +export { Badge, badgeVariants } diff --git a/samples/mcs-finance-statement-agent/src/code-app/src/components/ui/button.tsx b/samples/mcs-finance-statement-agent/src/code-app/src/components/ui/button.tsx new file mode 100644 index 000000000..21409a066 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/src/components/ui/button.tsx @@ -0,0 +1,60 @@ +import * as React from "react" +import { Slot } from "@radix-ui/react-slot" +import { cva, type VariantProps } from "class-variance-authority" + +import { cn } from "@/lib/utils" + +const buttonVariants = cva( + "inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-md text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none [&_svg:not([class*='size-'])]:size-4 shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive", + { + variants: { + variant: { + default: "bg-primary text-primary-foreground hover:bg-primary/90", + destructive: + "bg-destructive text-white hover:bg-destructive/90 focus-visible:ring-destructive/20 dark:focus-visible:ring-destructive/40 dark:bg-destructive/60", + outline: + "border bg-background shadow-xs hover:bg-accent hover:text-accent-foreground dark:bg-input/30 dark:border-input dark:hover:bg-input/50", + secondary: + "bg-secondary text-secondary-foreground hover:bg-secondary/80", + ghost: + "hover:bg-accent hover:text-accent-foreground dark:hover:bg-accent/50", + link: "text-primary underline-offset-4 hover:underline", + }, + size: { + default: "h-9 px-4 py-2 has-[>svg]:px-3", + sm: "h-8 rounded-md gap-1.5 px-3 has-[>svg]:px-2.5", + lg: "h-10 rounded-md px-6 has-[>svg]:px-4", + icon: "size-9", + "icon-sm": "size-8", + "icon-lg": "size-10", + }, + }, + defaultVariants: { + variant: "default", + size: "default", + }, + } +) + +function Button({ + className, + variant, + size, + asChild = false, + ...props +}: React.ComponentProps<"button"> & + VariantProps & { + asChild?: boolean + }) { + const Comp = asChild ? Slot : "button" + + return ( + + ) +} + +export { Button, buttonVariants } diff --git a/samples/mcs-finance-statement-agent/src/code-app/src/components/ui/calendar.tsx b/samples/mcs-finance-statement-agent/src/code-app/src/components/ui/calendar.tsx new file mode 100644 index 000000000..5f62ae943 --- /dev/null +++ b/samples/mcs-finance-statement-agent/src/code-app/src/components/ui/calendar.tsx @@ -0,0 +1,211 @@ +import * as React from "react" +import { + ChevronDownIcon, + ChevronLeftIcon, + ChevronRightIcon, +} from "lucide-react" +import { DayButton, DayPicker, getDefaultClassNames } from "react-day-picker" + +import { cn } from "@/lib/utils" +import { Button, buttonVariants } from "@/components/ui/button" + +function Calendar({ + className, + classNames, + showOutsideDays = true, + captionLayout = "label", + buttonVariant = "ghost", + formatters, + components, + ...props +}: React.ComponentProps & { + buttonVariant?: React.ComponentProps["variant"] +}) { + const defaultClassNames = getDefaultClassNames() + + return ( + svg]:rotate-180`, + String.raw`rtl:**:[.rdp-button\_previous>svg]:rotate-180`, + className + )} + captionLayout={captionLayout} + formatters={{ + formatMonthDropdown: (date) => + date.toLocaleString("default", { month: "short" }), + ...formatters, + }} + classNames={{ + root: cn("w-fit", defaultClassNames.root), + months: cn( + "flex gap-4 flex-col md:flex-row relative", + defaultClassNames.months + ), + month: cn("flex flex-col w-full gap-4", defaultClassNames.month), + nav: cn( + "flex items-center gap-1 w-full absolute top-0 inset-x-0 justify-between", + defaultClassNames.nav + ), + button_previous: cn( + buttonVariants({ variant: buttonVariant }), + "size-(--cell-size) aria-disabled:opacity-50 p-0 select-none", + defaultClassNames.button_previous + ), + button_next: cn( + buttonVariants({ variant: buttonVariant }), + "size-(--cell-size) aria-disabled:opacity-50 p-0 select-none", + defaultClassNames.button_next + ), + month_caption: cn( + "flex items-center justify-center h-(--cell-size) w-full px-(--cell-size)", + defaultClassNames.month_caption + ), + dropdowns: cn( + "w-full flex items-center text-sm font-medium justify-center h-(--cell-size) gap-1.5", + defaultClassNames.dropdowns + ), + dropdown_root: cn( + "relative has-focus:border-ring border border-input shadow-xs has-focus:ring-ring/50 has-focus:ring-[3px] rounded-md", + defaultClassNames.dropdown_root + ), + dropdown: cn( + "absolute bg-popover inset-0 opacity-0", + defaultClassNames.dropdown + ), + caption_label: cn( + "select-none font-medium", + captionLayout === "label" + ? "text-sm" + : "rounded-md pl-2 pr-1 flex items-center gap-1 text-sm h-8 [&>svg]:text-muted-foreground [&>svg]:size-3.5", + defaultClassNames.caption_label + ), + table: "w-full border-collapse", + weekdays: cn("flex", defaultClassNames.weekdays), + weekday: cn( + "text-muted-foreground rounded-md flex-1 font-normal text-[0.8rem] select-none", + defaultClassNames.weekday + ), + week: cn("flex w-full mt-2", defaultClassNames.week), + week_number_header: cn( + "select-none w-(--cell-size)", + defaultClassNames.week_number_header + ), + week_number: cn( + "text-[0.8rem] select-none text-muted-foreground", + defaultClassNames.week_number + ), + day: cn( + "relative w-full h-full p-0 text-center [&:first-child[data-selected=true]_button]:rounded-l-md [&:last-child[data-selected=true]_button]:rounded-r-md group/day aspect-square select-none", + defaultClassNames.day + ), + range_start: cn( + "rounded-l-md bg-accent", + defaultClassNames.range_start + ), + range_middle: cn("rounded-none", defaultClassNames.range_middle), + range_end: cn("rounded-r-md bg-accent", defaultClassNames.range_end), + today: cn( + "bg-accent text-accent-foreground rounded-md data-[selected=true]:rounded-none", + defaultClassNames.today + ), + outside: cn( + "text-muted-foreground aria-selected:text-muted-foreground", + defaultClassNames.outside + ), + disabled: cn( + "text-muted-foreground opacity-50", + defaultClassNames.disabled + ), + hidden: cn("invisible", defaultClassNames.hidden), + ...classNames, + }} + components={{ + Root: ({ className, rootRef, ...props }) => { + return ( +
+ ) + }, + Chevron: ({ className, orientation, ...props }) => { + if (orientation === "left") { + return ( + + ) + } + + if (orientation === "right") { + return ( + + ) + } + + return ( + + ) + }, + DayButton: CalendarDayButton, + WeekNumber: ({ children, ...props }) => { + return ( + +
+ {children} +
+ + ) + }, + ...components, + }} + {...props} + /> + ) +} + +function CalendarDayButton({ + className, + day, + modifiers, + ...props +}: React.ComponentProps) { + const defaultClassNames = getDefaultClassNames() + + const ref = React.useRef(null) + React.useEffect(() => { + if (modifiers.focused) ref.current?.focus() + }, [modifiers.focused]) + + return ( +