Skip to content

Conversation

@himanshu-josh
Copy link

This PR adds confidence scoring capabilities to the table processing pipeline, enabling both cell-level and table-level confidence metrics to be captured and exposed in the JSON output. This enhancement provides valuable metadata about the quality and reliability of table extraction results.

  • Added confidence: float | None = None field to TableCell class. Enables storage of individual cell confidence scores from OCR/text extraction

  • Added table_confidence: float = 0.0 field to BaseTable class. Enables storage of aggregated table-level confidence scores Affects all table types: Table, TableOfContents, Form

  • Added calculate_table_confidence() method. Calculates average confidence across all cells in a table

  • Added calculate_cell_confidence() method. Calculates average confidence for individual table cells

  • Updated assign_ocr_lines() method. Preserves confidence data from OCR results: {"text": t, "confidence": cell_text.confidence}

  • Added block-specific JSON output classes:

    1. JSONTableCellOutput: Contains only confidence field
    2. JSONTableOutput: Contains only table_confidence field
  • Updated extract_json() method. Implements block-specific confidence field inclusion
    TableCell blocks → JSONTableCellOutput with confidence field
    Table blocks → JSONTableOutput with table_confidence field
    Other blocks → JSONBlockOutput with no confidence fields

    This feature could be useful when someone is working with surya plus a fallback model, so based on confidence fallback to that model can be implemented.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 19, 2025

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@himanshu-josh
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Sep 19, 2025
@himanshu-josh himanshu-josh changed the title table ocr confidence added feat: table ocr confidence added Sep 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant