feat: table ocr confidence added #891

himanshu-josh · 2025-09-19T09:47:35Z

This PR adds confidence scoring capabilities to the table processing pipeline, enabling both cell-level and table-level confidence metrics to be captured and exposed in the JSON output. This enhancement provides valuable metadata about the quality and reliability of table extraction results.

Added confidence: float | None = None field to TableCell class. Enables storage of individual cell confidence scores from OCR/text extraction
Added table_confidence: float = 0.0 field to BaseTable class. Enables storage of aggregated table-level confidence scores Affects all table types: Table, TableOfContents, Form
Added calculate_table_confidence() method. Calculates average confidence across all cells in a table
Added calculate_cell_confidence() method. Calculates average confidence for individual table cells
Updated assign_ocr_lines() method. Preserves confidence data from OCR results: {"text": t, "confidence": cell_text.confidence}
Added block-specific JSON output classes:
1. JSONTableCellOutput: Contains only confidence field
2. JSONTableOutput: Contains only table_confidence field
Updated extract_json() method. Implements block-specific confidence field inclusion
TableCell blocks → JSONTableCellOutput with confidence field
Table blocks → JSONTableOutput with table_confidence field
Other blocks → JSONBlockOutput with no confidence fields

This feature could be useful when someone is working with surya plus a fallback model, so based on confidence fallback to that model can be implemented.

github-actions · 2025-09-19T09:47:49Z

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

himanshu-josh · 2025-09-19T09:48:31Z

I have read the CLA Document and I hereby sign the CLA

table ocr confidence added

3d07e72

github-actions bot added a commit that referenced this pull request Sep 19, 2025

@himanshu-josh has signed the CLA in #891

e3ace59

himanshu-josh changed the title ~~table ocr confidence added~~ feat: table ocr confidence added Sep 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: table ocr confidence added #891

feat: table ocr confidence added #891

Uh oh!

himanshu-josh commented Sep 19, 2025

Uh oh!

github-actions bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

himanshu-josh commented Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: table ocr confidence added #891

Are you sure you want to change the base?

feat: table ocr confidence added #891

Uh oh!

Conversation

himanshu-josh commented Sep 19, 2025

Uh oh!

github-actions bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

himanshu-josh commented Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Sep 19, 2025 •

edited

Loading