-
Notifications
You must be signed in to change notification settings - Fork 44
Open
Labels
Description
Summary
Implement a new chunking strategy that preserves tables as cohesive units during document processing.
Benefits
- Maintains semantic integrity of tabular data
- Enhances utility for data-heavy documents (e.g., financial reports, scientific papers)
- Prevents splitting of logically grouped table rows or columns
Implementation Notes
- Detect and isolate tables as distinct chunks
- Ensure table boundaries are preserved during chunking
- Optionally tag chunks as
type: tablefor downstream processing
Impact
Improves accuracy and relevance of extracted content from structured documents, and enables better downstream use in RAG pipelines or data extraction workflows.