You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While this sample was originally created for multi-page documents in PDF, other related use-cases (such as ID document or receipt extraction) may operate on single-page images/photographs/scans instead.
Today there's support for images in some aspects of the pipeline, but others assume PDF. It would be great to round out support for images as source documents - particularly for common JPEG+PNG formats which have good native support in e.g. Amazon Textract, SageMaker Ground Truth, and web browsers.
1. (Believe so but need to double-check) Core Textract state machine component supports OCRing image files
2. Notebook entity recognition data prep flow supports image files
3. (Need to check) OCR pipeline trigger and Textract orchestration supports image files
4. (Known gap) A2I human review UI supports image files
The text was updated successfully, but these errors were encountered:
Fix thumbnailing endpoint and model inference wrapper's logic to
correctly process single image files (as well as PDFs). Fixes#18.
Relates to #5.
Co-authored-by: David <[email protected]>
While this sample was originally created for multi-page documents in PDF, other related use-cases (such as ID document or receipt extraction) may operate on single-page images/photographs/scans instead.
Today there's support for images in some aspects of the pipeline, but others assume PDF. It would be great to round out support for images as source documents - particularly for common JPEG+PNG formats which have good native support in e.g. Amazon Textract, SageMaker Ground Truth, and web browsers.
The text was updated successfully, but these errors were encountered: