-
Notifications
You must be signed in to change notification settings - Fork 481
Is it GLM-OCR or PP-DocLayoutV3 #178
Copy link
Copy link
Open
Description
Hello,
I’m encountering issues with document information extraction and would like to clarify whether the problem could be related to the OCR stage or to the PP-DocLayoutV3 component.
I am working with complex industrial documents, and I have tested the pipeline across more than 30 systems/models without success. In all cases, the extraction results are either incomplete or inaccurate.
Could you please confirm:
- Whether extraction failures are more likely caused by OCR inaccuracies or by limitations in PP-DocLayoutV3?
- If there are any recommended debugging steps or diagnostics to help isolate whether the issue originates from OCR vs layout detection?
- Whether PP-DocLayoutV3 has known limitations when handling highly complex or dense industrial documents?
Any guidance on how to better troubleshoot or improve performance in such scenarios would be greatly appreciated.
Thank you.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels