Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: OCR tool Tesseract currently only supports english #1731

Open
5 tasks
JBosse-Artefactual opened this issue Feb 12, 2025 · 0 comments
Open
5 tasks

Comments

@JBosse-Artefactual
Copy link

Please describe the problem you'd like to be solved
Tesseract is currently implemented with no additional language options. It runs in English by default and has no other language options currently.

Describe the solution you'd like to see implemented
Implement a means of having Tesseract be able to run on documents of other languages.

Describe alternatives you've considered
Could run Tesseract or other OCR tool outside of archivematica.

Additional context


For Artefactual use:

Before you close this issue, you must check off the following:

  • All pull requests related to this issue are properly linked
  • All pull requests related to this issue have been merged
  • A testing plan for this issue has been implemented and passed (testing plan information should be included in the issue body or comments)
  • Documentation regarding this issue has been written and merged
  • Details about this issue have been added to the release notes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant