-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escimages: layout a bit off when combining .pbm #36
Comments
I think that There are a few approaches you could use here-
Long term, we need a reliable way to convert receipt files to raster images. HTML has the most accurate output we get right now, so |
I've found the solution, after your reply i googled a bit and found this topic https://superuser.com/a/290679 that suggest using +append instead of -append when using imagemagick's convert. I will try that soon and let you know. |
I've tried changing the -append to +append but results where not as expected. Also, if i'd OCR the images separately it becomes to slow for my purpose. |
I've went through the source code a bit and found a possible solution. |
This command is currently only going to to extract individual images, so I think you should initially add the whitespace back with ImageMagick. This documentation should hopefully get you started. I think we could solve this properly in one of several ways:
Thoughts? |
I reckon any of those options would do the trick, first one seemingly the easiest, i would go for that. |
Hi,
I have an example where the output differs a bit from the input.
It cut the receipt a little bit short, like there needs to be a whiteline in between.
I have attached the .bin and combined .pbm files, and you should be able to see that under "soda" the dashes ---- are a little bit to high, making tesseract having trouble with that line.
Anything possible to avoid this ?
Tesseract output
.bin and .pbm files
The text was updated successfully, but these errors were encountered: