-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'±' is recognised as '+' #4286
Comments
Which model / language did you use? |
Python 3.12 reprex: Load the JSON filejson_path = "new_pred.json" Extract all coordinates without filteringcoordinates = [annotation["box"] for annotation in annotations] Load the imageimage_path = "new_pred.jpg" Load the image with PIL to get its dimensionsimage_pil = Image.open(image_path) Function to crop image based on coordinates and perform OCR with boundary checkdef crop_and_ocr_with_boundary_check(image, coordinates, image_width, image_height):
Perform OCR on the annotated regions with boundary checkocr_results, skipped_coordinates = crop_and_ocr_with_boundary_check(image, coordinates, image_width, image_height) Convert OCR results to a DataFrameocr_df = pd.DataFrame(ocr_results) Print debugging informationprint(f"Total annotations in JSON: {len(annotations)}") Display the DataFrameprint(ocr_df) Optionally, save the results to a CSV fileocr_df.to_csv("ocr_results.csv", index=False) Print image dimensionsprint(f"Image dimensions: {image_width}x{image_height}") |
Please add also your image (or its URL if it is online) to this issue report. |
Python output: Total annotations in JSON: 50 |
Current Behavior
No response
Expected Behavior
No response
Suggested Fix
No response
tesseract -v
tesseract v5.4.0.20240606
leptonica-1.84.1
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.1) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.3 : libwebp 1.4.0 : libopenjp2 2.5.2
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found libarchive 3.7.4 zlib/1.3.1 liblzma/5.6.1 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.6
Operating System
Windows 11
Other Operating System
No response
uname -a
No response
Compiler
No response
CPU
No response
Virtualization / Containers
No response
Other Information
Tesseract is recognising '±' as '+'. In some places, it doesn't even recognise this.
Python 3.12
The text was updated successfully, but these errors were encountered: