OCR-Dotted-Matrix

OCR to detect and recognize dot-matrix text written with inkjet-printed on medical PVC bag

Example images:

TEXT DETECTION wiht CRAFT (Character-Region Awareness For Text detection)

The code pre-processes images with the OpenCV function to improve text detection with CRAFT with (https://github.com/clovaai/CRAFT-pytorch/blob/master/README.md#craft-character-region-awareness-for-text-detection) The weights of pre-train network are available on this link https://drive.google.com/file/d/1Jk4eGD7crsqCCg9C9VjCLkMN3ze8kutZ/view. The recognize label is a string of the text, so the CRAFT parameters are set to find a unique block of text. it is possible to change --text_threshold,--low_text ,--link_threshold to have different detection results, but it is necessary to modify the label and recognition method after.

Craft results:

TEXT RECOGNITION with TESSERACT

The code extract the area around text on original image and fix the text oriention.

The cropped image:

Morphology Transformations (OpenCV function) and rescaling of chars with different parameters are applied to the cropped image.

Pre-process cropped image:

I use Tesseract OCR engine (https://tesseract-ocr.github.io/) with default page segmentation , the experiments show the LCDDot_FT_500.traineddata performs the best results in this case. Two methods are used to control the label:

SequenceMatcher is a class available in python module named difflib. It can be used for comparing pairs of input sequences. With the function ratio( ) returns the similarity score ( float in [0,1] ) between input strings. It sums the sizes of all matched sequences returned by function.
Regular expression is a class available in python module named re. The function re.match() checks for a match only at the beginning of the string.

Saving all result in json file:

        {
            "Name_original_file": "A_0.png",
            "Name_preprocess": "_preprocess_150.jpg",
            "check_label": "LOTTO:L21X45SCAD.:10-2023",
            "tesseract_LCDDot_FT_500_psm3_result": "LOTTO:L21X45SCAD.:10-2023",
            "LCDDot_FT_500_psm3_sequence_matcher_ratio_result": 1.0,
            "LCDDot_FT_500_psm3_bool_re_result": true
        }
    ],
    [
        {
            "Name_original_file": "A_0.png",
            "Name_preprocess": "_preprocess_160.jpg",
            "check_label": "LOTTO:L21X45SCAD.:10-2023",
            "tesseract_LCDDot_FT_500_psm3_result": "LOTTO:L21X4SCAD.:1625555",
            "LCDDot_FT_500_psm3_sequence_matcher_ratio_result": 0.78,
            "LCDDot_FT_500_psm3_bool_re_result": false
        }
    ],
    [
        {
            "Name_original_file": "A_0.png",
            "Name_preprocess": "_preprocess_170.jpg",
            "check_label": "LOTTO:L21X45SCAD.:10-2023",
            "tesseract_LCDDot_FT_500_psm3_result": "LOTTO:L21X45SCAD.:10-2023",
            "LCDDot_FT_500_psm3_sequence_matcher_ratio_result": 1.0,
            "LCDDot_FT_500_psm3_bool_re_result": true
        }

Getting started

Install dependencies

Requirements

PyTorch>=1.9.0
torchvision>=0.2.2
opencv-python>=4.5.2

conda env create -f environment.yml

Run script

python Test_Image.py  --image [folder path to test images]  --folder_res [folder path to save result images] --label [string label to check]

Arguments

--image: folder path to test images
--label: string label to check
--folder_res: folder path to save result images
--trained_model: pretrained model
--text_threshold: text confidence threshold
--low_text: text low-bound score
--link_threshold: link confidence threshold
--cuda: use cuda for inference (default:True)
--canvas_size: max image size for inference
--mag_ratio: image magnification ratio
--poly: enable polygon type result
--show_time: show processing time
--test_folder: folder path to input images
--refine: use link refiner for sentense-level dataset
--refiner_model: pretrained refiner model

Name	Name	Last commit message	Last commit date
Latest commit LeoPits Update README.md Apr 15, 2022 759fc6c · Apr 15, 2022 History 22 Commits
Image_readme	Image_readme	Add files via upload	Apr 15, 2022
basenet	basenet	Craft function	Apr 13, 2022
result/5L_Test/A_0.png	result/5L_Test/A_0.png	Add files via upload	Apr 15, 2022
result_Craft	result_Craft	Add files via upload	Apr 15, 2022
traineddata	traineddata	Add files via upload	Apr 13, 2022
README.md	README.md	Update README.md	Apr 15, 2022
Test_Image.py	Test_Image.py	Add files via upload	Apr 15, 2022
craft.py	craft.py	Craft function	Apr 13, 2022
craft_utils.py	craft_utils.py	Craft function	Apr 13, 2022
environment.yml	environment.yml	Add files via upload	Apr 15, 2022
file_utils.py	file_utils.py	Craft function	Apr 13, 2022
imgproc.py	imgproc.py	Craft function	Apr 13, 2022
refinenet.py	refinenet.py	Craft function	Apr 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR-Dotted-Matrix

TEXT DETECTION wiht CRAFT (Character-Region Awareness For Text detection)

TEXT RECOGNITION with TESSERACT

Getting started

Install dependencies

Requirements

Run script

Arguments

About

Releases

Packages

Languages

LeoPits/OCR-Dot-matrix-text

Folders and files

Latest commit

History

Repository files navigation

OCR-Dotted-Matrix

TEXT DETECTION wiht CRAFT (Character-Region Awareness For Text detection)

TEXT RECOGNITION with TESSERACT

Getting started

Install dependencies

Requirements

Run script

Arguments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages