Weakly supervised street text detection, localisation and segmentation in Pytorch. This is not the most optimal way of doing the same. I am working on optimizing accuracy and speed. [i am yet to release latest code with improved performance with resnet18 and inference optimization]
Some cherry picked examples of localized text
The Weakly supervised algorithm is trained by first training a character agnostic text detection network by training it with images of various charecters and non-textual. Further, this network is used to label unlabelled images producing images with corresponding segmented masks. These unlabelled images are used to train a network to train a text segmentation network. From the segmented masks, the bounding boxes are derived.
-
Install the required python packages by running
pip install -r requirements.txt
-
Download Chars74k dataset Chars74k dataset and place it in the root directory
-
Place images of unlabelled street view text in the folder called Images. I used the UCSD SVT dataset and a select of images from NEOCR dataset. [I will upload my split and share it soon]
-
Place various without any text in the folder called Background. Recommeded a combination of indoor/outdoor scenes without text [I will upload my split and share it soon]
-
Train a charecter recognition network by running
python3 train_charmodel.py
-
Label the images using the following command
python3 label_images.py
-
Train a localisation , detection and segmentation network by running the command
python3 train_localizationmodel.py