Skip to content

Helper scripts to train ML models required by Seeneva app

License

Notifications You must be signed in to change notification settings

Seeneva/ml-scripts

Repository files navigation

Seeneva ML scripts

Русский


Contains helper scripts to preprocess train data. No dataset is provided.

Clone

This repository contains submodules.

Clone it using:

git clone --recurse-submodules https://github.com/Seeneva/ml-scripts.git

Or init submodules using:

git submodule init

Setup

Repository contains ./setup.py and ./requirements.txt with the list of required Python dependencies.

Using Python Virtual Environments

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Using system Python

pip install -r requirements.txt

This model is used to detect object on comic book pages.

Supported classes

Class ID Name Description
0 speech_baloon Single speech balloon
1 panel Single panel on the page

Prepare and train

  • Put all your comic book pages dataset into ./yolo/dataset directory. This directory shouldn't contains any subdirectories.
  • Anotate each comic book page in the directory using YOLO format and supported classes. You can use tools like labelImg.
  • Run ./yolo_width_split.py script to split 'Double Page Spread' image into few separate images. Original wide images will be moved into ./yolo/data_wide_backup directory.
  • (Optionally) Run ./yolo_stats.py script to calculate dataset details.
  • Run ./yolo_train_data.py script to generate required files to train YOLO model. All files will be placed into ./yolo directory.
  • Create and put YOLO yolo-obj.cfg file into ./yolo directory.
  • Train your model using YOLOv4-tiny darknet.
  • (Optional) Convert model into TensorFlow Lite format:
git clone -b config https://github.com/Seeneva/tensorflow-yolov4-tflite.git converter

python ./converter/save_model.py --weights ${YOUR_YOLO_BACKUP_PATH}/yolo-obj_final.weights --output ${YOUR_TF_BACKUP_PATH}/tf --score_thres 0.7 --input_size 480x736 --model yolov4 --tiny --framework tflite

python ./converter/convert_tflite.py --weights ${YOUR_TF_BACKUP_PATH}/tf --output ${YOUR_TF_BACKUP_PATH}/tf/seeneva.tflite --input_size 480x736 --quantize_mode float16

OCR model is used to recognize text inside speech balloons.

Setup

You should install Tesseract on your system and make sure that your environment can run make commands.

Prepare and train

  • Run ./yolo_extract_objects.py --class_id 0 to crop all speech balloons from YOLO dataset and place them into ./yolo/objects/0 directory.
  • Now you need to crop each text line in the cropped speech balloons and save them as separete *.png files in the ./tesseract/${LANG_NAME}_seeneva-ground-truth directory.
  • Create *.gt.txt file for each text line *.png file in the ./tesseract/${LANG_NAME}_seeneva-ground-truth. You can use ./tesseract_cteate_txt.py to automate it.

How your dir should look like

  • Write out a content of each line *.png into *.gt.txt file. Usually all letters should be uppercased. So for image 1.png (example above) you should write PROGRAMMED into 1.gt.txt file.
  • Run ./tesseract_check_data.py to check that dataset is fine.
  • Run ./tesseract_train.sh to start training.
  • (Optionally) Convert into fast (int) format using:
combine_tessdata -c ./tesstrain/data/${LANG_NAME}_seeneva.traineddata

License

Copyright © 2021 Sergei Solodovnikov under the Apache License 2.0.

Note that dependencies may have different license. See 3RD-PARTY-LICENSES for more information.

About

Helper scripts to train ML models required by Seeneva app

Topics

Resources

License

Stars

Watchers

Forks