The aim of this repository is creating an OCR model for Persian Old Cuneiform to match broken tablets or inscriptions
This repository is inspired from eBL project and is a part of Electronic Persian Old Library organization.
eBL has developed models for Babylonian cuneiform but I am going to develop my models for Persian Old cuneiform.
-
Dataset Preparation: Annotate images with bounding boxes around each character.
-
Environment Setup: Install necessary libraries and import them.
-
YOLO Training: Train a YOLO model to detect characters in the dataset.
-
OCR Model Training: Train a text recognition model (CNN) to recognise characters.
-
Inference: Use YOLO for character detection and the OCR model for recognition.
-
Visualization: Display detected and recognized text on the images.
Primary data is taken from: https://www.kaggle.com/datasets/hosseinmousavi/achaemenid-inscription-ocr
Please notice that I am at the beginning of this project and I will prepare a more high qulity dataset (real images) in the future.
Use annotation online tools like CVAT or labelImg gitHub.
Please open this notebook and work on Google Colab: https://github.com/Melanee-Melanee/Persian-Old-Cuneiform/blob/main/Persian_Old_Cuneiform_OCR.ipynb
(Photo is from Apadana Castle Shush, Ref)
This project leverages Optical Character Recognition (OCR) and Natural Language Processing (NLP) technologies to read and match the texts. Specifically, I use OCR to convert the cuneiform signs from images into machine-readable text. Then I will use Prof Enrique Jiménez’s project to apply algorithms to detect and match segments of different tablets or inscriptions, aiding in the reconstruction of fragmented texts.
For more the details, please check my new article on Medium and eBL documentation.
This repository is still under developing. For contributing contact me by email: [email protected]
Notice: To create pull requests for this repository please choose branches except "main" (issue, refactor and feature).