The aim of this repository is creating an OCR model for Persian Old Cuneiform to match broken tablets or inscriptions

This repository is inspired from eBL project and is a part of Electronic Persian Old Library organization.

eBL has developed models for Babylonian cuneiform but I am going to develop my models for Persian Old cuneiform.

Stage 1 : Developing OCR model

Steps:

Dataset Preparation: Annotate images with bounding boxes around each character.
Environment Setup: Install necessary libraries and import them.
YOLO Training: Train a YOLO model to detect characters in the dataset.
OCR Model Training: Train a text recognition model (CNN) to recognise characters.
Inference: Use YOLO for character detection and the OCR model for recognition.
Visualization: Display detected and recognized text on the images.

Primary data

Primary data is taken from: https://www.kaggle.com/datasets/hosseinmousavi/achaemenid-inscription-ocr

Please notice that I am at the beginning of this project and I will prepare a more high qulity dataset (real images) in the future.

Annotation

Use annotation online tools like CVAT or labelImg gitHub.

Main notebook

Please open this notebook and work on Google Colab: https://github.com/Melanee-Melanee/Persian-Old-Cuneiform/blob/main/Persian_Old_Cuneiform_OCR.ipynb

Stage 2: Matching broken tablets or inscriptions by NLP

(Photo is from Apadana Castle Shush, Ref)

This project leverages Optical Character Recognition (OCR) and Natural Language Processing (NLP) technologies to read and match the texts. Specifically, I use OCR to convert the cuneiform signs from images into machine-readable text. Then I will use Prof Enrique Jiménez’s project to apply algorithms to detect and match segments of different tablets or inscriptions, aiding in the reconstruction of fragmented texts.

For more the details, please check my new article on Medium and eBL documentation.

This repository is still under developing. For contributing contact me by email: [email protected]

Notice: To create pull requests for this repository please choose branches except "main" (issue, refactor and feature).

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
Archived projects		Archived projects
Fonts		Fonts
Letters		Letters
Persian_Old_Cuneiform_OCR.ipynb		Persian_Old_Cuneiform_OCR.ipynb
README.md		README.md
best.pt		best.pt
data.yaml		data.yaml
data.zip		data.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The aim of this repository is creating an OCR model for Persian Old Cuneiform to match broken tablets or inscriptions

Stage 1 : Developing OCR model

Steps:

Primary data

Annotation

Main notebook

Stage 2: Matching broken tablets or inscriptions by NLP

About

Releases

Packages

Languages

Melanee-Melanee/Persian-Old-Cuneiform-OCR

Folders and files

Latest commit

History

Repository files navigation

The aim of this repository is creating an OCR model for Persian Old Cuneiform to match broken tablets or inscriptions

Stage 1 : Developing OCR model

Steps:

Primary data

Annotation

Main notebook

Stage 2: Matching broken tablets or inscriptions by NLP

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages