Official PyTorch implementation of The Learnable Typewriter: A Generative Approach to Text Αnalysis.
Authors: Yannis Siglidis, Nicolas Gonthier, Julien Gaubil, Tom Monnier, Mathieu Aubry.
Research Institute: Imagine, LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS, Marne-la-Vallée, France
ICDAR 2024 (Best Paper Award).
conda create --name ltw pytorch==1.9.1 torchvision==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
conda activate ltw
python -m pip install -r requirements.txt
Dropbox: Download & extract datasets.zip and runs.zip in the parent folder.
Huggingface: python scripts/download-hf.py
For minimal inference and plotting we provide a standalone notebook.
To reproduce the figures of the paper run the scripts/figures.ipynb
notebook.
Helper scripts are also provided to perform evaluation on the corresponding datasets:
python scripts/eval.py -i <MODEL-PATH> {--eval, --eval_best}
and produce figures and sprites for certain samples:
python scripts/eval.py -i <MODEL-PATH> -s {train, val, test} -id 0 0 0 -is 1 2 3 --plot_sprites
Training and model configure is performed though hydra. We supply the corresponding config files for all our baseline experiments.
python scripts/train.py supervised-google.yaml
python scripts/train.py unsupervised-google.yaml
python scripts/train.py supervised-copiale.yaml
python scripts/train.py unsupervised-copiale.yaml
python scripts/train.py supervised-fontenay.yaml
and finetune with:
python scripts/fontenay.py -i fontenay/fontenay/<MODEL_NAME> -o fontenay/fontenay-ft/ --max_epochs 150 -k "training.optimizer.lr=0.001"
To all of the above experiment config files, additional command line overrides could be applied to further modify them using the hydra syntax.
Trying the LT on a new dataset is dead easy.
First create a config file:
configs/<DATASET_ID>.yaml
...
DATASET-TAG:
path: <DATASET-NAME>/
sep: '' # How the character separator is denoted in the annotation.
space: ' ' # How the space is denoted in the annotation.
Then create the dataset folder:
datasets/<DATASET-NAME>
├── annotation.json
└── images
├── <image_id>.jpg
└── ...
The annotation.json file should be a dictionary with entries of the form:
"<image_id>": {
"split": "train", # {"train", "val", "test"} - "val" is ignored in the unsupervised case.
"label": "A beautiful calico cat." # The text that corresponds to this line.
},
You can completely ignore the annotation.json file in the case of unsupervised training without evaluation.
Logging is done through tensorboard. To visualize results run:
tensorboard --logdir ./<run_dir>/
If you want to dive in deeper, check out our experimental features.
@misc{the-learnable-typewriter,
title = {The Learnable Typewriter: A Generative Approach to Text Line Analysis},
author = {Siglidis, Ioannis and Gonthier, Nicolas and Gaubil, Julien and Monnier, Tom and Aubry, Mathieu},
publisher = {arXiv},
year = {2023},
url = {https://arxiv.org/abs/2302.01660},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
doi = {10.48550/ARXIV.2302.01660},
copyright = {Creative Commons Attribution 4.0 International}
}
If you like this project, have also a look to related work produced by our team:
- Efstathiou et al. - An Interpretable Deep Learning Approach for Morphological Script Type Analysis (ICWP 2024)
- Monnier et al. - Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency (ECCV 2022)
- Loiseau et al. - Representing Shape Collections with Alignment-Aware Linear Models (3DV 2021)
- Monnier et al. - Unsupervised Layered Image Decomposition into Object Prototypes (ICCV 2021)
- Monnier et al. - Deep Transformation Invariant Clustering (NeurIPS 2020)
- Deprelle et al. - Learning elementary structures for 3D shape generation and matching (NeurIPS 2019)
- Groueix et al. - 3D-CODED: 3D Correspondences by Deep Deformation (ECCV 2018)
- Groueix et al. - AtlasNet: A Papier-Mache Approach to Learning 3D Surface Generation (CVPR 2018)
We would like to thank Malamatenia Vlachou and Dominique Stutzmann for sharing ideas, insights and data for applying our method in paleography; Vickie Ye and Dmitriy Smirnov for useful insights and discussions; Romain Loiseau, Mathis Petrovich, Elliot Vincent, Sonat Baltacı for manuscript feedback and constructive insights. This work was partly supported by the European Research Council (ERC project DISCOVER, number 101076028), ANR project EnHerit ANR-17-CE23-0008, ANR project VHS ANR-21-CE38-0008 and HPC resources from GENCI-IDRIS (2022-AD011012780R1, AD011012905).