Skip to content

sauravrt/DHCD_Dataset

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Devnagari Handwritten Character (DHC) Dataset

This repository contains the DHC

Description

The Devenagari script comprises of 36 characters and 10 numerals.
A sample set of the 36 character-set.

A sample set of the 10 numeral-set

The DHC dataset contains 46 classes [36 character class and 10 digit class] (क .. + १ .. ) of Devnagari script. Each class has 2000 images which is divided into two sets: training and test containing 1700 and 300 images respectively. So technically, this dataset is larger both in terms of samples and classes than the famous MNIST dataset which was the initial inspiration for the creation of this dataset.

This repo contains the dataloader for PyTorch and it can be easily transported to other libraries like TensorFlow, Keras, Caffe etc.

Beside, the general character classification task, the dataset can also be explored for other problems like transferring style, disentanglement, semi-supervised learnign etc. as there are lot of variations within each class.

Contributors

The school children of class 6 and 7 (in 2015) from Mount Everest Higher Secondary School, Bhaktapur, Nepal contributed towards this dataset by volunteering to write the characters which were scanned manually. Beside the manual scanning, other pre-processing tasks were also performed, detail of which can be found in the paper.

If you use this dataset in your work, please cite it as follows:

Bibtex

@inproceedings{acharya2015deep,
  title={Deep learning based large scale handwritten Devanagari character recognition},
  author={Acharya, Shailesh and Pant, Ashok Kumar and Gyawali, Prashnna Kumar},
  booktitle={Software, Knowledge, Information Management and Applications (SKIMA), 2015 9th International Conference on},
  pages={1--6},
  year={2015},
  organization={IEEE}
}