Bengli Female VS Male Names Dataset

An NLP dataset that contains 2030 data samples of bengli names and corresponding gender both for female and male. This is a very small and simple toy dataset that can be used by NLP starters to practice sequence classification problem and other NLP problems like gender recognition from names.

Background

In Bengli language, name of a person is dependent largely on their gender. Normally, name of a female ends with certain type of suffix "A", "I", "EE" ["আ", "ই", "ঈ"]. And the names of male are significantly different from female in terms of phoneme patterna and ending suffix. So, In my observation their is a significant possibility that these difference in patterns can be used for gender classification based on names.

Download dataset

You can download the dataset and other resources for latest release from this link: Download Bengli Female VS Male name dataset

Or download the latest updates using wget-

wget --no-check-certificate \ 
https://raw.githubusercontent.com/faruk-ahmad/bengli-female-vs-male-names/master/dataset/bengli-female-vs-male-names.csv \
-O bengli-female-vs-male-names.csv

Find the full documentaion here:

Documentation and dataset specifications

Dataset Format

The dataset is in CSV format. There are two columns- namely

Name
Gender

Each row has two attributes. First one is name, second one is the gender. The name attribute is in utf-8 encoding. And the second attribute i.e. the gender attribute has been signified by 0 and 1 as


male	0
female	1

Dataset Statistics

The number of samples per class is as bellow-


male	1029
female	1001

Possible Use Cases

Sequence Classificaion using RNN, LSTM etc [check the sample notebook in notebook directory]
Sequence modeling using other type of machine learning algorithms
Gender recognition based on names

Contribute

If you feel to contribute to this dataset, you are welcome to contribute in the following ways-

Can add more data samples in the dataset. If you want to add more samples in the dataset, then add your data to the female.txt and male.txt file in db directory using newlines and send a pull request. I will merge your update to the csv file.
You can also create notebooks/scripts for different use cases using this dataset and put your notebook in the notebook directory and send a pull request.

Disclaimer

The names were collected from internet using different source like wikipedia, baby name suggestion websites etc. If someones name is in the dataset, that is totally unintentional.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bengli Female VS Male Names Dataset

Background

Download dataset

Dataset Format

Dataset Statistics

Possible Use Cases

Contribute

Disclaimer

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
dataset		dataset
db		db
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md

License

MdAbuRummanRefat/bengli-female-vs-male-names

Folders and files

Latest commit

History

Repository files navigation

Bengli Female VS Male Names Dataset

Background

Download dataset

Dataset Format

Dataset Statistics

Possible Use Cases

Contribute

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages