We have prepared our own Bangla Q&A dataset for our research and making the dataset public so that others can use it in their research/projects.
We picked widely read public papers, books, poems available on the Internet as resources and picked messages from various subjects. In the wake of gathering data, we at first created the information documents in a particular organization. Our data is in an excel file where the first column contains the queries, the second column contains the particular answers, the third column contains the name of the content record of the passage containing the query and the fourth column contains an aspect of the sentence where the appropriate response is found.
There are 3 folders that contains training, validaton and total data. You will find instructions to use data inside the folder.
Our dataset contains understandings of various lengths and a lot of quires with single-word, numerous words, and surprisingly absolute sentence long answers. There has been no turn out achieved for Bangla-QA at this cutoff as of recently. We consider our dataset to have capability of being a benchmark dataset for Bangla-QA from this perspective.
If you use the dataset, kindly cite us
@inproceedings{haque2020factoid, title={Factoid Question Answering over Bangla Comprehension}, author={Haque, Md Asiful and Sultana, Shamima and Islam, Md Jayedul and Islam, Md Ashraful and Ovi, Jesan Ahammed}, booktitle={International Symposium on Multidisciplinary Studies and Innovative Technologies}, pages={623--630}, year={2020}, organization={IEEE} }