This repository contains code and input data to re-create the results and visualizations included in Some Languages are More Equal than Others : Probing Deeper into the Linguistic Disparity in the NLP World, to be presented at AACL 2022.
Before starting to run the code, navigate to the Google drive link and add 'acl-publication-info.74k.parquet' to your Input folder. You do not have to download the file. Adding a link is sufficient.
Joshi
at any point of code or comments refers to Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choudhury. 2020. The state and fate of linguistic diversity and inclusion in the NLP world. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6282–6293, Online. Association for Computational Linguistics.Blasi
at any point of code or comments refers to Damian Blasi, Antonios Anastasopoulos, and Graham Neubig. 2022. Systematic inequalities in language technology performance across the world’s languages. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5486–5505, Dublin, Ireland. Association for Computational Linguistics.Rohatgi
at any point of code or comments refers to Shaurya Rohatgi. 2022. ACL Anthology Corpus with Full Text. Github.
Please use Generate Diagrams.ipynb
as the main file
@inproceedings{ranathunga2022some,
title={Some Languages are More Equal than Others: Probing Deeper into the Linguistic Disparity in the NLP World},
author={Ranathunga, Surangika and de Silva, Nisansa},
booktitle={Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing},
pages={823--848},
year={2022}
}
The code and data are released under the CC BY-NC 4.0. By using this code and data, you are agreeing to its usage terms.