Data | Citation | License | Paper | Landing Page
AmbER (Ambigiuous Entity Retrieval) sets are collections of queries which individually test a retriever's ability to do entity disambiguation. Each AmbER set contains queries about entities which share a name. See our ACL-IJNLP 2021 paper "Evaluating Entity Disambiguation and the Role of Popularity in Retrieval-Based NLP" to learn more about AmbER sets.
To install the required packages, run pip install -r requirements.txt
Alternatively, you can use Poetry by running poetry install
followed by poetry shell
to activate the environment.
AmbER sets are generated from Wikidata tuples and are aligned to a Wikipedia dump. To see reproduce our pipeline, see the generate_amber_sets directory. This step is optional as the generated AmbER sets are provided in the data directory.
To evaluate your retriever's predictions on AmbER sets, see the evaluation directory.
@inproceedings{chen-etal-2021-evaluating,
title = "Evaluating Entity Disambiguation and the Role of Popularity in Retrieval-Based {NLP}",
author = "Chen, Anthony and
Gudipati, Pallavi and
Longpre, Shayne and
Ling, Xiao and
Singh, Sameer",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.345",
doi = "10.18653/v1/2021.acl-long.345",
pages = "4472--4485",
}
The AmbER sets data in the data directory is licensed under the Creative Commons Zero v1.0 Universal License. All code provided in this respository is licensed under the Apache License Version 2.0.
For questions or comments on AmbER sets, please open a pull request or issue or contact Anthony Chen at [email protected].