-
Notifications
You must be signed in to change notification settings - Fork 0
Benchmarks
Huggingface Leaderboard Papers with Code about all NLP tasks
MLPerf™ Inference Benchmark Suite Github Paper
Beta Version ExplainaBoard
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
Homepage, paper, papers with code
General Language Understanding Evaluation (GLUE)
Homepage
Paper
Github
RACE: Large-scale ReAding Comprehension Dataset From Examinations NLU + QA, Understanding and reasoning 2017 Paper
BIG-bench
Collection of many NLP tasks
Github
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
Github
Stanford Question Answering Dataset (SQuAD)
Homepage
arxiv.org SQuAD 1 SQuAD 2
Microsoft NewsQS
Homepage Paper Github
PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them
Dataset and Benchmark in one, coould be useful to fine tune QA/CBQA Models
Paper
The Stanford Natural Language Inference (SNLI) Corpus
Homepage Paper
The Multi-Genre NLI Corpus
Homepage
Hey might have annotation artifacts and bias in their labeling:
Annotation Artifacts in Natural Language Inference Data
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation
Homepage
GLGE: A New General Language Generation Evaluation Benchmark
Natural Language Generation, 24 Taks 3 Difficulties; contains MASS, BART, and Prophet-Net Baselines
arxiv.org 2021 Link
Microsoft Research, College of Computer Science Sichuan University, Dayiheng Liu, et al.
Public Repository and guide https://microsoft.github.io/glge/
BERTSCORE: Evaluating text generation with BERT
automatic evaluation metric for text generation
Cite LinkPaperGithub
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
Situations With Adversarial Generations
Homepage Paper
HellaSwag: Can a Machine Really Finish Your Sentence?
Like SWAG but harder
Homepage 2019 Paper
Stanford typed dependencies manual
Penn-Treebank
Papers with Code, Kaggle,
Leaderboard
Low Resource Named Entity Recognition
Dependency Parsing
Semantic Similarity
Semantic Parsing
Semantic Textual Similarity
question-answering
natural-language-understanding
reading-comprehension
natural-language-inference
sentiment-analysis
language-modelling
text-classification