V2.0
Pre-release
Pre-release
We wrapped up our first public expansion effort on October 15, 2021. During this time, the community contributed over 1500 tasks!! 🎉
We will continue to publish more releases as we improve the data and add more experiments.
If you're wondering whether you can still contribute to the repo, the answer is a solid YES!
Note: v1 data is accessible here: https://instructions.apps.allenai.org/
What's Changed
- natural instructions-v1 by @danyaljj in #1
- Updating Task Category in Readme by @swarooprm in #2
- Repeat-copy task by @danyaljj in #4
- Changed output format in examples to non-list, faq by @swarooprm in #9
- Update README.md by @swarooprm in #12
- task 73 by @amirrezamirzaei in #11
- Shailaja tasks by @shailaja183 in #14
- Added 4 tasks using SPLASH dataset and 1 using the CoNaLa dataset. by @kurbster in #15
- Update README.md: clarifying "meaningful contribution" by @danyaljj in #23
- babi dataset task 1- 3 NI tasks (question generation, answer generation, identify supporting fact for QA) by @shailaja183 in #22
- Update README.md: commits not connected to Github profile by @danyaljj in #30
- Adding tasks using examples from CoNaLa dataset. by @kurbster in #29
- Removing repititive examples. by @kurbster in #40
- Repeat instance fix by @shailaja183 in #42
- squad2.0 by @shailaja183 in #32
- Fixing several outstanding issues by @danyaljj in #38
- Task103 facts to story by @Mihir3009 in #34
- Updated files for task 102 and 103 by @Mihir3009 in #49
- Update README.md by @swarooprm in #50
- fixing issue #13, #43, #31, #24 by @swarooprm in #53
- removing randomization (issue #55) by @swarooprm in #56
- Synthetic data by @nrjvarshney in #27
- task108 abuse detection by @Maitreyapatel in #41
- ASSET two tasks 111 and 112 by @Palipoor in #51
- Synthetic frequency tasks by @nrjvarshney in #57
- Task 105 sentence generation by @Mirzyaaliii in #62
- Add
task117_spl_translation_en_de
task by @Mehrad0711 in #66 - task 104 by @amirrezamirzaei in #35
- Update crowdsourcing.md by @danyaljj in #69
- Update task018_mctaco_temporal_reasoning_presence.json by @danyaljj in #67
- Update task019_mctaco_temporal_reasoning_category.json by @danyaljj in #68
- Task115 help classification by @Maitreyapatel in #59
- PIQA Dataset - Two subtasks by @kuntalkumarpal in #17
- Task 67-72: AbductiveNLI by @aarunku5 in #10
- Task 119, 120 and 121: ZEST dataset by @yeganehkordi in #79
- Task 65 -66: Time travel by @nrjvarshney in #8
- Task 126-131: SCAN Dataset by @eshaanpathak in #81
- Task 116 Com2sense task - Commonsense reasoning by @Palipoor in #60
- Task 137-140: Detoxifying LMs Prompt Completion Dataset by @eshaanpathak in #84
- Update README.md by @swarooprm in #86
- Task 141-143: Odd Man Out Dataset by @eshaanpathak in #88
- Task 151-154: Theory of Mind QA Dataset by @eshaanpathak in #91
- Task 155: Count number of nouns/verbs in the given sentence by @nrjvarshney in #92
- Add task 144 from SubjQA by @Palipoor in #89
- Task110_logic2text by @Mihir3009 in #100
- Task 156: CODAH Dataset by @eshaanpathak in #96
- Task 122-125: New Conala tasks by @kurbster in #80
- Task 133-136: WinoWhy (ACL 2020 paper) by @colinzhaoust in #83
- Task 164-165: MCScript by @Mirzyaaliii in #105
- Task166: ClariQ by @Mirzyaaliii in #106
- Task 177: Para-nmt by @Mirzyaaliii in #110
- Task 132: DAIS Dataset by @Mirzyaaliii in #82
- Task 182: DuoRC dataset by @yeganehkordi in #115
- Task 157: Count vowels consonants by @nrjvarshney in #97
- Task 158 - 159: Frequency of words by @nrjvarshney in #98
- Task 160 : Replace letter in sentence by @nrjvarshney in #99
- Task 161-163: Count words that contain/start_with/end_with the given letter by @nrjvarshney in #101
- Task 109 - Spam SMS Detection by @SavanDoshi in #117
- task 170, 191-192: HotpotQA by @Mihir3009 in #107
- Tasks 179 - 181 from the paper "Effective Crowd-Annotation of Participants, Interventions, and Outcomes..." by @Palipoor in #114
- Task 176 and Task 184 from dataset BREAK by @Palipoor in #109
- Task 167-169: StrategyQA dataset by @yeganehkordi in #104
- fix: typo spelling grammar by @slowy07 in #125
- Update README.md by @danyaljj in #126
- Fixed unit test by @kurbster in #124
- Task 118-119: semeval math challenges by @amirrezamirzaei in #76
- Task - 166 ClariQ Correction by @Mirzyaaliii in #134
- Update README.md by @danyaljj in #139
- Task 195-196: Sentiment140 by @Mihir3009 in #122
- Update README.md by @danyaljj in #146
- Task - 178: QuaRTz by @Mirzyaaliii in #111
- Tasks 205-208: New Synthetic tasks by @kurbster in #123
- Add tasks 171-175 by @Mehrad0711 in #108
- adding tasks 145-150 by @liusiyi641 in #90
- Task 193-194: DuoRC by @yeganehkordi in #121
- task 224 by @amirrezamirzaei in #137
- Changing the Expected Schema by @swarooprm in #147
- Task209 stance detection by @Maitreyapatel in #128
- Task 210-212: logic2text by @Mihir3009 in #129
- Task-223: QuaRTz by @Mirzyaaliii in #136
- Task 227: ClariQ by @Mirzyaaliii in #141
- Task 243- 245: Set operation tasks by @nrjvarshney in #148
- Task 228-229: ARC by @Mirzyaaliii in #142
- Task 239-242: TweetQA by @Mihir3009 in #145
- Task 246-248: DREAM by @yeganehkordi in #149
- Task 276 and 275 and 249 from enhanced wsc dataset by @Palipoor in #150
- Task 268: CaseHOLD by @yeganehkordi in #155
- Tasks 271-273: Europarl by @Mihir3009 in #157
- Task 274: overruling by @yeganehkordi in #158
- Task 283: DREAM by @yeganehkordi in #163
- Task 287: CaseHOLD by @yeganehkordi in #166
- task 290 : TellMyWhy by @amirrezamirzaei in #168
- Task 301-303: ReCoRD by @yeganehkordi in #172
- Task 281 Points of Correspondence detection by @Palipoor in #161
- Task - 229: ARC (Corrected file) by @Mirzyaaliii in #181
- task 225-226: Stackoverflow english learner tasks by @Maitreyapatel in #140
- Task 339: ReCoRD by @yeganehkordi in #186
- Remove extra file by @Mirzyaaliii in #184
- Update README.md by @swarooprm in #189
- Task 183 add rhyming task by @ethankim00 in #116
- Task 184-190 SNLI Tasks, 197-204 MNLI tasks by @aarunku5 in #120
- task 282 : Scruples by @amirrezamirzaei in #162
- tasks 250-253: more from SPL by @Mehrad0711 in #151
- tasks 254-263: SPL translation tasks by @Mehrad0711 in #152
- Task 277 - 280: StereoSet tasks by @XudongOliverShen in #159
- task 286 : OLID by @amirrezamirzaei in #165
- Task 318-321: StereoSet by @eshaanpathak in #179
- Task 380-381 from BoolQ by @Palipoor in #201
- Task 348-349: Squad2.0 by @shailaja183 in #190
- Task 344-347 and Task 382: HybridQA by @yeganehkordi in #188
- task 291 & 295 : semeval 2020 task4 by @amirrezamirzaei in #169
- task 322 - 328: Jigsaw by @XudongOliverShen in #180
- Add task 375 from paper "Exploring the Role of Argument Structure in Online Debate Persuasion" by @Palipoor in #198
- Task 312-315: EuroParl (Swedish and English) by @Mirzyaaliii in #177
- Task 394-396: PersianQA by @yeganehkordi in #210
- Update test_all.py by @danyaljj in #213
- Task 332: TellMeWhy by @amirrezamirzaei in #183
- Task 379: AGNews Topic Classification by @gkaramanolakis in #200
- Task 363: SST2 by @gkaramanolakis in #195
- Task 391-393 from cod3s paper by @Palipoor in #209
- Task 364 from "Towards Controllable Biases in Language Generation" by @Palipoor in #196
- Task 312-315 (Corrected files) by @Mirzyaaliii in #214
- Update README.md by @danyaljj in #218
- Tasks 213-222 ROCStories by @aarunku5 in #132
- Tasks 284-285: IMDB by @Mihir3009 in #164
- Tasks 288-289: Gigaword by @Mihir3009 in #167
- Task 305-308: Jeopardy by @eshaanpathak in #175
- Task 353-359: CaSiNo Dataset by @kushalchawla in #193
- Update README.md by @danyaljj in #221
- Enhancements to README and test_all.py by @eshaanpathak in #224
- Task 376-378 Word length tasks by @nrjvarshney in #199
- Task 316 - 317: CrowS-Pairs by @XudongOliverShen in #178
- Task 386-387 : Semeval2018 task3 by @amirrezamirzaei in #204
- Fix type keys by @amirrezamirzaei in #228
- bringing the branch up to date by @danyaljj in #234
- Update test_all.py: check the types of several recently added keys. by @danyaljj in #205
- Updated FAQ by @danyaljj in #236
- Update README.md: deadline update September -> October by @danyaljj in #256
- Minor Edits to Task 137-140 by @eshaanpathak in #265
- Tasks 230-238 IIRC Tasks by @aarunku5 in #144
- Task 296-300: StoryCloze by @Mirzyaaliii in #171
- Task 304 and 401 from paper "Where’s My Head? Definition, Data Set, and Models for Numeric Fused-Head Identification and Resolution" by @Palipoor in #173
- task 333 - 338: HateEval by @XudongOliverShen in #185
- Tasks 365-374 Synthetic Data and Tasks 517-520 by @kurbster in #197
- task 403 : Creak by @amirrezamirzaei in #223
- Task 463-464: ParsiNLU-entailment by @yeganehkordi in #239
- Task 465-466: ParsiNLU-QQP by @yeganehkordi in #240
- Task 475: Yelp Polarity Classification by @gkaramanolakis in #244
- Task 383 and Task 456-459: MATRES by @yeganehkordi in #202
- Task 460-462: Qasper by @yeganehkordi in #230
- Task 467-468: ParsiNLU-reading_comprehension by @yeganehkordi in #241
- Task 476-487: CLS Dataset - Multilingual Sentiment Classification of Product Reviews by @gkaramanolakis in #245
- Add tasks 513 and 514 by @Palipoor in #261
- Task512: Twitter Emotion Classification by @gkaramanolakis in #260
- Task 418-423 PerSent Dataset by @kuntalkumarpal in #229
- Task 424-427: HindiEnglish Corpora by @Mirzyaaliii in #231
- Tasks 573-578 air_dialog and curiosity_dialog instructions by @jayavardhan3112 in #271
- task 400 : PAWS by @amirrezamirzaei in #215
- Tasks 607-610 by @arlenfan in #287
- Task 534 : farstail entailment by @amirrezamirzaei in #275
- Task 352: CODA-19 by @cosmicishan in #192
- Task 329-331: GAP by @yeganehkordi in #182
- Task 530-533 EuroParl Spanish dataset by @cosmicishan in #274
- Task 524-529: ParsiNLU-sentiment-analysis by @yeganehkordi in #272
- Task 535-564 ALT & Discofuse Datasets by @PhaniRohithaKaza in #269
- Task 515-516 : Senteval by @amirrezamirzaei in #262
- task 397-399 : Semeval2018 task1 by @amirrezamirzaei in #212
- Task 360-362 : "Yes, and" response generation and classification. by @Maitreyapatel in #194
- Task 309-311: RACE by @Mirzyaaliii in #176
- Task 615 Movieqa by @cosmicishan in #297
- Task 582 Natural Questions by @cosmicishan in #278
- Task 579-581 Socialiqa by @cosmicishan in #277
- Minor Edits To My Previously Merged Tasks by @eshaanpathak in #313
- Task 292-294: StoryCommonsense by @Mirzyaaliii in #170
- Task 428-431 : Sent eval by @amirrezamirzaei in #232
- Tasks 469-472: from MRQA and hasPart KB datasets by @Maitreyapatel in #242
- Task 473-474: ParsiNLU-multiple-choice by @yeganehkordi in #243
- Task 521 - classification from QANTA by @Palipoor in #263
- Tasks 640-644: esnli Dataset and refresd Dataset by @ilobo22 in #305
- task 340 - 343: WinoMT by @XudongOliverShen in #187
- task 350 - 351: WinoMT by @XudongOliverShen in #191
- Tasks 510-511: Reddit Tifu dataset by @shailaja183 in #259
- task 611 : Mutual by @amirrezamirzaei in #290
- Task 616 CoLA by @cosmicishan in #298
- Task 269-270: Counterfactual story generation by @Maitreyapatel in #156
- Tasks 438-441: English-Gujarati Language by @Mihir3009 in #235
- Task 405 from NarrativeQA by @Palipoor in #226
- Task 522 News Editorials by @Palipoor in #267
- add tasks 402 and 404 by @Palipoor in #225
- Tasks 388-390: Torque (merge issue fix) by @shailaja183 in #258
- Tasks 747-749 and 614 from Glucose by @Palipoor in #296
- Task 668: TLDR; summarizing paper abstracts by @Sujan242 in #311
- Task - 649: RACE by @Mirzyaaliii in #309
- Task 650-663: ParsiNLU-translation by @yeganehkordi in #310
- Tasks 442-452 : opus_paracrawl and com_qa datasets by @Arutselvan in #237
- Tasks 664-667 and 685-737: Measuring Massive Multitask Language Understanding by @Sujan242 in #308
- task 827-828 : copa by @amirrezamirzaei in #324
- Task 933 Style Transfer (Simplify Sentences) by @ghlai9665 in #348
- Task 617-618 Amazon Review by @cosmicishan in #299
- Task 612 Yoruba BBC headline by @cosmicishan in #291
- Task 843-849: financial_phrasebook, pubmed_QA Datasets. by @JalanshMunshi in #328
- Tasks 672-675 : Amazon/Yelp reviews and Google welformed query by @paik1 in #314
- Tasks 853-856: HIPPOCAMPUS and conv_ai_2 by @NeilFranks in #330
- Task 770-819: PAWS-X and PEC by @andersonjwan in #322
- Tasks 406-417 : Mickey tasks by @Maitreyapatel in #227
- Task 672, 750-754, 861-868: NumerSense Task by @tanay2001 in #307
- Tasks 629-635: dbpedia_14 and allegro_reviews datasets by @divya0627 in #303
- Tasks 834-837: Math Dataset & Viquiquad by @chiragvartak in #326
- Tasks 1087-1089 and 1146-1147 - Synthetic Data - Answer Generation by @Ravsehajsinghpuri in #363
- Tasks 453-455 : Swag Dataset by @kuntalkumarpal in #238
- Task 761-765: app_review and emea by @vijaykumawat256 in #320
- Task 613 Politifact by @cosmicishan in #293
- Tasks 850-852: Synthetic data by @kurbster in #329
- Tasks 625-628 by @ananth-duggirala in #302
- Tasks 565-572 by @BhavyaSri1001 in #270
- Task 493-496: Amazon Polarity and Semeval by @SavanDoshi in #253
- Tasks 1090-1114: TED translation task for 25 language directions. by @davidstap in #364
- Task 432-437: alt (language translation) by @SavanDoshi in #233
- Task 585 Preposition by @cosmicishan in #280
- Task 738 from perspectrum by @Palipoor in #316
- Task 489-492: mwsc Dataset by @NSVPC in #252
- Tasks 264-266: Paper Reviews by @Mihir3009 in #153
- Tasks 743-746 EURLEX,YELP and ARITHMETIC by @RaghulRajM in #318
- Task 583-584 Universal Dependency English Parts of Speech Tagging by @mishra-sid in #279
- Task 586-590 Amazon food review by @cosmicishan in #281
- Tasks 1148-1153 by @Ravsehajsinghpuri in #370
- Task 1152-1159: BARD analogical reasoning dataset by @ashok-arjun in #374
- Task 669-671: AmbigQA by @Mirzyaaliii in #312
- Task hierarchy by @swarooprm in #361
- Revert "Task hierarchy" by @danyaljj in #391
- Create task-hierarchy.md in doc--In progress by @swarooprm in #389
- Tasks 935-937: Defeasible reasoning task/dataset by @Sujan242 in #352
- Task 1314: Country Abbreviation by @sumantapatro in #397
- Task 384-385 Socialiqa(replaced) by @cosmicishan in #379
- Tasks 927-928 : style transfer by @tanay2001 in #345
- Tasks 938-954 Indic Glue by @tanay2001 in #353
- Tasks: 1130-1145 X-CSR commonsense multiple QA by @Maitreyapatel in #368
- Tasks 1315-136 and 1188-1194 by @Ravsehajsinghpuri in #387
- Updating the task hierarchy for two tasks by @danyaljj in #400
- Task 497-509 : Scruples-Anecdote-Dilemmas-datasets by @kuntalkumarpal in #255
- Tasks 890-893: GCWD and gap by @selvagmj in #338
- Task 1115-1129 ALT by @SavanDoshi in #367
- Task 591-594 SciQ dataset by @kuntalkumarpal in #282
- adding language indicators by @danyaljj in #419
- flag repeated lines. by @danyaljj in #422
- Fix file issues outside the tasks folder by @swarooprm in #424
- Update test_all.py by @danyaljj in #425
- tasks 1331-1333 by @Ravsehajsinghpuri in #403
- Verify language naming conventions by @danyaljj in #423
- Task 1195: Disfluent to fluent sentence conversion by @ashok-arjun in #388
- Tasks 1398-1402 by @swarooprm in #417
- Tasks 1403-1406 by @Ravsehajsinghpuri in #426
- Task 1285-1307: Keypoint Matching - 23 topics by @ashok-arjun in #394
- Initial Task Hierarchy Revision & Auto Update Readme by @ghlai9665 in #428
- Fix the inconsistencies of the definitions by @danyaljj in #432
- Task 1418: bless lexical semantics task by @atharva-naik in #433
- Task 957-959: E2E by @ghlai9665 in #356
- Updating the tests: check if the tasks are sorted correctly. by @danyaljj in #434
- Update README.md by @danyaljj in #439
- Task 1419 - 1424 : MathQA by @mishra-sid in #438
- Task 1187 Politifact 2(Reviewed) by @cosmicishan in #437
- Task1186: natural-ness evaluation by @Sujan242 in #371
- Tasks 1425-1428 and 1317-1322 by @sumantapatro in #398
- Updating the readme [to be merged on October 15] by @danyaljj in #381
New Contributors
- @danyaljj made their first contribution in #1
- @swarooprm made their first contribution in #2
- @amirrezamirzaei made their first contribution in #11
- @shailaja183 made their first contribution in #14
- @kurbster made their first contribution in #15
- @Mihir3009 made their first contribution in #34
- @nrjvarshney made their first contribution in #27
- @Maitreyapatel made their first contribution in #41
- @Palipoor made their first contribution in #51
- @Mirzyaaliii made their first contribution in #62
- @Mehrad0711 made their first contribution in #66
- @kuntalkumarpal made their first contribution in #17
- @aarunku5 made their first contribution in #10
- @yeganehkordi made their first contribution in #79
- @eshaanpathak made their first contribution in #81
- @colinzhaoust made their first contribution in #83
- @SavanDoshi made their first contribution in #117
- @slowy07 made their first contribution in #125
- @liusiyi641 made their first contribution in #90
- @ethankim00 made their first contribution in #116
- @XudongOliverShen made their first contribution in #159
- @gkaramanolakis made their first contribution in #200
- @kushalchawla made their first contribution in #193
- @jayavardhan3112 made their first contribution in #271
- @arlenfan made their first contribution in #287
- @cosmicishan made their first contribution in #192
- @PhaniRohithaKaza made their first contribution in #269
- @ilobo22 made their first contribution in #305
- @Sujan242 made their first contribution in #311
- @Arutselvan made their first contribution in #237
- @ghlai9665 made their first contribution in #348
- @JalanshMunshi made their first contribution in #328
- @paik1 made their first contribution in #314
- @NeilFranks made their first contribution in #330
- @andersonjwan made their first contribution in #322
- @tanay2001 made their first contribution in #307
- @divya0627 made their first contribution in #303
- @chiragvartak made their first contribution in #326
- @Ravsehajsinghpuri made their first contribution in #363
- @vijaykumawat256 made their first contribution in #320
- @ananth-duggirala made their first contribution in #302
- @BhavyaSri1001 made their first contribution in #270
- @davidstap made their first contribution in #364
- @NSVPC made their first contribution in #252
- @RaghulRajM made their first contribution in #318
- @mishra-sid made their first contribution in #279
- @ashok-arjun made their first contribution in #374
- @sumantapatro made their first contribution in #397
- @selvagmj made their first contribution in #338
- @atharva-naik made their first contribution in #433
Full Changelog: https://github.com/allenai/natural-instructions-expansion/commits/v2.0