Skip to content

Commit

Permalink
Add hotpot QA (#703)
Browse files Browse the repository at this point in the history
* Add hotpot QA

* Remove unused deps from hotpotqa

* fix remote tests for new datasets (#704)

* Update datasets/hotpot_qa/hotpot_qa.py

Co-authored-by: Quentin Lhoest <[email protected]>

* Add hotpot QA

* Remove unused deps from hotpotqa

Co-authored-by: Quentin Lhoest <[email protected]>
  • Loading branch information
ghomasHudson and lhoestq authored Oct 2, 2020
1 parent 1ac5ecd commit d9da44a
Show file tree
Hide file tree
Showing 4 changed files with 148 additions and 0 deletions.
1 change: 1 addition & 0 deletions datasets/hotpot_qa/dataset_infos.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"distractor": {"description": "HotpotQA is a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowingQA systems to reason with strong supervisionand explain the predictions; (4) we offer a new type of factoid comparison questions to testQA systems\u2019 ability to extract relevant facts and perform necessary comparison.\n", "citation": "\n@inproceedings{yang2018hotpotqa,\n title={{HotpotQA}: A Dataset for Diverse, Explainable Multi-hop Question Answering},\n author={Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William W. and Salakhutdinov, Ruslan and Manning, Christopher D.},\n booktitle={Conference on Empirical Methods in Natural Language Processing ({EMNLP})},\n year={2018}\n}\n", "homepage": "https://hotpotqa.github.io/", "license": "", "features": {"id": {"dtype": "string", "id": null, "_type": "Value"}, "question": {"dtype": "string", "id": null, "_type": "Value"}, "answer": {"dtype": "string", "id": null, "_type": "Value"}, "type": {"dtype": "string", "id": null, "_type": "Value"}, "level": {"dtype": "string", "id": null, "_type": "Value"}, "supporting_facts": {"feature": {"title": {"dtype": "string", "id": null, "_type": "Value"}, "sent_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}, "context": {"feature": {"title": {"dtype": "string", "id": null, "_type": "Value"}, "sentences": {"feature": {"dtype": "string", "id": null, "_type": "Value"}, "length": -1, "id": null, "_type": "Sequence"}}, "length": -1, "id": null, "_type": "Sequence"}}, "post_processed": null, "supervised_keys": null, "builder_name": "hotpot_qa", "config_name": "distractor", "version": {"version_str": "1.0.0", "description": null, "major": 1, "minor": 0, "patch": 0}, "splits": {"train": {"name": "train", "num_bytes": 552949315, "num_examples": 90447, "dataset_name": "hotpot_qa"}, "validation": {"name": "validation", "num_bytes": 45716111, "num_examples": 7405, "dataset_name": "hotpot_qa"}}, "download_checksums": {"http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_train_v1.1.json": {"num_bytes": 566426227, "checksum": "26650cf50234ef5fb2e664ed70bbecdfd87815e6bffc257e068efea5cf7cd316"}, "http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_distractor_v1.json": {"num_bytes": 46320117, "checksum": "4e9ecb5c8d3b719f624d66b60f8d56bf227f03914f5f0753d6fa1b359d7104ea"}}, "download_size": 612746344, "post_processing_size": null, "dataset_size": 598665426, "size_in_bytes": 1211411770}, "fullwiki": {"description": "HotpotQA is a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowingQA systems to reason with strong supervisionand explain the predictions; (4) we offer a new type of factoid comparison questions to testQA systems\u2019 ability to extract relevant facts and perform necessary comparison.\n", "citation": "\n@inproceedings{yang2018hotpotqa,\n title={{HotpotQA}: A Dataset for Diverse, Explainable Multi-hop Question Answering},\n author={Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William W. and Salakhutdinov, Ruslan and Manning, Christopher D.},\n booktitle={Conference on Empirical Methods in Natural Language Processing ({EMNLP})},\n year={2018}\n}\n", "homepage": "https://hotpotqa.github.io/", "license": "", "features": {"id": {"dtype": "string", "id": null, "_type": "Value"}, "question": {"dtype": "string", "id": null, "_type": "Value"}, "answer": {"dtype": "string", "id": null, "_type": "Value"}, "type": {"dtype": "string", "id": null, "_type": "Value"}, "level": {"dtype": "string", "id": null, "_type": "Value"}, "supporting_facts": {"feature": {"title": {"dtype": "string", "id": null, "_type": "Value"}, "sent_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}, "context": {"feature": {"title": {"dtype": "string", "id": null, "_type": "Value"}, "sentences": {"feature": {"dtype": "string", "id": null, "_type": "Value"}, "length": -1, "id": null, "_type": "Sequence"}}, "length": -1, "id": null, "_type": "Sequence"}}, "post_processed": null, "supervised_keys": null, "builder_name": "hotpot_qa", "config_name": "fullwiki", "version": {"version_str": "1.0.0", "description": null, "major": 1, "minor": 0, "patch": 0}, "splits": {"train": {"name": "train", "num_bytes": 552949315, "num_examples": 90447, "dataset_name": "hotpot_qa"}, "validation": {"name": "validation", "num_bytes": 46848601, "num_examples": 7405, "dataset_name": "hotpot_qa"}, "test": {"name": "test", "num_bytes": 46000102, "num_examples": 7405, "dataset_name": "hotpot_qa"}}, "download_checksums": {"http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_train_v1.1.json": {"num_bytes": 566426227, "checksum": "26650cf50234ef5fb2e664ed70bbecdfd87815e6bffc257e068efea5cf7cd316"}, "http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_fullwiki_v1.json": {"num_bytes": 47454698, "checksum": "2f1f3e594a3066a3084cc57950ca2713c24712adaad03af6ccce18d1846d5618"}, "http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_test_fullwiki_v1.json": {"num_bytes": 46213747, "checksum": "c61a5274b9aa6deca3f7d2dc4d7757684c158fbd2264f759307699fb53801c2b"}}, "download_size": 660094672, "post_processing_size": null, "dataset_size": 645798018, "size_in_bytes": 1305892690}}
Binary file not shown.
Binary file not shown.
147 changes: 147 additions & 0 deletions datasets/hotpot_qa/hotpot_qa.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# coding=utf-8
# Copyright 2020 The TensorFlow Datasets Authors and the HuggingFace Datasets Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Lint as: python3
"""HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering."""

from __future__ import absolute_import, division, print_function

import json
import os
import textwrap

import datasets


_CITATION = """
@inproceedings{yang2018hotpotqa,
title={{HotpotQA}: A Dataset for Diverse, Explainable Multi-hop Question Answering},
author={Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William W. and Salakhutdinov, Ruslan and Manning, Christopher D.},
booktitle={Conference on Empirical Methods in Natural Language Processing ({EMNLP})},
year={2018}
}
"""

_DESCRIPTION = """\
HotpotQA is a new dataset with 113k Wikipedia-based question-answer pairs with four key features:
(1) the questions require finding and reasoning over multiple supporting documents to answer;
(2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas;
(3) we provide sentence-level supporting facts required for reasoning, allowingQA systems to reason with strong supervisionand explain the predictions;
(4) we offer a new type of factoid comparison questions to testQA systems’ ability to extract relevant facts and perform necessary comparison.
"""

_URL_BASE = "http://curtis.ml.cmu.edu/datasets/hotpot/"


class HotpotQA(datasets.GeneratorBasedBuilder):
"""HotpotQA is a Dataset for Diverse, Explainable Multi-hop Question Answering."""

BUILDER_CONFIGS = [
datasets.BuilderConfig(
name="distractor",
version=datasets.Version("1.0.0"),
description=textwrap.dedent(
"""
In the distractor setting, a question-answering system reads 10 paragraphs to provide an answer to a question.
They must also justify these answers with supporting facts. This setting challenges the model to find the true
supporting facts in the presence of noise, for each example we employ bigram tf-idf (Chen et al., 2017) to retrieve
8 paragraphs from Wikipedia as distractors, using the question as the query. We mix them with the 2 gold paragraphs
(the ones used to collect the question and answer) to construct the distractor setting.
"""
),
),
datasets.BuilderConfig(
name="fullwiki",
version=datasets.Version("1.0.0"),
description=textwrap.dedent(
"""
In the fullwiki setting, a question-answering system must find the answer to a question in the scope of the
entire Wikipedia. We fully test the model’s ability to locate relevant facts as well as reasoning about them
by requiring it to answer the question given the first paragraphs of all Wikipedia articles without the gold
paragraphs specified. This full wiki setting truly tests the performance of the systems’ ability at multi-hop
reasoning in the wild.
"""
),
),
]

def _info(self):
return datasets.DatasetInfo(
description=_DESCRIPTION,
features=datasets.Features(
{
"id": datasets.Value("string"),
"question": datasets.Value("string"),
"answer": datasets.Value("string"),
"type": datasets.Value("string"),
"level": datasets.Value("string"),
"supporting_facts": datasets.features.Sequence(
{
"title": datasets.Value("string"),
"sent_id": datasets.Value("int32"),
}
),
"context": datasets.features.Sequence(
{
"title": datasets.Value("string"),
"sentences": datasets.features.Sequence(datasets.Value("string")),
}
),
}
),
supervised_keys=None,
homepage="https://hotpotqa.github.io/",
citation=_CITATION,
)

def _split_generators(self, dl_manager):
"""Returns SplitGenerators."""
paths = {
datasets.Split.TRAIN: os.path.join(_URL_BASE, "hotpot_train_v1.1.json"),
datasets.Split.VALIDATION: os.path.join(_URL_BASE, "hotpot_dev_" + self.config.name + "_v1.json"),
}
if self.config.name == "fullwiki":
paths[datasets.Split.TEST] = os.path.join(_URL_BASE, "hotpot_test_fullwiki_v1.json")

files = dl_manager.download(paths)

split_generators = []
for split in files:
split_generators.append(datasets.SplitGenerator(name=split, gen_kwargs={"data_file": files[split]}))

return split_generators

def _generate_examples(self, data_file):
"""This function returns the examples."""
data = json.load(open(data_file))
for idx, example in enumerate(data):

# Test set has missing keys
for k in ["answer", "type", "level"]:
if k not in example.keys():
example[k] = None

if "supporting_facts" not in example.keys():
example["supporting_facts"] = []

yield idx, {
"id": example["_id"],
"question": example["question"],
"answer": example["answer"],
"type": example["type"],
"level": example["level"],
"supporting_facts": [{"title": f[0], "sent_id": f[1]} for f in example["supporting_facts"]],
"context": [{"title": f[0], "sentences": f[1]} for f in example["context"]],
}

1 comment on commit d9da44a

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.017527 / 0.011353 (0.006174) 0.014777 / 0.011008 (0.003769) 0.051437 / 0.038508 (0.012929) 0.034597 / 0.023109 (0.011488) 0.217267 / 0.275898 (-0.058631) 0.246115 / 0.323480 (-0.077365) 0.010330 / 0.007986 (0.002344) 0.004562 / 0.004328 (0.000234) 0.008406 / 0.004250 (0.004156) 0.053463 / 0.037052 (0.016411) 0.222515 / 0.258489 (-0.035974) 0.243155 / 0.293841 (-0.050686) 0.143942 / 0.128546 (0.015396) 0.116801 / 0.075646 (0.041155) 0.501532 / 0.419271 (0.082261) 0.506351 / 0.043533 (0.462818) 0.212345 / 0.255139 (-0.042794) 0.228948 / 0.283200 (-0.054252) 0.093217 / 0.141683 (-0.048466) 1.876133 / 1.452155 (0.423978) 1.931852 / 1.492716 (0.439136)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.041797 / 0.037411 (0.004386) 0.021068 / 0.014526 (0.006543) 0.102649 / 0.176557 (-0.073908) 0.096523 / 0.737135 (-0.640612) 0.031745 / 0.296338 (-0.264594)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.193678 / 0.215209 (-0.021531) 1.952667 / 2.077655 (-0.124987) 1.230830 / 1.504120 (-0.273289) 1.169883 / 1.541195 (-0.371311) 1.242331 / 1.468490 (-0.226159) 6.145316 / 4.584777 (1.560539) 5.092701 / 3.745712 (1.346988) 7.366202 / 5.269862 (2.096341) 6.347208 / 4.565676 (1.781532) 0.621731 / 0.424275 (0.197456) 0.011368 / 0.007607 (0.003761) 0.219290 / 0.226044 (-0.006754) 2.309151 / 2.268929 (0.040222) 1.717875 / 55.444624 (-53.726750) 1.605285 / 6.876477 (-5.271192) 1.689025 / 2.142072 (-0.453048) 6.196161 / 4.805227 (1.390934) 5.591751 / 6.500664 (-0.908913) 9.548811 / 0.075469 (9.473342)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 12.506757 / 1.841788 (10.664969) 15.369316 / 8.074308 (7.295007) 13.402427 / 10.191392 (3.211035) 0.455622 / 0.680424 (-0.224802) 0.286103 / 0.534201 (-0.248098) 0.764376 / 0.579283 (0.185093) 0.551309 / 0.434364 (0.116945) 0.734956 / 0.540337 (0.194618) 1.665953 / 1.386936 (0.279017)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.017876 / 0.011353 (0.006523) 0.014210 / 0.011008 (0.003202) 0.052336 / 0.038508 (0.013828) 0.035057 / 0.023109 (0.011948) 0.374513 / 0.275898 (0.098615) 0.400441 / 0.323480 (0.076961) 0.010121 / 0.007986 (0.002136) 0.004673 / 0.004328 (0.000344) 0.006988 / 0.004250 (0.002738) 0.054601 / 0.037052 (0.017548) 0.359662 / 0.258489 (0.101173) 0.404682 / 0.293841 (0.110841) 0.147778 / 0.128546 (0.019232) 0.122191 / 0.075646 (0.046545) 0.482443 / 0.419271 (0.063171) 0.432205 / 0.043533 (0.388672) 0.361153 / 0.255139 (0.106014) 0.368247 / 0.283200 (0.085048) 0.104349 / 0.141683 (-0.037334) 1.915295 / 1.452155 (0.463140) 1.908143 / 1.492716 (0.415426)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.045429 / 0.037411 (0.008018) 0.021986 / 0.014526 (0.007460) 0.028861 / 0.176557 (-0.147696) 0.095587 / 0.737135 (-0.641548) 0.029998 / 0.296338 (-0.266340)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.258761 / 0.215209 (0.043552) 2.588346 / 2.077655 (0.510692) 1.913693 / 1.504120 (0.409573) 1.870653 / 1.541195 (0.329459) 1.949421 / 1.468490 (0.480931) 6.045328 / 4.584777 (1.460551) 4.876144 / 3.745712 (1.130432) 7.204059 / 5.269862 (1.934197) 6.455869 / 4.565676 (1.890193) 0.597559 / 0.424275 (0.173284) 0.010746 / 0.007607 (0.003139) 0.290650 / 0.226044 (0.064605) 3.067995 / 2.268929 (0.799067) 2.294160 / 55.444624 (-53.150464) 2.199183 / 6.876477 (-4.677293) 2.301052 / 2.142072 (0.158979) 6.001374 / 4.805227 (1.196147) 4.905584 / 6.500664 (-1.595080) 8.724180 / 0.075469 (8.648711)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 12.758600 / 1.841788 (10.916812) 15.770581 / 8.074308 (7.696273) 13.931882 / 10.191392 (3.740490) 0.906324 / 0.680424 (0.225901) 0.597727 / 0.534201 (0.063526) 0.771599 / 0.579283 (0.192316) 0.558116 / 0.434364 (0.123752) 0.721865 / 0.540337 (0.181527) 1.613277 / 1.386936 (0.226341)

Please sign in to comment.