Seed Questions in the Dataset #1

joshinh · 2020-11-30T21:51:07Z

@danyaljj Firstly thanks for releasing the dataset!

If I understood correctly, every perturbed example in this dataset has been created from some seed example from the original BOOLQ dataset. But there seem to be a few clusters (eg. id=25894) which have no seed questions. Is this behaviour expected or am I missing something?

danyaljj · 2020-12-02T02:41:45Z

Hey there 👋

You're right, some clusters in the data don't have instances that belong to BoolQ.
Here is why: after perturbing the BoolQ [subset] instances, we have a verification step where humans check if a given question is well-formed questions (i.e., humans agree on its answer). Few of the BoolQ [subset] instances were deleted at this stage since our human annotators didn't like them.

Does that answer your question?

joshinh · 2020-12-02T04:15:53Z

Thanks, that makes sense! I was under the impression that the verification step is only performed for perturbed instances. I suppose that is also the reason why some clusters have size = 1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seed Questions in the Dataset #1

Seed Questions in the Dataset #1

joshinh commented Nov 30, 2020 •

edited

Loading

danyaljj commented Dec 2, 2020

joshinh commented Dec 2, 2020

Seed Questions in the Dataset #1

Seed Questions in the Dataset #1

Comments

joshinh commented Nov 30, 2020 • edited Loading

danyaljj commented Dec 2, 2020

joshinh commented Dec 2, 2020

joshinh commented Nov 30, 2020 •

edited

Loading