cfp.txt

Call for Participation:
TextGraphs-15: Third Shared Task on Explanation Regeneration
https://competitions.codalab.org/competitions/29228

We invite participation in the 3rd Shared Task on Explanation Regeneration
associated with the 15th Workshop on Graph-Based Natural Language Processing
(TextGraphs-15).

All systems participating in the shared task will be invited to submit system description papers.
Each system description paper will be peer-reviewed by (two) other participating teams and will be
presented as a poster at the main workshop: https://sites.google.com/view/textgraphs2021.

Overview
========

Multi-hop inference is the task of combining more than one piece of information 
to solve an inference task, such as question answering.  This can take many 
forms, from combining free-text sentences read from books or the web, to 
combining linked facts from a structured knowledge base.  The Shared Task 
on Explanation Regeneration asks participants to develop methods that 
reconstruct large explanations for science questions, using a corpus of 
gold explanations that provides supervision and instrumentation for this 
multi-hop inference task.  Each explanation is represented as an 
"explanation graph", a set of atomic facts (between 1 and 16 per explanation, 
drawn from a knowledge base of 9,000 facts) that, together, form a detailed 
explanation for the reasoning required to answer and explain the resoning 
behind a question. 

Explanation Regeneration is a stepping-stone towards general multi-hop inference 
over language.  In this shared task we frame explanation regeneration as a 
ranking task, where the inputs to a given system consist of questions and their 
correct answers. Participating systems must then rank atomic facts from a 
provided semi-structured knowledge base such that the combination of top-ranked 
facts provide a detailed explanation for the answer.  This requires combining 
scientific and common-sense/world knowledge with compositional inference.

While large language models (BERT, ERNIE) achieved the highest performance in 
the 2019 and 2020 shared tasks, substantially advancing the state-of-the-art 
over previous methods, absolute performance remains modest, highlighting the 
difficulty of generating detailed explanations through multi-hop reasoning.

*NEW FOR 2021:*
Many-hop multi-hop inference is challenging because there are often multiple 
ways of assembling a good explanation for a given question.  This 2021 
instantiation of the shared task focuses on the theme of determining relevance 
versus completeness in large multi-hop explanations.  To this end, this year 
we include a very large dataset of approximately 250,000 expert-annotated 
relevancy ratings for facts ranked highly by baseline language models from 
previous years (e.g. BERT, RoBERTa).

Submissions using a variety of methods (graph-based or otherwise) are 
encouraged.  Submissions that evaluate how well existing models designed on 
2-hop multihop question answering datasets (e.g. HotPotQA, QASC, etc) perform 
at many-fact multi-hop explanation regeneration are welcome.

Example
=======

For example, for the question: "Which of the following is an example of an
organism taking in nutrients?" with the correct answer: "a girl eating an
apple", an ideal system would rank the following explanatory statements
at the top of its extracted sentences:

1. A girl means a human girl.
2. Humans are living organisms.
3. Eating is when an organism takes in nutrients in the form of food.
4. Fruits are kinds of foods.
5. An apple is a kind of fruit.


Important Dates
===============

* 2021-02-15: Training data release
* 2021-03-10: Test data release; Evaluation start
* 2021-03-24: Evaluation end
* 2021-04-01: System description paper deadline
* 2021-04-13: Deadline for reviews of system description papers
* 2021-04-15: Author notifications
* 2021-04-26: Camera-ready description paper deadline
* 2021-06-11: TextGraphs-15 workshop

Data
====

The data used in this shared task contains approximately 5,100 questions drawn
from the AI2 Reasoning Challenge (ARC) dataset (Clark et al., 2018), together
with explanation sentences for their correct answers drawn from the
WorldTree V2 corpus (Xie et al., 2020, Jansen et al., 2018).  For 2021, this
has been augmented with a new set of approximately 250k expert-generated relevancy
ratings.

The knowledge base supporting these questions contains approximately 9,000 facts.
To encourage a variety of solving methods, the knowledge base is available both as
plain-text sentences (unstructured) as well as semi-structured tables. Facts are a
combination of scientific knowledge as well as common-sense/world knowledge.

Please see the shared task website for more details:
https://competitions.codalab.org/competitions/29228
https://github.com/cognitiveailab/tg2021task