Having various network heads #24

rodgzilla · 2018-07-13T16:07:35Z

Hi!

In the research paper, the authors tackle many different problems using the same base architecture, it is one the main strength of this article. Unfortunately, the actual version of the code only allows to work with multiple choices tasks such as ROCStories.

This is what I would like to fix in a future patch. By providing multiple model heads dedicated to other tasks that multiple choices problems, we can allow a lot more people to use this code.

I have already started working on this and I would like to get your opinions on a few design choices.

This is the new version of the DoubleHeadModel class:

class DoubleHeadModel(nn.Module):
    """ Transformer with language model and task specific heads """
    def __init__(self, cfg, clf_token, task_head_type, vocab=40990, n_ctx=512):
        super(DoubleHeadModel, self).__init__()
        self.transformer = TransformerModel(cfg, vocab=vocab, n_ctx=n_ctx)
        self.lm_head = LMHead(self.transformer, cfg)
        if isinstance(task_head_type, str):
            if task_head_type == 'multiple_choice':
                self.task_head = MultipleChoiceHead(clf_token, cfg)
            elif task_head_type == 'similarity':
                self.task_head = SimilarityHead(clf_token, cfg)
            elif task_head_type == 'inference':
                # the three classes correspond to entailment, contradiction and neutral.
                self.task_head = ClfHead(clf_token, cfg, 3)
            else:
            raise ValueError("task_head_type is expected to be 'multiple_choice' "
                             "'similarity', 'inference' or ('classification', n_class) "
                             f"got {task_head_type}.")
        elif isinstance(task_head_type, collections.abc.Sequence) and len(task_head_type) == 2 and \
             task_head_type[0] == 'classification':
            n_class = task_head_type[1]
            self.task_head = ClfHead(clf_token, cfg, n_class)
        else:
            raise ValueError("task_head_type is expected to be 'multiple_choice' "
                             "'similarity', 'inference' or ('classification', n_class) "
                             f"got {task_head_type}.")

    def forward(self, x):
        h = self.transformer(x)
        lm_logits = self.lm_head(h)
        task_logits = self.task_head(h, x)

        return lm_logits, task_logits

The __init__ method takes a new argument task_head_type which can be one of the following things:

"multiple_choice" for multiple choice problems (corresponds to current ClfHead) such as ROCStories.
"similarity" for similarity tasks such Quora Question Pairs (QQP) and the Semantic Textual Similarity benchmark (STS-B).
"inference" for Natural Language Inference (NLI) tasks such as SNLI, QNLI and MNLI. Inference problems are treated as classification problems with 3 classes: entailment, contradiction and neutral.
("classification", n_class) for classification tasks such as the Corpus of Linguistic Acceptability (CoLA) and the Stanford Sentiment Treebank (SST-2).

The code for the various heads is the following:

class MultipleChoiceHead(nn.Module):
    """ Multiple Choice Head for the transformer """

    def __init__(self, clf_token, cfg):
        super(MultipleChoiceHead, self).__init__()
        self.n_embd = cfg.n_embd
        self.clf_token = clf_token
        self.dropout = nn.Dropout2d(cfg.clf_pdrop)  
        self.linear = nn.Linear(cfg.n_embd, 1)

        nn.init.normal_(self.linear.weight, std = 0.02)
        nn.init.normal_(self.linear.bias, 0)

    def forward(self, h, x):
        # Classification logits
        clf_h = h.view(-1, self.n_embd)
        flat = x[..., 0].contiguous().view(-1)
        clf_h = clf_h[flat == self.clf_token, :]
        clf_h = clf_h.view(-1, x.size(1), self.n_embd, 1)
        clf_h = self.dropout(clf_h.transpose(1, 2)).transpose(1, 2)
        clf_h = clf_h.contiguous().view(-1, self.n_embd)
        clf_logits = self.linear(clf_h)

        return clf_logits.view(-1, x.size(1))

class ClfHead(nn.Module):
    """Classification Head for the transformer """

    def __init__(self, clf_token, cfg, n_class):
        super(ClfHead, self).__init__()
        self.n_embd = cfg.n_embd
        self.clf_token = clf_token
        self.dropout = nn.Dropout(cfg.clf_pdrop)
        self.linear = nn.Linear(cfg.n_embd, n_class)

        nn.init.normal_(self.linear.weight, std = 0.02)
        nn.init.normal_(self.linear.bias, 0)

    def forward(self, h, x):
        clf_h = h.view(-1, self.n_embd)
        flat = x[..., 0].contiguous().view(-1)
        clf_h = clf_h[flat == self.clf_token, :]
        clf_h = self.dropout(clf_h)
        clf_logits = self.linear(clf_h)

        return clf_logits

class SimilarityHead(nn.Module):
    """ Similarity Head for the transformer """

    def __init__(self, clf_token, cfg):
        super(SimilarityHead, self).__init__()
        self.n_embd = cfg.n_embd
        self.clf_token = clf_token
        self.dropout = nn.Dropout(cfg.clf_pdrop)
        self.linear = nn.Linear(cfg_n_embd, 1)

        nn.init.normal_(self.linear.weight, std = 0.02)
        nn.init.normal_(self.linear.bias, 0)

    def forward(self, h, x):
        sim_h = h.view(-1, self.n_embd)
        flat = x[..., 0].contiguous().view(-1)
        sim_h = sim_h[flat == self.clf_token, :]
        sim_h = self.dropout(sim_h)
        sim_h = sim_h.sum(dim = 1)
        sim_logits = self.linear(sim_h)

        return sim_logits

Do you think that this new design is reasonable?

If this code seems ok, I would like to test it before creating a pull request. Unfortunately I will not have the time to test SimilarityHead. Would anyone like to work with me on this ?

The text was updated successfully, but these errors were encountered:

thomwolf · 2018-07-18T08:43:06Z

Look good to me! I will merge your PR.

I can help you test the SimilarityHead, but not before the end of August, so if someone want to tackle this question during the summer, please do!

There are a few discussion related to this on OpenAI's repo that are probably worth following:

Cannot reproduce RACE score openai/finetune-transformer-lm#11
Any timeline to release the code to train the LM + finetune on the other 11 tasks? openai/finetune-transformer-lm#13

rodgzilla mentioned this issue Jul 16, 2018

Simplifying the use of the model to perform different tasks #25

Merged

rodgzilla mentioned this issue Aug 23, 2018

How to use Inference? #33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Having various network heads #24

Having various network heads #24

rodgzilla commented Jul 13, 2018

thomwolf commented Jul 18, 2018

Having various network heads #24

Having various network heads #24

Comments

rodgzilla commented Jul 13, 2018

thomwolf commented Jul 18, 2018