-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Hello,
I am using the GPT2 models available in HF, and running into a few issues. Firstly, there seems to be an issue with the tokenizer. Trying to calculate perplexity using the evaluate module, as follows:
from evaluate import load
perplexity = load("perplexity", module_type="metric")
results = perplexity.compute(predictions=["Hola, como estas?"], model_id="PlanTL-GOB-ES/gpt2-base-bne", device="cpu")
Gives the following error:
...
File "/ikerlariak/aormazabal024/PhD/Poetry-Generation/demo/poetry-env-traganarru/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self`
This seems to be related to the special tokens for <pad>, <s>, </s> and <unk> not being properly set (but are used by the evaluate module), as the only special token added in the tokenizer is <|endoftext|>. One can manually fix it for the local snapshot:
tokenizer.pad_token = '<pad>'
tokenizer.bos_token = '</s>'
tokenizer.eos_token = '</s>'
tokenizer.unk_token = '<unk>'
tokenizer.save_pretrained('[snapshot-path]')
However, even after fixing this, I am getting quite high perplexities compared to the 10-13 reported in the paper for all sentences I am trying (assuming per-word-perplexity is reported). Is it possible there was an issue when converting from fairseq to HF, and are the original fairseq models available somewhere to compare? Or maybe I am making a mistake when calculating the ppl, was there any tokenization done to the text apart from BPE (i.e. replacing newlines with , which is pretty standard in fairseq)?