-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Computing rouge is significantly slow #5708
Comments
I wrote a small test script, but cannot reproduce the latency that you're observing.
Yields:
Can you share more details on the setup that you're observing this on? Are you evaluating a specific bart model? |
@AkshitaB I get similar timings with your inputs But if you change your input shapes to this vocab_size = 50265
batch_size = 64
pred_seq_len = 150
tgt_seq_len = 787
predictions = torch.randint(0, vocab_size, (batch_size, pred_seq_len))
targets = torch.randint(0, vocab_size, (batch_size, tgt_seq_len)) I get
|
@vikigenius I can reproduce the above. I'll look into it. |
@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜 |
4 similar comments
@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜 |
@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜 |
@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜 |
@AkshitaB this is just a friendly ping to make sure you haven't forgotten about this issue 😜 |
Checklist
main
branch of AllenNLP.pip freeze
.Description
The rouge metric computation used in the metric is extremely slow. I am testing this using the Bart model.
Particularly the call here: https://github.com/allenai/allennlp-models/blob/3e3b3ecf8531d8c4d900fdf616926426b401b9ee/allennlp_models/generation/models/bart.py#L260
Related issues or possible duplicates
Environment
OS: Linux
Python version: 3.9
Output of
pip freeze
:Steps to reproduce
Particularly I am measuring the time it takes to call rouge here like this.
For a batch size of 64
This takes:
GPU Rouge timing: 187.72676797099984 # That's more than 3 minutes.
I noticed that the rouge computation is done on the GPU and maybe that was slowing things down, so I decided to check
This takes:
CPU Rouge timing: 64.7368660709999 # Much faster but still very slow
So I went ahead and used the rouge implementations from the dataset library which is basically a wrapper around
rouge_score
by Google.This takes:
HFT Rouge timing: 1.1103893849999622 # That's just one second
The rouge implementation used in allennlp here is 180 times slower than huggingface even though it doesn't have to perform the decoding step or perform stemming or compute the RougeLSum score.
I have used this model before, but I don't remember it being this slow. It seems like a recent thing.
Calling rouge for a single batch of size 64 should not be taking 3 minutes no matter what is going on internally.
The text was updated successfully, but these errors were encountered: