Is it expected the same sentence gives different features? #115

towr · 2021-05-05T12:38:05Z

I'm a bit puzzled by something I encountered trying to encode sentences as embeddings. When I ran the sentences through the model one at a time, I got slightly different results from when I ran batches of sentences.

I've reduced an example down to:

from transformers import pipeline
import numpy as np

p = pipeline('feature-extraction', model='allenai/scibert_scivocab_uncased')
s = 'the scurvy dog walked home alone'.split()

for l in range(1,len(s)+1):
    txt = ' '.join(s[:l])
    
    res1 = p(txt)
    res2 = p(txt)
    res1_2 = p([txt, txt])
    print(l, txt, len(res1[0]))
    print(all( np.allclose(i, j) for i, j in zip(res1[0], res2[0])),
          all( np.allclose(i, j) for i, j in zip(res2[0], res1_2[0])),
          all( np.allclose(i, j) for i, j in zip(res1_2[0], res1_2[1])))

The output I get is:

1 the 3
True False True
2 the scurvy 6
True True True
3 the scurvy dog 7
True False False
4 the scurvy dog walked 9
True False True
5 the scurvy dog walked home 10
True True False
6 the scurvy dog walked home alone 11
True True True

So running a single sentence through the model seems to give the same output each time, but if I run a batch with the same sentence twice, it's sometimes different (between the two outputs, and compared to the single-sentence case)

Is this expected/explainable?

Further context, I'm running it on CPU (laptop), python 3.8.9, freshly installed venv.
The difference is usually just in a few indices of the embeddings, and can be up to 1e-3. The difference is negligible when comparing the embeddings with cosine distance. But I'd like to understand where it comes from before dismissing it.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it expected the same sentence gives different features? #115

Is it expected the same sentence gives different features? #115

towr commented May 5, 2021 •

edited

Loading

Is it expected the same sentence gives different features? #115

Is it expected the same sentence gives different features? #115

Comments

towr commented May 5, 2021 • edited Loading

towr commented May 5, 2021 •

edited

Loading