bug in function embedding from sentiment.jl #129

mboedigh · 2019-02-28T21:53:01Z

The following code seems to have a bug in that it reshapes a matrix in an apparent attempt to transpose:

function embedding(embedding_matrix, x)
    # inefficient code:
    temp = embedding_matrix[:, Int64(x[1])+1]
    for i=2:length(x)
        temp = hcat(temp, embedding_matrix[:, Int64(x[i])+1])
    end
   # bug
    return reshape(temp, reverse(size(temp)))  
end

I propose the following:

function embedding(embedding_matrix, x)
    temp = embedding_matrix[:, Int64.(x).+1]'; 
end

# After this change the results of the following are much improved
d_good = StringDocument("A very nice thing that everyone likes")
prepare!(d_good, strip_case | strip_punctuation )
d_bad = StringDocument("A boring long dull movie that no one likes")
prepare!(d_bad, strip_case | strip_punctuation)
s = SentimentAnalyzer()
s(d_good)
s(d_bad)

The text was updated successfully, but these errors were encountered:

aviks · 2019-03-01T16:13:45Z

Good catch, thanks.

aviks · 2019-03-02T16:54:25Z

On further consideration, this change causes the existing test to fail :

TextAnalysis.jl/test/sentiment.jl

Line 8 in 59e0a70

d = StringDocument("a horrible thing that everyone hates")

Investigating...

mboedigh · 2019-03-04T17:26:14Z

I believe there is a deeper problem in the word embedding. Bad words such as "hate" get good scores.

using TextAnalysis
d_bad = StringDocument("a horrible thing that everyone hates")
prepare!(d_bad, strip_case | strip_punctuation)
s = SentimentAnalyzer()
[s(w) for w in words]

I note that there are 88587 words in the dictionary
length(s.model.words)

but the lookup table has only 5000 entries
size(s.model.weight[:embedding_1]["embedding_1"]["embeddings:0"],2)

Invoking s(d_illegal), with d_illegal being any StringDocument containing a word mapping higher than 5000 will cause an error. I don't know where the weights come from exactly so I can't track it down.

ksteimel mentioned this issue Mar 14, 2019

Better sentiment analysis model #84

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug in function embedding from sentiment.jl #129

bug in function embedding from sentiment.jl #129

mboedigh commented Feb 28, 2019

aviks commented Mar 1, 2019

aviks commented Mar 2, 2019

mboedigh commented Mar 4, 2019

bug in function embedding from sentiment.jl #129

bug in function embedding from sentiment.jl #129

Comments

mboedigh commented Feb 28, 2019

aviks commented Mar 1, 2019

aviks commented Mar 2, 2019

mboedigh commented Mar 4, 2019