Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can i select a certain word vector? #166

Open
mandal4 opened this issue Jan 29, 2020 · 1 comment
Open

How can i select a certain word vector? #166

mandal4 opened this issue Jan 29, 2020 · 1 comment

Comments

@mandal4
Copy link

mandal4 commented Jan 29, 2020

Hi, i'm newbie for NLP but i'd like to select some category's word vector.
I want word vectors of MS-COCO class's name, such as 'Person', 'Bus', 'Bird'...
I downloaded pretrained file but i found there is no description about categorical label, and couldn't find how can i select the certain word vector.

Anyone could help me..?

@JaganKaartik
Copy link

If you have downloaded a pre-trained file eg. glove.42B.300d.txt or any other glove vectors.
One way to extract the certain word vectors is to use the scripts.glove2word2vec from Gensim.

from gensim.test.utils import datapath, get_tmpfile
from gensim.models import KeyedVectors
from gensim.scripts.glove2word2vec import glove2word2vec

glove_file = datapath('glove.42B.300d.txt')
tmp_file = get_tmpfile("glove_test_word2vec.txt")

_ = glove2word2vec(glove_file, tmp_file)

 model = KeyedVectors.load_word2vec_format(tmp_file)

This script allows to convert GloVe vectors into the word2vec.

Now, use model['key'] to get your desired word vectors.

eg.

model['person']
# array([ 9.6294e-02,  7.3925e-01, -4.1032e-01,....],dtype=float32) 
# type: numpy.ndarray

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants