Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why zero out embeddings for special words if they are absent in vocab #19

Open
Silenthinker opened this issue Dec 20, 2017 · 2 comments

Comments

@Silenthinker
Copy link

Hi,

I noticed that in main.py, you zero out the embeddings for special words if they are absent in vocabulary:

# zero out the embeddings for padding and other special words if they are absent in vocab
for idx, item in enumerate([Constants.PAD_WORD, Constants.UNK_WORD, Constants.BOS_WORD, Constants.EOS_WORD]):
    emb[idx].zero_()

Is there any reason for doing so? Why not using random normal vectors?

Thanks.

@dasguptar
Copy link
Owner

Hi @Silenthinker

As far as I remember, when initialising the embeddings, I realised that the PAD_WORD needs to be zeroed out. At the time, I was unsure what to do with the other special words, and left them as zero-ed out to start with. I believe you can try initializing them normally, it should be fine.

Do let me know if you get a chance to try out random normal initialization!

@Silenthinker
Copy link
Author

Thanks for your reply. I'll try it out.

However, it seems unclear to me what the role of PAD_WORD is since I didn't find anywhere it is used for padding sentences. Did I miss it somewhere?

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants