Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glove Word Vectors #3

Closed
4 tasks
KarahanS opened this issue Mar 29, 2023 · 3 comments
Closed
4 tasks

Glove Word Vectors #3

KarahanS opened this issue Mar 29, 2023 · 3 comments

Comments

@KarahanS
Copy link
Contributor

KarahanS commented Mar 29, 2023

The feature, motivation and pitch

As our third algorithm, we have to train word vectors using GloVe. Official implementation can be found here.

We got terrible results after the training - and the iterations take a surprisingly short amount of time. Preprocessing seems ok as it displays the correct statistics in terms of the number of unique words in the vocabulary.
Possible solutions?

  • Is there a problem with how we feed our corpus to the script?
  • Can the problem be related to the remote server? Should we try another computer?
    • Maybe I can try running the script on Mac.
    • Maybe we can try running it on our Windows machines. It may not be straightforward as it is right now but there is a Windows adaptation: https://github.com/hfxunlp/GloVe-win
  • Should we ask? Uras hoca and others had trained Glove for Turkish word vectors - https://github.com/inzva/Turkish-GloVe. Maybe we can ask for some help?
  • Using pre-trained word embeddings is always an option but ugly.

Alternatives

No response

Additional context

No response

@KarahanS
Copy link
Contributor Author

stanfordnlp/GloVe#210

@KarahanS
Copy link
Contributor Author

KarahanS commented Apr 1, 2023

Skimmed through the paper and these are the parameters suggested for word vectors with 300 dimensions:

  • Iteration count: 100
  • Alpha: 0.75
  • $x_{max}$ = 100
  • Window size = 10 (we should set it to 5 in order to be able to compare it with word2vec and fasttext)

@CahidArda
Copy link
Contributor

Glove embeddings were added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants