Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suitable for both Eastern and Western Armenian? #1

Open
AngledLuffa opened this issue May 13, 2021 · 0 comments
Open

Suitable for both Eastern and Western Armenian? #1

AngledLuffa opened this issue May 13, 2021 · 0 comments

Comments

@AngledLuffa
Copy link

I'm working on preparing models for the next release of Stanford's Stanza python software, https://stanfordnlp.github.io/stanza

The next release of Universal Dependencies distinguishes between Eastern and Western Armenian. Is it suitable to use the word vectors you host for both dialects?

https://github.com/UniversalDependencies/UD_Armenian-ArmTDP/
https://github.com/UniversalDependencies/UD_Western_Armenian-ArmTDP/

If I count the words from the UD datasets that are present in the Glove 200 file, for example, 97% of the words in the Eastern Armenian dataset appear here, and only 88% of the words in the Western Armenian dataset appear in the word vectors. This makes me think there's a bit of an issue with the coverage of these word vectors. If these vectors are not ideal, do you have any recommendations for others that would be, or do you have any intention of adding more Western Armenian words to the dataset?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant