CrossBilingualEmbeddings

A NLP team project for finding the cross bilingual embeddings (dictionary) for English and Hindi.

Hinton[4] gave a hint towards learning distributed representations for symbolic input which was explored in Bengio[5] paper to create neural network based probabilistic language model, this distributed representation led to word2vec[6]. These developments led to usage of vectors as representations for textual data. Word2vec, phrase2vec, doc2vec were all the resultant tools. In simple form word2vec model gives vector representations of words depending on their context in data (on which model is trained). Now this model creates a sort of complex multidimensional space for all vectors (words) of particular language.

These are said to be feature embeddings of words in specific language. Now if we have some sort of vector space in which there are vector representations of two different languages. Than that is said to cross bilingual embeddings. One thing to note is that these embedding are not found directly by say training word2vec model on data containing data from both languages. Bilingual embeddings mean that feature vectors of words for both languages will exist in a certain meaningful way.

Examples for bilingual embeddings : Words like run and दौड़ना in bilingual embedding setting will have vector representations which will be very closed or these two vectors can be mapped to each other through some transformation function. Word like book can have vector in some place in between vector points of किताब and दर्ज in this bilingual settings.

References

[1] GloVe: Global Vectors for Word Representation -- Jeffrey Pennington, Richard Socher, Christopher D. Manning http://nlp.stanford.edu/projects/glove/

[2] Bilingual Word Embeddings from Parallel and Non-parallel Corpora for Cross-Language Text Classification -- Aditya Mogadala, Achim Rettinger https://www.aclweb.org/anthology/N/N16/N16-1083.pdf

[3] Bilingual Word Embeddings for Phrase-Based Machine Translation -- Will Y. Zou† , Richard Socher, Daniel Cer, Christopher D. Manning http://ai.stanford.edu/~wzou/emnlp2013_ZouSocherCerManning.pdf

[4] Hinton, Geoffrey E. "Learning distributed representations of concepts." Proceedings of the eighth annual conference of the cognitive science society. Vol. 1. 1986. http://www.cogsci.ucsd.edu/~ajyu/Teaching/Cogs202_sp13/Readings/hinton86.pdf

[5] Bengio, Yoshua, et al. "A neural probabilistic language model." journal of machine learning research 3.Feb (2003): 1137-1155. http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf

[6] Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013). https://arxiv.org/abs/1301.3781 [7] Dyer, Chris, et al. "cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models." Proceedings of the ACL 2010 System Demonstrations. Association for Computational Linguistics, 2010. http://cs.jhu.edu/~jonny/pub/P10-4002.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
Corpus/Hindi_English		Corpus/Hindi_English
nn		nn
Cross Bilingual Embeddings.pptx		Cross Bilingual Embeddings.pptx
CrossBilingualEmbeddings.docx		CrossBilingualEmbeddings.docx
HindiTokenizer.py		HindiTokenizer.py
README.md		README.md
_config.yml		_config.yml
align_sentences.py		align_sentences.py
bootstrap.py		bootstrap.py
common.py		common.py
eng_vectors.txt		eng_vectors.txt
find_closest_embedding.py		find_closest_embedding.py
hin_vectors.txt		hin_vectors.txt
mongo.py		mongo.py
mongo_new.py		mongo_new.py
ranking.py		ranking.py
stopwords.txt		stopwords.txt
translate_word.py		translate_word.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrossBilingualEmbeddings

References

About

Releases

Packages

Contributors 2

Languages

dhakrasp/CrossBilingualEmbeddings

Folders and files

Latest commit

History

Repository files navigation

CrossBilingualEmbeddings

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages