chakin is a downloader for pre-trained word vectors. Supported many vectors
This library lets you download pre-trained word vectors without troublesome work.
To install chakin, simply:
$ pip install chakin
You can download pre-trained word vectors as follows:
$ python
>>> import chakin
>>> chakin.search(lang='English')
Name Dimension Corpus VocabularySize
2 fastText(en) 300 Wikipedia 2.5M
11 GloVe.6B.50d 50 Wikipedia+Gigaword 5 (6B) 400K
12 GloVe.6B.100d 100 Wikipedia+Gigaword 5 (6B) 400K
13 GloVe.6B.200d 200 Wikipedia+Gigaword 5 (6B) 400K
14 GloVe.6B.300d 300 Wikipedia+Gigaword 5 (6B) 400K
15 GloVe.42B.300d 300 Common Crawl(42B) 1.9M
16 GloVe.840B.300d 300 Common Crawl(840B) 2.2M
17 GloVe.Twitter.25d 25 Twitter(27B) 1.2M
18 GloVe.Twitter.50d 50 Twitter(27B) 1.2M
19 GloVe.Twitter.100d 100 Twitter(27B) 1.2M
20 GloVe.Twitter.200d 200 Twitter(27B) 1.2M
21 word2vec.GoogleNews 300 Google News(100B) 3.0M
>>> chakin.download(number=2, save_dir='./') # select fastText(en)
Test: 100% || | Time: 0:00:02 60.7 MiB/s
'./wiki.en.vec'
So far, chakin supports following word vectors:
Name | Dimension | Corpus | VocabularySize | Method | Language |
---|---|---|---|---|---|
fastText(ar) | 300 | Wikipedia | 610K | fastText | Arabic |
fastText(de) | 300 | Wikipedia | 2.3M | fastText | German |
fastText(en) | 300 | Wikipedia | 2.5M | fastText | English |
fastText(es) | 300 | Wikipedia | 985K | fastText | Spanish |
fastText(fr) | 300 | Wikipedia | 1.2M | fastText | French |
fastText(it) | 300 | Wikipedia | 871K | fastText | Italian |
fastText(ja) | 300 | Wikipedia | 580K | fastText | Japanese |
fastText(ko) | 300 | Wikipedia | 880K | fastText | Korean |
fastText(pt) | 300 | Wikipedia | 592K | fastText | Portuguese |
fastText(ru) | 300 | Wikipedia | 1.9M | fastText | Russian |
fastText(zh) | 300 | Wikipedia | 330K | fastText | Chinese |
GloVe.6B.50d | 50 | Wikipedia+Gigaword 5 (6B) | 400K | GloVe | English |
GloVe.6B.100d | 100 | Wikipedia+Gigaword 5 (6B) | 400K | GloVe | English |
GloVe.6B.200d | 200 | Wikipedia+Gigaword 5 (6B) | 400K | GloVe | English |
GloVe.6B.300d | 300 | Wikipedia+Gigaword 5 (6B) | 400K | GloVe | English |
GloVe.42B.300d | 300 | Common Crawl(42B) | 1.9M | GloVe | English |
GloVe.840B.300d | 300 | Common Crawl(840B) | 2.2M | GloVe | English |
GloVe.Twitter.25d | 25 | Twitter(27B) | 1.2M | GloVe | English |
GloVe.Twitter.50d | 50 | Twitter(27B) | 1.2M | GloVe | English |
GloVe.Twitter.100d | 100 | Twitter(27B) | 1.2M | GloVe | English |
GloVe.Twitter.200d | 200 | Twitter(27B) | 1.2M | GloVe | English |
word2vec.GoogleNews | 300 | Google News(100B) | 3.0M | word2vec | English |
word2vec.Wiki-NEologd.50d | 50 | Wikipedia | 335K | word2vec + NEologd | Japanese |