[WIP] FastText binary file #1

oxinabox · 2018-06-08T05:26:34Z

I took a look at the FastText Binary format.
It is not actually a word embedding format.
It is basically an entire serialized model, which needs to be executed to get word embeddings.

This code loads the format,
but to actually get word embeddings out of it,
would require building up the ngram/subword tables etc.
Then running the computations to calcuate the word embeddings.

The file basically has to be loaded in it's entirety.
Because you need to read out parts to get the the right part of the file.
It actually loads really fast as most of the data is in contiguous matrices

After it is fully loaded, when executing it to get the actual word embedding,
then it is possible to avoid doing the whole vocabulary.

This was referenced Sep 18, 2018

ConceptNetNumberbatch word embeddings support #14

Open

A more featureful API #16

Open

oxinabox added 2 commits October 9, 2018 15:21

=prootype for loading the binary file

66823b0

=not much

c576fa9

oxinabox force-pushed the ox/fasttextbinary branch from 3bd0ab2 to c576fa9 Compare October 9, 2018 07:23

=test it

b7cb5b3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] FastText binary file #1

[WIP] FastText binary file #1

oxinabox commented Jun 8, 2018 •

edited

Loading

[WIP] FastText binary file #1

Are you sure you want to change the base?

[WIP] FastText binary file #1

Conversation

oxinabox commented Jun 8, 2018 • edited Loading

oxinabox commented Jun 8, 2018 •

edited

Loading