Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I took a look at the FastText Binary format.
It is not actually a word embedding format.
It is basically an entire serialized model, which needs to be executed to get word embeddings.
This code loads the format,
but to actually get word embeddings out of it,
would require building up the ngram/subword tables etc.
Then running the computations to calcuate the word embeddings.
The file basically has to be loaded in it's entirety.
Because you need to read out parts to get the the right part of the file.
It actually loads really fast as most of the data is in contiguous matrices
After it is fully loaded, when executing it to get the actual word embedding,
then it is possible to avoid doing the whole vocabulary.