BoNLTK aims to provide out of the box support for various NLP tasks that an application developer might need for Boyig (TIbetan) language.
pip install bonltk
Comming soon
-
Tokenizers:
- Hugging face tokenizers
- sentencepiece tokenizer
- Compare above tokenizers with botok
-
WordVectors:
- Word2Vec with gensim
- Emlo
-
Language Models:
- Huggingface transformers
- UMLFit Language model with fastai
-
Text Similarity:
- Sentence similarity using UMLFit, like in inltk
- Implement Text similarity techniques mention in [here]((https://medium.com/@adriensieg/text-similarities-da019229c894)
- Compare all the text similarity algorithms