-
kaldifeat, Feature extraction compatible with Kaldi using PyTorch, supporting CUDA, batch processing, chunk processing, and autograd
-
kaldiio, kaldiio is an IO utility implemented in pure Python language for several file formats used in kaldi
-
sentencepiece, SentencePiece is an unsupervised text tokenizer and detokenizer
-
subword-nmt, preprocessing scripts to segment text into subword units
-
fastBPE, C++ implementation of Neural Machine Translation of Rare Words with Subword Units, with Python API
-
dictionaty, a pronunciation lexicon covering both English and Chinese words in a unified phoneset
-
python-pinyin, 汉字转为拼音
-
text processing, Inverse Text Normalization & Text Normalization
- warp-ctc, PyTorch bindings for Warp-ctc