Skip to content

Latest commit

 

History

History
48 lines (22 loc) · 1.63 KB

asr_tools.md

File metadata and controls

48 lines (22 loc) · 1.63 KB

The Tools of ASR

training or inference

feature

  • kaldifeat, Feature extraction compatible with Kaldi using PyTorch, supporting CUDA, batch processing, chunk processing, and autograd

  • kaldiio, kaldiio is an IO utility implemented in pure Python language for several file formats used in kaldi

  • sentencepiece, SentencePiece is an unsupervised text tokenizer and detokenizer

  • subword-nmt, preprocessing scripts to segment text into subword units

  • fastBPE, C++ implementation of Neural Machine Translation of Rare Words with Subword Units, with Python API

  • dictionaty, a pronunciation lexicon covering both English and Chinese words in a unified phoneset

  • python-pinyin, 汉字转为拼音

  • text processing, Inverse Text Normalization & Text Normalization

loss function

  • warp-ctc, PyTorch bindings for Warp-ctc