kaldi-ctc

Connectionist Temporal Classification (CTC) Automatic Speech Recognition. Training and Decoding are extremely fast.

Intoduction

kaldi-ctc is based on kaldi, warp-ctc and cudnn.

Components	Role
kaldi	Parent body, data prepare / build decoding WFST
warp-ctc	Fast parallel implementation of CTC
cudnn(=5.x)	Fast recurrent neural networks(LSTM,GRU,ReLU,Tanh)

Compilation

# install dependents
cd tools
make -j
make openblas
# Install cudnn, reference script `extras/install_cudnn.sh`
bash extras/install_cudnn.sh  # just download cudnn, copy include/lib[64] dirs to system's CUDA path yourself.

cd ../src
# change `YOUR_CUDNN_ROOT`
./configure --cudnn-root=YOUR_CUDNN_ROOT --openblas-root=../tools/OpenBLAS/install
make depend -j
make -j

Example scripts

Make sure the GPU's memory is enough, default setting can run on GTX TITAN X/1080( >= 8G).
Using smaller minibatch_size(default 16) / max_allow_frames(default 2000) or bigger frame_subsampling_factor(default 1) if your GPUs are older.

librispeech

CTC-monophone

cd egs/librispeech/ctc
bash run.sh --stage -2 --num-gpus 4(change to your GPU devices amount)

Edit Distance Accuracy

steps/ctc/report/generate_plots.py exp/ctc/cudnn_google_fs3 reports/ctc-google

WER RESULITS (LM tgsmall)

Models	Real Time Factor(RTF)	test_clean	dev_clean	test_other	dev_other
chain		6.20	5.83	14.73	14.56
CTC-monophone	(0.05 ~ 0.06) / `frame_subsampling_factor`	8.63	9.02	20.75	22.16
CTC-character

There are many Out Of Vocabularies(OOVs) in training transcriptions now

awk 'FNR==NR{T[$1]=1;} FNR<NR{for(i=2;i<=NF;i++) {if (!($i in T)) print $i;}}' data/lang_nosp/words.txt   data/train_960/text | sort -u | wc -l
14291

CTC system gets better results than chain system on a larger corpu.

TODO

Cleanup librispeech corpus(Fix OOVs), Fine tune parameters

CTC-character example script

FLAT START TRAINING CTC-RNN ACOUSTIC MODELS, CTC-triphone

google - FLAT START TRAINING OF CD-CTC-SMBR LSTM RNN ACOUSTIC MODELS

Name	Name	Last commit message	Last commit date
Latest commit lifeiteng fix cudnn default config Mar 6, 2018 e964312 · Mar 6, 2018 History 6,808 Commits
egs	egs	remove --add-phone-loop configure	Nov 12, 2016
misc	misc	Fix to the reorder_addlibs.sh script (was not handling library names …	Jul 16, 2016
src	src	fix cudnn default config	Mar 6, 2018
tools	tools	update warp-ctc definition changes	Jan 29, 2017
windows	windows	Added a patch for pthreads and updated the INSTALL.md for windows	May 27, 2016
.gitattributes	.gitattributes	Don't mangle patch file line endings in all directories	Jan 11, 2016
.gitignore	.gitignore	Adding the underlying scripts for my refactored version of Vimal's da…	Jul 28, 2016
.travis.yml	.travis.yml	Travis CI: Do not run make test if nothing changed in src/	Aug 27, 2015
COPYING	COPYING	trunk: small fix to get_lda_block.sh (thanks: mozno@users.sf.net) plu…	Jul 14, 2015
INSTALL	INSTALL	Merge branch 'master' into sandbox-oplatek	Jun 18, 2013
Kaldi-README.md	Kaldi-README.md	Update README-Kaldi.md	Oct 14, 2016
README.md	README.md	Update README.md	Sep 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kaldi-ctc

Intoduction

Compilation

Example scripts

librispeech

CTC-monophone

Edit Distance Accuracy

WER RESULITS (LM tgsmall)

TODO

Cleanup librispeech corpus(Fix OOVs), Fine tune parameters

CTC-character example script

FLAT START TRAINING CTC-RNN ACOUSTIC MODELS, CTC-triphone

About

Releases

Packages

Languages

License

lingochamp/kaldi-ctc

Folders and files

Latest commit

History

Repository files navigation

kaldi-ctc

Intoduction

Compilation

Example scripts

librispeech

CTC-monophone

Edit Distance Accuracy

WER RESULITS (LM tgsmall)

TODO

Cleanup librispeech corpus(Fix OOVs), Fine tune parameters

CTC-character example script

FLAT START TRAINING CTC-RNN ACOUSTIC MODELS, CTC-triphone

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages