NeuralSP: Neural network based Speech Processing

How to install

# Set path to CUDA, NCCL
CUDAROOT=/usr/local/cuda
NCCL_ROOT=/usr/local/nccl

export CPATH=$NCCL_ROOT/include:$CPATH
export LD_LIBRARY_PATH=$NCCL_ROOT/lib/:$CUDAROOT/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=$NCCL_ROOT/lib/:$LIBRARY_PATH
export CUDA_HOME=$CUDAROOT
export CUDA_PATH=$CUDAROOT
export CPATH=$CUDA_PATH/include:$CPATH  # for warp-rnnt

# Install miniconda, python libraries, and other tools
cd tools
make KALDI=/path/to/kaldi

Key features

Corpus

ASR
- AISHELL-1
- CSJ
- Librispeech
- Switchboard (+ Fisher)
- TEDLIUM2/TEDLIUM3
- TIMIT
- WSJ
LM
- Penn Tree Bank
- WikiText2

Front-end

Frame stacking
Sequence summary network [link]
SpecAugment [link]
Adaptive SpecAugment [link]

Encoder

RNN encoder
- (CNN-)BLSTM, (CNN-)LSTM, (CNN-)BLGRU, (CNN-)LGRU
- Latency-controlled BRNN [link]
- Random state passing (RSP) [link]
Transformer encoder [link]
- Chunk hopping mechanism [link]
- Relative positional encoding [link]
- Causal mask
Conformer encoder [link]
Time-depth separable (TDS) convolution encoder [link] [line]
Gated CNN encoder (GLU) [link]

Connectionist Temporal Classification (CTC) decoder

Beam search
Shallow fusion
Forced alignment

RNN-Transducer (RNN-T) decoder [link]

Beam search
Shallow fusion

Attention-based decoder

RNN decoder
- Shallow fusion
- Cold fusion [link]
- Deep fusion [link]
- Forward-backward attention decoding [link]
- Ensemble decoding
Attention type
- location-based
- content-based
- dot-product
- GMM attention
Streaming RNN decoder specific
- Hard monotonic attention [link]
- Monotonic chunkwise attention (MoChA) [link]
- Delay constrained training (DeCoT) [link]
- Minimum latency training (MinLT) [link]
- CTC-synchronous training (CTC-ST) [link]
Transformer decoder [link]
Streaming Transformer decoder specific
- Monotonic Multihead Attention [link] [link]

Language model (LM)

RNNLM (recurrent neural network language model)
Gated convolutional LM [link]
Transformer LM
Transformer-XL LM [link]
Adaptive softmax [link]

Output units

Phoneme
Grapheme
Wordpiece (BPE, sentencepiece)
Word
Word-char mix

Multi-task learning (MTL)

Multi-task learning (MTL) with different units are supported to alleviate data sparseness.

Hybrid CTC/attention [link]
Hierarchical Attention (e.g., word attention + character attention) [link]
Hierarchical CTC (e.g., word CTC + character CTC) [link]
Hierarchical CTC+Attention (e.g., word attention + character CTC) [link]
Forward-backward attention [link]
LM objective

ASR Performance

AISHELL-1 (CER)

Model	dev	test
Transformer	5.0	5.4
Conformer	4.7	5.2
Streaming MMA	5.5	6.1

CSJ (WER)

Model	eval1	eval2	eval3
BLSTM LAS	6.5	5.1	5.6
LC-BLSTM MoChA	7.4	5.6	6.4

Switchboard 300h (WER)

Model	SWB	CH
BLSTM LAS	9.1	18.8

Switchboard+Fisher 2000h (WER)

Model	SWB	CH
BLSTM LAS	7.8	13.8

Librispeech (WER)

Model	dev-clean	dev-other	test-clean	test-other
BLSTM LAS	2.5	7.2	2.6	7.5
BLSTM RNN-T	2.9	8.5	3.2	9.0
Transformer	2.1	5.3	2.4	5.7
UniLSTM RNN-T	3.7	11.7	4.0	11.6
UniLSTM MoChA	4.1	11.0	4.2	11.2
LC-BLSTM RNN-T	3.3	9.8	3.5	10.2
LC-BLSTM MoChA	3.3	8.8	3.5	9.1
Streaming MMA	2.5	6.9	2.7	7.1

TEDLIUM2 (WER)

Model	dev	test
BLSTM LAS	8.1	7.5
LC-BLSTM RNN-T	8.9	8.5
LC-BLSTM MoChA	10.6	8.6
UniLSTM RNN-T	11.6	11.7
UniLSTM MoChA	13.6	11.6

WSJ (WER)

Model	test_dev93	test_eval92
BLSTM LAS	8.8	6.2

LM Performance

Penn Tree Bank (PPL)

Model	valid	test
RNNLM	87.99	86.06
+ cache=100	79.58	79.12
+ cache=500	77.36	76.94

WikiText2 (PPL)

Model	valid	test
RNNLM	104.53	98.73
+ cache=100	90.86	85.87
+ cache=2000	76.10	72.77

Name		Name	Last commit message	Last commit date
Latest commit History 3,036 Commits
examples		examples
neural_sp		neural_sp
test		test
tools		tools
utils		utils
.coveragerc		.coveragerc
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeuralSP: Neural network based Speech Processing

How to install

Key features

Corpus

Front-end

Encoder

Connectionist Temporal Classification (CTC) decoder

RNN-Transducer (RNN-T) decoder [link]

Attention-based decoder

Language model (LM)

Output units

Multi-task learning (MTL)

ASR Performance

AISHELL-1 (CER)

CSJ (WER)

Switchboard 300h (WER)

Switchboard+Fisher 2000h (WER)

Librispeech (WER)

TEDLIUM2 (WER)

WSJ (WER)

LM Performance

Penn Tree Bank (PPL)

WikiText2 (PPL)

Reference

Dependency

About

Releases

Packages

Languages

License

jinggaizi/neural_sp

Folders and files

Latest commit

History

Repository files navigation

NeuralSP: Neural network based Speech Processing

How to install

Key features

Corpus

Front-end

Encoder

Connectionist Temporal Classification (CTC) decoder

RNN-Transducer (RNN-T) decoder [link]

Attention-based decoder

Language model (LM)

Output units

Multi-task learning (MTL)

ASR Performance

AISHELL-1 (CER)

CSJ (WER)

Switchboard 300h (WER)

Switchboard+Fisher 2000h (WER)

Librispeech (WER)

TEDLIUM2 (WER)

WSJ (WER)

LM Performance

Penn Tree Bank (PPL)

WikiText2 (PPL)

Reference

Dependency

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages