Bengali.AI Speech Recognition - 5th Place Solution

Solution of the 5th of the Kaggle Bengali.AI speech recognition challenge

Download the following datasets:

CTC model

Competition training data
https://www.kaggle.com/competitions/bengaliai-speech/data

!unzip bengali-speech.zip
!mv bengali-speech/* data/

Competition meta data for training data
https://www.kaggle.com/datasets/imtiazprio/bengaliai-speech-train-nisqa

Download and place it in data/ as well.

Language model

IndicCorp v2
MIT License (https://github.com/AI4Bharat/IndicBERT/blob/main/LICENSE)
https://objectstore.e2enetworks.net/ai4b-public-nlu-nlg/indic-corp-frozen-for-the-paper-oct-2022/bn.txt

!mv bn.txt language_model/base_files/

IndicCorp processed & tokenized (https://github.com/Open-Speech-EkStep/vakyansh-models#punctuation-models):
MIT License (same as above) (https://github.com/Open-Speech-EkStep/vakyansh-models/blob/main/LICENSE)
https://storage.googleapis.com/vakyansh-open-models/language_model_text/bengali.zip

!unzip bengali.zip
!mv bengali/* language_model/base_files/

OpenSLR 53:
Apache License 2.0 (https://github.com/danpovey/openslr/blob/master/LICENSE)
https://us.openslr.org/resources/53/utt_spk_text.tsv

!mv utt.spk_text.tsv language_model/base_files/

DL Sprint competition data:
https://www.kaggle.com/competitions/dlsprint/data

!unzip dl-sprint.zip
!mv dl-sprint/train.csv dl-sprint/train_dl_sprint.csv
!mv dl-sprint/train_dl_sprint.csv language_model/base_files/

CTC model training:

Stage 1 training

Run preprocessing\filtering_v1_mos.ipynb

This notebook will filter the training data based on the mos scores calculated by the competition hosts and create train_21.csv and val_21.csv in the folder data/.

After run experiments\train_w2w_baseline_v7_v5_v3_v2.ipynb

This notebook will do stage 1 training. The model will be used to pseudo label the data and calculate wer scores in the next step.

Stage 2 training

Now run filtering_v2_wer.ipynb

It will calculate wer scores based on the previous model and filter the dataset for lower wer scores. This enhances the quality of the training data.

Now the final models can be trained:

IndicWav2Vec backbone:
train_w2w_baseline_v35.ipynb
This model will be trained for 210 steps.

1b backbone:
train_w2w_baseline_v32.ipynb
This model will be trained for 130k steps (longer training will give better results).

Ensemble training

Now the ensemble model can be trained:
train_w2w_baseline_v34_ensemble.ipynb

Use the 6k training step checkpoint.

Language model training:

Run language_model/language_model_current_v12.ipynb
Copy the unigram from the lms/new_model_arpa to lms/new_model_bin_mixed after creating the binary file.

Punctuation model:

Punctuation model checkpoint was taken from:
https://github.com/xashru/punctuation-restoration
The checkpoint can be found here:
https://drive.google.com/file/d/1X2udyT1XYrmCNvWtFpT_6jrWsQejGCBW/view?usp=sharing
The inference function was slightly modified to ensure that the last predicted sign is | or ?. For details see the linked inference notebook below.

Inference:

Inference notebook is found here:
https://www.kaggle.com/code/benbla/5th-place-solution

Notes

Training and val loss or WER scores may differ in earlier epochs because seed_everything was not set in the original version. However, the differences are negligible after a few thousand training steps.
In the original version WandB was used to track the experiments. The flag was set to False in this repository. By commenting out the following line, tracking can be reactivated:
os.environ["WANDB_DISABLED"] = "true"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bengali.AI Speech Recognition - 5th Place Solution

Download the following datasets:

CTC model

Language model

CTC model training:

Stage 1 training

Stage 2 training

Ensemble training

Language model training:

Punctuation model:

Inference:

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
experiments		experiments
language_model		language_model
preprocessing		preprocessing
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Bengali.AI Speech Recognition - 5th Place Solution

Download the following datasets:

CTC model

Language model

CTC model training:

Stage 1 training

Stage 2 training

Ensemble training

Language model training:

Punctuation model:

Inference:

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages