Code used for the paper "Resource-Efficient Speech Quality Prediction through Quantization Aware Training and Binary Activation Maps".
- Install conda env:
conda env create -f env.yaml
- Activate environment:
conda activate sqp_experiments
- Install current project:
pip install -e .
You can generate a training dataset with the following steps:
- Audio data: run the download and single-process data generation scripts from the Interspeech 2020 DNS Challenge; in
noisyspeech_synthesizer.cfg
, settotal_hours: 50
to match the paper and adjust the destination paths as needed - EGS files: generate the JSON file lists (EGS) using the
denoiser/audio.py
script from the DEMUCS Denoiser repo (check link for instructions) - Labels: compute speech quality labels using the script included here, e.g.:
python compute_labels.py /path/to/egs_dir /path/to/output
(the default settings match the paper; run with--help
for more additional info) - Repeat #3 for test set labels
- Edit
config/dataset/*.yaml
with the correct paths for audio data and labels
Train baseline model with default hyperparameters:
python train.py
Train BAM model with β = 5 for 50 epochs:
python train.py model=dnsmos_binary model.conv_activation_param=5 epochs=50
For a list of possible arguments and configurations, run:
python train.py --help
Perform post-training quantization on trained models and comparatively evaluate on validation or test data:
python quantize.py valid
python quantize.py test