Skip to content

Commit

Permalink
Add greedy search and modified_beam_search for non-streaming ASR (#95)
Browse files Browse the repository at this point in the history
* WIP: Add greedy search for non-streaming ASR

* Use kaldi_native_io to read wave samples

* First working version of greedy search for offline ASR.

* Refactor the code

* Support decoding from wav.scp and feats.scp

* Update CI

* Add doc for offline ASR with sherpa's C++ frontend.

* remove sentencepiece

* Fix style issues

* support conda install

* build for windows

* Add FAQs for installation

* Add version info

* Fix building on macOS

* Add more doc

* Typo fixes
  • Loading branch information
csukuangfj authored Aug 21, 2022
1 parent e5e2590 commit 8a0a0e7
Show file tree
Hide file tree
Showing 96 changed files with 3,096 additions and 98 deletions.
1 change: 1 addition & 0 deletions .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ per-file-ignores =
exclude =
.git,
./cmake,
./scripts,
./triton,
./sherpa/python/sherpa/__init__.py,
./sherpa/python/sherpa/decode.py,
39 changes: 39 additions & 0 deletions .github/scripts/generate_feats_scp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/usr/bin/env python3
"""
Usage:
./generate_feats_scp.py scp:wav.scp ark,scp:feats.ark,feats.scp
It generates `feats.ark` and `feats.scp` from `wav.scp`.
Different from Kaldi's `compute-fbank-feats`, this scripts uses
normalized samples in the range [-1, 1] to compute features.
"""

import sys

import kaldi_native_io as kio
import kaldifeat
import torch


def main():
rspecifier = sys.argv[1]
wspecifier = sys.argv[2]

opts = kaldifeat.FbankOptions()
opts.frame_opts.dither = 0
opts.mel_opts.num_bins = 80
fbank = kaldifeat.Fbank(opts)

with kio.SequentialWaveReader(rspecifier) as ki:
with kio.FloatMatrixWriter(wspecifier) as ko:
for key, value in ki:
tensor = torch.from_numpy(value.data.numpy()).clone().squeeze(0)
tensor = tensor / 32768
features = fbank(tensor)
ko.write(key, features.numpy())


if __name__ == "__main__":
main()
6 changes: 6 additions & 0 deletions .github/scripts/generate_wav_scp.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env bash
cat > wav.scp <<EOF
wav1 icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav
wav2 icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav
wav3 icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav
EOF
6 changes: 6 additions & 0 deletions .github/scripts/generate_wav_scp_aishell.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env bash
cat > wav_aishell.scp <<EOF
wav1 icefall-aishell-pruned-transducer-stateless3-2022-06-20/test_wavs/BAC009S0764W0121.wav
wav2 icefall-aishell-pruned-transducer-stateless3-2022-06-20/test_wavs/BAC009S0764W0122.wav
wav3 icefall-aishell-pruned-transducer-stateless3-2022-06-20/test_wavs/BAC009S0764W0123.wav
EOF
133 changes: 133 additions & 0 deletions .github/workflows/build-conda-cpu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
name: build_conda_cpu

on:
push:
tags:
- '*'

jobs:
generate_build_matrix:
# see https://github.com/pytorch/pytorch/pull/50633
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Generating build matrix
id: set-matrix
run: |
# outputting for debugging purposes
python scripts/github_actions/generate_build_matrix.py
MATRIX=$(python scripts/github_actions/generate_build_matrix.py)
echo "::set-output name=matrix::${MATRIX}"
build_conda_cpu:
needs: generate_build_matrix
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
${{ fromJson(needs.generate_build_matrix.outputs.matrix) }}

steps:
# refer to https://github.com/actions/checkout
- uses: actions/checkout@v2
with:
fetch-depth: 0

- uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
activate-environment: sherpa

- name: Install conda dependencies
shell: bash -l {0}
run: |
conda install -y -q anaconda-client
conda install -y -q conda-build
conda install -y -q -c k2-fsa -c kaldifeat -c kaldi_native_io -c pytorch k2 kaldifeat kaldi_native_io pytorch=${{ matrix.torch }} cpuonly
- name: Display MKL
if: startsWith(matrix.os, 'macos') || startsWith(matrix.os, 'ubuntu')
shell: bash -l {0}
run: |
ls -lh $CONDA_PREFIX/lib/libmkl*
- name: Display Python version
shell: bash -l {0}
run: |
python -c "import sys; print(sys.version)"
which python
- name: Display conda info
shell: bash -l {0}
run: |
conda env list
conda info
which conda
python --version
which python
python -m torch.utils.collect_env
- name: Build sherpa
if: startsWith(matrix.os, 'ubuntu') || startsWith(matrix.os, 'macos')
shell: bash -l {0}
env:
SHERPA_PYTHON_VERSION: ${{ matrix.python-version}}
SHERPA_TORCH_VERSION: ${{ matrix.torch }}
SHERPA_CONDA_TOKEN: ${{ secrets.SHERPA_CONDA_TOKEN}}
run: |
./scripts/build_conda_cpu.sh
- name: Build sherpa
if: startsWith(matrix.os, 'windows')
shell: bash -l {0}
env:
SHERPA_PYTHON_VERSION: ${{ matrix.python-version}}
SHERPA_TORCH_VERSION: ${{ matrix.torch }}
SHERPA_CONDA_TOKEN: ${{ secrets.SHERPA_CONDA_TOKEN}}
run: |
# ./scripts/build_conda_cpu_windows.sh
./scripts/build_conda_cpu.sh
- name: Display generated files
if: startsWith(matrix.os, 'ubuntu')
run: |
ls -lh /usr/share/miniconda/envs/sherpa/conda-bld/linux-64
- name: Upload generated files
if: startsWith(matrix.os, 'ubuntu')
uses: actions/upload-artifact@v2
with:
name: cpu-torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-${{ matrix.os }}
path: /usr/share/miniconda/envs/sherpa/conda-bld/linux-64/*.tar.bz2

- name: Display generated files
if: startsWith(matrix.os, 'windows')
shell: bash -l {0}
run: |
ls -lh /c/Miniconda/envs/sherpa/conda-bld
ls -lh /c/Miniconda/envs/sherpa/conda-bld/*/*
ls -lh /c/Miniconda/envs/sherpa/conda-bld/win-64/*
- name: Upload generated files
if: startsWith(matrix.os, 'windows')
uses: actions/upload-artifact@v2
with:
name: cpu-torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-${{ matrix.os }}
path: c:/Miniconda/envs/sherpa/conda-bld/win-64/*.tar.bz2

- name: Display generated files
if: startsWith(matrix.os, 'macos')
run: |
ls -lh /usr/local/miniconda/envs/sherpa/conda-bld/osx-64
- name: Upload generated files
if: startsWith(matrix.os, 'macos')
uses: actions/upload-artifact@v2
with:
name: cpu-torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-${{ matrix.os }}
path: /usr/local/miniconda/envs/sherpa/conda-bld/osx-64/*.tar.bz2
2 changes: 1 addition & 1 deletion .github/workflows/build-doc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ jobs:
- name: Install PyTorch ${{ matrix.torch }}
shell: bash
run: |
python3 -m pip install -qq --upgrade pip
python3 -m pip install -qq --upgrade pip kaldi_native_io
python3 -m pip install -qq wheel twine typing_extensions websockets sentencepiece>=0.1.96 soundfile
python3 -m pip install -qq torch==${{ matrix.torch }}+cpu numpy -f https://download.pytorch.org/whl/torch_stable.html
Expand Down
Loading

0 comments on commit 8a0a0e7

Please sign in to comment.