Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebAssembly exmaple for speaker diarization #1411

Merged
merged 2 commits into from
Oct 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/wasm-simd-hf-space-de-tts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/wasm-simd-hf-space-en-asr-zipformer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/wasm-simd-hf-space-en-tts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/wasm-simd-hf-space-silero-vad.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
167 changes: 167 additions & 0 deletions .github/workflows/wasm-simd-hf-space-speaker-diarization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
name: wasm-simd-hf-space-speaker-diarization

on:
push:
branches:
- wasm
- wasm-speaker-diarization
tags:
- 'v[0-9]+.[0-9]+.[0-9]+*'

workflow_dispatch:

concurrency:
group: wasm-simd-hf-space-speaker-diarization-${{ github.ref }}
cancel-in-progress: true

jobs:
wasm-simd-hf-space-speaker-diarization:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
shell: bash
run: |
emcc -v
echo "--------------------"
emcc --check

- name: Download model files
shell: bash
run: |
cd wasm/speaker-diarization/assets/
ls -lh
echo "----------"

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker-segmentation-models/sherpa-onnx-pyannote-segmentation-3-0.tar.bz2
tar xvf sherpa-onnx-pyannote-segmentation-3-0.tar.bz2
rm sherpa-onnx-pyannote-segmentation-3-0.tar.bz2
mv sherpa-onnx-pyannote-segmentation-3-0/model.onnx ./segmentation.onnx
rm -rf sherpa-onnx-pyannote-segmentation-3-0

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/speaker-recongition-models/3dspeaker_speech_eres2net_base_sv_zh-cn_3dspeaker_16k.onnx
mv 3dspeaker_speech_eres2net_base_sv_zh-cn_3dspeaker_16k.onnx ./embedding.onnx

echo "----------"

ls -lh

- name: Build sherpa-onnx for WebAssembly
shell: bash
run: |
./build-wasm-simd-speaker-diarization.sh

- name: collect files
shell: bash
run: |
SHERPA_ONNX_VERSION=v$(grep "SHERPA_ONNX_VERSION" ./CMakeLists.txt | cut -d " " -f 2 | cut -d '"' -f 2)

dst=sherpa-onnx-wasm-simd-${SHERPA_ONNX_VERSION}-speaker-diarization
mv build-wasm-simd-speaker-diarization/install/bin/wasm/speaker-diarization $dst
ls -lh $dst
tar cjfv $dst.tar.bz2 ./$dst

- name: Upload wasm files
uses: actions/upload-artifact@v4
with:
name: sherpa-onnx-wasm-simd-speaker-diarization
path: ./sherpa-onnx-wasm-simd-*.tar.bz2

- name: Release
if: (github.repository_owner == 'csukuangfj' || github.repository_owner == 'k2-fsa') && github.event_name == 'push' && contains(github.ref, 'refs/tags/')
uses: svenstaro/upload-release-action@v2
with:
file_glob: true
overwrite: true
file: ./*.tar.bz2

- name: Publish to ModelScope
# if: false
env:
MS_TOKEN: ${{ secrets.MODEL_SCOPE_GIT_TOKEN }}
uses: nick-fields/retry@v2
with:
max_attempts: 20
timeout_seconds: 200
shell: bash
command: |
SHERPA_ONNX_VERSION=v$(grep "SHERPA_ONNX_VERSION" ./CMakeLists.txt | cut -d " " -f 2 | cut -d '"' -f 2)

git config --global user.email "[email protected]"
git config --global user.name "Fangjun Kuang"

rm -rf ms
export GIT_LFS_SKIP_SMUDGE=1
export GIT_CLONE_PROTECTION_ACTIVE=false

git clone https://www.modelscope.cn/studios/csukuangfj/web-assembly-speaker-diarization-sherpa-onnx.git ms
cd ms
rm -fv *.js
rm -fv *.data
git fetch
git pull
git merge -m "merge remote" --ff origin main

cp -v ../sherpa-onnx-wasm-simd-${SHERPA_ONNX_VERSION}-*/* .

git status
git lfs track "*.data"
git lfs track "*.wasm"
ls -lh

git add .
git commit -m "update model"
git push https://oauth2:${MS_TOKEN}@www.modelscope.cn/studios/csukuangfj/web-assembly-speaker-diarization-sherpa-onnx.git

- name: Publish to huggingface
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
uses: nick-fields/retry@v2
with:
max_attempts: 20
timeout_seconds: 200
shell: bash
command: |
SHERPA_ONNX_VERSION=v$(grep "SHERPA_ONNX_VERSION" ./CMakeLists.txt | cut -d " " -f 2 | cut -d '"' -f 2)

git config --global user.email "[email protected]"
git config --global user.name "Fangjun Kuang"

rm -rf huggingface
export GIT_LFS_SKIP_SMUDGE=1
export GIT_CLONE_PROTECTION_ACTIVE=false

git clone https://csukuangfj:[email protected]/spaces/k2-fsa/web-assembly-speaker-diarization-sherpa-onnx huggingface
ls -lh

cd huggingface
rm -fv *.js
rm -fv *.data
git fetch
git pull
git merge -m "merge remote" --ff origin main

cp -v ../sherpa-onnx-wasm-simd-${SHERPA_ONNX_VERSION}-*/* .

git status
git lfs track "*.data"
git lfs track "*.wasm"
ls -lh

git add .
git commit -m "update model"
git push https://csukuangfj:[email protected]/spaces/k2-fsa/web-assembly-speaker-diarization-sherpa-onnx main
2 changes: 1 addition & 1 deletion .github/workflows/wasm-simd-hf-space-vad-asr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Install emsdk
uses: mymindstorm/setup-emsdk@v14
with:
version: 3.1.51
version: 3.1.53
actions-cache-folder: 'emsdk-cache'

- name: View emsdk version
Expand Down
14 changes: 13 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ option(SHERPA_ONNX_ENABLE_WEBSOCKET "Whether to build webscoket server/client" O
option(SHERPA_ONNX_ENABLE_GPU "Enable ONNX Runtime GPU support" OFF)
option(SHERPA_ONNX_ENABLE_DIRECTML "Enable ONNX Runtime DirectML support" OFF)
option(SHERPA_ONNX_ENABLE_WASM "Whether to enable WASM" OFF)
option(SHERPA_ONNX_ENABLE_WASM_SPEAKER_DIARIZATION "Whether to enable WASM for speaker diarization" OFF)
option(SHERPA_ONNX_ENABLE_WASM_TTS "Whether to enable WASM for TTS" OFF)
option(SHERPA_ONNX_ENABLE_WASM_ASR "Whether to enable WASM for ASR" OFF)
option(SHERPA_ONNX_ENABLE_WASM_KWS "Whether to enable WASM for KWS" OFF)
Expand Down Expand Up @@ -135,6 +136,7 @@ message(STATUS "SHERPA_ONNX_ENABLE_C_API ${SHERPA_ONNX_ENABLE_C_API}")
message(STATUS "SHERPA_ONNX_ENABLE_WEBSOCKET ${SHERPA_ONNX_ENABLE_WEBSOCKET}")
message(STATUS "SHERPA_ONNX_ENABLE_GPU ${SHERPA_ONNX_ENABLE_GPU}")
message(STATUS "SHERPA_ONNX_ENABLE_WASM ${SHERPA_ONNX_ENABLE_WASM}")
message(STATUS "SHERPA_ONNX_ENABLE_WASM_SPEAKER_DIARIZATION ${SHERPA_ONNX_ENABLE_WASM_SPEAKER_DIARIZATION}")
message(STATUS "SHERPA_ONNX_ENABLE_WASM_TTS ${SHERPA_ONNX_ENABLE_WASM_TTS}")
message(STATUS "SHERPA_ONNX_ENABLE_WASM_ASR ${SHERPA_ONNX_ENABLE_WASM_ASR}")
message(STATUS "SHERPA_ONNX_ENABLE_WASM_KWS ${SHERPA_ONNX_ENABLE_WASM_KWS}")
Expand Down Expand Up @@ -196,9 +198,19 @@ else()
add_definitions(-DSHERPA_ONNX_ENABLE_DIRECTML=0)
endif()

if(SHERPA_ONNX_ENABLE_WASM_SPEAKER_DIARIZATION)
if(NOT SHERPA_ONNX_ENABLE_SPEAKER_DIARIZATION)
message(FATAL_ERROR "Please set SHERPA_ONNX_ENABLE_SPEAKER_DIARIZATION to ON if you want to build WASM for speaker diarization")
endif()

if(NOT SHERPA_ONNX_ENABLE_WASM)
message(FATAL_ERROR "Please set SHERPA_ONNX_ENABLE_WASM to ON if you enable WASM for speaker diarization")
endif()
endif()

if(SHERPA_ONNX_ENABLE_WASM_TTS)
if(NOT SHERPA_ONNX_ENABLE_TTS)
message(FATAL_ERROR "Please set SHERPA_ONNX_ENABLE_TTS to ON if you want to build wasm TTS")
message(FATAL_ERROR "Please set SHERPA_ONNX_ENABLE_TTS to ON if you want to build WASM for TTS")
endif()

if(NOT SHERPA_ONNX_ENABLE_WASM)
Expand Down
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ We also have spaces built using WebAssembly. They are listed below:
|VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-small |[Click me][wasm-hf-vad-asr-zh-en-paraformer-small]| [地址][wasm-ms-vad-asr-zh-en-paraformer-small]|
|Speech synthesis (English) |[Click me][wasm-hf-tts-piper-en]| [地址][wasm-ms-tts-piper-en]|
|Speech synthesis (German) |[Click me][wasm-hf-tts-piper-de]| [地址][wasm-ms-tts-piper-de]|
|Speaker diarization |[Click me][wasm-hf-speaker-diarization]|[地址][wasm-ms-speaker-diarization]|

### Links for pre-built Android APKs

Expand Down Expand Up @@ -173,6 +174,7 @@ We also have spaces built using WebAssembly. They are listed below:
| Speaker identification (Speaker ID) | [Address][sid-models] |
| Spoken language identification (Language ID)| See multi-lingual [Whisper][Whisper] ASR models from [Speech recognition][asr-models]|
| Punctuation | [Address][punct-models] |
| Speaker segmentation | [Address][speaker-segmentation-models] |

### Useful links

Expand Down Expand Up @@ -261,6 +263,8 @@ Video demo in Chinese: [爆了!炫神教你开打字挂!真正影响胜率
[wasm-ms-tts-piper-en]: https://modelscope.cn/studios/k2-fsa/web-assembly-tts-sherpa-onnx-en
[wasm-hf-tts-piper-de]: https://huggingface.co/spaces/k2-fsa/web-assembly-tts-sherpa-onnx-de
[wasm-ms-tts-piper-de]: https://modelscope.cn/studios/k2-fsa/web-assembly-tts-sherpa-onnx-de
[wasm-hf-speaker-diarization]: https://huggingface.co/spaces/k2-fsa/web-assembly-speaker-diarization-sherpa-onnx
[wasm-ms-speaker-diarization]: https://www.modelscope.cn/studios/csukuangfj/web-assembly-speaker-diarization-sherpa-onnx
[apk-streaming-asr]: https://k2-fsa.github.io/sherpa/onnx/android/apk.html
[apk-streaming-asr-cn]: https://k2-fsa.github.io/sherpa/onnx/android/apk-cn.html
[apk-tts]: https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html
Expand Down Expand Up @@ -303,5 +307,6 @@ Video demo in Chinese: [爆了!炫神教你开打字挂!真正影响胜率
[sid-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-recongition-models
[slid-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-recongition-models
[punct-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/punctuation-models
[speaker-segmentation-models]: https://github.com/k2-fsa/sherpa-onnx/releases/tag/speaker-segmentation-models
[GigaSpeech]: https://github.com/SpeechColab/GigaSpeech
[WenetSpeech]: https://github.com/wenet-e2e/WenetSpeech
4 changes: 2 additions & 2 deletions build-wasm-simd-asr.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ if [ x"$EMSCRIPTEN" == x"" ]; then
echo "git clone https://github.com/emscripten-core/emsdk.git"
echo "cd emsdk"
echo "git pull"
echo "./emsdk install latest"
echo "./emsdk activate latest"
echo "./emsdk install 3.1.53"
echo "./emsdk activate 3.1.53"
echo "source ./emsdk_env.sh"
exit 1
else
Expand Down
4 changes: 2 additions & 2 deletions build-wasm-simd-kws.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ if [ x"$EMSCRIPTEN" == x"" ]; then
echo "git clone https://github.com/emscripten-core/emsdk.git"
echo "cd emsdk"
echo "git pull"
echo "./emsdk install latest"
echo "./emsdk activate latest"
echo "./emsdk install 3.1.53"
echo "./emsdk activate 3.1.53"
echo "source ./emsdk_env.sh"
exit 1
else
Expand Down
4 changes: 2 additions & 2 deletions build-wasm-simd-nodejs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ if [ x"$EMSCRIPTEN" == x"" ]; then
echo "git clone https://github.com/emscripten-core/emsdk.git"
echo "cd emsdk"
echo "git pull"
echo "./emsdk install latest"
echo "./emsdk activate latest"
echo "./emsdk install 3.1.53"
echo "./emsdk activate 3.1.53"
echo "source ./emsdk_env.sh"
exit 1
else
Expand Down
Loading
Loading