GitHub - luke-lin-vmc/sherpa-onnx: Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 11 programming languages

Forked from Sherpa-Onnx

https://github.com/k2-fsa/sherpa-onnx

Improvements

Upgrade portaudio
MFC example to enable loopback recording (thru WASAPI hostapi)
Fixed MFC example project settings. MFC example now can switch debug and release build without rebuilding Sherpa-Onnx libs (see Fixed issues)

Work Flow

Git clone

git clone https://github.com/luke-lin-vmc/sherpa-onnx
cd sherpa-onnx
mkdir build
cd build

Build Release Libs and Examples

cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build . --config Release

(Optional) Build Debug Libs and Examples

cmake -DCMAKE_BUILD_TYPE=Debug ..
cmake --build . --config Debug

Build Release MFC example

cd ..\mfc-examples
msbuild .\mfc-examples.sln /property:Configuration=Release /property:Platform=x64
(or Visual C++ to open mfc-examples.sln, then build Release solution)

(Optional) Build Debug MFC example

cd ..\mfc-examples
msbuild .\mfc-examples.sln /property:Configuration=Debug /property:Platform=x64
(or Visual C++ to open mfc-examples.sln, then build Debug solution)

The StreamingSpeechRecognition.exe files will be under .\x64\Release or .\x64\Debug

Prepare Zipformer Models

Download and decompress models and tokenizer from https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2
Rename models and tokenizer to decoder.onnx, encoder.onnx, joiner.onnx and tokens.txt. Put them along with the StreamingSpeechRecognition.exe. An example below.

   D:\ASR\Sherpa-onnx\zipformer\>tree /F
   Folder PATH listing for volume OSDisk
   Volume serial number is C2C8-D7B9
   C:.
       decoder.onnx
       encoder.onnx
       joiner.onnx
       StreamingSpeechRecognition.exe
       tokens.txt

Note: You may change the model to support different languages. All the zipformer models can be found at https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models This app is a streaming (aka on-line or real-time) pipeline, only the streaming models (model name contains “streaming”, such like sherpa-onnx-streaming-zipformer-en-2023-02-21.tar.bz2) can be used.

(Optional) Prepare Paraformer Models

You may use paraformer models instead of zipformer models
Download and decompress models and tokenizer from https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
Rename models and tokenizer to paraformer-decoder.onnx, paraformer-encoder.onnx and tokens.txt. Put them along with the StreamingSpeechRecognition.exe. An example below.

   D:\ASR\Sherpa-onnx\paraformer>tree /F
   Folder PATH listing for volume OSDisk
   Volume serial number is C2C8-D7B9
   C:.
       paraformer-decoder.onnx
       paraformer-encoder.onnx
       StreamingSpeechRecognition.exe
       tokens.txt

Note: You may change the model to support different languages. All the paraformer models can be found at https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models This app is a streaming (aka on-line or real-time) pipeline, only the streaming models (model name contains “streaming”, such like sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en.tar.bz2 ) can be used.

(Optional) Traditional Chinese Support

Currently the models support Chinese output Simplified Chinese only. You may convert "tokens.txt" to Traditional Chinese to output Traditional Chinese. Several methods can do the conversion. One of the methods is to use MS Word. https://answers.microsoft.com/en-us/msoffice/forum/all/in-ms-365-word-how-to-convert-traditional-chinese/f46cdf06-f404-429a-86cb-74f4d8bfb114

Run the program

.\x64\Release\StreamingSpeechRecognition.exe

Fixed issues

With the original repo, you might get Link Error while "Build Debug MFC example". This is becasue Debug MFC example needs to link debug "onnxruntime.lib". However, below two CMake scripts put both debug and release "onnxruntime.lib" into the same location. Means "Build Release Libs and Examples" will overwrite the debug "onnxruntime.lib" with release "onnxruntime.lib". The LINK error will happen when Debug MFC example links to release "onnxruntime.lib"

Solution

Modify the cmake script by copying debug "onnxruntime.lib" to lib/Debug, and copying release "onnxruntime.lib" to lib/Release.
Make MFC example to link differnet "onnxruntime.lib" per the configuration (Debug/Release)

Known issues

PortAudio returns "Invalid device" if using the "Speaker (Synaptics Audio) [Loopback]" device

Reference

------------- Original README.md -------------------

Supported functions

Speech recognition	Speech synthesis
✔️	✔️

Speaker identification	Speaker diarization	Speaker verification
✔️	✔️	✔️

Spoken Language identification	Audio tagging	Voice activity detection
✔️	✔️	✔️

Keyword spotting	Add punctuation	Speech enhancement
✔️	✔️	✔️

Supported platforms

Architecture	Android	iOS	Windows	macOS	linux	HarmonyOS
x64	✔️		✔️	✔️	✔️	✔️
x86	✔️		✔️
arm64	✔️	✔️	✔️	✔️	✔️	✔️
arm32	✔️				✔️	✔️
riscv64					✔️

Supported programming languages

1. C++	2. C	3. Python	4. JavaScript
✔️	✔️	✔️	✔️

5. Java	6. C#	7. Kotlin	8. Swift
✔️	✔️	✔️	✔️

9. Go	10. Dart	11. Rust	12. Pascal
✔️	✔️	✔️	✔️

For Rust support, please see sherpa-rs

It also supports WebAssembly.

Introduction

This repository supports running the following functions locally

Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
Text-to-speech (i.e., TTS)
Speaker diarization
Speaker identification
Speaker verification
Spoken language identification
Audio tagging
VAD (e.g., silero-vad)
Keyword spotting

on the following platforms and operating systems:

x86, x86_64, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64), RK NPU
Linux, macOS, Windows, openKylin
Android, WearOS
iOS
HarmonyOS
NodeJS
WebAssembly
NVIDIA Jetson Orin NX (Support running on both CPU and GPU)
NVIDIA Jetson Nano B01 (Support running on both CPU and GPU)
Raspberry Pi
RV1126
LicheePi4A
VisionFive 2
旭日X3派
爱芯派
etc

with the following APIs

C++, C, Python, Go, C#
Java, Kotlin, JavaScript
Swift, Rust
Dart, Object Pascal

Links for Huggingface Spaces

You can visit the following Huggingface spaces to try sherpa-onnx without installing anything. All you need is a browser.

Description	URL
Speaker diarization	Click me
Speech recognition	Click me
Speech recognition with Whisper	Click me
Speech synthesis	Click me
Generate subtitles	Click me
Audio tagging	Click me
Spoken language identification with Whisper	Click me

We also have spaces built using WebAssembly. They are listed below:

Description	Huggingface space	ModelScope space
Voice activity detection with silero-vad	Click me	地址
Real-time speech recognition (Chinese + English) with Zipformer	Click me	地址
Real-time speech recognition (Chinese + English) with Paraformer	Click me	地址
Real-time speech recognition (Chinese + English + Cantonese) with Paraformer-large	Click me	地址
Real-time speech recognition (English)	Click me	地址
VAD + speech recognition (Chinese + English + Korean + Japanese + Cantonese) with SenseVoice	Click me	地址
VAD + speech recognition (English) with Whisper tiny.en	Click me	地址
VAD + speech recognition (English) with Moonshine tiny	Click me	地址
VAD + speech recognition (English) with Zipformer trained with GigaSpeech	Click me	地址
VAD + speech recognition (Chinese) with Zipformer trained with WenetSpeech	Click me	地址
VAD + speech recognition (Japanese) with Zipformer trained with ReazonSpeech	Click me	地址
VAD + speech recognition (Thai) with Zipformer trained with GigaSpeech2	Click me	地址
VAD + speech recognition (Chinese 多种方言) with a TeleSpeech-ASR CTC model	Click me	地址
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-large	Click me	地址
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-small	Click me	地址
VAD + speech recognition (多语种及多种中文方言) with Dolphin-base	Click me	地址
Speech synthesis (English)	Click me	地址
Speech synthesis (German)	Click me	地址
Speaker diarization	Click me	地址

Links for pre-built Android APKs

You can find pre-built Android APKs for this repository in the following table

Description	URL	中国用户
Speaker diarization	Address	点此
Streaming speech recognition	Address	点此
Text-to-speech	Address	点此
Voice activity detection (VAD)	Address	点此
VAD + non-streaming speech recognition	Address	点此
Two-pass speech recognition	Address	点此
Audio tagging	Address	点此
Audio tagging (WearOS)	Address	点此
Speaker identification	Address	点此
Spoken language identification	Address	点此
Keyword spotting	Address	点此

Links for pre-built Flutter APPs

Real-time speech recognition

Description	URL	中国用户
Streaming speech recognition	Address	点此

Text-to-speech

Description	URL	中国用户
Android (arm64-v8a, armeabi-v7a, x86_64)	Address	点此
Linux (x64)	Address	点此
macOS (x64)	Address	点此
macOS (arm64)	Address	点此
Windows (x64)	Address	点此

Note: You need to build from source for iOS.

Links for pre-built Lazarus APPs

Generating subtitles

Description	URL	中国用户
Generate subtitles (生成字幕)	Address	点此

Links for pre-trained models

Description	URL
Speech recognition (speech to text, ASR)	Address
Text-to-speech (TTS)	Address
VAD	Address
Keyword spotting	Address
Audio tagging	Address
Speaker identification (Speaker ID)	Address
Spoken language identification (Language ID)	See multi-lingual Whisper ASR models from Speech recognition
Punctuation	Address
Speaker segmentation	Address
Speech enhancement	Address

Some pre-trained ASR models (Streaming)

Please see

for more models. The following table lists only SOME of them.

Name	Supported Languages	Description
sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20	Chinese, English	See also
sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16	Chinese, English	See also
sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23	Chinese	Suitable for Cortex A7 CPU. See also
sherpa-onnx-streaming-zipformer-en-20M-2023-02-17	English	Suitable for Cortex A7 CPU. See also
sherpa-onnx-streaming-zipformer-korean-2024-06-16	Korean	See also
sherpa-onnx-streaming-zipformer-fr-2023-04-14	French	See also

Some pre-trained ASR models (Non-Streaming)

Please see

for more models. The following table lists only SOME of them.

Name	Supported Languages	Description
Whisper tiny.en	English	See also
Moonshine tiny	English	See also
sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17	Chinese, Cantonese, English, Korean, Japanese	支持多种中文方言. See also
sherpa-onnx-paraformer-zh-2024-03-09	Chinese, English	也支持多种中文方言. See also
sherpa-onnx-zipformer-ja-reazonspeech-2024-08-01	Japanese	See also
sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24	Russian	See also
sherpa-onnx-nemo-ctc-giga-am-russian-2024-10-24	Russian	See also
sherpa-onnx-zipformer-ru-2024-09-18	Russian	See also
sherpa-onnx-zipformer-korean-2024-06-24	Korean	See also
sherpa-onnx-zipformer-thai-2024-06-20	Thai	See also
sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04	Chinese	支持多种方言. See also

Useful links

Documentation: https://k2-fsa.github.io/sherpa/onnx/
Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi

How to reach us

Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.

Name		Name	Last commit message	Last commit date
Latest commit History 1,225 Commits
.github		.github
android		android
c-api-examples		c-api-examples
cmake		cmake
cxx-api-examples		cxx-api-examples
dart-api-examples		dart-api-examples
dotnet-examples		dotnet-examples
ffmpeg-examples		ffmpeg-examples
flutter-examples		flutter-examples
flutter		flutter
go-api-examples		go-api-examples
harmony-os		harmony-os
ios-swift		ios-swift
ios-swiftui		ios-swiftui
java-api-examples		java-api-examples
kotlin-api-examples		kotlin-api-examples
lazarus-examples		lazarus-examples
mfc-examples		mfc-examples
nodejs-addon-examples		nodejs-addon-examples
nodejs-examples		nodejs-examples
pascal-api-examples		pascal-api-examples
python-api-examples		python-api-examples
rust-api-examples		rust-api-examples
scripts		scripts
sherpa-onnx		sherpa-onnx
swift-api-examples		swift-api-examples
toolchains		toolchains
wasm		wasm
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.flake8		.flake8
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CPPLINT.cfg		CPPLINT.cfg
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
build-aarch64-linux-gnu.sh		build-aarch64-linux-gnu.sh
build-android-arm64-v8a.sh		build-android-arm64-v8a.sh
build-android-armv7-eabi.sh		build-android-armv7-eabi.sh
build-android-x86-64.sh		build-android-x86-64.sh
build-android-x86.sh		build-android-x86.sh
build-arm-linux-gnueabihf.sh		build-arm-linux-gnueabihf.sh
build-ios-no-tts.sh		build-ios-no-tts.sh
build-ios-shared.sh		build-ios-shared.sh
build-ios.sh		build-ios.sh
build-ohos-arm64-v8a.sh		build-ohos-arm64-v8a.sh
build-ohos-armeabi-v7a.sh		build-ohos-armeabi-v7a.sh
build-ohos-x86-64.sh		build-ohos-x86-64.sh
build-riscv64-linux-gnu.sh		build-riscv64-linux-gnu.sh
build-rknn-linux-aarch64.sh		build-rknn-linux-aarch64.sh
build-swift-macos.sh		build-swift-macos.sh
build-wasm-simd-asr.sh		build-wasm-simd-asr.sh
build-wasm-simd-kws.sh		build-wasm-simd-kws.sh
build-wasm-simd-nodejs.sh		build-wasm-simd-nodejs.sh
build-wasm-simd-speaker-diarization.sh		build-wasm-simd-speaker-diarization.sh
build-wasm-simd-speech-enhancement.sh		build-wasm-simd-speech-enhancement.sh
build-wasm-simd-tts.sh		build-wasm-simd-tts.sh
build-wasm-simd-vad-asr.sh		build-wasm-simd-vad-asr.sh
build-wasm-simd-vad.sh		build-wasm-simd-vad.sh
jitpack.yml		jitpack.yml
new-release.sh		new-release.sh
pom.xml		pom.xml
release.sh		release.sh
setup.py		setup.py

License

luke-lin-vmc/sherpa-onnx

Folders and files

Latest commit

History

Repository files navigation