- Upgrade portaudio
- MFC example to enable loopback recording (thru WASAPI hostapi)
- Fixed MFC example project settings. MFC example now can switch debug and release build without rebuilding Sherpa-Onnx libs (see Fixed issues)
git clone https://github.com/luke-lin-vmc/sherpa-onnx
cd sherpa-onnx
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build . --config Release
cmake -DCMAKE_BUILD_TYPE=Debug ..
cmake --build . --config Debug
cd ..\mfc-examples
msbuild .\mfc-examples.sln /property:Configuration=Release /property:Platform=x64
(or Visual C++ to open mfc-examples.sln, then build Release solution)
cd ..\mfc-examples
msbuild .\mfc-examples.sln /property:Configuration=Debug /property:Platform=x64
(or Visual C++ to open mfc-examples.sln, then build Debug solution)
The StreamingSpeechRecognition.exe files will be under .\x64\Release or .\x64\Debug
- Download and decompress models and tokenizer from https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2
- Rename models and tokenizer to
decoder.onnx,encoder.onnx,joiner.onnxandtokens.txt. Put them along with theStreamingSpeechRecognition.exe. An example below.
D:\ASR\Sherpa-onnx\zipformer\>tree /F
Folder PATH listing for volume OSDisk
Volume serial number is C2C8-D7B9
C:.
decoder.onnx
encoder.onnx
joiner.onnx
StreamingSpeechRecognition.exe
tokens.txt
Note: You may change the model to support different languages. All the zipformer models can be found at https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models This app is a streaming (aka on-line or real-time) pipeline, only the streaming models (model name contains “streaming”, such like sherpa-onnx-streaming-zipformer-en-2023-02-21.tar.bz2) can be used.
- You may use paraformer models instead of zipformer models
- Download and decompress models and tokenizer from https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
- Rename models and tokenizer to
paraformer-decoder.onnx,paraformer-encoder.onnxandtokens.txt. Put them along with theStreamingSpeechRecognition.exe. An example below.
D:\ASR\Sherpa-onnx\paraformer>tree /F
Folder PATH listing for volume OSDisk
Volume serial number is C2C8-D7B9
C:.
paraformer-decoder.onnx
paraformer-encoder.onnx
StreamingSpeechRecognition.exe
tokens.txt
Note: You may change the model to support different languages. All the paraformer models can be found at https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models This app is a streaming (aka on-line or real-time) pipeline, only the streaming models (model name contains “streaming”, such like sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en.tar.bz2 ) can be used.
Currently the models support Chinese output Simplified Chinese only. You may convert "tokens.txt" to Traditional Chinese to output Traditional Chinese. Several methods can do the conversion. One of the methods is to use MS Word. https://answers.microsoft.com/en-us/msoffice/forum/all/in-ms-365-word-how-to-convert-traditional-chinese/f46cdf06-f404-429a-86cb-74f4d8bfb114
.\x64\Release\StreamingSpeechRecognition.exe
- With the original repo, you might get
Link Errorwhile "Build Debug MFC example". This is becasue Debug MFC example needs to link debug "onnxruntime.lib". However, below two CMake scripts put both debug and release "onnxruntime.lib" into the same location. Means "Build Release Libs and Examples" will overwrite the debug "onnxruntime.lib" with release "onnxruntime.lib". The LINK error will happen when Debug MFC example links to release "onnxruntime.lib"
- https://github.com/k2-fsa/sherpa-onnx/blob/master/cmake/onnxruntime-win-x64-static-debug.cmake#L67
- https://github.com/k2-fsa/sherpa-onnx/blob/master/cmake/onnxruntime-win-x86-static.cmake#L63
- Solution
- Modify the cmake script by copying debug "onnxruntime.lib" to lib/Debug, and copying release "onnxruntime.lib" to lib/Release.
- Make MFC example to link differnet "onnxruntime.lib" per the configuration (Debug/Release)
- PortAudio returns "Invalid device" if using the "Speaker (Synaptics Audio) [Loopback]" device
- https://k2-fsa.github.io/sherpa/onnx/install/windows.html#bit-windows-x64
- https://github.com/k2-fsa/sherpa-onnx/tree/v1.11.2/mfc-examples
| Speech recognition | Speech synthesis |
|---|---|
| ✔️ | ✔️ |
| Speaker identification | Speaker diarization | Speaker verification |
|---|---|---|
| ✔️ | ✔️ | ✔️ |
| Spoken Language identification | Audio tagging | Voice activity detection |
|---|---|---|
| ✔️ | ✔️ | ✔️ |
| Keyword spotting | Add punctuation | Speech enhancement |
|---|---|---|
| ✔️ | ✔️ | ✔️ |
| Architecture | Android | iOS | Windows | macOS | linux | HarmonyOS |
|---|---|---|---|---|---|---|
| x64 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
| x86 | ✔️ | ✔️ | ||||
| arm64 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
| arm32 | ✔️ | ✔️ | ✔️ | |||
| riscv64 | ✔️ |
| 1. C++ | 2. C | 3. Python | 4. JavaScript |
|---|---|---|---|
| ✔️ | ✔️ | ✔️ | ✔️ |
| 5. Java | 6. C# | 7. Kotlin | 8. Swift |
|---|---|---|---|
| ✔️ | ✔️ | ✔️ | ✔️ |
| 9. Go | 10. Dart | 11. Rust | 12. Pascal |
|---|---|---|---|
| ✔️ | ✔️ | ✔️ | ✔️ |
For Rust support, please see sherpa-rs
It also supports WebAssembly.
This repository supports running the following functions locally
- Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
- Text-to-speech (i.e., TTS)
- Speaker diarization
- Speaker identification
- Speaker verification
- Spoken language identification
- Audio tagging
- VAD (e.g., silero-vad)
- Keyword spotting
on the following platforms and operating systems:
- x86,
x86_64, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64), RK NPU - Linux, macOS, Windows, openKylin
- Android, WearOS
- iOS
- HarmonyOS
- NodeJS
- WebAssembly
- NVIDIA Jetson Orin NX (Support running on both CPU and GPU)
- NVIDIA Jetson Nano B01 (Support running on both CPU and GPU)
- Raspberry Pi
- RV1126
- LicheePi4A
- VisionFive 2
- 旭日X3派
- 爱芯派
- etc
with the following APIs
- C++, C, Python, Go,
C# - Java, Kotlin, JavaScript
- Swift, Rust
- Dart, Object Pascal
You can visit the following Huggingface spaces to try sherpa-onnx without installing anything. All you need is a browser.
| Description | URL |
|---|---|
| Speaker diarization | Click me |
| Speech recognition | Click me |
| Speech recognition with Whisper | Click me |
| Speech synthesis | Click me |
| Generate subtitles | Click me |
| Audio tagging | Click me |
| Spoken language identification with Whisper | Click me |
We also have spaces built using WebAssembly. They are listed below:
| Description | Huggingface space | ModelScope space |
|---|---|---|
| Voice activity detection with silero-vad | Click me | 地址 |
| Real-time speech recognition (Chinese + English) with Zipformer | Click me | 地址 |
| Real-time speech recognition (Chinese + English) with Paraformer | Click me | 地址 |
| Real-time speech recognition (Chinese + English + Cantonese) with Paraformer-large | Click me | 地址 |
| Real-time speech recognition (English) | Click me | 地址 |
| VAD + speech recognition (Chinese + English + Korean + Japanese + Cantonese) with SenseVoice | Click me | 地址 |
| VAD + speech recognition (English) with Whisper tiny.en | Click me | 地址 |
| VAD + speech recognition (English) with Moonshine tiny | Click me | 地址 |
| VAD + speech recognition (English) with Zipformer trained with GigaSpeech | Click me | 地址 |
| VAD + speech recognition (Chinese) with Zipformer trained with WenetSpeech | Click me | 地址 |
| VAD + speech recognition (Japanese) with Zipformer trained with ReazonSpeech | Click me | 地址 |
| VAD + speech recognition (Thai) with Zipformer trained with GigaSpeech2 | Click me | 地址 |
| VAD + speech recognition (Chinese 多种方言) with a TeleSpeech-ASR CTC model | Click me | 地址 |
| VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-large | Click me | 地址 |
| VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-small | Click me | 地址 |
| VAD + speech recognition (多语种及多种中文方言) with Dolphin-base | Click me | 地址 |
| Speech synthesis (English) | Click me | 地址 |
| Speech synthesis (German) | Click me | 地址 |
| Speaker diarization | Click me | 地址 |
You can find pre-built Android APKs for this repository in the following table
| Description | URL | 中国用户 |
|---|---|---|
| Speaker diarization | Address | 点此 |
| Streaming speech recognition | Address | 点此 |
| Text-to-speech | Address | 点此 |
| Voice activity detection (VAD) | Address | 点此 |
| VAD + non-streaming speech recognition | Address | 点此 |
| Two-pass speech recognition | Address | 点此 |
| Audio tagging | Address | 点此 |
| Audio tagging (WearOS) | Address | 点此 |
| Speaker identification | Address | 点此 |
| Spoken language identification | Address | 点此 |
| Keyword spotting | Address | 点此 |
| Description | URL | 中国用户 |
|---|---|---|
| Streaming speech recognition | Address | 点此 |
| Description | URL | 中国用户 |
|---|---|---|
| Android (arm64-v8a, armeabi-v7a, x86_64) | Address | 点此 |
| Linux (x64) | Address | 点此 |
| macOS (x64) | Address | 点此 |
| macOS (arm64) | Address | 点此 |
| Windows (x64) | Address | 点此 |
Note: You need to build from source for iOS.
| Description | URL |
|---|---|
| Speech recognition (speech to text, ASR) | Address |
| Text-to-speech (TTS) | Address |
| VAD | Address |
| Keyword spotting | Address |
| Audio tagging | Address |
| Speaker identification (Speaker ID) | Address |
| Spoken language identification (Language ID) | See multi-lingual Whisper ASR models from Speech recognition |
| Punctuation | Address |
| Speaker segmentation | Address |
| Speech enhancement | Address |
Please see
- https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/index.html
- https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-paraformer/index.html
- https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-ctc/index.html
for more models. The following table lists only SOME of them.
| Name | Supported Languages | Description |
|---|---|---|
| sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20 | Chinese, English | See also |
| sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16 | Chinese, English | See also |
| sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23 | Chinese | Suitable for Cortex A7 CPU. See also |
| sherpa-onnx-streaming-zipformer-en-20M-2023-02-17 | English | Suitable for Cortex A7 CPU. See also |
| sherpa-onnx-streaming-zipformer-korean-2024-06-16 | Korean | See also |
| sherpa-onnx-streaming-zipformer-fr-2023-04-14 | French | See also |
Please see
- https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/index.html
- https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-paraformer/index.html
- https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-ctc/index.html
- https://k2-fsa.github.io/sherpa/onnx/pretrained_models/telespeech/index.html
- https://k2-fsa.github.io/sherpa/onnx/pretrained_models/whisper/index.html
for more models. The following table lists only SOME of them.
| Name | Supported Languages | Description |
|---|---|---|
| Whisper tiny.en | English | See also |
| Moonshine tiny | English | See also |
| sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17 | Chinese, Cantonese, English, Korean, Japanese | 支持多种中文方言. See also |
| sherpa-onnx-paraformer-zh-2024-03-09 | Chinese, English | 也支持多种中文方言. See also |
| sherpa-onnx-zipformer-ja-reazonspeech-2024-08-01 | Japanese | See also |
| sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24 | Russian | See also |
| sherpa-onnx-nemo-ctc-giga-am-russian-2024-10-24 | Russian | See also |
| sherpa-onnx-zipformer-ru-2024-09-18 | Russian | See also |
| sherpa-onnx-zipformer-korean-2024-06-24 | Korean | See also |
| sherpa-onnx-zipformer-thai-2024-06-20 | Thai | See also |
| sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04 | Chinese | 支持多种方言. See also |
- Documentation: https://k2-fsa.github.io/sherpa/onnx/
- Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi
Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.
Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms
See also Open-LLM-VTuber/Open-LLM-VTuber#50
Streaming ASR and TTS based on FastAPI
It shows how to use the ASR and TTS Python APIs with FastAPI.
Uses streaming ASR in C# with graphical user interface.
Video demo in Chinese: 【开源】Windows实时字幕软件(网课/开会必备)
It uses the JavaScript API of sherpa-onnx along with Electron
Video demo in Chinese: 爆了!炫神教你开打字挂!真正影响胜率的英雄联盟工具!英雄联盟的最后一块拼图!和游戏中的每个人无障碍沟通!
A server based on nodejs providing Restful API for speech recognition.
一个模块化,全过程可离线,低占用率的对话机器人/智能音箱
It uses QT. Both ASR and TTS are used.
It extends ./flutter-examples/streaming_asr by downloading models inside the app to reduce the size of the app.
Note: [Team B] Sherpa AI backend also uses sherpa-onnx in a Flutter APP.
sherpa-onnx in Unity. See also #1695, #1892, and #1859
本项目为xiaozhi-esp32提供后端服务,帮助您快速搭建ESP32设备控制服务器 Backend service for xiaozhi-esp32, helps you quickly build an ESP32 device control server.
See also
Pure Python, GUI-focused home automation/consumer grade SCADA.
It uses TTS from sherpa-onnx. See also ✨ Speak command that uses the new globally configured TTS model.
Enable custom wake word for XiaoAi Speakers. 让小爱音箱支持自定义唤醒词。
Video demo in Chinese: 小爱同学启动~˶╹ꇴ╹˶!
It is a YouTube video, showing how the author tried to use AI so he can have a conversation with Paimon.
