Support whisper large/large-v1/large-v2/large-v3 and distil-large-v2 #1114

csukuangfj · 2024-07-12T13:35:16Z

See also #464

You can find exported onnx models at

model	URL
whisper large	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-large
whisper large-v1	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-large-v1
whisper large-v2	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-large-v2
whisper large-v3	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-large-v3
distil large-v2	https://huggingface.co/csukuangfj/sherpa-onnx-whisper-distil-large-v2

You can find a colab notebook
https://github.com/k2-fsa/colab/blob/master/sherpa-onnx/sherpa_onnx_whisper_large_v3.ipynb
showing how to use the exported whisper large v3 with sherpa-onnx.

csukuangfj · 2024-07-12T15:45:13Z

Here is the RTF about running whisper large v3 on NVIDIA GPU using google colab, which provides Tesla T4.

Fri Jul 12 15:44:09 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   75C    P0              31W /  70W |    105MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

/content/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 /content/sherpa-onnx/build/bin/sherpa-onnx-offline --whisper-encoder=./large-v3-encoder.onnx --whisper-decoder=./large-v3-decoder.onnx --tokens=./large-v3-tokens.txt --provider=cuda --num-threads=2 ./test_wavs/0.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="./large-v3-encoder.onnx", decoder="./large-v3-decoder.onnx", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), telespeech_ctc="", tokens="./large-v3-tokens.txt", num_threads=2, debug=False, provider="cuda", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
Creating recognizer ...
Started
Done!

./test_wavs/0.wav
{"text": " after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels", "timestamps": [], "tokens":[" after", " early", " night", "fall", " the", " yellow", " lamps", " would", " light", " up", " here", " and", " there", " the", " squ", "alid", " quarter", " of", " the", " broth", "els"], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 5.868 s
Real time factor (RTF): 5.868 / 6.625 = 0.886

real	0m35.402s
user	0m13.369s
sys	0m5.118s

csukuangfj · 2024-07-12T16:10:26Z

By the way, the RTF is less than 1 when Tesla T4 GPU is used for whisper large v3.

csukuangfj added 15 commits July 12, 2024 15:50

support export whisper large-v3

fdb0f1f

fix feat dim for large-v3

caf2b2e

fix a typo

82d6595

fix export

24856cb

fix test for large-v3 to use 128 feat_dim

ccb9929

remove torchaudio dep

44a8ed2

fix test

66ccadf

fix test

c36ba03

fix export

3222363

minor fixes

9b8150c

minor fixes

153eb87

update whisper version

87477f1

support whisper large v3

5e2ff50

minor fixes

22c99ee

release v1.10.14

e19234c

csukuangfj merged commit 117cd7b into k2-fsa:master Jul 12, 2024
146 of 187 checks passed

csukuangfj deleted the onnx-whisper-large-v3 branch July 12, 2024 15:47

csukuangfj mentioned this pull request Jul 12, 2024

大佬有没有微信交流群或者qq群啊，，我目前还不太理解这些代码，另外我有需求转换largeV3转onnx，这个有什么方法吗 #1112

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support whisper large/large-v1/large-v2/large-v3 and distil-large-v2 #1114

Support whisper large/large-v1/large-v2/large-v3 and distil-large-v2 #1114

csukuangfj commented Jul 12, 2024 •

edited

Loading

csukuangfj commented Jul 12, 2024

csukuangfj commented Jul 12, 2024 •

edited

Loading

Support whisper large/large-v1/large-v2/large-v3 and distil-large-v2 #1114

Support whisper large/large-v1/large-v2/large-v3 and distil-large-v2 #1114

Conversation

csukuangfj commented Jul 12, 2024 • edited Loading

csukuangfj commented Jul 12, 2024

csukuangfj commented Jul 12, 2024 • edited Loading

csukuangfj commented Jul 12, 2024 •

edited

Loading

csukuangfj commented Jul 12, 2024 •

edited

Loading