-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It is hoped that in future versions, the reasoning framework of CTranslate2 can be supported. :) #2
Comments
Thank you for the suggestion! This is known internally as Task 257 |
Hi thanks we have Whisper-Tiny model example now on Hugging Face. Please try it out and give us your feedback. Thanks for the links we'll check them out. |
@pauldog Hi, thanks for the Whisper-Tiny model example on Hugging Face.
There will be problems with the decoded results in the sample file,When decoding the given sample audio, [Music] will appear Then I tried to use Chinese transcription and modified the Chinese special token id, and some undecodable results appeared. I feel like these exceptions are related to the encoding of RunWhisper.cs, line 215
I tried fine-tuning whisper-tiny model, as well as small, large-v2, etc. But I then used the optimization-cli in hugging face to export it as an onnx model, and then used onnx to unity sentis. I tried the tiny model and replaced the AudioDecoder_Tiny.sentis and AudioEncoder_Tiny.sentis in the sample. |
Hi, when I download the file and put it in Wordpad I get:
For your second question. You could change the names of the inputs on line 157. To the names of the inputs in your ONNX file. It may just have been exported with different names. |
Hi, when I download the CS file and open it in WordPad I get: For your second question you can change the names of the inputs to the inputs that you exported your ONNX file with on line 157-162. So you would change these to "encoder_hidden_states" and "input_ids". Also, note that some languages may work better than others. It depends on the training data that was used. |
@pauldog I found that in Chinese, there may be two or three tokens forming a Chinese character. Therefore, if the historical whisper results are integrated and decoded, there will be no problem of garbled characters. The following is the code I modified here and the screenshot of the operation. |
@pauldog The following are some of my sharings. After experimental verification, they can run on Windows, but cannot run on the Android platform. I can only generate encoder and decoder models on Optimum. |
@pauldog I will share this demo later The following is the direction I want to go in the future. According to the current experience results, recognition effects need to be improved. In addition to fine-tuning the whisper model, there are also quantitative methods. It can greatly reduce the size of the model. The following is the size of my tiny model before and after quantization. I use huggingface/optimum here to quantify the model (here)
But when I imported the quantized model, UnitySentis reported an error. The following is the error message. It seems that some operators are not supported. I would like to ask, does UnitySentis not support these currently? Or is there something wrong with my converted and quantified model? If this problem can be solved, then according to our experience, a whisper-small model with at least 244 M parameters can achieve a good offline effect experience. Here are some links that may be helpful: |
Hi @YLQY. You converted them correctly. I can tell you that the Sentis team is working on quantization support. The first step in this support will be the next version 1.4.0 coming out in the next couple of weeks. |
@pauldog |
Hello @pauldog, I am very happy to see that UnitySentis has version 1.4. I can’t wait to try it out and also found the code update in huggingface. The onnx quantization model I converted through the above method still cannot be loaded. The following is the error message, I hope it will be helpful to you. |
Hi, you are right some aspects of quantization are still not supported such as QuantizeLinear. Appologies. If an you have an un-quantized model, you can quantize the weights. There is an example as one of the samples in the package manager. |
Hello, I am very happy to finally wait for the demo of UnitySentis. This surprised me. I am a speech recognition algorithm engineer. We often encounter such problems, the inference speed of the model, that is, the problem of real-time performance. Nowadays, very popular large models, such as ChatGPT and whsiper-large-v2. I want to use these models in UnitySentis. We have also tested the onnx model inference speed of Whisper in the Linux environment. Unfortunately, this inference speed is still unbearable for people. So we have two methods. The first is to reduce the size of the model, such as using whisper-small or whisper-base models, but there will still be some loss in effect. We are very happy that we have found an inference library named CTranslate2 (https://github.com/OpenNMT/CTranslate2). We only need to export the model to the format required by ctranslate2, and then with the acceleration of CTranslate2, whisper-large- The real-time rate of v2 has been significantly improved, the RTF value can be reduced to 0.03 (A30 machine), and the running speed under the CPU has also been significantly improved. Moreover, CTranslate2 also supports most of the current mainstream Transformer models, so it would be great if UnitySentis could refer to or integrate CTranslate2 in future versions. :)
I wish UnitySentis will develop better and better :)
Here are some links you may want to use
https://github.com/OpenNMT/CTranslate2
https://opennmt.net/CTranslate2
https://github.com/guillaumekln/faster-whisper/tree/master
The text was updated successfully, but these errors were encountered: