This project demonstrates the use of WhisperTiny models within the Unity Inference Engine for Speech to Text conversion.
Whisper is a model that was trained on labelled data for automatic speech recognition (ASR) and speech translation. Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al from OpenAI.
- Speech Input: Perform local speech-to-text using neural inference
- Multilingual Support: Supports English, German, French voice input.
- Unity:
6000.1.11f1 - Inference Engine:
2.3.0
You can download the WhisperTiny models from the Unity repository on Hugging Face.
| Model Name | Hugging Face Link |
|---|---|
| decoder_model | models/decoder_model.onnx |
| decoder_with_past_model | models/decoder_with_past_model.onnx |
| encoder_model | models/encoder_model.onnx |
| logmel_spectrogram | models/logmel_spectrogram.onnx |
Vocab JSON data/vocab.json
- Clone or download this repository.
- Download the WhisperTiny ONNX models and the vocab.json file from the
Unity Hugging Face repositoryand place the contents into the/Assets/Datadirectory in your project. - Add the model assets to the RunWhisper component of the MicrophoneManager GameObject.
- Open the
/Assets/Scenes/Runtime AI Sample Scene.unityscene in the Unity Editor. - Run the scene to see test the Speech-To-Text conversion
Try yourself:
When the record button is pressed, the microphone activates and audio is captured into an AudioClip. Pressing the button again will stop the recording.
The recorded audio clip will be used for Speech recognition. The worklfow (simplified)
Step 1: Audio Preprocessing to convert audio into time-frequency representation (log-Mel spectrogram)
Step 3: Decoder takes the encoded features and generates text output. The decoder predicts one token (word/character) at a time.
The dropdown menu allows you to select the desired input language. Once processing is complete, the detected text will be displayed in the text field.
This project depends on 3rd party neural networks. Please refer to the orignial WhisperTiny repositories for detailed license information.


