This project optimizes OpenAI Whisper with NVIDIA TensorRT and implements the Wyoming Protocol for Home Assistant integration..
When executing the base.en
model on NVIDIA Jetson Orin Nano, WhisperTRT runs ~3x faster while consuming only ~60% the memory compared with PyTorch.
By default, this uses the base (multilingual) model.
WhisperTRT roughly mimics the API of the original Whisper model, making it easy to use. The Wyoming goodies are based off wyoming-faster-whisper with minimal tweaks to use WhisperTRT instead of faster-whisper.
While WhisperTRT was originally built for and tested on the Jetson Orin Nano, this project was built in Docker on an x86 Ubuntu 24.04 VM with a 4070 Ti.
Check out the performance and usage details below!
All benchmarks are generated by calling profile_backends.py
,
processing a 20-second audio clip.
Execution time in seconds to transcribe 20 seconds of speech on Jetson Orin Nano. See profile_backend.py for details.
whisper (Jetson) | faster_whisper (Jetson) | whisper_trt (Jetson) | whisper (4070 Ti) | faster_whisper (4070 Ti) | whisper_trt (4070 Ti) | |
---|---|---|---|---|---|---|
tiny.en | 1.74 sec | 0.85 sec | 0.64 sec | 0.40 sec | 0.35 sec | 0.07 sec |
base.en | 2.55 sec | Unavailable | 0.86 sec | 0.71 sec | 0.34 sec | 0.10 sec |
Memory consumption to transcribe 20 seconds of speech on Jetson Orin Nano. See profile_backend.py for details.
whisper (Jetson) | faster_whisper (Jetson) | whisper_trt (Jetson) | whisper (4070 Ti) | faster_whisper (4070 Ti) | whisper_trt (4070 Ti) | |
---|---|---|---|---|---|---|
tiny.en | 569 MB | 404 MB | 488 MB | 672 MB | 522 MB | 544 MB |
base.en | 666 MB | Unavailable | 439 MB | 726 MB | 514 MB | 548 MB |
NOTE: ARM64 dGPU and iGPU containers may take a while to start on first launch after installation or updates. I do not have ARM64 or Jetson devices so several packages such as torch and torch2trt fail to install properly because CUDA is not detected when using QEMU/buildx. If you know how to get around this please reach out to me.
- Install and configure Docker
- Install and configure the Nvidia Container Toolkit
For AMD64 with discrete GPUs:
services:
wyoming-whisper-trt:
image: captnspdr/wyoming-whisper-trt:latest-amd64
container_name: wyoming-whisper-trt
ports:
- 10300:10300
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
For ARM64 with discrete GPUs:
services:
wyoming-whisper-trt:
image: captnspdr/wyoming-whisper-trt:latest-arm64
container_name: wyoming-whisper-trt
ports:
- 10300:10300
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
For ARM64 with an iGPU like Jetson devices:
services:
wyoming-whisper-trt:
image: captnspdr/wyoming-whisper-trt:latest-igpu
container_name: wyoming-whisper-trt
ports:
- 10300:10300
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [GPU]
- Clone this repository
- Browse to the repository root folder
- Run the following command based on your platform:
For AMD64 with dGPU:
docker run --gpus all --name wyoming-whisper-trt -d -p 10300:10300 captnspdr/wyoming-whisper-trt:latest-amd64
For ARM64 with dGPU:
docker run --gpus all --name wyoming-whisper-trt -d -p 10300:10300 captnspdr/wyoming-whisper-trt:latest-arm64
For ARM64 with iGPU:
docker run --gpus all --name wyoming-whisper-trt -d -p 10300:10300 captnspdr/wyoming-whisper-trt:latest-igpu
- Clone this repository
- Browse to the repository root folder
- Run
docker compose -f docker-compose-github.yaml up -d