Skip to content

A project that optimizes Wyoming and Whisper for low latency inference using NVIDIA TensorRT

License

Notifications You must be signed in to change notification settings

Jonah-May-OSS/wyoming-whisper-trt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WhisperTRT

This project optimizes OpenAI Whisper with NVIDIA TensorRT and implements the Wyoming Protocol for Home Assistant integration..

When executing the base.en model on NVIDIA Jetson Orin Nano, WhisperTRT runs ~3x faster while consuming only ~60% the memory compared with PyTorch.

By default, this uses the base (multilingual) model.

WhisperTRT roughly mimics the API of the original Whisper model, making it easy to use. The Wyoming goodies are based off wyoming-faster-whisper with minimal tweaks to use WhisperTRT instead of faster-whisper.

While WhisperTRT was originally built for and tested on the Jetson Orin Nano, this project was built in Docker on an x86 Ubuntu 24.04 VM with a 4070 Ti.

Check out the performance and usage details below!

Performance

All benchmarks are generated by calling profile_backends.py, processing a 20-second audio clip.

Execution Time

Execution time in seconds to transcribe 20 seconds of speech on Jetson Orin Nano. See profile_backend.py for details.

whisper (Jetson) faster_whisper (Jetson) whisper_trt (Jetson) whisper (4070 Ti) faster_whisper (4070 Ti) whisper_trt (4070 Ti)
tiny.en 1.74 sec 0.85 sec 0.64 sec 0.40 sec 0.35 sec 0.07 sec
base.en 2.55 sec Unavailable 0.86 sec 0.71 sec 0.34 sec 0.10 sec

Memory Consumption

Memory consumption to transcribe 20 seconds of speech on Jetson Orin Nano. See profile_backend.py for details.

whisper (Jetson) faster_whisper (Jetson) whisper_trt (Jetson) whisper (4070 Ti) faster_whisper (4070 Ti) whisper_trt (4070 Ti)
tiny.en 569 MB 404 MB 488 MB 672 MB 522 MB 544 MB
base.en 666 MB Unavailable 439 MB 726 MB 514 MB 548 MB

Usage

NOTE: ARM64 dGPU and iGPU containers may take a while to start on first launch after installation or updates. I do not have ARM64 or Jetson devices so several packages such as torch and torch2trt fail to install properly because CUDA is not detected when using QEMU/buildx. If you know how to get around this please reach out to me.

Pre-requisites:

  1. Install and configure Docker
  2. Install and configure the Nvidia Container Toolkit

Docker Compose (recommended)

For AMD64 with discrete GPUs:

services:
  wyoming-whisper-trt:
    image: captnspdr/wyoming-whisper-trt:latest-amd64
    container_name: wyoming-whisper-trt
    ports:
      - 10300:10300
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

For ARM64 with discrete GPUs:

services:
  wyoming-whisper-trt:
    image: captnspdr/wyoming-whisper-trt:latest-arm64
    container_name: wyoming-whisper-trt
    ports:
      - 10300:10300
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

For ARM64 with an iGPU like Jetson devices:

services:
  wyoming-whisper-trt:
    image: captnspdr/wyoming-whisper-trt:latest-igpu
    container_name: wyoming-whisper-trt
    ports:
      - 10300:10300
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [GPU]

Docker (Latest tag on Docker Hub)

  1. Clone this repository
  2. Browse to the repository root folder
  3. Run the following command based on your platform:

For AMD64 with dGPU:

docker run --gpus all --name wyoming-whisper-trt -d -p 10300:10300 captnspdr/wyoming-whisper-trt:latest-amd64

For ARM64 with dGPU:

docker run --gpus all --name wyoming-whisper-trt -d -p 10300:10300 captnspdr/wyoming-whisper-trt:latest-arm64

For ARM64 with iGPU:

docker run --gpus all --name wyoming-whisper-trt -d -p 10300:10300 captnspdr/wyoming-whisper-trt:latest-igpu

Docker (Latest GitHub commit, ARM64 and AMD64 with dGPU only)

  1. Clone this repository
  2. Browse to the repository root folder
  3. Run docker compose -f docker-compose-github.yaml up -d

See also:

  • torch2trt - Used to convert PyTorch model to TensorRT and perform inference.
  • NanoLLM - Large Language Models targeting NVIDIA Jetson. Perfect for combining with ASR!

About

A project that optimizes Wyoming and Whisper for low latency inference using NVIDIA TensorRT

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages