Skip to content

Latest commit

 

History

History
33 lines (29 loc) · 1.74 KB

File metadata and controls

33 lines (29 loc) · 1.74 KB

Lint Unit tests Supported Python versions Version

caul

Automatic speech recognition in Python

"Here's to Harry ... the best, bar none."

Audiofile transcription using NVIDIA's Parakeet family of multilingual models with fallback to Whisper.cpp for languages outside Parakeet's scope. Built with uv for package and project management. Installation's as simple as

uv python install 3.13
uv sync --dev

A handler object can be instantiated and run on one or more audio file paths or directly on NumPy/Torch tensors, returning a list of ASRHandlerResult for each input. transcriptions contains a list of tuples of the form (start_time, end_time, text_segment) and scores a measure of confidence in a transcription in the range(0, -250):

>>> from .handler import ASRHandler
>>> handler = ASRHandler(models="parakeet")
>>> handler.startup()
>>> results = handler.transcribe("<...path to some audio file...>")
>>> print(results)
[ASRInferenceHandlerResult(transcription=[(0.0, 1.5, "We're spending too much time here.")], score=-250.0),
ASRInferenceHandlerResult(transcription=[(1.5, 2.9, "Stay a little longer.")], score=-250.0),
ASRInferenceHandlerResult(transcription=[(2.9, 4.0, "He'd kill us if he got the chance.")], score=-250.0)]