caul

Automatic speech recognition in Python

"Here's to Harry ... the best, bar none."

Audiofile transcription using NVIDIA's Parakeet family of multilingual models with fallback to Whisper.cpp for languages outside Parakeet's scope. Built with uv for package and project management. Installation's as simple as

uv python install 3.13
uv sync --dev

A handler object can be instantiated and run on one or more audio file paths or directly on NumPy/Torch tensors, returning a list of ASRHandlerResult for each input. transcriptions contains a list of tuples of the form (start_time, end_time, text_segment) and scores a measure of confidence in a transcription in the range(0, -250):

>>> from .handler import ASRHandler
>>> handler = ASRHandler(models="parakeet")
>>> handler.startup()
>>> results = handler.transcribe("<...path to some audio file...>")
>>> print(results)
[ASRInferenceHandlerResult(transcription=[(0.0, 1.5, "We're spending too much time here.")], score=-250.0),
ASRInferenceHandlerResult(transcription=[(1.5, 2.9, "Stay a little longer.")], score=-250.0),
ASRInferenceHandlerResult(transcription=[(2.9, 4.0, "He'd kill us if he got the chance.")], score=-250.0)]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

caul

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

caul