Returning segment-level timestamps from Whisper model #665

hjaved202 · 2024-10-27T13:19:55Z

hjaved202
Oct 27, 2024

I have been using Whisper-large-v2 model from HuggingFace for local testing, and found setting 'return_timestamps=True' parameter in the ASR pipeline returns timestamped transcriptions (see below code snippet from HuggingFace page).

I would like to have access to these segment-level timestamps for an application I am working on, but it seems this parameter is not exposed in the Whisper Triton deployment here. Can anyone guide me on how I could set this parameter / access this output?

`import torch
from transformers import pipeline
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"

pipe = pipeline(
"automatic-speech-recognition",
model="openai/whisper-large-v2",
chunk_length_s=30,
device=device,
)

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = ds[0]["audio"]

prediction = pipe(sample.copy(), batch_size=8)["text"]
" Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel."

prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
[{'text': ' Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.',
'timestamp': (0.0, 5.44)}]`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Returning segment-level timestamps from Whisper model #665

{{title}}

Replies: 0 comments

Select a reply

Returning segment-level timestamps from Whisper model #665

hjaved202 Oct 27, 2024

Replies: 0 comments

hjaved202
Oct 27, 2024