Only translating last 30s or so of the audio file. #172

SergioEstevao · 2024-06-21T11:28:29Z

When using the whisper-kit cli or apps with a large file, 50 minutes of audio, it looks like the final report (.srt file) is only showing the last 30s of content transcribed.

Is this expected, Am I'm missing a command line argument?

ZachNagengast · 2024-06-24T15:43:28Z

What kind of audio is it? Also could you provide the command you are using to call the cli? This may be a result of log prob errors considering the full windows to be silent, which would happen if the audio is particularly noisy. Can you try adjusting the log prop threshold and see if the results are better?

SergioEstevao · 2024-06-24T17:27:56Z

So I was trying to transcribe an episode from the Cautionary Tales podcast. The sound is clear for the majority of the episodes.

I was using the CLI with this command:
swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-tiny" --audio-path ../transcripts/audio.mp3 --report

You can get the audio file from here:

https://chtbl.com/track/39E17/podtrac.com/pts/redirect.mp3/pdrl.fm/18db03/traffic.omny.fm/d/clips/e73c998e-6e60-432f-8610-ae210140c5b1/c0ae8c6e-22f0-4e9b-ac1c-ae390037ac53/a4efe84f-d748-4730-98f5-b1770137cb8e/audio.mp3

SergioEstevao · 2024-06-28T12:13:32Z

@ZachNagengast After doing some more tests I believe the bug is on the converting process when the source file is not 1 channel and 16Kz.

This line here

While we are reading new data for the input buffer in chunck we are always writing to the same position (0) of the outputBuffer so in the end the outputBuffer only has data from the last chunk read from the input file.

ZachNagengast · 2024-06-28T20:00:12Z

Hi @SergioEstevao I'm having trouble reproducing this, can you share your the hardware and OS you're using where this error occurs?

This is the file I get running your same command with the file
last30bug.srt.zip

ZachNagengast · 2024-06-29T01:34:08Z

I've confirmed theres something weird with the output buffer. Will have something for this shortly.

ZachNagengast added the needs info Further information is requested label Jun 24, 2024

ZachNagengast added bug Something isn't working and removed needs info Further information is requested labels Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only translating last 30s or so of the audio file. #172

Only translating last 30s or so of the audio file. #172

SergioEstevao commented Jun 21, 2024

ZachNagengast commented Jun 24, 2024

SergioEstevao commented Jun 24, 2024

SergioEstevao commented Jun 28, 2024

ZachNagengast commented Jun 28, 2024

ZachNagengast commented Jun 29, 2024

Only translating last 30s or so of the audio file. #172

Only translating last 30s or so of the audio file. #172

Comments

SergioEstevao commented Jun 21, 2024

ZachNagengast commented Jun 24, 2024

SergioEstevao commented Jun 24, 2024

SergioEstevao commented Jun 28, 2024

ZachNagengast commented Jun 28, 2024

ZachNagengast commented Jun 29, 2024