Improve language detection when using clip_timestamps #867

ben91lin · 2024-06-04T04:59:29Z

Using clip_timestamps to improve the initial seek for language detection, avoiding incorrect detection at the start of the audio file.

trungkienbkhn · 2024-06-05T16:42:32Z

faster_whisper/transcribe.py

                detected_language_info = {}
+                seek = int(


Should replace with:

seek = int(start_timestamp * self.frames_per_second)

trungkienbkhn · 2024-06-05T16:49:22Z

@ben91lin , hello. Tks for your improvement. But I found that an error occurs if clip_timestamp[0] > audio duration:

    language = max(
ValueError: max() arg is an empty sequence

Could you fix this ?

ben91lin · 2024-06-06T18:59:33Z

# If seek is beyond all frames, set it to the last segment.
if seek >= features.shape[-1]:
    seek = content_frames

If audio_length is 80s and start_timestamp is 67s, it will clip the last 1.3 seconds for detection.
If start_timestamp greater or equal 80s, force use the last nb_max_frames for detection.

trungkienbkhn · 2024-06-07T02:46:18Z

# If seek is beyond all frames, set it to the last segment.
if seek >= features.shape[-1]:
    seek = content_frames

I think it's wrong logic. In this case, if seek = content_frame, then fw will detect language from content_frames to features.shape[-1].
=> These are padded values (meaningless), and can lead to incorrect language detection.
Should set seek = 0 if seek >= content_frames

ben91lin · 2024-06-07T16:49:22Z

I think you are right, I omitted the padding from FeatureExtractor. Setting seek = 0 if seek >= content_frame is the correct approach.

ben91lin force-pushed the language-detection branch 3 times, most recently from 1becddb to 65dcdc4 Compare June 4, 2024 05:11

trungkienbkhn reviewed Jun 5, 2024

View reviewed changes

ben91lin force-pushed the language-detection branch 4 times, most recently from b8cc0fc to 370902e Compare June 6, 2024 18:55

ben91lin force-pushed the language-detection branch from 370902e to 73dc4b2 Compare June 7, 2024 16:41

Improve language detection when using clip_timestamps

6f07c97

ben91lin force-pushed the language-detection branch from 73dc4b2 to 6f07c97 Compare June 7, 2024 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve language detection when using clip_timestamps #867

Improve language detection when using clip_timestamps #867

ben91lin commented Jun 4, 2024

trungkienbkhn Jun 5, 2024

trungkienbkhn commented Jun 5, 2024

ben91lin commented Jun 6, 2024

trungkienbkhn commented Jun 7, 2024

ben91lin commented Jun 7, 2024

Improve language detection when using clip_timestamps #867

Are you sure you want to change the base?

Improve language detection when using clip_timestamps #867

Conversation

ben91lin commented Jun 4, 2024

trungkienbkhn Jun 5, 2024

Choose a reason for hiding this comment

trungkienbkhn commented Jun 5, 2024

ben91lin commented Jun 6, 2024

trungkienbkhn commented Jun 7, 2024

ben91lin commented Jun 7, 2024