[Hallucinations] Repetition of words or chunks with own fine-tuned model #987

asr-lord · 2024-09-02T14:03:05Z

I've created a "real-time" application with chunks of 3 sec using my own small fine-tuned model. It reads the complete audio call recording and generate 3s chunks, but in some cases I get repetition of the same word/s:

I've converted the model by the following code:

!pip install transformers[torch]>=4.23 ctranslate2
from ctranslate2.converters import TransformersConverter

model_name_or_path = "/home/whisper-small-fine-tuned"
output_dir = "/home/whisper-small-ct2"

converter = TransformersConverter(model_name_or_path)
converter.convert(output_dir, quantization="float16", force=True)

And run the following code to get transcription:

from faster_whisper import WhisperModel

model_size = "/home/whisper-small-ct2"

# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")

segment, info = model.transcribe(
    wav_array_chunk_16khz,
    initial_prompt="Venta telefonica",
    language="es",
    task="transcribe",
    hotwords=None,
    word_timestamps=True,
    vad_filter=True, vad_parameters=dict(min_silence_duration_ms=500),
    chunk_length=5,
    condition_on_previous_text=False,
    suppress_tokens=[],
)

Output text transcription:
['no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo', 'soy', 'emigante', 'y', 'hasta', 'hoy', 'no', 'porque', 'yo']

Notice that the factor to transcribe chunks is lower in fine-tuned model than original OpenAI model:
*I'm using 4/16 T4-GPU AWS instance

{'small': 
	{'float32': 5.158, 'float16': 7.313},
 'whisper-small-ct2': 
	{'float32': 2.32, 'float16': 3.383},
 'medium': 
 	{'float32': 2.608, 'float16': 4.966}}

The text was updated successfully, but these errors were encountered:

asr-lord changed the title ~~[Hallucinations] Repetition of words or chunks with own fine-tuned models~~ [Hallucinations] Repetition of words or chunks with own fine-tuned model Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hallucinations] Repetition of words or chunks with own fine-tuned model #987

[Hallucinations] Repetition of words or chunks with own fine-tuned model #987

asr-lord commented Sep 2, 2024 •

edited

Loading

[Hallucinations] Repetition of words or chunks with own fine-tuned model #987

[Hallucinations] Repetition of words or chunks with own fine-tuned model #987

Comments

asr-lord commented Sep 2, 2024 • edited Loading

asr-lord commented Sep 2, 2024 •

edited

Loading