Cyrillic letters in Polish transcription #991

Venzon · 2024-09-05T15:14:04Z

Hi,

I encountered an issue (large-v3) when transcribing Polish audio. Occasionally, Cyrillic letters appear in the transcription, even though the text should be fully in Polish. For example, instead of "Dzień dobry," the output includes "Dzieнь dobry," where the "ń" is replaced by the Cyrillic character "нь."

I noticed that this problem does not occur when using newly implemented BatchedInferencePipeline using same options.
Tested on master and 1.0.3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cyrillic letters in Polish transcription #991

Cyrillic letters in Polish transcription #991

Venzon commented Sep 5, 2024 •

edited

Loading

Cyrillic letters in Polish transcription #991

Cyrillic letters in Polish transcription #991

Comments

Venzon commented Sep 5, 2024 • edited Loading

Venzon commented Sep 5, 2024 •

edited

Loading