Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cyrillic letters in Polish transcription #991

Open
Venzon opened this issue Sep 5, 2024 · 0 comments
Open

Cyrillic letters in Polish transcription #991

Venzon opened this issue Sep 5, 2024 · 0 comments

Comments

@Venzon
Copy link

Venzon commented Sep 5, 2024

Hi,

I encountered an issue (large-v3) when transcribing Polish audio. Occasionally, Cyrillic letters appear in the transcription, even though the text should be fully in Polish. For example, instead of "Dzień dobry," the output includes "Dzieнь dobry," where the "ń" is replaced by the Cyrillic character "нь."

I noticed that this problem does not occur when using newly implemented BatchedInferencePipeline using same options.
Tested on master and 1.0.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant