TTS generates only noise for new languages

I’m fine-tuning MiraTTS on indian language, and although training completes successfully, inference produces only random/noisy audio instead of intelligible speech. Prompts, codec, context, and semantic tokens are verified correct, so the issue seems to be the model failing to learn proper text → semantic alignment for this language/script. What are the recommended approaches to fix this, e.g., phoneme-based training, tokenizer modifications, or text preprocessing, and what common pitfalls could cause noise-only outputs despite successful training?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTS generates only noise for new languages #18

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

TTS generates only noise for new languages #18

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions