Skip to content

thanhtantran/VieNeu-TTS

 
 

Repository files navigation

VieNeu-TTS

GitHub Hugging Face

VieNeu-TTS is an advanced on-device Vietnamese Text-to-Speech (TTS) model with instant voice cloning.

Trained on ~1000 hours of high-quality Vietnamese speech, this model represents a significant upgrade from VieNeu-TTS-140h with the following improvements:

  • Enhanced pronunciation: More accurate and stable Vietnamese pronunciation
  • Code-switching support: Seamless transitions between Vietnamese and English
  • Better voice cloning: Higher fidelity and speaker consistency
  • Real-time synthesis: 24 kHz waveform generation on CPU or GPU
  • Multiple model formats: Support for PyTorch, GGUF Q4/Q8 (CPU optimized), and ONNX codec

VieNeu-TTS-1000h delivers production-ready speech synthesis fully offline.

Author: Phạm Nguyễn Ngọc Bảo

Modified to use with CPU (only) on Orange Pi 5, tested, remove all GPU config and models.

vieneu-tts-orangepi vieneu-tts-orangepi-2

🔬 Model Overview

  • Backbone: Qwen 0.5B LLM (chat template)
  • Audio codec: NeuCodec (torch implementation; ONNX & quantized variants supported)
  • Context window: 2 048 tokens shared by prompt text and speech tokens
  • Output watermark: Enabled by default
  • Training data:

Model Variants

Model Format Device Quality Speed Streaming
VieNeu-TTS-q8-gguf GGUF Q8 CPU/GPU ⭐⭐⭐⭐ Fast
VieNeu-TTS-q4-gguf GGUF Q4 CPU/GPU ⭐⭐⭐ Very Fast

Recommendations:

  • CPU users: Use VieNeu-TTS-q4-gguf for fastest inference or VieNeu-TTS-q8-gguf for better quality
  • Streaming: Only GGUF models support streaming inference

🏁 Getting Started

1. Clone the repository

git clone https://github.com/thanhtantran/VieNeu-TTS.git
cd VieNeu-TTS

2. Install eSpeak NG (required by phonemizer)

Common commands:

sudo apt install espeak-ng

The lib of espeak-ng on Orange Pi is located in /usr/lib/aarch64-linux-gnu/libespeak-ng.so.1 So I have modified all link in the code to this lib

3. Install Python dependencies (Python ≥ 3.12)

Don't worried about your current Python version, uv will create venv with Python 3.12

Lightweight installation with llama-cpp-python (CPU) and standard PyTorch (CPU).

uv sync

4. Run the Application

Start the Gradio interface:

uv run gradio_app.py

Then access the Web UI at http://127.0.0.1:7860.

🐋 Docker Deployment

For a quick start or production deployment without manually installing dependencies, use Docker.

Quick Start

Copy .env.example to .env

cp .env.example .env

Build or start container with my prebuilt image

docker compose up -d

Access the Web UI at http://localhost:7860.


📦 Project Structure

VieNeu-TTS/
├── examples/
│   ├── infer_long_text.py     # CLI for long-form synthesis (chunked)
│   └── sample_long_text.txt   # Example paragraph for testing
├── gradio_app.py              # Local Gradio web demo with LMDeploy support
├── main.py                    # Basic batch inference script
├── config.yaml                # Configuration for models, codecs, and voices
├── output_audio/              # Generated audio (created when running scripts)
├── sample/                    # Reference voices (audio + transcript + codes)
│   ├── Bình (nam miền Bắc).wav/txt/pt
│   ├── Đoan (nữ miền Nam).wav/txt/pt
│   ├── Dung (nữ miền Nam).wav/txt/pt
│   ├── Hương (nữ miền Bắc).wav/txt/pt
│   ├── Ly (nữ miền Bắc).wav/txt/pt
│   ├── Ngọc (nữ miền Bắc).wav/txt/pt
│   ├── Nguyên (nam miền Nam).wav/txt/pt
│   ├── Sơn (nam miền Nam).wav/txt/pt
│   ├── Tuyên (nam miền Bắc).wav/txt/pt
│   └── Vĩnh (nam miền Nam).wav/txt/pt
├── utils/
│   ├── __init__.py
│   ├── core_utils.py          # Text chunking utilities
│   ├── normalize_text.py      # Vietnamese text normalization pipeline
│   ├── phonemize_text.py      # Text to phoneme conversion
│   └── phoneme_dict.json      # Phoneme dictionary
├── vieneu_tts/
│   ├── __init__.py            # Exports VieNeuTTS and FastVieNeuTTS
│   └── vieneu_tts.py          # Core VieNeuTTS implementation (VieNeuTTS & FastVieNeuTTS)
├── README.md
├── requirements.txt           # Basic dependencies (legacy)
├── pyproject.toml             # Project configuration with full dependencies (UV)
└── uv.lock                    # UV lock file for dependency management

📚 References


📄 License

Apache License 2.0


📑 Citation

@misc{vieneutts2025,
  title        = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning},
  author       = {Pham Nguyen Ngoc Bao},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}}
}

Please also cite the base model:

@misc{neuttsair2025,
  title        = {NeuTTS Air: On-Device Speech Language Model with Instant Voice Cloning},
  author       = {Neuphonic},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/neuphonic/neutts-air}}
}

🤝 Contributing

Contributions are welcome!

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit your changes: git commit -m "Add amazing feature"
  4. Push the branch: git push origin feature/amazing-feature
  5. Open a pull request

📞 Support


🙏 Acknowledgements

This project builds upon NeuTTS Air by Neuphonic. Huge thanks to the team for open-sourcing such a powerful base model.


**Made with ❤️ for the Vietnamese TTS community and Orange Pi users :) **

About

Vietnamese TTS with instant voice cloning • On-device • Real-time CPU inference • 24kHz audio quality • Optimized to run on Orange Pi

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 98.3%
  • Dockerfile 1.7%