VieNeu-TTS

VieNeu-TTS is an advanced on-device Vietnamese Text-to-Speech (TTS) model with instant voice cloning.

Trained on ~1000 hours of high-quality Vietnamese speech, this model represents a significant upgrade from VieNeu-TTS-140h with the following improvements:

Enhanced pronunciation: More accurate and stable Vietnamese pronunciation
Code-switching support: Seamless transitions between Vietnamese and English
Better voice cloning: Higher fidelity and speaker consistency
Real-time synthesis: 24 kHz waveform generation on CPU or GPU
Multiple model formats: Support for PyTorch, GGUF Q4/Q8 (CPU optimized), and ONNX codec

VieNeu-TTS-1000h delivers production-ready speech synthesis fully offline.

Author: Phạm Nguyễn Ngọc Bảo

Modified to use with CPU (only) on Orange Pi 5, tested, remove all GPU config and models.

Demo video: https://youtu.be/BNHYnSm0O6s

🔬 Model Overview

Backbone: Qwen 0.5B LLM (chat template)
Audio codec: NeuCodec (torch implementation; ONNX & quantized variants supported)
Context window: 2 048 tokens shared by prompt text and speech tokens
Output watermark: Enabled by default
Training data:
- VieNeu-TTS-1000h — 443,641 curated Vietnamese samples

Model Variants

Model	Format	Device	Quality	Speed	Streaming
VieNeu-TTS-q8-gguf	GGUF Q8	CPU/GPU	⭐⭐⭐⭐	Fast	✅
VieNeu-TTS-q4-gguf	GGUF Q4	CPU/GPU	⭐⭐⭐	Very Fast	✅

Recommendations:

CPU users: Use VieNeu-TTS-q4-gguf for fastest inference or VieNeu-TTS-q8-gguf for better quality
Streaming: Only GGUF models support streaming inference

🏁 Getting Started

1. Clone the repository

git clone https://github.com/thanhtantran/VieNeu-TTS.git
cd VieNeu-TTS

2. Install eSpeak NG (required by phonemizer)

Common commands:

sudo apt install espeak-ng

The lib of espeak-ng on Orange Pi is located in /usr/lib/aarch64-linux-gnu/libespeak-ng.so.1 So I have modified all link in the code to this lib

3. Install Python dependencies (Python ≥ 3.12)

Don't worried about your current Python version, uv will create venv with Python 3.12

Lightweight installation with llama-cpp-python (CPU) and standard PyTorch (CPU).

uv sync

4. Run the Application

Start the Gradio interface:

uv run gradio_app.py

Then access the Web UI at http://127.0.0.1:7860.

🐋 Docker Deployment

For a quick start or production deployment without manually installing dependencies, use Docker.

Quick Start

Copy .env.example to .env

cp .env.example .env

Build or start container with my prebuilt image

docker compose up -d

Access the Web UI at http://localhost:7860.

📦 Project Structure

VieNeu-TTS/
├── examples/
│   ├── infer_long_text.py     # CLI for long-form synthesis (chunked)
│   └── sample_long_text.txt   # Example paragraph for testing
├── gradio_app.py              # Local Gradio web demo with LMDeploy support
├── main.py                    # Basic batch inference script
├── config.yaml                # Configuration for models, codecs, and voices
├── output_audio/              # Generated audio (created when running scripts)
├── sample/                    # Reference voices (audio + transcript + codes)
│   ├── Bình (nam miền Bắc).wav/txt/pt
│   ├── Đoan (nữ miền Nam).wav/txt/pt
│   ├── Dung (nữ miền Nam).wav/txt/pt
│   ├── Hương (nữ miền Bắc).wav/txt/pt
│   ├── Ly (nữ miền Bắc).wav/txt/pt
│   ├── Ngọc (nữ miền Bắc).wav/txt/pt
│   ├── Nguyên (nam miền Nam).wav/txt/pt
│   ├── Sơn (nam miền Nam).wav/txt/pt
│   ├── Tuyên (nam miền Bắc).wav/txt/pt
│   └── Vĩnh (nam miền Nam).wav/txt/pt
├── utils/
│   ├── __init__.py
│   ├── core_utils.py          # Text chunking utilities
│   ├── normalize_text.py      # Vietnamese text normalization pipeline
│   ├── phonemize_text.py      # Text to phoneme conversion
│   └── phoneme_dict.json      # Phoneme dictionary
├── vieneu_tts/
│   ├── __init__.py            # Exports VieNeuTTS and FastVieNeuTTS
│   └── vieneu_tts.py          # Core VieNeuTTS implementation (VieNeuTTS & FastVieNeuTTS)
├── README.md
├── requirements.txt           # Basic dependencies (legacy)
├── pyproject.toml             # Project configuration with full dependencies (UV)
└── uv.lock                    # UV lock file for dependency management

📚 References

📄 License

Apache License 2.0

📑 Citation

@misc{vieneutts2025,
  title        = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning},
  author       = {Pham Nguyen Ngoc Bao},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}}
}

Please also cite the base model:

@misc{neuttsair2025,
  title        = {NeuTTS Air: On-Device Speech Language Model with Instant Voice Cloning},
  author       = {Neuphonic},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/neuphonic/neutts-air}}
}

🤝 Contributing

Contributions are welcome!

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Commit your changes: git commit -m "Add amazing feature"
Push the branch: git push origin feature/amazing-feature
Open a pull request

📞 Support

GitHub Issues: github.com/pnnbao97/VieNeu-TTS/issues
Hugging Face: huggingface.co/pnnbao-ump
Facebook: Phạm Nguyễn Ngọc Bảo

🙏 Acknowledgements

This project builds upon NeuTTS Air by Neuphonic. Huge thanks to the team for open-sourcing such a powerful base model.

**Made with ❤️ for the Vietnamese TTS community and Orange Pi users :) **

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VieNeu-TTS

Demo video: https://youtu.be/BNHYnSm0O6s

🔬 Model Overview

Model Variants

🏁 Getting Started

1. Clone the repository

2. Install eSpeak NG (required by phonemizer)

3. Install Python dependencies (Python ≥ 3.12)

4. Run the Application

🐋 Docker Deployment

Quick Start

📦 Project Structure

📚 References

📄 License

📑 Citation

🤝 Contributing

📞 Support

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
.github		.github
docs		docs
examples		examples
sample		sample
utils		utils
vieneu_tts		vieneu_tts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
docker-compose.yml		docker-compose.yml
gradio_app.py		gradio_app.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

VieNeu-TTS

Demo video: https://youtu.be/BNHYnSm0O6s

🔬 Model Overview

Model Variants

🏁 Getting Started

1. Clone the repository

2. Install eSpeak NG (required by phonemizer)

3. Install Python dependencies (Python ≥ 3.12)

4. Run the Application

🐋 Docker Deployment

Quick Start

📦 Project Structure

📚 References

📄 License

📑 Citation

🤝 Contributing

📞 Support

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages