VieNeu-TTS is an advanced on-device Vietnamese Text-to-Speech (TTS) model with instant voice cloning.
Trained on ~1000 hours of high-quality Vietnamese speech, this model represents a significant upgrade from VieNeu-TTS-140h with the following improvements:
- Enhanced pronunciation: More accurate and stable Vietnamese pronunciation
- Code-switching support: Seamless transitions between Vietnamese and English
- Better voice cloning: Higher fidelity and speaker consistency
- Real-time synthesis: 24 kHz waveform generation on CPU or GPU
- Multiple model formats: Support for PyTorch, GGUF Q4/Q8 (CPU optimized), and ONNX codec
VieNeu-TTS-1000h delivers production-ready speech synthesis fully offline.
Author: Phạm Nguyễn Ngọc Bảo
Modified to use with CPU (only) on Orange Pi 5, tested, remove all GPU config and models.
Demo video: https://youtu.be/BNHYnSm0O6s
- Backbone: Qwen 0.5B LLM (chat template)
- Audio codec: NeuCodec (torch implementation; ONNX & quantized variants supported)
- Context window: 2 048 tokens shared by prompt text and speech tokens
- Output watermark: Enabled by default
- Training data:
- VieNeu-TTS-1000h — 443,641 curated Vietnamese samples
| Model | Format | Device | Quality | Speed | Streaming |
|---|---|---|---|---|---|
| VieNeu-TTS-q8-gguf | GGUF Q8 | CPU/GPU | ⭐⭐⭐⭐ | Fast | ✅ |
| VieNeu-TTS-q4-gguf | GGUF Q4 | CPU/GPU | ⭐⭐⭐ | Very Fast | ✅ |
Recommendations:
- CPU users: Use
VieNeu-TTS-q4-gguffor fastest inference orVieNeu-TTS-q8-gguffor better quality - Streaming: Only GGUF models support streaming inference
git clone https://github.com/thanhtantran/VieNeu-TTS.git
cd VieNeu-TTSCommon commands:
sudo apt install espeak-ngThe lib of espeak-ng on Orange Pi is located in /usr/lib/aarch64-linux-gnu/libespeak-ng.so.1 So I have modified all link in the code to this lib
Don't worried about your current Python version, uv will create venv with Python 3.12
Lightweight installation with llama-cpp-python (CPU) and standard PyTorch (CPU).
uv syncStart the Gradio interface:
uv run gradio_app.pyThen access the Web UI at http://127.0.0.1:7860.
For a quick start or production deployment without manually installing dependencies, use Docker.
Copy .env.example to .env
cp .env.example .env
Build or start container with my prebuilt image
docker compose up -dAccess the Web UI at http://localhost:7860.
VieNeu-TTS/
├── examples/
│ ├── infer_long_text.py # CLI for long-form synthesis (chunked)
│ └── sample_long_text.txt # Example paragraph for testing
├── gradio_app.py # Local Gradio web demo with LMDeploy support
├── main.py # Basic batch inference script
├── config.yaml # Configuration for models, codecs, and voices
├── output_audio/ # Generated audio (created when running scripts)
├── sample/ # Reference voices (audio + transcript + codes)
│ ├── Bình (nam miền Bắc).wav/txt/pt
│ ├── Đoan (nữ miền Nam).wav/txt/pt
│ ├── Dung (nữ miền Nam).wav/txt/pt
│ ├── Hương (nữ miền Bắc).wav/txt/pt
│ ├── Ly (nữ miền Bắc).wav/txt/pt
│ ├── Ngọc (nữ miền Bắc).wav/txt/pt
│ ├── Nguyên (nam miền Nam).wav/txt/pt
│ ├── Sơn (nam miền Nam).wav/txt/pt
│ ├── Tuyên (nam miền Bắc).wav/txt/pt
│ └── Vĩnh (nam miền Nam).wav/txt/pt
├── utils/
│ ├── __init__.py
│ ├── core_utils.py # Text chunking utilities
│ ├── normalize_text.py # Vietnamese text normalization pipeline
│ ├── phonemize_text.py # Text to phoneme conversion
│ └── phoneme_dict.json # Phoneme dictionary
├── vieneu_tts/
│ ├── __init__.py # Exports VieNeuTTS and FastVieNeuTTS
│ └── vieneu_tts.py # Core VieNeuTTS implementation (VieNeuTTS & FastVieNeuTTS)
├── README.md
├── requirements.txt # Basic dependencies (legacy)
├── pyproject.toml # Project configuration with full dependencies (UV)
└── uv.lock # UV lock file for dependency management
- GitHub Repository
- Hugging Face Model Card
- NeuTTS Air base model
- Fine-tuning guide
- VieNeuCodec dataset
Apache License 2.0
@misc{vieneutts2025,
title = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning},
author = {Pham Nguyen Ngoc Bao},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}}
}Please also cite the base model:
@misc{neuttsair2025,
title = {NeuTTS Air: On-Device Speech Language Model with Instant Voice Cloning},
author = {Neuphonic},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/neuphonic/neutts-air}}
}Contributions are welcome!
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m "Add amazing feature" - Push the branch:
git push origin feature/amazing-feature - Open a pull request
- GitHub Issues: github.com/pnnbao97/VieNeu-TTS/issues
- Hugging Face: huggingface.co/pnnbao-ump
- Facebook: Phạm Nguyễn Ngọc Bảo
This project builds upon NeuTTS Air by Neuphonic. Huge thanks to the team for open-sourcing such a powerful base model.
**Made with ❤️ for the Vietnamese TTS community and Orange Pi users :) **