Our project aims to develop a locally-operated multilingual conversational AI that integrates several advanced technologies to enhance mental health support. These include speech-to-text (STT), Chinese large language models (LLM), and text-to-speech (TTS). Additionally, we have incorporated Differentiable Digital Signal Processing - Singing Voice Conversion (DDSP-SVC) technology, enabling the AI to transform voice characteristics and respond in a comforting, familiar voice. Our model, fine�tuned on chinese psychotherapy datasets, can provide personalized counseling services
- Speech Recording or Text Input: Users can record their speech via the web interface or input text through the message box.
- Speech-to-Text: Whisper converts the recorded speech to text.
- Generating System Response: The converted text is input into the Fine-tuned chinese LLM to generate a response.
- Text-to-Speech: Bark converts the LLM-generated response into human-like speech.
- Personalized Voice Training and Cloning: Users can train their desired voice using the DDSP-SVC through the interface, and replace the system's voice with it.
The recommended version of python is 3.8 to avoid gradio error messages.
We use our own mental dataset to finetune model , can download this model in here: https://huggingface.co/huangyt/module_v7
cd taiwan_llama/module
git Ifs install
git clone https://huggingface.co/huangyt/module_v7
python launch.py