AIMouto is a 3D virtual sister chatbot that combines VRM (Virtual Reality Model) technology with Large Language Models (LLM) to create an interactive character capable of expressing emotions through facial expressions, animations, and speech.
- Custom VRM model integration
- Real-time facial expressions and animations
- Eye tracking and camera following system
- Lip sync with speech output
- Five distinct emotional states:
- Neutral: Default state
- Joy: Happy expressions
- Angry: Upset expressions
- Sorrow: Sad expressions
- Fun: Playful expressions
- Dynamic emotion intensity control using [face:intensity:emotion] format
- Smooth animation transitions between emotional states
- Text-based input/output interface
- Text-to-Speech (TTS) integration
- Real-time lip synchronization
- Natural conversation flow with context awareness
- Three.js for 3D rendering
- @pixiv/three-vrm for VRM model handling
- Web Speech API for TTS functionality
- Google's Gemini API for natural language processing
- Vercel for serverless backend deployment
Vercel provides the serverless infrastructure for AIMouto's backend API:
- Automatic deployments from Git
- Serverless API endpoints in
/apidirectory - Zero-configuration edge network deployment
- Built-in development environment with
vercel dev - Seamless integration with frontend assets
/aimouto
├── api/ # Backend API handlers
├── assets/ # Static assets
│ ├── anims/ # Animation files
│ └── models/ # VRM models
├── main.js # Main application logic
├── loadMixamoAnimation.js # Animation loader
└── mixamoVRMRigMap.js # VRM rigging mappings
- Clone the repository
- Copy
.env.exampleto.envand set your API keys:GEMINI_API_KEY=your_api_key GEMINI_MODEL_NAME=gemini-2.0-flash - Install dependencies:
cd api npm install - Start the development server:
This will start the Vercel development environment for the backend API.
vercel dev
- In a new terminal, serve the root directory with a web server
- VRM model creation and import
- Emotion types and weight implementation
- Neutral idle animation
- Text input/output integration
- TTS implementation
- Basic lip sync during speech
- Camera gaze following
- Voice synthesis output
- Advanced voice synthesis model integration
MTI License