An intelligent chatbot assistant for Wuthering Waves that provides character builds, team compositions, and gameplay strategies using Retrieval Augmented Generation (RAG) and Large Language Models (LLMs).
- How It Works
- Architecture Overview
- RAG System Explained
- Features
- Tech Stack
- Installation
- Usage
- Data Management
- Project Structure
Traditional chatbots hallucinate or provide outdated information because they rely solely on their training data. For a game like Wuthering Waves with frequent updates, patches, and new characters, we need real-time, accurate, source-attributed information.
This project implements a RAG pipeline that combines:
- Semantic Search - Find relevant character data based on user queries
- Context Injection - Feed retrieved data to the LLM
- Grounded Generation - LLM generates answers based on actual game data
Flow:
User Query β Embedding β Vector Search β Retrieve Top-K Docs β
Inject into Prompt β LLM Generation β Response with Sources
βββββββββββββββββββ
β User Interface β (Streamlit Web App)
β - Chat Input β
β - History β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β RAG Engine (LangChain) β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β 1. Query Embedding β
β βββ OpenAI Embeddings API β
β β
β 2. Vector Similarity Search β
β βββ ChromaDB (Local Vector Store) β
β β
β 3. Context Retrieval β
β βββ Top-K Most Relevant Characters β
β β
β 4. Prompt Engineering β
β βββ System Prompt + Context + Query β
β β
β 5. LLM Generation β
β βββ OpenAI GPT-4 API β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Response β (Answer + Source Attribution)
βββββββββββββββββββ
- Stores structured character information (40+ characters)
- Scraped from Prydwen.gg using Playwright automation
- Auto-cleaned and formatted during scraping
- Converts text into high-dimensional vectors (1536 dimensions)
- Captures semantic meaning, not just keywords
- Example: "best DPS" and "highest damage dealer" β similar vectors
- Stores character embeddings for fast similarity search
- Automatically rebuilds when data changes
- Enables sub-100ms semantic search across all characters
- Orchestrates retrieval and generation
- Implements conversational memory (chat history)
- Handles context window management
- Generates natural language responses
- Grounded in retrieved character data
- Provides explanations and recommendations
| Approach | RAG (Our Choice) | Fine-Tuning |
|---|---|---|
| Update Speed | Instant (just update JSON) | Requires retraining |
| Cost | Low (only API calls) | High (GPU, training time) |
| Accuracy | High (uses latest data) | Outdated after training |
| Source Attribution | Yes (shows which character) | No |
| Hallucination Risk | Low (grounded in data) | Higher |
- Retrieval Before Generation: LLM only sees actual game data
- Source Attribution: Every answer cites which characters were used
- Explicit Instructions: System prompt enforces "only use provided context"
Example Query Flow:
# User asks: "What's the best build for Jiyan?"
# Step 1: Embed query
query_vector = embed("What's the best build for Jiyan?")
# Step 2: Vector search in ChromaDB
results = chromadb.similarity_search(query_vector, top_k=3)
# Returns: [Jiyan data, Mortefi data, Verina data]
# Step 3: Build context
context = f"""
Character: Jiyan
Element: Aero
Best Echo Set: Sierra Gale (5pc)
Main Stats: Crit Rate/Crit DMG, ATK%, Aero DMG
Best Weapons: Verdant Summit, Emerald of Genesis
...
"""
# Step 4: Inject into prompt
prompt = f"""
You are a Wuthering Waves expert. Answer based on this data:
{context}
User question: What's the best build for Jiyan?
"""
# Step 5: LLM generates grounded response
response = gpt4(prompt)
# Output: "For Jiyan, the optimal build uses Sierra Gale 5pc set..."The system maintains chat history using LangChain's ConversationBufferMemory:
# First question
User: "What's the best build for Jiyan?"
AI: "Jiyan works best with Sierra Gale 5pc, Crit Rate/DMG stats..."
# Follow-up question (context aware!)
User: "What weapons should I use for him?"
AI: "For Jiyan, the best weapons are Verdant Summit or Emerald of Genesis..."Memory stores:
- Last 5-10 conversation turns
- Automatically summarizes if context gets too long
- Enables natural back-and-forth dialogue
- β Character Build Recommendations - Echo sets, stats, weapons
- β Team Composition Suggestions - Synergy analysis
- β Conversational Interface - Natural language queries
- β Source Attribution - Shows which characters data came from
- β Semantic Search - Understands intent, not just keywords
- β Auto-Scraping - Updates character data from Prydwen.gg
The system understands various ways to ask the same thing:
- "Best DPS" = "Highest damage" = "Top damage dealers"
- "Team comp" = "Team composition" = "Who works well together"
- "Build for Jiyan" = "How to build Jiyan" = "Jiyan build guide"
| Component | Technology | Purpose |
|---|---|---|
| Language | Python 3.11 | Core application |
| LLM | OpenAI GPT-4 | Natural language generation |
| Embeddings | OpenAI text-embedding-3-small | Text vectorization |
| Vector DB | ChromaDB | Semantic search |
| RAG Framework | LangChain | Orchestration & chains |
| Web Framework | Streamlit | User interface |
| Web Scraping | Playwright + BeautifulSoup4 | Data extraction |
OpenAI GPT-4:
- Best-in-class reasoning and instruction following
- Consistent output quality
- Good at understanding gaming terminology
ChromaDB:
- Lightweight, embedded (no separate server needed)
- Fast similarity search (<100ms)
- Automatic persistence
LangChain:
- Pre-built RAG chains
- Memory management
- Easy prompt templating
Streamlit:
- Rapid UI development
- Built-in chat interface
- Easy deployment
- Python 3.8 or higher
- OpenAI API key (Get one here)
- ~500MB disk space (for dependencies + embeddings)
-
Clone the repository
git clone https://github.com/saaip7/wuwa-assistant.git cd wuwa-assistant -
Create virtual environment
python -m venv venv # Windows venv\Scripts\activate # macOS/Linux source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Configure environment variables
# Create .env file echo "OPENAI_API_KEY=your_api_key_here" > .env # Or manually edit .env: OPENAI_API_KEY=xxxxx OPENAI_MODEL=gpt-4 OPENAI_EMBEDDING_MODEL=text-embedding-3-small
-
Verify installation
python -c "from src.rag_engine import WuWaRAG; print('β Setup complete!')"
streamlit run app.pyThe app will open automatically at http://localhost:8501
Character Builds:
"What's the best build for Jiyan?"
"How should I build Roccia?"
"Optimal echo set for Calcharo?"
Team Compositions:
"Best team for Jiyan?"
"Who works well with Rover Havoc?"
"Build a team around Encore"
Comparisons:
"Compare Jiyan vs Calcharo"
"Who's better for Aero DPS: Jiyan or Aalto?"
"Difference between Verina and Baizhi?"
General Questions:
"Best Electro DPS characters?"
"Top 5 main DPS?"
"Which 4-star supports are good?"
- Type your question in the chat input
- Press Enter or click Send
- View the response with source attribution
- Ask follow-up questions - the system remembers context!
Pro Tips:
- Be specific: "Best build for Jiyan DPS" > "Jiyan"
- Ask follow-ups: "What about his weapons?" after asking about builds
- Compare characters: "Who's better for X role?"
- 40+ characters from Prydwen.gg
- Auto-updated via web scraping
- Includes: builds, weapons, teams, stats
Option 1: Re-scrape from Prydwen.gg
# Full scrape (all characters)
python src/scraper.py
# Validate data quality
python scripts/validate_scraped_data.py
# Import to database (merge with existing)
python scripts/import_characters.py --strategy merge
# Rebuild vector database
Remove-Item -Recurse -Force chroma_db # Windows
rm -rf chroma_db/ # macOS/LinuxOption 2: Manual Edit
# Edit data/characters.json directly
# Then restart the app (ChromaDB auto-rebuilds)
streamlit run app.pyEach character requires these fields:
{
"name": "Character Name",
"element": "Aero|Electro|Fusion|Glacio|Havoc|Spectro",
"weapon": "Broadblade|Sword|Pistols|Gauntlets|Rectifier",
"role": "Main DPS|Sub DPS|Support|Healer",
"rarity": "4-star|5-star",
"best_echo_set": "Echo set recommendation",
"main_stats_priority": "Stat priority string",
"sub_stats_priority": "Sub stat priority string",
"best_weapons": ["weapon1", "weapon2"],
"team_synergies": ["character1", "character2"],
"notes": "Additional notes"
}Edit src/rag_engine.py:
# Retrieval settings
TOP_K_RESULTS = 5 # More results = more context but slower
SIMILARITY_THRESHOLD = 0.7 # Lower = more permissive retrieval
# LLM settings
TEMPERATURE = 0.7 # Lower = more focused, Higher = more creative
MAX_TOKENS = 800 # Response length limitEdit the system prompt in src/rag_engine.py to change AI behavior:
SYSTEM_PROMPT = """
You are a Wuthering Waves expert assistant.
Answer based ONLY on the provided character data.
Be concise, accurate, and cite your sources.
"""Contributions welcome! This is a learning project focused on:
- RAG implementation patterns
- LLM application architecture
- Web scraping automation
- Vector database usage
Feel free to open issues or PRs!
MIT License - free to use for learning purposes
- Data Source: Prydwen.gg
- Powered By: LangChain, OpenAI, ChromaDB
- Game: Wuthering Waves by Kuro Games
RAG Resources:
Related Projects:
Built with β€οΈ