⚔️ WuWa Assistant

An intelligent chatbot assistant for Wuthering Waves that provides character builds, team compositions, and gameplay strategies using Retrieval Augmented Generation (RAG) and Large Language Models (LLMs).

📖 Table of Contents

How It Works
Architecture Overview
RAG System Explained
Features
Tech Stack
Installation
Usage
Data Management
Project Structure

🧠 How It Works

The Problem

Traditional chatbots hallucinate or provide outdated information because they rely solely on their training data. For a game like Wuthering Waves with frequent updates, patches, and new characters, we need real-time, accurate, source-attributed information.

The Solution: RAG (Retrieval Augmented Generation)

This project implements a RAG pipeline that combines:

Semantic Search - Find relevant character data based on user queries
Context Injection - Feed retrieved data to the LLM
Grounded Generation - LLM generates answers based on actual game data

Flow:

User Query → Embedding → Vector Search → Retrieve Top-K Docs → 
Inject into Prompt → LLM Generation → Response with Sources

🏗️ Architecture Overview

┌─────────────────┐
│  User Interface │  (Streamlit Web App)
│   - Chat Input  │
│   - History     │
└────────┬────────┘
         │
         ▼
┌─────────────────────────────────────────────┐
│         RAG Engine (LangChain)              │
├─────────────────────────────────────────────┤
│  1. Query Embedding                         │
│     └─→ OpenAI Embeddings API               │
│                                              │
│  2. Vector Similarity Search                │
│     └─→ ChromaDB (Local Vector Store)       │
│                                              │
│  3. Context Retrieval                       │
│     └─→ Top-K Most Relevant Characters      │
│                                              │
│  4. Prompt Engineering                      │
│     └─→ System Prompt + Context + Query     │
│                                              │
│  5. LLM Generation                          │
│     └─→ OpenAI GPT-4 API                    │
└─────────────────────────────────────────────┘
         │
         ▼
┌─────────────────┐
│   Response      │  (Answer + Source Attribution)
└─────────────────┘

Key Components:

1. Data Layer (`data/characters.json`)

Stores structured character information (40+ characters)
Scraped from Prydwen.gg using Playwright automation
Auto-cleaned and formatted during scraping

2. Embedding Layer (OpenAI Embeddings)

Converts text into high-dimensional vectors (1536 dimensions)
Captures semantic meaning, not just keywords
Example: "best DPS" and "highest damage dealer" → similar vectors

3. Vector Database (ChromaDB)

Stores character embeddings for fast similarity search
Automatically rebuilds when data changes
Enables sub-100ms semantic search across all characters

4. RAG Engine (`src/rag_engine.py`)

Orchestrates retrieval and generation
Implements conversational memory (chat history)
Handles context window management

5. LLM Layer (OpenAI GPT-4)

Generates natural language responses
Grounded in retrieved character data
Provides explanations and recommendations

🔍 RAG System Explained

Why RAG Instead of Fine-Tuning?

Approach	RAG (Our Choice)	Fine-Tuning
Update Speed	Instant (just update JSON)	Requires retraining
Cost	Low (only API calls)	High (GPU, training time)
Accuracy	High (uses latest data)	Outdated after training
Source Attribution	Yes (shows which character)	No
Hallucination Risk	Low (grounded in data)	Higher

How RAG Prevents Hallucinations

Retrieval Before Generation: LLM only sees actual game data
Source Attribution: Every answer cites which characters were used
Explicit Instructions: System prompt enforces "only use provided context"

Example Query Flow:

# User asks: "What's the best build for Jiyan?"

# Step 1: Embed query
query_vector = embed("What's the best build for Jiyan?")

# Step 2: Vector search in ChromaDB
results = chromadb.similarity_search(query_vector, top_k=3)
# Returns: [Jiyan data, Mortefi data, Verina data]

# Step 3: Build context
context = f"""
Character: Jiyan
Element: Aero
Best Echo Set: Sierra Gale (5pc)
Main Stats: Crit Rate/Crit DMG, ATK%, Aero DMG
Best Weapons: Verdant Summit, Emerald of Genesis
...
"""

# Step 4: Inject into prompt
prompt = f"""
You are a Wuthering Waves expert. Answer based on this data:

{context}

User question: What's the best build for Jiyan?
"""

# Step 5: LLM generates grounded response
response = gpt4(prompt)
# Output: "For Jiyan, the optimal build uses Sierra Gale 5pc set..."

Conversational Memory

The system maintains chat history using LangChain's ConversationBufferMemory:

# First question
User: "What's the best build for Jiyan?"
AI: "Jiyan works best with Sierra Gale 5pc, Crit Rate/DMG stats..."

# Follow-up question (context aware!)
User: "What weapons should I use for him?"
AI: "For Jiyan, the best weapons are Verdant Summit or Emerald of Genesis..."

Memory stores:

Last 5-10 conversation turns
Automatically summarizes if context gets too long
Enables natural back-and-forth dialogue

🎯 Features

Current Features:

✅ Character Build Recommendations - Echo sets, stats, weapons
✅ Team Composition Suggestions - Synergy analysis
✅ Conversational Interface - Natural language queries
✅ Source Attribution - Shows which characters data came from
✅ Semantic Search - Understands intent, not just keywords
✅ Auto-Scraping - Updates character data from Prydwen.gg

Smart Query Understanding:

The system understands various ways to ask the same thing:

"Best DPS" = "Highest damage" = "Top damage dealers"
"Team comp" = "Team composition" = "Who works well together"
"Build for Jiyan" = "How to build Jiyan" = "Jiyan build guide"

🛠️ Tech Stack

Core Technologies:

Component	Technology	Purpose
Language	Python 3.11	Core application
LLM	OpenAI GPT-4	Natural language generation
Embeddings	OpenAI text-embedding-3-small	Text vectorization
Vector DB	ChromaDB	Semantic search
RAG Framework	LangChain	Orchestration & chains
Web Framework	Streamlit	User interface
Web Scraping	Playwright + BeautifulSoup4	Data extraction

Why These Choices?

OpenAI GPT-4:

Best-in-class reasoning and instruction following
Consistent output quality
Good at understanding gaming terminology

ChromaDB:

Lightweight, embedded (no separate server needed)
Fast similarity search (<100ms)
Automatic persistence

LangChain:

Pre-built RAG chains
Memory management
Easy prompt templating

Streamlit:

Rapid UI development
Built-in chat interface
Easy deployment

� Installation

Prerequisites

Python 3.8 or higher
OpenAI API key (Get one here)
~500MB disk space (for dependencies + embeddings)

Setup Steps

Clone the repository

git clone https://github.com/saaip7/wuwa-assistant.git
cd wuwa-assistant

Create virtual environment

python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

Install dependencies
```
pip install -r requirements.txt
```

Configure environment variables

# Create .env file
echo "OPENAI_API_KEY=your_api_key_here" > .env

# Or manually edit .env:
OPENAI_API_KEY=xxxxx
OPENAI_MODEL=gpt-4
OPENAI_EMBEDDING_MODEL=text-embedding-3-small

Verify installation

python -c "from src.rag_engine import WuWaRAG; print('✅ Setup complete!')"

▶️ Usage

Starting the App

streamlit run app.py

The app will open automatically at http://localhost:8501

Example Queries

Character Builds:

"What's the best build for Jiyan?"
"How should I build Roccia?"
"Optimal echo set for Calcharo?"

Team Compositions:

"Best team for Jiyan?"
"Who works well with Rover Havoc?"
"Build a team around Encore"

Comparisons:

"Compare Jiyan vs Calcharo"
"Who's better for Aero DPS: Jiyan or Aalto?"
"Difference between Verina and Baizhi?"

General Questions:

"Best Electro DPS characters?"
"Top 5 main DPS?"
"Which 4-star supports are good?"

Using the Chat Interface

Type your question in the chat input
Press Enter or click Send
View the response with source attribution
Ask follow-up questions - the system remembers context!

Pro Tips:

Be specific: "Best build for Jiyan DPS" > "Jiyan"
Ask follow-ups: "What about his weapons?" after asking about builds
Compare characters: "Who's better for X role?"

📊 Data Management

Current Data

40+ characters from Prydwen.gg
Auto-updated via web scraping
Includes: builds, weapons, teams, stats

Updating Character Data

Option 1: Re-scrape from Prydwen.gg

# Full scrape (all characters)
python src/scraper.py

# Validate data quality
python scripts/validate_scraped_data.py

# Import to database (merge with existing)
python scripts/import_characters.py --strategy merge

# Rebuild vector database
Remove-Item -Recurse -Force chroma_db  # Windows
rm -rf chroma_db/                      # macOS/Linux

Option 2: Manual Edit

# Edit data/characters.json directly
# Then restart the app (ChromaDB auto-rebuilds)
streamlit run app.py

Data Schema

Each character requires these fields:

{
  "name": "Character Name",
  "element": "Aero|Electro|Fusion|Glacio|Havoc|Spectro",
  "weapon": "Broadblade|Sword|Pistols|Gauntlets|Rectifier",
  "role": "Main DPS|Sub DPS|Support|Healer",
  "rarity": "4-star|5-star",
  "best_echo_set": "Echo set recommendation",
  "main_stats_priority": "Stat priority string",
  "sub_stats_priority": "Sub stat priority string",
  "best_weapons": ["weapon1", "weapon2"],
  "team_synergies": ["character1", "character2"],
  "notes": "Additional notes"
}

🔧 Advanced Configuration

Tuning RAG Performance

Edit src/rag_engine.py:

# Retrieval settings
TOP_K_RESULTS = 5  # More results = more context but slower
SIMILARITY_THRESHOLD = 0.7  # Lower = more permissive retrieval

# LLM settings
TEMPERATURE = 0.7  # Lower = more focused, Higher = more creative
MAX_TOKENS = 800  # Response length limit

Custom System Prompt

Edit the system prompt in src/rag_engine.py to change AI behavior:

SYSTEM_PROMPT = """
You are a Wuthering Waves expert assistant.
Answer based ONLY on the provided character data.
Be concise, accurate, and cite your sources.
"""

🤝 Contributing

Contributions welcome! This is a learning project focused on:

RAG implementation patterns
LLM application architecture
Web scraping automation
Vector database usage

Feel free to open issues or PRs!

📝 License

MIT License - free to use for learning purposes

🙏 Acknowledgments

Data Source: Prydwen.gg
Powered By: LangChain, OpenAI, ChromaDB
Game: Wuthering Waves by Kuro Games

📚 Learn More

RAG Resources:

Related Projects:

Built with ❤️

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.claude		.claude
data		data
docs		docs
scripts		scripts
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

⚔️ WuWa Assistant

📖 Table of Contents

🧠 How It Works

The Problem

The Solution: RAG (Retrieval Augmented Generation)

🏗️ Architecture Overview

Key Components:

1. Data Layer (data/characters.json)

2. Embedding Layer (OpenAI Embeddings)

3. Vector Database (ChromaDB)

4. RAG Engine (src/rag_engine.py)

5. LLM Layer (OpenAI GPT-4)

🔍 RAG System Explained

Why RAG Instead of Fine-Tuning?

How RAG Prevents Hallucinations

Conversational Memory

🎯 Features

Current Features:

Smart Query Understanding:

🛠️ Tech Stack

Core Technologies:

Why These Choices?

� Installation

Prerequisites

Setup Steps

▶️ Usage

Starting the App

Example Queries

Using the Chat Interface

📊 Data Management

Current Data

Updating Character Data

Data Schema

🔧 Advanced Configuration

Tuning RAG Performance

Custom System Prompt

🤝 Contributing

📝 License

🙏 Acknowledgments

📚 Learn More

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Data Layer (`data/characters.json`)

4. RAG Engine (`src/rag_engine.py`)

Packages