Skip to content

saaip7/wuwa-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

βš”οΈ WuWa Assistant

An intelligent chatbot assistant for Wuthering Waves that provides character builds, team compositions, and gameplay strategies using Retrieval Augmented Generation (RAG) and Large Language Models (LLMs).


πŸ“– Table of Contents


🧠 How It Works

The Problem

Traditional chatbots hallucinate or provide outdated information because they rely solely on their training data. For a game like Wuthering Waves with frequent updates, patches, and new characters, we need real-time, accurate, source-attributed information.

The Solution: RAG (Retrieval Augmented Generation)

This project implements a RAG pipeline that combines:

  1. Semantic Search - Find relevant character data based on user queries
  2. Context Injection - Feed retrieved data to the LLM
  3. Grounded Generation - LLM generates answers based on actual game data

Flow:

User Query β†’ Embedding β†’ Vector Search β†’ Retrieve Top-K Docs β†’ 
Inject into Prompt β†’ LLM Generation β†’ Response with Sources

πŸ—οΈ Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  User Interface β”‚  (Streamlit Web App)
β”‚   - Chat Input  β”‚
β”‚   - History     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         RAG Engine (LangChain)              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  1. Query Embedding                         β”‚
β”‚     └─→ OpenAI Embeddings API               β”‚
β”‚                                              β”‚
β”‚  2. Vector Similarity Search                β”‚
β”‚     └─→ ChromaDB (Local Vector Store)       β”‚
β”‚                                              β”‚
β”‚  3. Context Retrieval                       β”‚
β”‚     └─→ Top-K Most Relevant Characters      β”‚
β”‚                                              β”‚
β”‚  4. Prompt Engineering                      β”‚
β”‚     └─→ System Prompt + Context + Query     β”‚
β”‚                                              β”‚
β”‚  5. LLM Generation                          β”‚
β”‚     └─→ OpenAI GPT-4 API                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Response      β”‚  (Answer + Source Attribution)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Components:

1. Data Layer (data/characters.json)

  • Stores structured character information (40+ characters)
  • Scraped from Prydwen.gg using Playwright automation
  • Auto-cleaned and formatted during scraping

2. Embedding Layer (OpenAI Embeddings)

  • Converts text into high-dimensional vectors (1536 dimensions)
  • Captures semantic meaning, not just keywords
  • Example: "best DPS" and "highest damage dealer" β†’ similar vectors

3. Vector Database (ChromaDB)

  • Stores character embeddings for fast similarity search
  • Automatically rebuilds when data changes
  • Enables sub-100ms semantic search across all characters

4. RAG Engine (src/rag_engine.py)

  • Orchestrates retrieval and generation
  • Implements conversational memory (chat history)
  • Handles context window management

5. LLM Layer (OpenAI GPT-4)

  • Generates natural language responses
  • Grounded in retrieved character data
  • Provides explanations and recommendations

πŸ” RAG System Explained

Why RAG Instead of Fine-Tuning?

Approach RAG (Our Choice) Fine-Tuning
Update Speed Instant (just update JSON) Requires retraining
Cost Low (only API calls) High (GPU, training time)
Accuracy High (uses latest data) Outdated after training
Source Attribution Yes (shows which character) No
Hallucination Risk Low (grounded in data) Higher

How RAG Prevents Hallucinations

  1. Retrieval Before Generation: LLM only sees actual game data
  2. Source Attribution: Every answer cites which characters were used
  3. Explicit Instructions: System prompt enforces "only use provided context"

Example Query Flow:

# User asks: "What's the best build for Jiyan?"

# Step 1: Embed query
query_vector = embed("What's the best build for Jiyan?")

# Step 2: Vector search in ChromaDB
results = chromadb.similarity_search(query_vector, top_k=3)
# Returns: [Jiyan data, Mortefi data, Verina data]

# Step 3: Build context
context = f"""
Character: Jiyan
Element: Aero
Best Echo Set: Sierra Gale (5pc)
Main Stats: Crit Rate/Crit DMG, ATK%, Aero DMG
Best Weapons: Verdant Summit, Emerald of Genesis
...
"""

# Step 4: Inject into prompt
prompt = f"""
You are a Wuthering Waves expert. Answer based on this data:

{context}

User question: What's the best build for Jiyan?
"""

# Step 5: LLM generates grounded response
response = gpt4(prompt)
# Output: "For Jiyan, the optimal build uses Sierra Gale 5pc set..."

Conversational Memory

The system maintains chat history using LangChain's ConversationBufferMemory:

# First question
User: "What's the best build for Jiyan?"
AI: "Jiyan works best with Sierra Gale 5pc, Crit Rate/DMG stats..."

# Follow-up question (context aware!)
User: "What weapons should I use for him?"
AI: "For Jiyan, the best weapons are Verdant Summit or Emerald of Genesis..."

Memory stores:

  • Last 5-10 conversation turns
  • Automatically summarizes if context gets too long
  • Enables natural back-and-forth dialogue

🎯 Features

Current Features:

  • βœ… Character Build Recommendations - Echo sets, stats, weapons
  • βœ… Team Composition Suggestions - Synergy analysis
  • βœ… Conversational Interface - Natural language queries
  • βœ… Source Attribution - Shows which characters data came from
  • βœ… Semantic Search - Understands intent, not just keywords
  • βœ… Auto-Scraping - Updates character data from Prydwen.gg

Smart Query Understanding:

The system understands various ways to ask the same thing:

  • "Best DPS" = "Highest damage" = "Top damage dealers"
  • "Team comp" = "Team composition" = "Who works well together"
  • "Build for Jiyan" = "How to build Jiyan" = "Jiyan build guide"

πŸ› οΈ Tech Stack

Core Technologies:

Component Technology Purpose
Language Python 3.11 Core application
LLM OpenAI GPT-4 Natural language generation
Embeddings OpenAI text-embedding-3-small Text vectorization
Vector DB ChromaDB Semantic search
RAG Framework LangChain Orchestration & chains
Web Framework Streamlit User interface
Web Scraping Playwright + BeautifulSoup4 Data extraction

Why These Choices?

OpenAI GPT-4:

  • Best-in-class reasoning and instruction following
  • Consistent output quality
  • Good at understanding gaming terminology

ChromaDB:

  • Lightweight, embedded (no separate server needed)
  • Fast similarity search (<100ms)
  • Automatic persistence

LangChain:

  • Pre-built RAG chains
  • Memory management
  • Easy prompt templating

Streamlit:

  • Rapid UI development
  • Built-in chat interface
  • Easy deployment

οΏ½ Installation

Prerequisites

  • Python 3.8 or higher
  • OpenAI API key (Get one here)
  • ~500MB disk space (for dependencies + embeddings)

Setup Steps

  1. Clone the repository

    git clone https://github.com/saaip7/wuwa-assistant.git
    cd wuwa-assistant
  2. Create virtual environment

    python -m venv venv
    
    # Windows
    venv\Scripts\activate
    
    # macOS/Linux
    source venv/bin/activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Configure environment variables

    # Create .env file
    echo "OPENAI_API_KEY=your_api_key_here" > .env
    
    # Or manually edit .env:
    OPENAI_API_KEY=xxxxx
    OPENAI_MODEL=gpt-4
    OPENAI_EMBEDDING_MODEL=text-embedding-3-small
  5. Verify installation

    python -c "from src.rag_engine import WuWaRAG; print('βœ… Setup complete!')"

▢️ Usage

Starting the App

streamlit run app.py

The app will open automatically at http://localhost:8501

Example Queries

Character Builds:

"What's the best build for Jiyan?"
"How should I build Roccia?"
"Optimal echo set for Calcharo?"

Team Compositions:

"Best team for Jiyan?"
"Who works well with Rover Havoc?"
"Build a team around Encore"

Comparisons:

"Compare Jiyan vs Calcharo"
"Who's better for Aero DPS: Jiyan or Aalto?"
"Difference between Verina and Baizhi?"

General Questions:

"Best Electro DPS characters?"
"Top 5 main DPS?"
"Which 4-star supports are good?"

Using the Chat Interface

  1. Type your question in the chat input
  2. Press Enter or click Send
  3. View the response with source attribution
  4. Ask follow-up questions - the system remembers context!

Pro Tips:

  • Be specific: "Best build for Jiyan DPS" > "Jiyan"
  • Ask follow-ups: "What about his weapons?" after asking about builds
  • Compare characters: "Who's better for X role?"

πŸ“Š Data Management

Current Data

  • 40+ characters from Prydwen.gg
  • Auto-updated via web scraping
  • Includes: builds, weapons, teams, stats

Updating Character Data

Option 1: Re-scrape from Prydwen.gg

# Full scrape (all characters)
python src/scraper.py

# Validate data quality
python scripts/validate_scraped_data.py

# Import to database (merge with existing)
python scripts/import_characters.py --strategy merge

# Rebuild vector database
Remove-Item -Recurse -Force chroma_db  # Windows
rm -rf chroma_db/                      # macOS/Linux

Option 2: Manual Edit

# Edit data/characters.json directly
# Then restart the app (ChromaDB auto-rebuilds)
streamlit run app.py

Data Schema

Each character requires these fields:

{
  "name": "Character Name",
  "element": "Aero|Electro|Fusion|Glacio|Havoc|Spectro",
  "weapon": "Broadblade|Sword|Pistols|Gauntlets|Rectifier",
  "role": "Main DPS|Sub DPS|Support|Healer",
  "rarity": "4-star|5-star",
  "best_echo_set": "Echo set recommendation",
  "main_stats_priority": "Stat priority string",
  "sub_stats_priority": "Sub stat priority string",
  "best_weapons": ["weapon1", "weapon2"],
  "team_synergies": ["character1", "character2"],
  "notes": "Additional notes"
}

πŸ”§ Advanced Configuration

Tuning RAG Performance

Edit src/rag_engine.py:

# Retrieval settings
TOP_K_RESULTS = 5  # More results = more context but slower
SIMILARITY_THRESHOLD = 0.7  # Lower = more permissive retrieval

# LLM settings
TEMPERATURE = 0.7  # Lower = more focused, Higher = more creative
MAX_TOKENS = 800  # Response length limit

Custom System Prompt

Edit the system prompt in src/rag_engine.py to change AI behavior:

SYSTEM_PROMPT = """
You are a Wuthering Waves expert assistant.
Answer based ONLY on the provided character data.
Be concise, accurate, and cite your sources.
"""

🀝 Contributing

Contributions welcome! This is a learning project focused on:

  • RAG implementation patterns
  • LLM application architecture
  • Web scraping automation
  • Vector database usage

Feel free to open issues or PRs!


πŸ“ License

MIT License - free to use for learning purposes


πŸ™ Acknowledgments


πŸ“š Learn More

RAG Resources:

Related Projects:


Built with ❀️

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages