Skip to content

OpusPlays/Pokemon-Opus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pokémon Opus banner

Pokémon Opus

An autonomous AI playthrough of Pokémon Blue, powered by Claude Opus 4.7.
No human input. No save scumming. No walkthroughs. Just the model, the screen, and the buttons.

🌐 opusplays.com  ·  𝕏 @OpusPlays  ·  Quick Start  ·  Architecture  ·  How it works

Model Game Python Status License


🎮 What is Pokémon Opus?

Pokémon Opus is an autonomous AI playthrough of Pokémon Blue for Game Boy. Claude Opus 4.7 reads the screen and the emulator's RAM, decides which buttons to press, and works through the entire game on its own — choosing a starter, navigating routes, training a team, fighting gym leaders, and (eventually) taking down the Elite Four.

There is no human in the loop. No walkthrough is fed in. No script is hard-coded for the routes. The model gets the same information any player would — the screen, the party, the bag, the current dialog — and has to figure out what to do.

This repository contains the agent and game server. The companion live dashboard at opusplays.com displays the current team, badges earned, Pokédex progress, and a streaming feed of the model's reasoning, in real time.

Watching an AI think out loud while it tries to find Misty is a very specific kind of fun.


✨ Highlights

  • 🧠 Claude Opus 4.7 (1M context) — the entire run history fits in a single conversation; no summarization required for the first ~150 hours of play
  • 🎮 Headless PyBoy emulator with full RAM reading for Gen 1 (party, bag, badges, location, every flag)
  • 🎯 Long-horizon objectives — the agent maintains its own scratchpad of goals, attempts, and failures across thousands of turns
  • 🧩 Mode-aware reasoning — different sub-agents handle exploration, battles, dialog, and menus, each with their own tailored prompt
  • 🔁 Streaming events — every decision is broadcast via WebSocket for the live dashboard at opusplays.com
  • 🚫 Pure model-on-game — no save scumming, no external lookups, no human nudges

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│  PokemonOpenClaude (game server, port 8765)                 │
│  Headless PyBoy emulator + RAM reader + REST API            │
└──────────────────────────┬──────────────────────────────────┘
                           │ HTTP (GET /state, POST /action,
                           │       GET /screenshot)
┌──────────────────────────▼──────────────────────────────────┐
│  Pokémon Opus Backend (Python, port 3000)                   │
│                                                             │
│  ┌─────────────┐  ┌────────────┐  ┌──────────────────────┐ │
│  │ Orchestrator│  │ LLM Client │  │ Streaming Server     │ │
│  │ (state      │  │ (Anthropic,│  │ (FastAPI + WebSocket)│ │
│  │  machine)   │  │  OpenAI,   │  └──────────┬───────────┘ │
│  │             │  │  local)    │             │             │
│  │ ┌─────────┐ │  └────────────┘             │             │
│  │ │ Explore │ │                             │             │
│  │ │ Battle  │ │  ┌────────────┐             │             │
│  │ │ Menu    │ │  │ Memory     │             │             │
│  │ │ Strategy│ │  │ Objectives │             │             │
│  │ └─────────┘ │  │ Map Graph  │             │             │
│  └─────────────┘  │ Context    │             │             │
│                   └────────────┘             │             │
└──────────────────────────────────────────────┼─────────────┘
                                               │ WebSocket
┌──────────────────────────────────────────────▼─────────────┐
│  React Viewer (TypeScript, port 5173)                      │
│  Live game screen, AI reasoning, team, map, objectives     │
└────────────────────────────────────────────────────────────┘
                                               │
                                               │ HTTPS
┌──────────────────────────────────────────────▼─────────────┐
│  opusplays.com — public live dashboard                     │
│  Team, badges, Pokédex, milestones, FAQ, activity feed     │
└────────────────────────────────────────────────────────────┘

The system is built on three open-source foundations:

Project Role
PokemonOpenClaude Headless Game Boy emulator with REST API and full Gen 1 RAM reading
Zork-Opus Proven AI game-playing architecture — memory, objectives, multi-model orchestration
Archon Infrastructure patterns for streaming, events, and React dashboards

🧠 Features

AI Brain (adapted from Zork-Opus)

  • Game-mode state machine — explore → battle → dialog → menu → intro
  • Mode-specific sub-agents with tailored prompts and heuristics (pokemon_opus/agents/)
  • Battle agent with the full Gen 1 type chart (including the bugs — Ghost doesn't hit Psychic, Psychic is OP), STAB awareness, and a heuristic fast-path that skips the LLM for trivial wild encounters
  • Dual-cache memory system — persistent cross-episode memory plus an ephemeral working set
  • Strategic objective generation with gym-progression planning and Pokédex-completion targets
  • Map graph with BFS pathfinding and exploration-frontier tracking
  • Stuck detection, oscillation warnings, auto-save

Game Interface

  • Talks to PokemonOpenClaude via REST API
  • Full Gen 1 RAM state: party, bag, battle, dialog, map, badges, Pokédex
  • Frame-accurate button input (respects Game Boy timing — no rapid-fire glitches)
  • Screenshot capture for the viewer and for vision-model analysis

LLM Integration

  • Per-role models — exploration, battle, strategy, and memory each pick their own backend
  • Anthropic, OpenRouter, and local LLM support out of the box
  • Circuit breaker with exponential backoff retry
  • Token and cost tracking — know exactly how much each gym leader cost in API spend

Live Viewer

  • React + TypeScript + Tailwind v4
  • Live game screen with pixel-perfect rendering
  • Streaming AI reasoning panel
  • Team display with Pokémon sprites and HP bars
  • Badge timeline, objectives, inventory, milestones
  • WebSocket with auto-reconnection

🚀 Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+ (for the local viewer)
  • A Pokémon Blue ROMPokemon - Blue Version (USA, Europe).gb (legally dump your own copy)
  • An Anthropic API key (or any OpenAI-compatible endpoint)

1. Start the game server

# Clone and install PokemonOpenClaude
git clone https://github.com/NousResearch/pokemon-agent
cd pokemon-agent
pip install -e ".[all]"

# Start it with your ROM
pokemon-agent serve --rom "path/to/Pokemon - Blue Version (USA, Europe).gb" --port 8765

2. Configure Pokémon Opus

git clone https://github.com/OpusPlays/Pokemon-Opus
cd Pokemon-Opus

# Create .env from the example
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY

# Install Python dependencies
pip install -e .

3. Run the agent

python -m pokemon_opus.main

The agent starts polling the game server and making decisions. You'll see logs scrolling in your terminal — the model's current goal, what it sees, what it chose to do, and why.

4. (Optional) Start the local viewer

cd viewer
npm install
npm run dev
# Open http://localhost:5173

For the public-facing dashboard, see opusplays.com.


⚙️ Configuration

All settings live in pyproject.toml under [tool.pokemon-opus]:

[tool.pokemon-opus.game]
server_url = "http://localhost:8765"
max_turns_per_episode = 10000
save_interval = 50

[tool.pokemon-opus.llm]
client_base_url = "https://api.anthropic.com/v1"
agent_model      = "claude-opus-4-20250514"
battle_model     = "claude-opus-4-20250514"   # can swap for a faster local model
strategist_model = "claude-opus-4-20250514"
memory_model     = "claude-opus-4-20250514"

[tool.pokemon-opus.llm.battle_sampling]
temperature = 0.3
max_tokens  = 2048

Per-role model configuration

Role Purpose Recommended
agent Overworld exploration decisions Opus (needs reasoning)
battle Move selection, switching, item use Opus, or a fast local model
strategist Long-term planning, objective updates Opus
memory Memory synthesis between turns Opus or Sonnet

Each role can have its own base_url, model, and sampling parameters — mix and match providers freely.

Environment variables

See .env.example:

ANTHROPIC_API_KEY=sk-ant-...                # default provider
OPENROUTER_API_KEY=sk-or-...                # multi-model support
LOCAL_LLM_BASE_URL=http://localhost:8082/v1 # for local fast tactical models
GAME_SERVER_URL=http://localhost:8765       # PokemonOpenClaude

🔁 How a Turn Works

Each turn of the agent follows an 11-phase loop:

  1. Read stateGET /state from the emulator's RAM
  2. Detect mode — battle? dialog? menu? intro? → fall back to explore
  3. Route to agent — pass to the mode-specific sub-agent
  4. Execute actionsPOST /action with the chosen button presses
  5. Read post-state — capture what changed
  6. Compute deltas — location, badges, party, items, battle results
  7. Record history — append to the action log with the model's reasoning
  8. Track milestones — badges, catches, level-ups, evolutions
  9. Memory synthesis — create or update location/trainer/strategy memories
  10. Map update — record visits, connections, and warps
  11. Stream to viewer — broadcast new state + screenshot via WebSocket

Memory Categories

The memory manager classifies what the agent learns into typed buckets so older info can be retrieved by relevance, not just recency:

Category Persistence Example
ROUTE Core "Route 3 connects Pewter City to Mt. Moon"
TRAINER Permanent "Bug Catcher on Route 3 has Caterpie Lv9"
ITEM Permanent "Found Potion at Viridian Forest (12, 8)"
POKEMON Permanent "Pikachu spawns in Viridian Forest"
BATTLE Permanent "Brock's Onix is Lv14, Water Gun was super effective"
STRATEGY Permanent "Need Lv16 minimum before challenging Misty"
LANDMARK Core "Pokémon Center in Cerulean City at map ID 3"

Gen 1 Battle Intelligence

  • Full type effectiveness chart with Gen 1 quirks (no Dark/Steel, Bug→Poison is super effective, Psychic resists nothing useful, etc.)
  • STAB (Same Type Attack Bonus) awareness baked into move scoring
  • Move type guessing from name keywords when the bag/move metadata is incomplete
  • Heuristic fast-path that skips the LLM entirely for clearly-winning matchups against trash-tier wild Pokémon — saves a lot of tokens
  • LLM fallback for any non-trivial trainer battle, switching decisions, or status interactions

📁 Project Structure

Pokemon-Opus/
├── pokemon_opus/
│   ├── main.py              # Entry point — spins up the orchestrator
│   ├── config.py            # Pydantic config from TOML + env
│   ├── game_client.py       # HTTP client for PokemonOpenClaude
│   ├── orchestrator.py      # Game-mode state machine + turn loop
│   ├── state.py             # GameState, Pokemon, Objective models
│   ├── agents/
│   │   ├── explore.py       # Overworld navigation
│   │   ├── battle.py        # Battle decisions (type-aware)
│   │   ├── menu.py          # Dialog / menu handling (mostly mechanical)
│   │   ├── intro.py         # Intro / starter pick / new game flow
│   │   └── strategist.py    # Long-term planning + objectives
│   ├── memory/
│   │   └── manager.py       # Dual-cache memory system
│   ├── objectives/
│   │   └── manager.py       # Objective lifecycle tracking
│   ├── map/
│   │   └── graph.py         # Room connectivity + BFS pathfinding
│   ├── context/
│   │   └── builder.py       # Per-mode prompt assembly
│   ├── llm/
│   │   └── client.py        # Multi-provider LLM client
│   ├── streaming/
│   │   └── server.py        # FastAPI + WebSocket for the viewer
│   └── data/
│       ├── type_chart.py    # Gen 1 type effectiveness (with all the quirks)
│       └── map_data.py      # Gym order, HMs, progression milestones
├── viewer/                  # React + TypeScript local frontend
│   └── src/
│       ├── components/      # GameScreen, TeamPanel, MapView, etc.
│       ├── hooks/           # useWebSocket
│       └── lib/             # Types, sprite URLs, colors
├── tests/                   # pytest suite
├── memories.md              # Long-form notes the agent has accumulated
├── pyproject.toml           # Config + dependencies
└── .env.example             # API keys template

🌐 Live Dashboard

The public-facing dashboard at opusplays.com is a Next.js site that polls the agent's streaming server and renders:

  • 🟢 Live status — current location, current goal, playtime timer
  • 🐾 Current team — sprites, levels, HP bars, statuses, known moves
  • 🏆 Gym badges — 8-badge grid that lights up as they're earned
  • 📖 Pokédex progress — 151-cell visual grid (caught / seen / unknown)
  • 📰 Activity feed — the model's actions, thoughts, battles, milestones, and deaths in real time
  • 🪜 Milestones timeline — every badge, evolution, region transition, and party wipe
  • FAQ + run rules — what counts as fair play

It polls every 5 seconds and updates without a page refresh.


🗺️ Roadmap

  • Make it past Brock
  • Survive Mt. Moon
  • Beat the Cerulean Rocket Grunt without panicking
  • HMs (Cut, Surf, Strength, Flash, Fly)
  • Defeat the Elite Four
  • Catch Mewtwo
  • Complete the Pokédex (151/151)

Follow @OpusPlays for milestone announcements.


🤝 Contributing

Issues and PRs welcome — especially:

  • Better prompts for stuck situations (puzzle gyms, Team Rocket hideouts, the Safari Zone)
  • Move metadata improvements (more accurate damage estimates)
  • Viewer features — anything you'd want to see while watching the run
  • Bug reports with the exact game state where the agent got stuck (memories.md is dumped on every save)

If you want to run your own AI Pokémon agent off this codebase, please credit the upstream projects below.


🙏 Credits


📜 License

MIT — see LICENSES/ for the full texts of bundled dependencies.

Pokémon and all related media are © Nintendo / Game Freak / Creatures. This is an unofficial fan project; no game code or copyrighted ROM data is distributed.


OpusPlays

Run by @OpusPlays · Watch live at opusplays.com

About

AI agent that plays Pokémon autonomously, powered by Claude Opus 4.7. Live at opusplays.com

Resources

Stars

Watchers

Forks

Contributors