Snake-With-Q-Learning

An interactive Snake game with an optional evolutionary Q-learning trainer. The game renders with pixels/winit and includes a semi-transparent control panel overlay. You can play manually or watch a population of agents learn to play via Q-learning with evolutionary strategies.

Features

Classic Snake on a fixed grid (800x600 window, 20px cells) with wrap-around edges.
Smooth pixel rendering with a checkerboard grid background and snake head “eyes”.
On-screen control panel with current score, length, speed, evolution status, epoch charts, and quick action buttons.
Q-learning agent with compact, vision-based state encoding (20-bit key) and three actions: turn left, go straight, turn right.
Evolutionary trainer (default population 24) running multiple agents in parallel, with elitism, mutation, and adaptive restarts on stagnation. Unique leader protection prevents premature epoch resets.
Auto-save and auto-load of the best (champion) agent to/from snake_agent.json.
GPU-aware training budget: if a GPU adapter is available, a higher steps-per-tick budget is used; toggle via keyboard.

Controls

Movement: Arrow keys or WASD
Pause/Resume: P
Restart game: R (when dead or from overlay button)
Toggle evolution: E
Adjust speed:
- Manual play: + / - change tick time
- Evolution: + doubles and - halves steps/frame (up to 100,000)
Save best agent: S
Toggle panel visibility: H
Toggle acceleration budget (CPU/GPU): G (only shown if a GPU is available)
Ultra-fast training (skip rendering, raise budget): U
Show only best agent during training: B
If built with feature gpu-nn: toggle experimental NN mode with N
Quit: Esc or close window
Mouse: Click panel buttons (Pause/Resume, Speed+, Restart, Save, Hide/Show)

Build and Run

Prerequisites:

Rust toolchain (stable)
Windows (tested), but should work on other platforms supported by pixels/winit.

Run (debug):

cargo run

Run (optimized):

cargo run --release

On start, the app tries to load snake_agent.json. If found, evolution auto-starts using the loaded agent as a seed.

How the learning works

State encoding (vision + context)

The agent observes an 8-cell neighborhood around the snake head in a direction-relative frame (3x3 area forward). Each cell is encoded with 2 bits:

00 = empty
01 = danger (wall/body)
10 = apple
11 = unused

This uses 16 bits. Additionally:

2 bits: relative direction to the apple (left/straight/right)
2 bits: Manhattan distance category to the apple (4 buckets)

Total: 20-bit state key (~1M states).

Actions

Three discrete actions relative to the current direction:

0 = turn left
1 = go straight
2 = turn right

Rewards

+10.0 for eating an apple, increasing slightly with length (+0.1 per current length)
Death penalty depends on cause: -30.0 for self-collision; -12.0 otherwise
Small step penalty (-0.005)
Shaping: +0.05 when moving closer and -0.03 when moving away; additional +0.02 when within 3 cells of the apple

QAgent parameters

epsilon-greedy with decay (epsilon, min_epsilon, decay)
learning rate alpha, discount gamma
steps and episodes counters recorded per agent

Evolutionary trainer

Population of agents (default 24), each playing in its own game instance in parallel
Per-epoch step limit with a “leader protection” exception that lets a unique best agent continue beyond the limit
At epoch end, reproduction with elitism + mutations; multiple staged restart strategies on long stagnation, seeding from the global champion
Tracks a global champion (best ever), with auto-save on improvement
Agents are color-coded for visualization

Code structure

src/main.rs — the main application with game logic, rendering, Q-learning agent, and evolutionary trainer.
src/gpu_nn.rs — optional experimental NN scaffolding (behind the gpu-nn feature).
snake_agent.json — saved champion agent (created at runtime when saving).

Tips

To start training from scratch, delete snake_agent.json or press E to toggle training and let evolution run.
At very high training speeds, frames are skipped and drawing can be disabled to maximize throughput.
Grid/cell sizes are constants near the top of main.rs and can be adjusted as needed.

License

MIT License — see LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
.idea		.idea
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
README.ru.md		README.ru.md
build_cuda.cmd		build_cuda.cmd
build_dqn_windows.cmd		build_dqn_windows.cmd
deny.toml		deny.toml
nvcc_wrap.bat		nvcc_wrap.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snake-With-Q-Learning

Features

Controls

Build and Run

How the learning works

State encoding (vision + context)

Actions

Rewards

QAgent parameters

Evolutionary trainer

Code structure

Tips

License

About

Uh oh!

Releases

Packages

Languages

License

DonTMover/Snake-With-Q-Learning

Folders and files

Latest commit

History

Repository files navigation

Snake-With-Q-Learning

Features

Controls

Build and Run

How the learning works

State encoding (vision + context)

Actions

Rewards

QAgent parameters

Evolutionary trainer

Code structure

Tips

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages