An interactive Snake game with an optional evolutionary Q-learning trainer. The game renders with pixels/winit and includes a semi-transparent control panel overlay. You can play manually or watch a population of agents learn to play via Q-learning with evolutionary strategies.
- Classic Snake on a fixed grid (800x600 window, 20px cells) with wrap-around edges.
- Smooth pixel rendering with a checkerboard grid background and snake head “eyes”.
- On-screen control panel with current score, length, speed, evolution status, epoch charts, and quick action buttons.
- Q-learning agent with compact, vision-based state encoding (20-bit key) and three actions: turn left, go straight, turn right.
- Evolutionary trainer (default population 24) running multiple agents in parallel, with elitism, mutation, and adaptive restarts on stagnation. Unique leader protection prevents premature epoch resets.
- Auto-save and auto-load of the best (champion) agent to/from
snake_agent.json. - GPU-aware training budget: if a GPU adapter is available, a higher steps-per-tick budget is used; toggle via keyboard.
- Movement: Arrow keys or WASD
- Pause/Resume: P
- Restart game: R (when dead or from overlay button)
- Toggle evolution: E
- Adjust speed:
- Manual play:
+/-change tick time - Evolution:
+doubles and-halves steps/frame (up to 100,000)
- Manual play:
- Save best agent: S
- Toggle panel visibility: H
- Toggle acceleration budget (CPU/GPU): G (only shown if a GPU is available)
- Ultra-fast training (skip rendering, raise budget): U
- Show only best agent during training: B
- If built with feature
gpu-nn: toggle experimental NN mode with N - Quit: Esc or close window
- Mouse: Click panel buttons (Pause/Resume, Speed+, Restart, Save, Hide/Show)
Prerequisites:
- Rust toolchain (stable)
- Windows (tested), but should work on other platforms supported by
pixels/winit.
Run (debug):
cargo runRun (optimized):
cargo run --releaseOn start, the app tries to load snake_agent.json. If found, evolution auto-starts using the loaded agent as a seed.
The agent observes an 8-cell neighborhood around the snake head in a direction-relative frame (3x3 area forward). Each cell is encoded with 2 bits:
- 00 = empty
- 01 = danger (wall/body)
- 10 = apple
- 11 = unused
This uses 16 bits. Additionally:
- 2 bits: relative direction to the apple (left/straight/right)
- 2 bits: Manhattan distance category to the apple (4 buckets)
Total: 20-bit state key (~1M states).
Three discrete actions relative to the current direction:
- 0 = turn left
- 1 = go straight
- 2 = turn right
- +10.0 for eating an apple, increasing slightly with length (+0.1 per current length)
- Death penalty depends on cause: -30.0 for self-collision; -12.0 otherwise
- Small step penalty (-0.005)
- Shaping: +0.05 when moving closer and -0.03 when moving away; additional +0.02 when within 3 cells of the apple
- epsilon-greedy with decay (
epsilon,min_epsilon,decay) - learning rate
alpha, discountgamma stepsandepisodescounters recorded per agent
- Population of agents (default 24), each playing in its own game instance in parallel
- Per-epoch step limit with a “leader protection” exception that lets a unique best agent continue beyond the limit
- At epoch end, reproduction with elitism + mutations; multiple staged restart strategies on long stagnation, seeding from the global champion
- Tracks a global champion (best ever), with auto-save on improvement
- Agents are color-coded for visualization
src/main.rs— the main application with game logic, rendering, Q-learning agent, and evolutionary trainer.src/gpu_nn.rs— optional experimental NN scaffolding (behind thegpu-nnfeature).snake_agent.json— saved champion agent (created at runtime when saving).
- To start training from scratch, delete
snake_agent.jsonor press E to toggle training and let evolution run. - At very high training speeds, frames are skipped and drawing can be disabled to maximize throughput.
- Grid/cell sizes are constants near the top of
main.rsand can be adjusted as needed.
MIT License — see LICENSE file.