This repository contains the source code for my PAM project at ACECOM. It implements Proximal Policy Optimization (PPO) in Sonic The Hedgehog with the goal of completing the first zone. The training environment and model are set up using OpenAI Gymnasium and Stable-Retro.
Ángel Aarón Flores Alberca
- Python 3.10
- OS: Windows 10/11, Linux, macOS
Follow these steps to set up correctly the virtual environment and run the model locally:
git clone https://github.com/bxcowo/sonic_genesis_PPO.git
To create venv:
python3 -m venv .venv
To activate venv:
- On windows
.venv\Scripts\activate
- On macOS and Linux
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
Open main.py in your preferred IDE. Here, you'll find the model definition that trains Sonic to progress through the game. If you’d like to experiment, you can modify hyperparameters by scaling them up or down in multiples of the existing values.
main.py
launches the training process using 5 CPU cores by default. You can change the training environment by editing metadata.json, there you can switch to any other state saved in the custom_integration
directory.
After training is complete, two new files are generated and used by testing_model.py. Running this script will test the trained agent and produce .bk2
files that record the gameplay. The agent will play very quickly, but you can use these .bk2
files to render a more detailed video playback later.
Before rendering, move the entire SonicTheHedgehog-Genesis-Custom
directory from custom_integration into your virtual environment’s Retro data directory (for example: .venv/lib/python3.10/site-packages/retro/data/stable
).
Once done, you can render the recorded gameplay into a video using:
python3 -m retro.scripts.playback_movie SonicTheHedgehog-Genesis-Custom-[Selected zone].[Selected act]-000000.bk2
The custom environment is defined inside the custom_integration folder and includes:
- contest.json: Defines how rewards and the done condition are interpreted.
- data.json: Lists RAM variables, their memory addresses, and types.
- metadata.json: Specifies which state the agent will load and play.
- script.lua: A Lua script that defines the reward function and termination condition.
To adjust environment behaviors, you can modify script.lua or metadata.json. Avoid editing the other files unless you know what you’re doing.
This project is licensed under the MIT License - see the LICENSE file for details.