This repository implements the use of reinforcement learning for controlling traffic light systems.
While the code is abstracted in order to be applied on different scenarios, a real-life implementation is provided for illustration purposes too.
Toolkit-wise, stable-baselines3
is used in conjunction with the Simulation of Urban MObility (SUMO) software for learning on multiple traffic simulations in parallel.
Key highlights of this implementation include::
- Pytorch as backend.
- Vectorized environments.
- Frame-stacking.
- Curriculum learning.
- Custom conv3d feature extractor.
- Playable setup for obtaining human baselines.
- Designed for reproducibility to other sumo networks.
(A legacy keras
+ tensorflow
implementation is still available in the aptly named branch.)
- Install sumo software from https://eclipse.dev/sumo/
- Run
conda install -f environments/environment.dev.yml
The traffic lights at a 4-way traffic intersection is controlled by a PPO model. The destinations and origins of the cars, which define the general simulation, are randomized every episode (though we fixed it for the final eval env runs).
The following snapshots illustrate the parameters pertaining to the road network.
For testing the model simply run
python -m scripts.rl.test
. You can also try your best to beat it running python -m scripts.baseline.human
.
The final model acting on the simulation, and the best performing fixed policy as reference are shown below:
The results from the different policies below:
If you wish to retrain or explore the training process, check out scripts/rl/train.py
.
In terms of general model improvement decisions, these were the most prominent:
- Baseline mlp with multi-input
spaces.Dict
observations:(1, n_actions)
for thephase
observation vs.(1, n_obs, n_obs)
for thespeed
,position
andwait
matrices.
- Dropping the
position
matrix in favor of vehicle absence encoding in thespeed
andwait
matrices (with vehicle absence as -1, and normal values ranging [0, 1]). - The inclusion of the
accel
matrix for a richer representation. - Changing
phase
encoding to(1, n_obs, n_obs)
instead of(1, n_actions)
. - Introduction of weighted (
w2
) unshaped long-term reward, balanced against the weighted (w1
) shaped myopic reward. - Transitioning from the above fixed
w1
/w2
balance, to a curriculum approach for faster convergence. - Multi-input cnn treating each matrix separately (though with the same conv block).
- Single-input cnn with observation types as channels.
- Frame-stacking and Conv3D introduction for temporal encoding.
- Self-attention mechanism on depth and channels.
Designs not withheld (yet):
- Residual blocks
- (Cross)-attention mechanisms (as we've moved away from the multi-input design)
- Create a new network and replace the
intersection.net.xml
file in the /sumo/*/ folders - Change the
sumo-env.cfg
values accordingly (see also the Quickstart above for some more details), specifically:- Find the x- and y-coordinates of your observation window's center (
obs_center
) - Denote your observation window's precision (
obs_length
) and its size (obs_nrows
) - Identify the traffic light id to be controlled (
tls_id
) - List the traffic light's incoming lanes (
tls_lanes
) and non-yellow phases (tls_phases
) - List the network's sources (
rnd_src
) and destinations (rnd_dst
) You may also need to rename thenetwork
andconfig
arguments in theSumoEnv
orSumoEnvFactory
initialization
- Find the x- and y-coordinates of your observation window's center (
General clean-up
Get better results
Increase the traffic scenario variability
Generalize to multiple traffic lights
Add multi-(hierarchical)-agent support