Skip to content

Commit

Permalink
Release 0.3.0
Browse files Browse the repository at this point in the history
  • Loading branch information
yannbouteiller committed Sep 15, 2022
1 parent 6216f01 commit 2619ad8
Show file tree
Hide file tree
Showing 3 changed files with 117 additions and 49 deletions.
93 changes: 51 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ It is demonstrated on the TrackMania 2020 video game.
- [TMRL details](#advanced)
- [Real-time Gym framework](#real-time-gym-framework)
- [rtgym repo](https://github.com/yannbouteiller/rtgym)
- [Distant training architecture](#distant-training-architecture)
- [Remote training architecture](#remote-training-architecture)
- [Contribute](#authors)
- [Sponsors](#sponsors)

Expand All @@ -66,17 +66,18 @@ This is done through Deep Reinforcement Learning (RL).
* **Training algorithms:**
`tmrl` lets you easily train policies in TrackMania with state-of-the-art Deep Reinforcement Learning algorithms such as [Soft Actor-Critic](https://www.youtube.com/watch?v=LN29DDlHp1U) (SAC) and [Randomized Ensembled Double Q-Learning](https://arxiv.org/abs/2101.05982) (REDQ).
These algorithms store collected samples in a large dataset, called a replay memory.
In parallel, this dataset is used to train an artificial neural network (policy) that maps observations (images, speed...) to relevant actions (gas, steering angle...).
In parallel, this dataset is used to train an artificial neural network (policy) that maps observations (images, speed...) to relevant actions (gas, break, steering angle...).

* **Analog control:**
`tmrl` controls the game using a virtual gamepad, which enables analog input.

* **Different types of observation:**
The car can use either a LIDAR (Light Detection and Ranging) computed from snapshots or the raw unprocessed snapshots in order to perceive its environment.
The car cam either use raw unprocessed snapshots, or a LIDAR (Light Detection and Ranging) computed from the snapshots in order to perceive its environment.

* **Models:**
To process LIDAR measurements, `tmrl` uses a Multi-Layer Perceptron (MLP).
To process raw camera images (snapshots), it uses a Convolutional Neural Network (CNN).
These models learn the physics from histories or observations equally spaced in time.

### Developer features (real-time applications):
* **Python library:**
Expand Down Expand Up @@ -105,7 +106,7 @@ In particular, [rtgym](https://github.com/yannbouteiller/rtgym) enables implemen

## Installation

Detailed installation instructions are provided [here](readme/Install.md).
Detailed instructions for installation are provided [here](readme/Install.md).

## Getting started

Expand All @@ -125,12 +126,12 @@ from tmrl import get_environment
from time import sleep
import numpy as np

# default observations are of shape: ((1,), (4, 19), (3,), (3,))
# default LIDAR observations are of shape: ((1,), (4, 19), (3,), (3,))
# representing: (speed, 4 last LIDARs, 2 previous actions)
# actions are [gas, break, steer], analog between -1.0 and +1.0
def model(obs):
"""
simplistic policy
simplistic policy for LIDAR observations
"""
deviation = obs[1].mean(0)
deviation /= (deviation.sum() + 0.001)
Expand All @@ -141,7 +142,9 @@ def model(obs):
steer = min(max(steer, -1.0), 1.0)
return np.array([1.0, 0.0, steer])

env = get_environment() # retrieve the TMRL Gym environment
# Let us retrieve the TMRL Gym environment.
# The environment you get from get_environment() depends on the content of config.json
env = get_environment()

sleep(1.0) # just so we have time to focus the TM20 window after starting the script

Expand All @@ -154,22 +157,23 @@ for _ in range(200): # rtgym ensures this runs at 20Hz by default
env.wait() # rtgym-specific method to artificially 'pause' the environment when needed
```

The environment can be customized by changing the content of the `ENV` entry in `TmrlData\config\config.json`:
The environment flavor can be chosen and customized by changing the content of the `ENV` entry in `TmrlData\config\config.json`:

_(NB: do not copy-paste the examples, comments are not supported in vanilla .json files)_

### LIDAR environment:
In this version of the environment, screenshots are reduced to 19-beam LIDARs to be processed with, e.g., an MLP.
In addition, this version features the speed (that human players can see).
This works only on plain road with black borders, using the front camera.
### Full environment:
This version of the environment features full screenshots to be processed with, e.g., a CNN.
In addition, this version features the speed, gear and RPM.
This works on any track, using any (sensible) camera configuration.

```json5
{
"ENV": {
"RTGYM_INTERFACE": "TM20LIDAR", // TrackMania 2020 with LIDAR observations
"WINDOW_WIDTH": 958, // width of the game window (min: 256)
"WINDOW_HEIGHT": 488, // height of the game window (min: 128)
"RTGYM_INTERFACE": "TM20FULL", // TrackMania 2020 with full screenshots
"WINDOW_WIDTH": 256, // width of the game window and screenshots (min: 256)
"WINDOW_HEIGHT": 128, // height of the game window and screenshots (min: 128)
"SLEEP_TIME_AT_RESET": 1.5, // the environment sleeps for this amount of time after each reset
"IMG_HIST_LEN": 4, // length of the history of LIDAR measurements in observations (set to 1 for RNNs)
"IMG_HIST_LEN": 4, // length of the history of images in observations (set to 1 for RNNs)
"RTGYM_CONFIG": {
"time_step_duration": 0.05, // duration of a time step
"start_obs_capture": 0.04, // duration before an observation is captured
Expand All @@ -181,20 +185,22 @@ This works only on plain road with black borders, using the front camera.
}
}
```
Note that human players can see or hear the features provided by this environment: we provide no "cheat" that would render the approach non-transferable to the real world.
In case you do wish to cheat, though, you can easily take inspiration from our [rtgym interfaces](https://github.com/trackmania-rl/tmrl/blob/master/tmrl/custom/custom_gym_interfaces.py) to build your own custom environment for TrackMania.
Of course, custom environments will not be accepted for the competition :wink:

### Full environment:
This version of the environment features full screenshots to be processed with, e.g., a CNN.
In addition, this version features the speed, gear and RPM.
This works on any track, using any (sensible) camera configuration.

### LIDAR environment:
In this version of the environment, screenshots are reduced to 19-beam LIDARs to be processed with, e.g., an MLP.
In addition, this version features the speed (that human players can see).
This works only on plain road with black borders, using the front camera with car hidden.
```json5
{
"ENV": {
"RTGYM_INTERFACE": "TM20FULL", // TrackMania 2020 with full screenshots
"WINDOW_WIDTH": 256, // width of the game window and screenshots (min: 256)
"WINDOW_HEIGHT": 128, // height of the game window and screenshots (min: 128)
"RTGYM_INTERFACE": "TM20LIDAR", // TrackMania 2020 with LIDAR observations
"WINDOW_WIDTH": 958, // width of the game window (min: 256)
"WINDOW_HEIGHT": 488, // height of the game window (min: 128)
"SLEEP_TIME_AT_RESET": 1.5, // the environment sleeps for this amount of time after each reset
"IMG_HIST_LEN": 4, // length of the history of images in observations (set to 1 for RNNs)
"IMG_HIST_LEN": 4, // length of the history of LIDAR measurements in observations (set to 1 for RNNs)
"RTGYM_CONFIG": {
"time_step_duration": 0.05, // duration of a time step
"start_obs_capture": 0.04, // duration before an observation is captured
Expand All @@ -207,14 +213,10 @@ This works on any track, using any (sensible) camera configuration.
}
```

Note that human players can see or hear the features provided by this environment: we provide no "cheat" that would render the approach non-transferable to the real world.
In case you do wish to cheat, though, you can easily take inspiration from our [rtgym interfaces](https://github.com/trackmania-rl/tmrl/blob/master/tmrl/custom/custom_gym_interfaces.py) to build your own custom environment for TrackMania.
Of course, custom environments will not be accepted for the competition :wink:

### LIDAR with track progress

If you have watched the [2022-06-08 episode](https://www.youtube.com/watch?v=c1xq7iJ3f9E) of the Underscore_ talk show (french), note that the policy you have seen has been trained in a slightly augmented version of the LIDAR environment: on top of LIDAR and speed value, we have added a value representing the percentage of completion of the track, so that the AI can know the turns in advance similarly to humans practicing a given track.
It is not yet clear whether we want to use this environment in the competition, as it is de-facto less generalizable.
If you have watched the [2022-06-08 episode](https://www.youtube.com/watch?v=c1xq7iJ3f9E) of the Underscore_ talk show (french), note that the policy you have seen has been trained in a slightly augmented version of the LIDAR environment: on top of LIDAR and speed value, we have added a value representing the percentage of completion of the track, so that the model can know the turns in advance similarly to humans practicing a given track.
This environment will not be accepted in the competition, as it is de-facto less generalizable.
However, if you wish to use this environment, e.g., to beat our results, you can use the following `config.json`:

```json5
Expand All @@ -239,8 +241,8 @@ However, if you wish to use this environment, e.g., to beat our results, you can

## TrackMania training details

In `tmrl`, an AI that knows absolutely nothing about driving or even about what a road is, is set at the starting point of a track.
Its goal is to learn how to complete the track by exploring its own capacities and environment.
In `tmrl`, model (AI) that knows absolutely nothing about driving or even about what a road is, is set at the starting point of a track.
Its goal is to learn how to complete the track as fast as possible by exploring its own capacities and environment.

The car feeds observations such as images to an artificial neural network, which must output the best possible controls from these observations.
This implies that the AI must understand its environment in some way.
Expand Down Expand Up @@ -359,19 +361,18 @@ In `tmrl`, the car can be controlled in two different ways:

Different observation spaces are available in `tmrl`:

- A LIDAR measurement is computed from real-time screenshots in tracks with black borders.
- A history of several such LIDAR measurements (typically the last 4 time-steps).
- A history of raw screenshots (typically 4).
- A history of LIDAR measurement computed from raw screenshots in tracks with black borders.

In addition, we provide the norm of the velocity as part of the observation space in all our experiments.

Example of `tmrl` environment in TrackMania Nations Forever with a single LIDAR measurement:

![reward](readme/img/lidar.png)

In TrackMania Nations Forever, the raw speed is computed from screen captures thanks to the 1-NN algorithm.
In TrackMania Nations Forever, we use to compute the raw speed from screenshots thanks to the 1-NN algorithm.

In TrackMania 2020, the [OpenPlanet](https://openplanet.nl) API is used to retrieve the raw speed directly.
In TrackMania 2020, we now use the [OpenPlanet](https://openplanet.nl) API to retrieve the raw speed directly.

### Results

Expand Down Expand Up @@ -400,10 +401,10 @@ Time-steps are being elastically constrained to their nominal duration. When thi

Custom `rtgym` interfaces for Trackmania used by `tmrl` are implemented in [custom_gym_interfaces.py](https://github.com/yannbouteiller/tmrl/blob/master/tmrl/custom/custom_gym_interfaces.py).

### Distant training architecture:
### Remote training architecture:

`tmrl` is based on a client-server framework on the model of [Ray RLlib](https://docs.ray.io/en/latest/rllib.html).
Our client-server architecture is not secured and it is not meant to compete with Ray, but it is much simpler to modify in order to implement ad-hoc pipelines and works on both Windows and Linux.
Our client-server architecture is not secured yet and it is not meant to compete with Ray, but it is much simpler to modify in order to implement ad-hoc pipelines and works on both Windows and Linux.

We collect training samples from several rollout workers, typically several computers and/or robots.
Each rollout worker stores its collected samples in a local buffer, and periodically sends this replay buffer to the central server.
Expand All @@ -424,17 +425,25 @@ These mechanics can be summarized as follows:
![Networking architecture](readme/img/network_interface.png "Networking Architecture")


## Development roadmap:
You are welcome to contribute to the `tmrl` project.
Please consider the following:
- Further profiling and code optimization.
- Secure and improve network communications.
- Find the cleanest way to support sequences in `MemoryDataloading` for RNN training.


## Authors:

Contributions to this project are welcome, please submit a PR with your name in the contributors list.
When contributing, please submit a PR with your name in the contributors list with a short caption.

### Maintainers:
- Yann Bouteiller
- Edouard Geze

### Contributors:
- Simon Ramstedt
- AndrejGobeX
- Simon Ramstedt - initial code base
- AndrejGobeX - optimization of screen capture

## License

Expand Down
70 changes: 64 additions & 6 deletions readme/get_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,80 @@ Before reading these instructions, make sure you have installed TMRL and OpenPla

## Pre-trained AI in Trackmania 2020

You can test our pre-trained AI directly in TrackMania by following these steps (we recommend doing this once, so you understand how `tmrl` controls the video game):
You can test our pre-trained AIs directly in TrackMania by following these steps (we recommend doing this once, so you understand how `tmrl` controls the video game):

### Load the tmrl-test track into your TrackMania game:
- Navigate to your home folder (`C:\Users\username\`), and open `TmrlData\resources`
- Copy the `tmrl-test.Map.Gbx` file into `...\Documents\Trackmania\Maps\My Maps` (or equivalent on your system).

### Test the pre-trained AI:
### Test pre-trained AIs:

#### Game preparation

- Launch TrackMania 2020
- In case the OpenPlanet menu is showing in the top part of the screen, hide it using the `f3` key
- Launch the `tmrl-test` track. This can be done by selecting `create > map editor > edit a map > tmrl-test > select map` and hitting the green flag.
- Set the game in windowed mode. To do this, bring the cursor to the top of the screen and a drop-down menu will show. Hit the windowed icon.
- Bring the TrackMania window to the top-left corner of the screen. On Windows10, it should automatically fit to a quarter of the screen _(NB: the window will automatically snap to the top-left corner and get sized properly when you start the AI)_.
- Enter the cockpit view by hitting the `3` key (the car must be hidden, press several times if the cockpit is visible).
- Hide the ghost by pressing the `g` key.

#### If you want to test the pre-train AI for LIDARs:
- Replace/ensure the following entries in `TmrlData\config\config.json`:
```json
"RUN_NAME": "SAC_4_LIDAR_pretrained"
```
```json
"ENV": {
"RTGYM_INTERFACE": "TM20LIDAR",
"WINDOW_WIDTH": 958,
"WINDOW_HEIGHT": 488,
"SLEEP_TIME_AT_RESET": 1.5,
"IMG_HIST_LEN": 4,
"RTGYM_CONFIG": {
"time_step_duration": 0.05,
"start_obs_capture": 0.04,
"time_step_timeout_factor": 1.0,
"act_buf_len": 2,
"benchmark": false,
"wait_on_done": true
}
}
```
- Enter the cockpit view by hitting the `3` key (the car must be hidden, press several times if the cockpit is visible).

The trackmania window should now look like this:

![screenshot1](img/screenshot1.PNG)

#### If you want to test the pre-train AI for raw screenshots:
- Replace/ensure the following entries in `TmrlData\config\config.json`:
```json
"RUN_NAME": "SAC_4_imgs_pretrained"
```
```json
"ENV": {
"RTGYM_INTERFACE": "TM20IMAGES",
"WINDOW_WIDTH": 256,
"WINDOW_HEIGHT": 128,
"IMG_WIDTH": 64,
"IMG_HEIGHT": 64,
"IMG_GRAYSCALE": true,
"SLEEP_TIME_AT_RESET": 1.5,
"IMG_HIST_LEN": 4,
"RTGYM_CONFIG": {
"time_step_duration": 0.05,
"start_obs_capture": 0.04,
"time_step_timeout_factor": 1.0,
"act_buf_len": 2,
"benchmark": false,
"wait_on_done": true
}
}
```
- Use the default camera by hitting the `1` key (the car must be visible).
- For best performance, use the `Canadian flag` skin, because this is what we trained with.

#### Then:
- Open a terminal and put it where it does not overlap with the trackmania window.
For instance in the bottom-left corner of the screen.
- Run the following command, and directly click somewhere in the TrackMania window so that `tmrl` can control the car.
Expand All @@ -39,11 +94,11 @@ If you get an error saying that communication was refused, try reloading the `TM
In case you get a DLL error from the `win32gui/win32ui/win32con` library, install `pywin32` without using `pip` (e.g., use `conda install pywin32`).

#### Profiling / optimization:
If you see many warnings complaining about time-steps timing out, this means that your computer struggles at running the AI and trackmania in parallel.
If you see many warnings complaining about time-steps timing out, this means your computer struggles at running the AI and trackmania in parallel.
Try reducing the trackmania graphics to the minimum (in particular, try setting the maximum fps to 30, but not much less than this, because screenshots are captured at 20 fps)
_(NB: seeing these warnings once at each environment reset is normal, this is because we purposefully sleep when the car is waiting for green light)._

In the `Graphics` tab of the TM20 settings, ensure that the resolution is 958 (width) * 488 (height) pixels.
In the `Graphics` tab of the TM20 settings, ensure that the resolution is 958 * 488 pixels for the LIDAR environment and 256 * 128 pixels for the raw screenshot environment.

The `Input` setting for gamepads must be the default.

Expand Down Expand Up @@ -113,7 +168,10 @@ _(Note: you may want to run these commands on separate computers instead, for in
During training, make sure you don't see too many 'timestep timeouts' in the worker terminal.
If you do, this means that your GPU is not powerful enough, and you should use remote training instead of localhost training (see `TmrlData\config\config.json`).

With an RTX3080 on a distant machine as trainer and one local machine as worker/server, it takes approximatively 5 hours for the car to understand how to take a turn correctly.
Don't forget to tune training hyperparameters in `config.json` (the default should work for the LIDAR environment).

With carefully chosen hyperparameters, an RTX3080 on a distant machine as trainer and one local machine as worker/server, it takes approximatively 5 hours for the car to understand how to take a turn correctly in the LIDAR environment.
And it takes more like 5 days in the raw screenshots environment! :wink:

_(Note: you can exit these processes by pressing `CTRL + C` in each terminal)_

Expand Down
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
sys.exit('Sorry, Python < 3.7 is not supported. We use dataclasses that have been introduced in 3.7.')


RESOURCES_URL = "https://github.com/trackmania-rl/tmrl/releases/download/v0.2.0/resources.zip"
RESOURCES_URL = "https://github.com/trackmania-rl/tmrl/releases/download/v0.3.0/resources.zip"


def url_retrieve(url: str, outfile: Path, overwrite: bool = False):
Expand Down Expand Up @@ -64,6 +64,7 @@ def url_retrieve(url: str, outfile: Path, overwrite: bool = False):
copy2(RESOURCES_FOLDER / "config.json", CONFIG_FOLDER)
copy2(RESOURCES_FOLDER / "reward.pkl", REWARD_FOLDER)
copy2(RESOURCES_FOLDER / "SAC_4_LIDAR_pretrained.pth", WEIGHTS_FOLDER)
copy2(RESOURCES_FOLDER / "SAC_4_imgs_pretrained.pth", WEIGHTS_FOLDER)

# on Windows, look for OpenPlanet:
if platform.system() == "Windows":
Expand Down

0 comments on commit 2619ad8

Please sign in to comment.