Release 0.3.0

trackmania-rl · Sep 15, 2022 · 2619ad8 · 2619ad8
1 parent 6216f01
commit 2619ad8
Show file tree

Hide file tree

Showing 3 changed files with 117 additions and 49 deletions.
diff --git a/README.md b/README.md
@@ -50,7 +50,7 @@ It is demonstrated on the TrackMania 2020 video game.
 - [TMRL details](#advanced)
     - [Real-time Gym framework](#real-time-gym-framework)
       - [rtgym repo](https://github.com/yannbouteiller/rtgym)
-    - [Distant training architecture](#distant-training-architecture)
+    - [Remote training architecture](#remote-training-architecture)
 - [Contribute](#authors)
 - [Sponsors](#sponsors)
 
@@ -66,17 +66,18 @@ This is done through Deep Reinforcement Learning (RL).
 * **Training algorithms:**
 `tmrl` lets you easily train policies in TrackMania with state-of-the-art Deep Reinforcement Learning algorithms such as [Soft Actor-Critic](https://www.youtube.com/watch?v=LN29DDlHp1U) (SAC) and [Randomized Ensembled Double Q-Learning](https://arxiv.org/abs/2101.05982) (REDQ).
 These algorithms store collected samples in a large dataset, called a replay memory.
-In parallel, this dataset is used to train an artificial neural network (policy) that maps observations (images, speed...) to relevant actions (gas, steering angle...).
+In parallel, this dataset is used to train an artificial neural network (policy) that maps observations (images, speed...) to relevant actions (gas, break, steering angle...).
 
 * **Analog control:**
 `tmrl` controls the game using a virtual gamepad, which enables analog input.
 
 * **Different types of observation:**
-The car can use either a LIDAR (Light Detection and Ranging) computed from snapshots or the raw unprocessed snapshots in order to perceive its environment.
+The car cam either use raw unprocessed snapshots, or a LIDAR (Light Detection and Ranging) computed from the snapshots in order to perceive its environment.
 
 * **Models:**
 To process LIDAR measurements, `tmrl` uses a Multi-Layer Perceptron (MLP).
 To process raw camera images (snapshots), it uses a Convolutional Neural Network (CNN).
+These models learn the physics from histories or observations equally spaced in time.
 
 ### Developer features (real-time applications):
 * **Python library:**
@@ -105,7 +106,7 @@ In particular, [rtgym](https://github.com/yannbouteiller/rtgym) enables implemen
 
 ## Installation
 
-Detailed installation instructions are provided [here](readme/Install.md).
+Detailed instructions for installation are provided [here](readme/Install.md).
 
 ## Getting started
 
@@ -125,12 +126,12 @@ from tmrl import get_environment
 from time import sleep
 import numpy as np
 
-# default observations are of shape: ((1,), (4, 19), (3,), (3,))
+# default LIDAR observations are of shape: ((1,), (4, 19), (3,), (3,))
 # representing: (speed, 4 last LIDARs, 2 previous actions)
 # actions are [gas, break, steer], analog between -1.0 and +1.0
 def model(obs):
     """
-    simplistic policy
+    simplistic policy for LIDAR observations
     """
     deviation = obs[1].mean(0)
     deviation /= (deviation.sum() + 0.001)
@@ -141,7 +142,9 @@ def model(obs):
     steer = min(max(steer, -1.0), 1.0)
     return np.array([1.0, 0.0, steer])
 
-env = get_environment()  # retrieve the TMRL Gym environment
+# Let us retrieve the TMRL Gym environment.
+# The environment you get from get_environment() depends on the content of config.json
+env = get_environment()
 
 sleep(1.0)  # just so we have time to focus the TM20 window after starting the script
 
@@ -154,22 +157,23 @@ for _ in range(200):  # rtgym ensures this runs at 20Hz by default
 env.wait()  # rtgym-specific method to artificially 'pause' the environment when needed
 ```
 
-The environment can be customized by changing the content of the `ENV` entry in `TmrlData\config\config.json`:
+The environment flavor can be chosen and customized by changing the content of the `ENV` entry in `TmrlData\config\config.json`:
 
 _(NB: do not copy-paste the examples, comments are not supported in vanilla .json files)_
 
-### LIDAR environment:
-In this version of the environment, screenshots are reduced to 19-beam LIDARs to be processed with, e.g., an MLP.
-In addition, this version features the speed (that human players can see).
-This works only on plain road with black borders, using the front camera.
+### Full environment:
+This version of the environment features full screenshots to be processed with, e.g., a CNN.
+In addition, this version features the speed, gear and RPM.
+This works on any track, using any (sensible) camera configuration.
+
 ```json5
 {
   "ENV": {
-    "RTGYM_INTERFACE": "TM20LIDAR",  // TrackMania 2020 with LIDAR observations
-    "WINDOW_WIDTH": 958,  // width of the game window (min: 256)
-    "WINDOW_HEIGHT": 488,  // height of the game window (min: 128)
+    "RTGYM_INTERFACE": "TM20FULL",  // TrackMania 2020 with full screenshots
+    "WINDOW_WIDTH": 256,  // width of the game window and screenshots (min: 256)
+    "WINDOW_HEIGHT": 128,  // height of the game window and screenshots (min: 128)
     "SLEEP_TIME_AT_RESET": 1.5,  // the environment sleeps for this amount of time after each reset
-    "IMG_HIST_LEN": 4,  // length of the history of LIDAR measurements in observations (set to 1 for RNNs)
+    "IMG_HIST_LEN": 4,  // length of the history of images in observations (set to 1 for RNNs)
     "RTGYM_CONFIG": {
       "time_step_duration": 0.05,  // duration of a time step
       "start_obs_capture": 0.04,  // duration before an observation is captured
@@ -181,20 +185,22 @@ This works only on plain road with black borders, using the front camera.
   }
 }
 ```
+Note that human players can see or hear the features provided by this environment: we provide no "cheat" that would render the approach non-transferable to the real world.
+In case you do wish to cheat, though, you can easily take inspiration from our [rtgym interfaces](https://github.com/trackmania-rl/tmrl/blob/master/tmrl/custom/custom_gym_interfaces.py) to build your own custom environment for TrackMania.
+Of course, custom environments will not be accepted for the competition :wink:
 
-### Full environment:
-This version of the environment features full screenshots to be processed with, e.g., a CNN.
-In addition, this version features the speed, gear and RPM.
-This works on any track, using any (sensible) camera configuration.
-
+### LIDAR environment:
+In this version of the environment, screenshots are reduced to 19-beam LIDARs to be processed with, e.g., an MLP.
+In addition, this version features the speed (that human players can see).
+This works only on plain road with black borders, using the front camera with car hidden.
 ```json5
 {
   "ENV": {
-    "RTGYM_INTERFACE": "TM20FULL",  // TrackMania 2020 with full screenshots
-    "WINDOW_WIDTH": 256,  // width of the game window and screenshots (min: 256)
-    "WINDOW_HEIGHT": 128,  // height of the game window and screenshots (min: 128)
+    "RTGYM_INTERFACE": "TM20LIDAR",  // TrackMania 2020 with LIDAR observations
+    "WINDOW_WIDTH": 958,  // width of the game window (min: 256)
+    "WINDOW_HEIGHT": 488,  // height of the game window (min: 128)
     "SLEEP_TIME_AT_RESET": 1.5,  // the environment sleeps for this amount of time after each reset
-    "IMG_HIST_LEN": 4,  // length of the history of images in observations (set to 1 for RNNs)
+    "IMG_HIST_LEN": 4,  // length of the history of LIDAR measurements in observations (set to 1 for RNNs)
     "RTGYM_CONFIG": {
       "time_step_duration": 0.05,  // duration of a time step
       "start_obs_capture": 0.04,  // duration before an observation is captured
@@ -207,14 +213,10 @@ This works on any track, using any (sensible) camera configuration.
 }
 ```
 
-Note that human players can see or hear the features provided by this environment: we provide no "cheat" that would render the approach non-transferable to the real world.
-In case you do wish to cheat, though, you can easily take inspiration from our [rtgym interfaces](https://github.com/trackmania-rl/tmrl/blob/master/tmrl/custom/custom_gym_interfaces.py) to build your own custom environment for TrackMania.
-Of course, custom environments will not be accepted for the competition :wink:
-
 ### LIDAR with track progress
 
-If you have watched the [2022-06-08 episode](https://www.youtube.com/watch?v=c1xq7iJ3f9E) of the Underscore_ talk show (french), note that the policy you have seen has been trained in a slightly augmented version of the LIDAR environment: on top of LIDAR and speed value, we have added a value representing the percentage of completion of the track, so that the AI can know the turns in advance similarly to humans practicing a given track.
-It is not yet clear whether we want to use this environment in the competition, as it is de-facto less generalizable.
+If you have watched the [2022-06-08 episode](https://www.youtube.com/watch?v=c1xq7iJ3f9E) of the Underscore_ talk show (french), note that the policy you have seen has been trained in a slightly augmented version of the LIDAR environment: on top of LIDAR and speed value, we have added a value representing the percentage of completion of the track, so that the model can know the turns in advance similarly to humans practicing a given track.
+This environment will not be accepted in the competition, as it is de-facto less generalizable.
 However, if you wish to use this environment, e.g., to beat our results, you can use the following `config.json`:
 
 ```json5
@@ -239,8 +241,8 @@ However, if you wish to use this environment, e.g., to beat our results, you can
 
 ## TrackMania training details
 
-In `tmrl`, an AI that knows absolutely nothing about driving or even about what a road is, is set at the starting point of a track.
-Its goal is to learn how to complete the track by exploring its own capacities and environment.
+In `tmrl`, model (AI) that knows absolutely nothing about driving or even about what a road is, is set at the starting point of a track.
+Its goal is to learn how to complete the track as fast as possible by exploring its own capacities and environment.
 
 The car feeds observations such as images to an artificial neural network, which must output the best possible controls from these observations.
 This implies that the AI must understand its environment in some way.
@@ -359,19 +361,18 @@ In `tmrl`, the car can be controlled in two different ways:
 
 Different observation spaces are available in `tmrl`:
 
-- A LIDAR measurement is computed from real-time screenshots in tracks with black borders.
-- A history of several such LIDAR measurements (typically the last 4 time-steps).
 - A history of raw screenshots (typically 4).
+- A history of LIDAR measurement computed from raw screenshots in tracks with black borders.
 
 In addition, we provide the norm of the velocity as part of the observation space in all our experiments.
 
 Example of `tmrl` environment in TrackMania Nations Forever with a single LIDAR measurement:
 
 ![reward](readme/img/lidar.png)
 
-In TrackMania Nations Forever, the raw speed is computed from screen captures thanks to the 1-NN algorithm.
+In TrackMania Nations Forever, we use to compute the raw speed from screenshots thanks to the 1-NN algorithm.
 
-In TrackMania 2020, the [OpenPlanet](https://openplanet.nl) API is used to retrieve the raw speed directly.
+In TrackMania 2020, we now use the [OpenPlanet](https://openplanet.nl) API to retrieve the raw speed directly.
 
 ### Results
 
@@ -400,10 +401,10 @@ Time-steps are being elastically constrained to their nominal duration. When thi
 
 Custom `rtgym` interfaces for Trackmania used by `tmrl` are implemented in [custom_gym_interfaces.py](https://github.com/yannbouteiller/tmrl/blob/master/tmrl/custom/custom_gym_interfaces.py).
 
-### Distant training architecture:
+### Remote training architecture:
 
 `tmrl` is based on a client-server framework on the model of [Ray RLlib](https://docs.ray.io/en/latest/rllib.html).
-Our client-server architecture is not secured and it is not meant to compete with Ray, but it is much simpler to modify in order to implement ad-hoc pipelines and works on both Windows and Linux.
+Our client-server architecture is not secured yet and it is not meant to compete with Ray, but it is much simpler to modify in order to implement ad-hoc pipelines and works on both Windows and Linux.
 
 We collect training samples from several rollout workers, typically several computers and/or robots.
 Each rollout worker stores its collected samples in a local buffer, and periodically sends this replay buffer to the central server.
@@ -424,17 +425,25 @@ These mechanics can be summarized as follows:
 ![Networking architecture](readme/img/network_interface.png "Networking Architecture")
 
 
+## Development roadmap:
+You are welcome to contribute to the `tmrl` project.
+Please consider the following:
+- Further profiling and code optimization.
+- Secure and improve network communications.
+- Find the cleanest way to support sequences in `MemoryDataloading` for RNN training.
+
+
 ## Authors:
 
-Contributions to this project are welcome, please submit a PR with your name in the contributors list.
+When contributing, please submit a PR with your name in the contributors list with a short caption.
 
 ### Maintainers:
 - Yann Bouteiller
 - Edouard Geze
 
 ### Contributors:
-- Simon Ramstedt
-- AndrejGobeX
+- Simon Ramstedt - initial code base
+- AndrejGobeX - optimization of screen capture
 
 ## License
 

diff --git a/readme/get_started.md b/readme/get_started.md
@@ -4,25 +4,80 @@ Before reading these instructions, make sure you have installed TMRL and OpenPla
 
 ## Pre-trained AI in Trackmania 2020
 
-You can test our pre-trained AI directly in TrackMania by following these steps (we recommend doing this once, so you understand how `tmrl` controls the video game):
+You can test our pre-trained AIs directly in TrackMania by following these steps (we recommend doing this once, so you understand how `tmrl` controls the video game):
 
 ### Load the tmrl-test track into your TrackMania game:
 - Navigate to your home folder (`C:\Users\username\`), and open `TmrlData\resources`
 - Copy the `tmrl-test.Map.Gbx` file into `...\Documents\Trackmania\Maps\My Maps` (or equivalent on your system).
 
-### Test the pre-trained AI:
+### Test pre-trained AIs:
+
+#### Game preparation
+
 - Launch TrackMania 2020
 - In case the OpenPlanet menu is showing in the top part of the screen, hide it using the `f3` key
 - Launch the `tmrl-test` track. This can be done by selecting `create > map editor > edit a map > tmrl-test > select map` and hitting the green flag.
 - Set the game in windowed mode. To do this, bring the cursor to the top of the screen and a drop-down menu will show. Hit the windowed icon.
 - Bring the TrackMania window to the top-left corner of the screen. On Windows10, it should automatically fit to a quarter of the screen _(NB: the window will automatically snap to the top-left corner and get sized properly when you start the AI)_.
-- Enter the cockpit view by hitting the `3` key (the car must be hidden, press several times if the cockpit is visible).
 - Hide the ghost by pressing the `g` key.
 
+#### If you want to test the pre-train AI for LIDARs:
+- Replace/ensure the following entries in `TmrlData\config\config.json`:
+```json
+  "RUN_NAME": "SAC_4_LIDAR_pretrained"
+```
+```json
+  "ENV": {
+    "RTGYM_INTERFACE": "TM20LIDAR",
+    "WINDOW_WIDTH": 958,
+    "WINDOW_HEIGHT": 488,
+    "SLEEP_TIME_AT_RESET": 1.5,
+    "IMG_HIST_LEN": 4,
+    "RTGYM_CONFIG": {
+      "time_step_duration": 0.05,
+      "start_obs_capture": 0.04,
+      "time_step_timeout_factor": 1.0,
+      "act_buf_len": 2,
+      "benchmark": false,
+      "wait_on_done": true
+    }
+  }
+```
+- Enter the cockpit view by hitting the `3` key (the car must be hidden, press several times if the cockpit is visible).
+
 The trackmania window should now look like this:
 
 ![screenshot1](img/screenshot1.PNG)
 
+#### If you want to test the pre-train AI for raw screenshots:
+- Replace/ensure the following entries in `TmrlData\config\config.json`:
+```json
+  "RUN_NAME": "SAC_4_imgs_pretrained"
+```
+```json
+  "ENV": {
+    "RTGYM_INTERFACE": "TM20IMAGES",
+    "WINDOW_WIDTH": 256,
+    "WINDOW_HEIGHT": 128,
+    "IMG_WIDTH": 64,
+    "IMG_HEIGHT": 64,
+    "IMG_GRAYSCALE": true,
+    "SLEEP_TIME_AT_RESET": 1.5,
+    "IMG_HIST_LEN": 4,
+    "RTGYM_CONFIG": {
+      "time_step_duration": 0.05,
+      "start_obs_capture": 0.04,
+      "time_step_timeout_factor": 1.0,
+      "act_buf_len": 2,
+      "benchmark": false,
+      "wait_on_done": true
+    }
+  }
+```
+- Use the default camera by hitting the `1` key (the car must be visible).
+- For best performance, use the `Canadian flag` skin, because this is what we trained with.
+
+#### Then:
 - Open a terminal and put it where it does not overlap with the trackmania window.
 For instance in the bottom-left corner of the screen.
 - Run the following command, and directly click somewhere in the TrackMania window so that `tmrl` can control the car.
@@ -39,11 +94,11 @@ If you get an error saying that communication was refused, try reloading the `TM
 In case you get a DLL error from the `win32gui/win32ui/win32con` library, install `pywin32` without using `pip` (e.g., use `conda install pywin32`).
 
 #### Profiling / optimization:
-If you see many warnings complaining about time-steps timing out, this means that your computer struggles at running the AI and trackmania in parallel.
+If you see many warnings complaining about time-steps timing out, this means your computer struggles at running the AI and trackmania in parallel.
 Try reducing the trackmania graphics to the minimum (in particular, try setting the maximum fps to 30, but not much less than this, because screenshots are captured at 20 fps)
 _(NB: seeing these warnings once at each environment reset is normal, this is because we purposefully sleep when the car is waiting for green light)._
 
-In the `Graphics` tab of the TM20 settings, ensure that the resolution is 958 (width) * 488 (height) pixels.
+In the `Graphics` tab of the TM20 settings, ensure that the resolution is 958 * 488 pixels for the LIDAR environment and 256 * 128 pixels for the raw screenshot environment.
 
 The `Input` setting for gamepads must be the default.
 
@@ -113,7 +168,10 @@ _(Note: you may want to run these commands on separate computers instead, for in
 During training, make sure you don't see too many 'timestep timeouts' in the worker terminal.
 If you do, this means that your GPU is not powerful enough, and you should use remote training instead of localhost training (see `TmrlData\config\config.json`).
 
-With an RTX3080 on a distant machine as trainer and one local machine as worker/server, it takes approximatively 5 hours for the car to understand how to take a turn correctly.
+Don't forget to tune training hyperparameters in `config.json` (the default should work for the LIDAR environment).
+
+With carefully chosen hyperparameters, an RTX3080 on a distant machine as trainer and one local machine as worker/server, it takes approximatively 5 hours for the car to understand how to take a turn correctly in the LIDAR environment.
+And it takes more like 5 days in the raw screenshots environment! :wink:
 
 _(Note: you can exit these processes by pressing `CTRL + C` in each terminal)_
 

diff --git a/setup.py b/setup.py
@@ -13,7 +13,7 @@
     sys.exit('Sorry, Python < 3.7 is not supported. We use dataclasses that have been introduced in 3.7.')
 
 
-RESOURCES_URL = "https://github.com/trackmania-rl/tmrl/releases/download/v0.2.0/resources.zip"
+RESOURCES_URL = "https://github.com/trackmania-rl/tmrl/releases/download/v0.3.0/resources.zip"
 
 
 def url_retrieve(url: str, outfile: Path, overwrite: bool = False):
@@ -64,6 +64,7 @@ def url_retrieve(url: str, outfile: Path, overwrite: bool = False):
     copy2(RESOURCES_FOLDER / "config.json", CONFIG_FOLDER)
     copy2(RESOURCES_FOLDER / "reward.pkl", REWARD_FOLDER)
     copy2(RESOURCES_FOLDER / "SAC_4_LIDAR_pretrained.pth", WEIGHTS_FOLDER)
+    copy2(RESOURCES_FOLDER / "SAC_4_imgs_pretrained.pth", WEIGHTS_FOLDER)
 
     # on Windows, look for OpenPlanet:
     if platform.system() == "Windows":