Skip to content

Commit

Permalink
Upgraded to rtgym 0.8 (gymnasium)
Browse files Browse the repository at this point in the history
  • Loading branch information
yannbouteiller committed Mar 19, 2023
1 parent 05ec924 commit 43620a4
Show file tree
Hide file tree
Showing 21 changed files with 63 additions and 67 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
`tmrl` is a python library designed to facilitate the implementation of deep RL applications in real-time settings such as robots and video games. Full tutorial [here](readme/tuto_library.md) and documentation [here](https://tmrl.readthedocs.io/en/latest/).

- :ok_hand: **ML developers who are TM enthusiasts with no interest in learning this huge thing:**\
`tmrl` provides a Gym environment for TrackMania that is easy to use. Fast-track for you guys [here](#trackmania-gym-environment).
`tmrl` provides a Gymnasium environment for TrackMania that is easy to use. Fast-track for you guys [here](#trackmania-gymnasium-environment).

- :earth_americas: **Everyone:**\
`tmrl` hosts the [TrackMania Roborace League](readme/competition.md), a vision-based AI competition where participants design real-time self-racing AIs in the TrackMania video game.
Expand All @@ -44,7 +44,7 @@
- [Security (important)](#security)
- [TrackMania applications](#autonomous-driving-in-trackmania)
- [TrackMania Roborace League](readme/competition.md)
- [TrackMania Gym environment](#trackmania-gym-environment)
- [TrackMania Gymnasium environment](#trackmania-gymnasium-environment)
- [LIDAR environment](#lidar-environment)
- [Full environment](#full-environment)
- [TrackMania training details](#trackmania-training-details)
Expand Down Expand Up @@ -93,8 +93,8 @@ These models learn the physics from histories or observations equally spaced in
`tmrl` is a complete framework designed to help you successfully implement deep RL in your [real-time applications](#real-time-gym-framework) (e.g., robots...).
A complete tutorial toward doing this is provided [here](readme/tuto_library.md).

* **TrackMania Gym environment:**
`tmrl` comes with a real-time Gym environment for the TrackMania2020 video game, based on [rtgym](https://pypi.org/project/rtgym/). Once `tmrl` is installed, it is easy to use this environment in your own training framework. More information [here](#trackmania-gym-environment).
* **TrackMania Gymnasium environment:**
`tmrl` comes with a real-time Gymnasium environment for the TrackMania2020 video game, based on [rtgym](https://pypi.org/project/rtgym/). Once `tmrl` is installed, it is easy to use this environment in your own training framework. More information [here](#trackmania-gymnasium-environment).

* **Distributed training:**
`tmrl` is based on a single-server / multiple-clients architecture.
Expand Down Expand Up @@ -157,7 +157,7 @@ Follow the link for information about the competition, including the current lea

Regardless of whether they want to compete or not, ML developers will find the [competition tutorial script](https://github.com/trackmania-rl/tmrl/blob/master/tmrl/tuto/competition/custom_actor_module.py) useful for creating advanced training pipelines in TrackMania.

## TrackMania Gym environment
## TrackMania Gymnasium environment
In case you only wish to use the `tmrl` Real-Time Gym environment for TrackMania in your own training framework, this is made possible by the `get_environment()` method:

_(NB: the game needs to be set up as described in the [getting started](readme/get_started.md) instructions)_
Expand Down
2 changes: 1 addition & 1 deletion readme/Install.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ _(Note for ML developers: in case you are not interested in using support for Tr

The following instructions are for installing `tmrl` with support for the TrackMania 2020 video game.

You will first need to install [TrackMania 2020](https://www.trackmania.com/) (obviously), and also a small community-supported utility called [Openplanet for TrackMania](https://openplanet.nl/) (the Gym environment needs this utility to compute the reward).
You will first need to install [TrackMania 2020](https://www.trackmania.com/) (obviously), and also a small community-supported utility called [Openplanet for TrackMania](https://openplanet.nl/) (the Gymnasium environment needs this utility to compute the reward).


### Install TrackMania 2020:
Expand Down
2 changes: 1 addition & 1 deletion readme/competition.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ We choose whether to accept your entry based on reproducibility and novelty.
### Current iteration (Beta)
The `tmrl` competition is an open research initiative, currently in its first iteration :hatching_chick:

In this iteration, competitors race on the `tmrl-test` track (plain road) by solving the `Full` version of the [TrackMania 2020 Gym environment](https://github.com/trackmania-rl/tmrl#gym-environment) (the `LIDAR` version is also accepted).
In this iteration, competitors race on the `tmrl-test` track (plain road) by solving the `Full` version of the [TrackMania 2020 Gym environment](https://github.com/trackmania-rl/tmrl#trackmania-gymnasium-environment) (the `LIDAR` version is also accepted).

- The `action space` is the default TrackMania 2020 continuous action space (3 floats between -1.0 and 1.0).
- The `observation space` is a history of 4 raw snapshots along with the speed, gear, rpm and 2 previous actions. The choice of camera is up to you as long as you use one of the default. You are allowed to use colors if you wish (set the `"IMG_GRAYSCALE"` entry to `false` in `config.json`). You may also customize the actual image dimensions (`"IMG_WIDTH"` and `"IMG_HEIGHT"`), and the game window dimensions (`"WINDOW_WIDTH"` and `"WINDOW_HEIGHT"`) if you need to. However, the window dimensions must remain between `(256, 128)` and `(958, 488)` (dimensions greater than `(958, 488)` are **not** allowed).
Expand Down
26 changes: 13 additions & 13 deletions readme/tuto_library.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ As soon as the server is instantiated, it listens for incoming connections from
In RL, a task is often called an "environment".
`tmrl` is meant for asynchronous remote training of real-time applications such as robots.
Thus, we use [Real-Time Gym](https://github.com/yannbouteiller/rtgym) (`rtgym`) to wrap our robots and video games into a Gym environment.
You can also probably use other environments as long as they are registered as Gym environments and have a relevant substitute for the `default_action` attribute.
You can also probably use other environments as long as they are registered as Gymnasium environments and have a relevant substitute for the `default_action` attribute.

To build your own environment (e.g., an environment for your own robot or video game), follow the [rtgym tutorial](https://github.com/yannbouteiller/rtgym#tutorial).
If you need inspiration, you can find our `rtgym` interfaces for TrackMania in [custom_gym_interfaces.py](https://github.com/trackmania-rl/tmrl/blob/master/tmrl/custom/custom_gym_interfaces.py).
Expand All @@ -173,7 +173,7 @@ _(NB: you need `opencv-python` installed)_

```python
from rtgym import RealTimeGymInterface, DEFAULT_CONFIG_DICT, DummyRCDrone
import gym.spaces as spaces
import gymnasium.spaces as spaces
import numpy as np
import cv2
from threading import Thread
Expand Down Expand Up @@ -276,7 +276,7 @@ my_config["benchmark_polyak"] = 0.2

## Rollout workers

Now that we have our robot encapsulated in a Gym environment, we will create an RL actor.
Now that we have our robot encapsulated in a Gymnasium environment, we will create an RL actor.
In `tmrl`, this is done within a `RolloutWorker` object.

One to several `RolloutWorkers` can coexist in `tmrl`, each one typically encapsulating a robot, or, in the case of a video game, an instance of the game
Expand All @@ -290,7 +290,7 @@ import tmrl.config.config_constants as cfg # constants from the config.json fil
class RolloutWorker:
def __init__(
self,
env_cls=None, # class of the Gym environment
env_cls=None, # class of the Gymnasium environment
actor_module_cls=None, # class of a module containing the policy
sample_compressor: callable = None, # compressor for sending samples over the Internet
server_ip=None, # ip of the central server
Expand All @@ -315,16 +315,16 @@ In this tutorial, we will implement a similar `RolloutWorker` for our dummy dron

The first argument of our `RolloutWorker` is `env_cls`.

This expects a Gym environment class, which can be partially instantiated with `partial()`.
Furthermore, this Gym environment needs to be wrapped in the `GenericGymEnv` wrapper (which by default just changes float64 to float32 in observations).
This expects a Gymnasium environment class, which can be partially instantiated with `partial()`.
Furthermore, this Gymnasium environment needs to be wrapped in the `GenericGymEnv` wrapper (which by default just changes float64 to float32 in observations).

With our dummy drone environment, this translates to:

```python
from tmrl.util import partial
from tmrl.envs import GenericGymEnv

env_cls=partial(GenericGymEnv, id="real-time-gym-v0", gym_kwargs={"config": my_config})
env_cls=partial(GenericGymEnv, id="real-time-gym-v1", gym_kwargs={"config": my_config})
```

We can create a dummy environment to retrieve the action and observation spaces:
Expand Down Expand Up @@ -505,7 +505,7 @@ This is done by setting the `Server` IP as the localhost IP, i.e., `"127.0.0.1"`
_(NB: We have set the values for `server_ip` and `server_port` earlier in this tutorial.)_

In the current iteration of `tmrl`, samples are gathered locally in a buffer by the `RolloutWorker` and are sent to the `Server` only at the end of an episode.
In case your Gym environment is never `terminated` (or only after too long), `tmrl` enables forcing reset after a time-steps threshold.
In case your Gymnasium environment is never `terminated` (or only after too long), `tmrl` enables forcing reset after a time-steps threshold.
For instance, let us say we don't want an episode to last more than 1000 time-steps:

_(Note 1: This is for the sake of illustration, in fact, this cannot happen in our RC drone environment)_
Expand Down Expand Up @@ -694,13 +694,13 @@ class TorchTrainingOffline:
`TorchTrainingOffline` requires other (possibly partially instantiated) classes as arguments: a dummy environment, a `TorchMemory`, and a `TrainingAgent`

#### Dummy environment:
`env_cls`: Most of the time, the dummy environment class that you need to pass here is the same class as for the `RolloutWorker` Gym environment:
`env_cls`: Most of the time, the dummy environment class that you need to pass here is the same class as for the `RolloutWorker` Gymnasium environment:

```python
from tmrl.util import partial
from tmrl.envs import GenericGymEnv

env_cls = partial(GenericGymEnv, id="real-time-gym-v0", gym_kwargs={"config": my_config})
env_cls = partial(GenericGymEnv, id="real-time-gym-v1", gym_kwargs={"config": my_config})
```
This dummy environment will only be used by the `Trainer` to retrieve the observation and action spaces (`reset()` will not be called).
Alternatively, you can pass this information as a Tuple:
Expand Down Expand Up @@ -750,7 +750,7 @@ class TorchMemory(ABC):
"""
Outputs a decompressed RL transition.
This transition is the same as the output by the Gym environment (after observation preprocessing).
This transition is the same as the output by the Gymnasium environment (after observation preprocessing).
Args:
item: int: indices of the transition that the Trainer wants to sample
Expand Down Expand Up @@ -826,7 +826,7 @@ In this tutorial, we will privilege memory usage and thus we will implement our
The `append_buffer()` method will simply store the compressed sample components in `self.data`.

`append_buffer()` is passed a [buffer](https://github.com/trackmania-rl/tmrl/blob/c1f740740a7d57382a451607fdc66d92ba62ea0c/tmrl/networking.py#L198) object that contains a list of compressed `(act, new_obs, rew, terminated, truncated, info)` samples in its `memory` attribute.
`act` is the action that was sent to the `step()` method of the Gym environment to yield `new_obs`, `rew`, `terminated`, `truncated`, and `info`.
`act` is the action that was sent to the `step()` method of the Gymnasium environment to yield `new_obs`, `rew`, `terminated`, `truncated`, and `info`.
Here, we decompose our samples in their relevant components, append these components to the `self.data` list, and clip `self.data` when `self.memory_size` is exceeded:

```python
Expand Down Expand Up @@ -904,7 +904,7 @@ Finally, if we have enough samples, we need to remove the length of the action b
Furthermore, the `get_transition()` method outputs a full RL transition, which includes the previous observation. Thus, we must subtract 1 to get the number of full transitions that we can actually output.

Alright, let us finally implement `get_transition()`, where we have chosen sample decompression would happen.
This method outputs full transitions as if they were output by the Gym environment
This method outputs full transitions as if they were output by the Gymnasium environment
(after observation preprocessing if used):

```python
Expand Down
12 changes: 4 additions & 8 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,21 +101,17 @@ def url_retrieve(url: str, outfile: Path, overwrite: bool = False):
install_req = [
'numpy',
'torch',
'imageio',
'imageio-ffmpeg',
'pandas',
'gym>=0.26.0',
'rtgym>=0.7',
'gymnasium',
'rtgym>=0.8',
'pyyaml',
'wandb',
'requests',
'opencv-python',
'scikit-image',
'keyboard',
'pyautogui',
'pyinstrument',
'tlspyo>=0.2.5',
'matplotlib'
'tlspyo>=0.2.5'
]

if platform.system() == "Windows":
Expand All @@ -131,7 +127,7 @@ def url_retrieve(url: str, outfile: Path, overwrite: bool = False):

setup(
name='tmrl',
version='0.4.2',
version='0.5.0',
description='Network-based framework for real-time robot learning',
long_description=README,
long_description_content_type='text/markdown',
Expand Down
6 changes: 3 additions & 3 deletions tmrl/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@

def get_environment():
"""
Default TMRL Gym environment for TrackMania 2020.
Default TMRL Gymnasium environment for TrackMania 2020.
Returns:
Gym.Env: An instance of the default TMRL Gym environment
gymnasium.Env: An instance of the default TMRL Gym environment
"""
return GenericGymEnv(id="real-time-gym-v0", gym_kwargs={"config": CONFIG_DICT})
return GenericGymEnv(id="real-time-gym-v1", gym_kwargs={"config": CONFIG_DICT})
2 changes: 1 addition & 1 deletion tmrl/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ def main(args):
config_modifiers = args.config
for k, v in config_modifiers.items():
config[k] = v
rw = RolloutWorker(env_cls=partial(GenericGymEnv, id="real-time-gym-v0", gym_kwargs={"config": config}),
rw = RolloutWorker(env_cls=partial(GenericGymEnv, id="real-time-gym-v1", gym_kwargs={"config": config}),
actor_module_cls=cfg_obj.POLICY,
sample_compressor=cfg_obj.SAMPLE_COMPRESSOR,
device='cuda' if cfg.CUDA_INFERENCE else 'cpu',
Expand Down
8 changes: 4 additions & 4 deletions tmrl/actor.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ class ActorModule(ABC):
def __init__(self, observation_space, action_space):
"""
Args:
observation_space (Gym.spaces.Space): observation space (here for your convenience)
action_space (Gym.spaces.Space): action space (here for your convenience)
observation_space (gymnasium.spaces.Space): observation space (here for your convenience)
action_space (gymnasium.spaces.Space): action space (here for your convenience)
"""
self.observation_space = observation_space
self.action_space = action_space
Expand Down Expand Up @@ -121,8 +121,8 @@ class TorchActorModule(ActorModule, torch.nn.Module, ABC):
def __init__(self, observation_space, action_space, device="cpu"):
"""
Args:
observation_space (Gym.spaces.Space): observation space (here for your convenience)
action_space (Gym.spaces.Space): action space (here for your convenience)
observation_space (gymnasium.spaces.Space): observation space (here for your convenience)
action_space (gymnasium.spaces.Space): action space (here for your convenience)
device: device where your model should live and where observations for `act` will be collated
"""
super().__init__(observation_space, action_space) # ActorModule
Expand Down
4 changes: 2 additions & 2 deletions tmrl/config/config_objects.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
else:
SAMPLE_COMPRESSOR = get_local_buffer_sample_tm20_imgs

# to preprocess observations that come out of the gym environment:
# to preprocess observations that come out of the gymnasium environment:
if cfg.PRAGMA_LIDAR:
if cfg.PRAGMA_PROGRESS:
OBS_PREPROCESSOR = obs_preprocessor_tm_lidar_progress_act_in_obs
Expand Down Expand Up @@ -144,7 +144,7 @@ def sac_v2_entropy_scheduler(agent, epoch):
agent.entopy_target = start_ent + (end_ent - start_ent) * epoch / end_epoch


ENV_CLS = partial(GenericGymEnv, id="real-time-gym-v0", gym_kwargs={"config": CONFIG_DICT})
ENV_CLS = partial(GenericGymEnv, id="real-time-gym-v1", gym_kwargs={"config": CONFIG_DICT})

if cfg.PRAGMA_LIDAR: # lidar
TRAINER = partial(
Expand Down
2 changes: 1 addition & 1 deletion tmrl/custom/custom_gym_interfaces.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

# third-party imports
import cv2
import gym.spaces as spaces
import gymnasium.spaces as spaces
import numpy as np


Expand Down
12 changes: 6 additions & 6 deletions tmrl/envs.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from dataclasses import InitVar, dataclass

# third-party imports
import gym
import gymnasium

# local imports
from tmrl.wrappers import (AffineObservationWrapper, Float64ToFloat32)
Expand All @@ -11,21 +11,21 @@
__docformat__ = "google"


class GenericGymEnv(gym.Wrapper):
class GenericGymEnv(gymnasium.Wrapper):
def __init__(self, id: str = "Pendulum-v0", obs_scale: float = 0., gym_kwargs={}):
"""
Use this wrapper when using the framework with arbitrary environments.
Args:
id (str): gym id
id (str): gymnasium id
obs_scale (float): change this if wanting to rescale actions by a scalar
gym_kwargs (dict): keyword arguments of the gym environment (i.e. between -1.0 and 1.0 when the actual action space is something else)
gym_kwargs (dict): keyword arguments of the gymnasium environment (i.e. between -1.0 and 1.0 when the actual action space is something else)
"""
env = gym.make(id, **gym_kwargs, disable_env_checker=True)
env = gymnasium.make(id, **gym_kwargs, disable_env_checker=True)
if obs_scale:
env = AffineObservationWrapper(env, 0, obs_scale)
env = Float64ToFloat32(env)
assert isinstance(env.action_space, gym.spaces.Box)
assert isinstance(env.action_space, gymnasium.spaces.Box)
# env = NormalizeActionWrapper(env)
super().__init__(env)

Expand Down
2 changes: 1 addition & 1 deletion tmrl/networking.py
Original file line number Diff line number Diff line change
Expand Up @@ -461,7 +461,7 @@ def __init__(
):
"""
Args:
env_cls (type): class of the Gym environment (subclass of tmrl.envs.GenericGymEnv)
env_cls (type): class of the Gymnasium environment (subclass of tmrl.envs.GenericGymEnv)
actor_module_cls (type): class of the module containing the policy (subclass of tmrl.actor.ActorModule)
sample_compressor (callable): compressor for sending samples over the Internet; \
when not `None`, `sample_compressor` must be a function that takes the following arguments: \
Expand Down
6 changes: 3 additions & 3 deletions tmrl/tools/benchmark_environment.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
import time

# third-party imports
import gym
from gym import spaces
import gymnasium
from gymnasium import spaces
from rtgym.envs.real_time_env import DEFAULT_CONFIG_DICT

# local imports
Expand All @@ -25,7 +25,7 @@ def benchmark():
env_config["running_average_factor"] = 0.05
env_config["wait_on_done"] = True
env_config["interface_kwargs"] = {"img_hist_len": 1, "gamepad": False, "min_nb_steps_before_failure": int(20 * 60)}
env = gym.make("real-time-gym-v0", config=env_config)
env = gymnasium.make("real-time-gym-v1", config=env_config)

t_d = time.time()
o, i = env.reset()
Expand Down
Loading

0 comments on commit 43620a4

Please sign in to comment.