Release Release v0.60 · takuseno/d3rlpy

logo

New logo images are made for d3rlpy 🎉

standard	inverted

ActionScaler

ActionScaler provides action scaling pre/post-processing for continuous control algorithms. Previously actions must be in between [-1.0, 1.0]. From now on, you don't need to care about the range of actions.

from d3rlpy.cql import CQL

cql = CQL(action_scaler='min_max')  # just pass action_scaler argument

handling timeout episodes

Episodes terminated by timeouts should not be clipped at bootstrapping. From this version, you can specify episode boundaries as well as the terminal flags.

from d3rlpy.dataset import MDPDataset

observations = ...
actions = ...
rewards = ...
terminals = ... # this indicates the environmental termination
episode_terminals = ... # this indicates episode boundaries

datasets = MDPDataset(observations, actions, rewards, terminals, episode_terminals)

# if episode_terminals are omitted, terminals will be used to specify episode boundaries
# datasets = MDPDataset(observations, actions, rewards, terminals)

In online training, you can specify this option via timelimit_aware flag.

from d3rlpy.sac import SAC

env = gym.make('Hopper-v2') # make sure if the environment is wrapped by gym.wrappers.Timelimit

sac = SAC()
sac.fit_online(env, timelimit_aware=True) # this flag is True by default

reference: https://arxiv.org/abs/1712.00378

batch online training

When training with computationally expensive environments such as robotics simulators or rich 3D games, it will take a long time to finish due to the slow environment steps.
To solve this, d3rlpy supports batch online training.

from d3rlpy.algos import SAC
from d3rlpy.envs import AsyncBatchEnv

if __name__ == '__main__':  # this is necessary if you use AsyncBatchEnv
    env = AsyncBatchEnv([lambda: gym.make('Hopper-v2') for _ in range(10)])  # distributing 10 environments in different processes

    sac = SAC(use_gpu=True)
    sac.fit_batch_online(env) # train with 10 environments concurrently

docker image

Pre-built d3rlpy docker image is available in DockerHub.

$ docker run -it --gpus all --name d3rlpy takuseno/d3rlpy:latest bash

enhancements

BEAR algorithm is updated based on the official implementation
- new mmd_kernel option is available
to_mdp_dataset method is added to ReplayBuffer
ConstantEpsilonGreedy explorer is added
d3rlpy.envs.ChannelFirst wrapper is added (thanks for reporting, @feyza-droid )
new dataset utility function d3rlpy.datasets.get_d4rl is added
- this is handling timeouts inside the function
offline RL paper reproduction codes are added
smoothed moving average plot at d3rlpy plot CLI function (thanks, @pstansell )
user-friendly messages for assertion errors
better memory consumption
save_interval argument is added to fit_online

bugfix

core dumps are fixed in Google Colaboratory tutorials
typos in some documentations (thanks for reporting, @pstansell )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.60