Skip to content

Release v0.60

Compare
Choose a tag to compare
@takuseno takuseno released this 27 Jan 15:10
· 835 commits to master since this release

logo

New logo images are made for d3rlpy πŸŽ‰

standard inverted
image d3rlpy_cover_narrow

ActionScaler

ActionScaler provides action scaling pre/post-processing for continuous control algorithms. Previously actions must be in between [-1.0, 1.0]. From now on, you don't need to care about the range of actions.

from d3rlpy.cql import CQL

cql = CQL(action_scaler='min_max')  # just pass action_scaler argument

handling timeout episodes

Episodes terminated by timeouts should not be clipped at bootstrapping. From this version, you can specify episode boundaries as well as the terminal flags.

from d3rlpy.dataset import MDPDataset

observations = ...
actions = ...
rewards = ...
terminals = ... # this indicates the environmental termination
episode_terminals = ... # this indicates episode boundaries

datasets = MDPDataset(observations, actions, rewards, terminals, episode_terminals)

# if episode_terminals are omitted, terminals will be used to specify episode boundaries
# datasets = MDPDataset(observations, actions, rewards, terminals) 

In online training, you can specify this option via timelimit_aware flag.

from d3rlpy.sac import SAC

env = gym.make('Hopper-v2') # make sure if the environment is wrapped by gym.wrappers.Timelimit

sac = SAC()
sac.fit_online(env, timelimit_aware=True) # this flag is True by default

reference: https://arxiv.org/abs/1712.00378

batch online training

When training with computationally expensive environments such as robotics simulators or rich 3D games, it will take a long time to finish due to the slow environment steps.
To solve this, d3rlpy supports batch online training.

from d3rlpy.algos import SAC
from d3rlpy.envs import AsyncBatchEnv

if __name__ == '__main__':  # this is necessary if you use AsyncBatchEnv
    env = AsyncBatchEnv([lambda: gym.make('Hopper-v2') for _ in range(10)])  # distributing 10 environments in different processes

    sac = SAC(use_gpu=True)
    sac.fit_batch_online(env) # train with 10 environments concurrently

docker image

Pre-built d3rlpy docker image is available in DockerHub.

$ docker run -it --gpus all --name d3rlpy takuseno/d3rlpy:latest bash

enhancements

  • BEAR algorithm is updated based on the official implementation
    • new mmd_kernel option is available
  • to_mdp_dataset method is added to ReplayBuffer
  • ConstantEpsilonGreedy explorer is added
  • d3rlpy.envs.ChannelFirst wrapper is added (thanks for reporting, @feyza-droid )
  • new dataset utility function d3rlpy.datasets.get_d4rl is added
    • this is handling timeouts inside the function
  • offline RL paper reproduction codes are added
  • smoothed moving average plot at d3rlpy plot CLI function (thanks, @pstansell )
  • user-friendly messages for assertion errors
  • better memory consumption
  • save_interval argument is added to fit_online

bugfix

  • core dumps are fixed in Google Colaboratory tutorials
  • typos in some documentations (thanks for reporting, @pstansell )