Release v0.60
logo
New logo images are made for d3rlpy π
standard | inverted |
---|---|
![]() |
![]() |
ActionScaler
ActionScaler
provides action scaling pre/post-processing for continuous control algorithms. Previously actions must be in between [-1.0, 1.0]
. From now on, you don't need to care about the range of actions.
from d3rlpy.cql import CQL
cql = CQL(action_scaler='min_max') # just pass action_scaler argument
handling timeout episodes
Episodes terminated by timeouts should not be clipped at bootstrapping. From this version, you can specify episode boundaries as well as the terminal flags.
from d3rlpy.dataset import MDPDataset
observations = ...
actions = ...
rewards = ...
terminals = ... # this indicates the environmental termination
episode_terminals = ... # this indicates episode boundaries
datasets = MDPDataset(observations, actions, rewards, terminals, episode_terminals)
# if episode_terminals are omitted, terminals will be used to specify episode boundaries
# datasets = MDPDataset(observations, actions, rewards, terminals)
In online training, you can specify this option via timelimit_aware
flag.
from d3rlpy.sac import SAC
env = gym.make('Hopper-v2') # make sure if the environment is wrapped by gym.wrappers.Timelimit
sac = SAC()
sac.fit_online(env, timelimit_aware=True) # this flag is True by default
reference: https://arxiv.org/abs/1712.00378
batch online training
When training with computationally expensive environments such as robotics simulators or rich 3D games, it will take a long time to finish due to the slow environment steps.
To solve this, d3rlpy supports batch online training.
from d3rlpy.algos import SAC
from d3rlpy.envs import AsyncBatchEnv
if __name__ == '__main__': # this is necessary if you use AsyncBatchEnv
env = AsyncBatchEnv([lambda: gym.make('Hopper-v2') for _ in range(10)]) # distributing 10 environments in different processes
sac = SAC(use_gpu=True)
sac.fit_batch_online(env) # train with 10 environments concurrently
docker image
Pre-built d3rlpy docker image is available in DockerHub.
$ docker run -it --gpus all --name d3rlpy takuseno/d3rlpy:latest bash
enhancements
BEAR
algorithm is updated based on the official implementation- new
mmd_kernel
option is available
- new
to_mdp_dataset
method is added toReplayBuffer
ConstantEpsilonGreedy
explorer is addedd3rlpy.envs.ChannelFirst
wrapper is added (thanks for reporting, @feyza-droid )- new dataset utility function
d3rlpy.datasets.get_d4rl
is added- this is handling timeouts inside the function
- offline RL paper reproduction codes are added
- smoothed moving average plot at
d3rlpy plot
CLI function (thanks, @pstansell ) - user-friendly messages for assertion errors
- better memory consumption
save_interval
argument is added tofit_online
bugfix
- core dumps are fixed in Google Colaboratory tutorials
- typos in some documentations (thanks for reporting, @pstansell )