Release v0.41
Algorithm
- Policy in Latent Action Space (PLAS)
Off-Policy Evaluation
Off-policy evaluation (OPE) is a method to evaluate policy performance only with the offline dataset.
# train policy
from d3rlpy.algos import CQL
from d3rlpy.datasets import get_pybullet
dataset, env = get_pybullet('hopper-bullet-mixed-v0')
cql = CQL()
cql.fit(dataset.episodes)
# Off-Policy Evaluation
from d3rlpy.ope import FQE
from d3rlpy.metrics.scorer import soft_opc_scorer
from d3rlpy.metrics.scorer import initial_state_value_estimation_scorer
fqe = FQE(algo=cql)
fqe.fit(dataset.episodes,
eval_episodes=dataset.episodes
scorers={
'soft_opc': soft_opc_scorer(1000),
'init_value': initial_state_value_estimation_scorer
})
- Fitted Q-Evaluation
Q Function Factory
d3rlpy provides flexible controls over Q functions through Q function factory. Following this change, the previous q_func_type
argument was renamed to q_func_factory
.
from d3rlpy.algos import DQN
from d3rlpy.q_functions import QRQFunctionFactory
# initialize Q function factory
q_func_factory = QRQFunctionFactory(n_quantiles=32)
# give it to algorithm object
dqn = DQN(q_func_factory=q_func_factory)
You can pass Q function name as string too.
dqn = DQN(q_func_factory='qr')
You can also make your own Q function factory. Currently, these are the supported Q function factory.
EncoderFactory
- DenseNet architecture (only for vector observation)
from d3rlpy.algos import DQN
dqn = DQN(encoder_factory='dense')
N-step TD calculation
d3rlpy supports N-step TD calculation for ALL algorithms. You can pass n_steps
arugment to configure this parameters.
from d3rlpy.algos import DQN
dqn = DQN(n_steps=5) # n_steps=1 by default
Paper reproduction scripts
d3rlpy supports many algorithms including online and offline paradigms. Originally, d3rlpy is designed for industrial practitioners. But, academic research is still important to push deep reinforcement learning forward. Currently, there are online DQN-variant reproduction codes.
The evaluation results will be also available soon.
enhancements
build_with_dataset
andbuild_with_env
methods are added to algorithm objectsshuffle
flag is added tofit
method (thanks, @jamartinh )