Skip to content

Latest commit

 

History

History

Model_Based_Reinforcement_Learning

When_to_Trust_Your_Model_Model_based_policy_optimization

  • Use short rollout from the model to train the policy

  • Argue that the short rollout allows us to take more policy gradient step per environment step compared to model-free RL

Exploring_Model_based_Planning_with_Policy_Networks

  • Argues that performing CEM in the parameter space of a policy network is easier than in the action space.

  • Visualizes the reward function surface and uses this visualization to justify their claims.

  • The visualization techniques look interesting and can be re-used in other studies.

Using_Inaccurate_Models_in_Reinforcement_Learning

  • Differentiate through the learnt model to obtain the policy gradient.

  • Evaluate this policy gradient at a true trajectory to remove one source of gradient approximation error.