Skip to content

Latest commit

 

History

History

Model_Free_Reinforcement_Learning

Proximal Policy Optimization

  • The pdf is not a summary, but a detailed description of the implementation details of PPO.

Deep Reinforcement Learning with Double Q Learning

  • In DQN, to construct target for the Q-function, the same Q-function is used for action selection and evaluation.

  • This leads to action value over-estimation.

  • They show that decoupling action selection and evaluation reduces the degree of over-estimation and increases performance.

Prioritized_Experience_Replay

Off_Policy_Deep_Reinforcement_Learning_without_Exploration

  • Studies the setting where only a replay buffer is available and the agent is not allowed to interact with the environment.

  • Proposes that the learnt agent should generate a state-action visitation frequency that is similar to the replay buffer.

Diagnosing_Bottlenecks_in_Deep_Q_learning_Algorithms

  • Identifies that the choice of the sampling distribution and architecture choice might play large roles in performance.

  • Identifies that convergence might not be an issue in deep Q-learning.