paper_summaries/Model_Free_Reinforcement_Learning at master · quanvuong/paper_summaries · GitHub

Name		Name	Last commit message	Last commit date
parent directory ..
Deep_Reinforcement_Learning_with_Double_Q_Learning.pdf		Deep_Reinforcement_Learning_with_Double_Q_Learning.pdf
Diagnosing_Bottlenecks_in_Deep_Q_learning_Algorithms.pdf		Diagnosing_Bottlenecks_in_Deep_Q_learning_Algorithms.pdf
InvertedPendulum-v2.png		InvertedPendulum-v2.png
Off_Policy_Deep_Reinforcement_Learning_without_Exploration.pdf		Off_Policy_Deep_Reinforcement_Learning_without_Exploration.pdf
PPO_implementation_details.pdf		PPO_implementation_details.pdf
Prioritized_Experience_Replay.pdf		Prioritized_Experience_Replay.pdf
README.md		README.md

README.md

Proximal Policy Optimization

The pdf is not a summary, but a detailed description of the implementation details of PPO.

Deep Reinforcement Learning with Double Q Learning

In DQN, to construct target for the Q-function, the same Q-function is used for action selection and evaluation.
This leads to action value over-estimation.
They show that decoupling action selection and evaluation reduces the degree of over-estimation and increases performance.

Prioritized_Experience_Replay

Off_Policy_Deep_Reinforcement_Learning_without_Exploration

Studies the setting where only a replay buffer is available and the agent is not allowed to interact with the environment.
Proposes that the learnt agent should generate a state-action visitation frequency that is similar to the replay buffer.

Diagnosing_Bottlenecks_in_Deep_Q_learning_Algorithms

Identifies that the choice of the sampling distribution and architecture choice might play large roles in performance.
Identifies that convergence might not be an issue in deep Q-learning.