Skip to content

Latest commit

 

History

History

Goal_Conditioned_RL

Skew_Fit

  • A policy should accomplish diverse and unknown user-specified goal at test time.

  • If the goal specified at test time is unknown, then the policy needs to perform well on all possible goals.

  • Then, the goal proposal distribution, from which we sample goals for the policy to practice, should be the uniform distribution over the set of possible goals.

  • However, we do not know this set a prior and can only observe sample from the set.

  • State and goal are assumed to be equivalent in this paper.

  • At each iteration, we can sample state by performing goal-directed exploration. How do we construct the goal proposal distribution at each iteration such that it converges to the uniform dist. over the set of possible goals over time?

Visual_Reinforcement_Learning_with_Imagined_Goals

  • Learning from images with a VAE

  • Self-supervised goal proposal and practice