Skip to content

Current progress & Near‐term plan

Itomigna2 edited this page Mar 18, 2024 · 5 revisions

Current progress

  • This version uses CNN based representation network + LSTM based dynamics network architecture for RGB input environment (LunarLander-v2, using rgb states wrapped by PixelObservationWrapper(self.env))
  • It works for randomly initialized env(random terrain and acceleration), but still not perfectly converge to more than 200 score.
  • Training(non-fixed env seed) takes several hours, longer experiment length than 5000, with Deep CNN.
  • Training(fixed env seed) takes up to 1 hour, experiment length 200~2000, with Deep CNN. Agent can learn perfectly in this case.
  • It needs to be improved further.

Near-term plan

  • Write the docs on the wiki.
  • Check the nni config setting and try to fix the issue#5.
  • Efficiency optimization about code and nni experiment setting.
  • Try to use off-policy correction method like V-trace, Retrace.
Previous

Progress

  • contents
Clone this wiki locally