Skip to content

Current progress & Near‐term plan

Itomigna2 edited this page Mar 5, 2024 · 5 revisions

Current progress

  • This version uses CNN based representation network + LSTM based dynamics network architecture for RGB input environment (LunarLander-v2, using rgb states wrapped by PixelObservationWrapper(self.env))
  • It works for randomly initialized env(random terrain and acceleration), but still not perfectly converge to more than 200 score. It needs to be improved further.

Near-term plan

  • (Near-term plan) Test stronger encoder architecture
  • (Near-term plan) Efficiency optimization about code and nni experiment setting
  • (Near-term plan) Try to use off-policy correction method like V-trace, Retrace.
Previous

Progress

  • contents
Clone this wiki locally