Current progress & Near‐term plan

Jump to bottom

Itomigna2 edited this page Mar 5, 2024 · 5 revisions

Current progress

This version uses CNN based representation network + LSTM based dynamics network architecture for RGB input environment (LunarLander-v2, using rgb states wrapped by PixelObservationWrapper(self.env))
It works for randomly initialized env(random terrain and acceleration), but still not perfectly converge to more than 200 score. It needs to be improved further.

Near-term plan

(Near-term plan) Test stronger encoder architecture
(Near-term plan) Efficiency optimization about code and nni experiment setting
(Near-term plan) Try to use off-policy correction method like V-trace, Retrace.

Toggle table of contents Pages 5

Clone this wiki locally