-
Notifications
You must be signed in to change notification settings - Fork 5
Current progress & Near‐term plan
Itomigna2 edited this page Mar 5, 2024
·
5 revisions
- This version uses CNN based representation network + LSTM based dynamics network architecture for RGB input environment (LunarLander-v2, using rgb states wrapped by PixelObservationWrapper(self.env))
- It works for randomly initialized env(random terrain and acceleration), but still not perfectly converge to more than 200 score. It needs to be improved further.
- (Near-term plan) Test stronger encoder architecture
- (Near-term plan) Efficiency optimization about code and nni experiment setting
- (Near-term plan) Try to use off-policy correction method like V-trace, Retrace.