-
Notifications
You must be signed in to change notification settings - Fork 5
Current progress & Near‐term plan
Itomigna2 edited this page Mar 5, 2024
·
5 revisions
- This version uses CNN based representation network + LSTM based dynamics network architecture for RGB input environment (LunarLander-v2, using rgb states wrapped by PixelObservationWrapper(self.env))
- It works for randomly initialized env(random terrain and acceleration), but still not perfectly converge to more than 200 score.
- Training takes more than 10 hours, longer experiment length than 5000, with Deep CNN ann env with non-fixed random seed.
- It needs to be improved further.
- Write the docs on the wiki.
- Check the nni config setting and try to fix the issue#5.
- Efficiency optimization about code and nni experiment setting.
- Try to use off-policy correction method like V-trace, Retrace.