You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was specifically interested in the MoG-DQN, however, running your implementation, it seems that it does not manage to learn the simplest CartPole problem, even after thousands of episodes, whereas the standard DQN algorithm manages to consistently score the maximum possible points after about 450 episodes.
Is there something wrong with the implementation? I've been trying to implement the MoG-DQN for about a month myself, now, and all my attempts were unsuccessful. That's why I wanted to try your implementation out to see what I'm doing wrong, but it appears that your implementation does not work either. This is very surprising to me because in the paper, MoG-DQN appears to be able to learn even Atari games, whereas in my experience it does not even work on the CartPole problem. How come?
The text was updated successfully, but these errors were encountered:
Hello, I'm glad you are interested in my code. The code I implemented occasionally works, but it is extremely unstable and shows a phenomenon where the return increases after training but then decreases again. I am actually quite curious about this too. Previously, I wanted to apply the characteristics of MoG-DQN to related research, but I found that the results did not meet expectations.
The authors of the paper said they used a mixture density network, and I also calculated the loss according to the formulas they provided. Do you think there is anything wrong with my implementation?
Hello and sorry for the late reply. I don't think there's anything wrong with your implementation. In fact, I had also implemented the same algorithm independently in exactly the same way as you did. What was surprising was the fact that I could not replicate the results presented in the MoG-DQN paper, especially regarding the performance on the Atari games. I thought I got something wrong and wanted to try out your implementation, but it seems your implementation does not work for the Atari games either. At this point, I'm curious how they obtained the results they presented in the paper. I'm thinking of contacting the corresponding author to get some clarification. Thanks!
Hello, thanks for all the cool implementations.
I was specifically interested in the MoG-DQN, however, running your implementation, it seems that it does not manage to learn the simplest CartPole problem, even after thousands of episodes, whereas the standard DQN algorithm manages to consistently score the maximum possible points after about 450 episodes.
Is there something wrong with the implementation? I've been trying to implement the MoG-DQN for about a month myself, now, and all my attempts were unsuccessful. That's why I wanted to try your implementation out to see what I'm doing wrong, but it appears that your implementation does not work either. This is very surprising to me because in the paper, MoG-DQN appears to be able to learn even Atari games, whereas in my experience it does not even work on the CartPole problem. How come?
The text was updated successfully, but these errors were encountered: