You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your work. Would you be willing to share the training logs such as how the reward loss as well as the image loss changes as the number of training steps increases and to what extent it eventually converges. When I used the algorithm on my own dataset, I noticed that the world model's reward loss looks larger, and I'm not sure if that makes sense. It looks like the reward loss converges around 0.5, but the range of the reward is -1 to 1. I think this is a relatively large prediction error.
The text was updated successfully, but these errors were encountered:
Thank you for your interest in this work and for sharing your observations regarding the reward loss in your experiments.
I understand your concern about the world model's reward loss appearing larger than expected, especially considering the reward range of -1 to 1. A reward loss converging around 0.5 can indeed be indicative of a significant prediction error.
To address this, I'd like to inform you that I have recently made an update to the network weight initialization, aligning it same with the original repository. This adjustment could potentially influence the training dynamics and loss values you're observing. The details of this update can be found in the recent commit: Network Weight Initialization Adjustment.
I recommend rerunning your experiments with this latest update. It's possible that this adjustment may lead to improvements in the reward loss metrics for your dataset.
If you continue to observe unusual reward loss values or if you're unable to identify the causes, I would be more than willing to assist further. You can share your logs with me for a closer examination. Alternatively, if you're interested in comparing your results with mine, I can provide the training logs post the modifications mentioned above. In those cases, please contact me via e-mail.
Thank you for your work. Would you be willing to share the training logs such as how the reward loss as well as the image loss changes as the number of training steps increases and to what extent it eventually converges. When I used the algorithm on my own dataset, I noticed that the world model's reward loss looks larger, and I'm not sure if that makes sense. It looks like the reward loss converges around 0.5, but the range of the reward is -1 to 1. I think this is a relatively large prediction error.
The text was updated successfully, but these errors were encountered: