Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak for multi-human policies #22

Open
huiwenzhang opened this issue Jun 23, 2023 · 3 comments
Open

memory leak for multi-human policies #22

huiwenzhang opened this issue Jun 23, 2023 · 3 comments

Comments

@huiwenzhang
Copy link

huiwenzhang commented Jun 23, 2023

Hi, when running the multi-human policy, such as sarl, lstm-rl, I noticed that there is drastic memory increase with training going on. The used memory increased from about 4G to 20G after 100 episodes training. I debug for a long time, but still no clue about what's going wrong there. @ChanganVR Pls have a look.

@ChanganVR
Copy link
Owner

@huiwenzhang Not such issue has been reported before. Maybe you could check whether your pytorch and CUDA version are compatible. Sometimes that could have an effect on the memory consumption.

@huiwenzhang
Copy link
Author

@huiwenzhang Not such issue has been reported before. Maybe you could check whether your pytorch and CUDA version are compatible. Sometimes that could have an effect on the memory consumption.

I used pytorch version 2.0.1 with cuda version 11.8. The local cuda version is 12.1. According to the official doc of pytorch, newer cuda version is also supported. Besides, I didn't use GPU as you suggested. But the problem still exist. Training with cadrl and rgl policy is fine. Do you have any other guess about the memory leak?

@ChanganVR
Copy link
Owner

@huiwenzhang I see. I don't have a clue what could be causing the issue. You could debug by removing all codes and adding back parts by parts until the issue occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants