memory leak for multi-human policies #22

huiwenzhang · 2023-06-23T06:58:22Z

Hi, when running the multi-human policy, such as sarl, lstm-rl, I noticed that there is drastic memory increase with training going on. The used memory increased from about 4G to 20G after 100 episodes training. I debug for a long time, but still no clue about what's going wrong there. @ChanganVR Pls have a look.

ChanganVR · 2023-06-24T04:58:37Z

@huiwenzhang Not such issue has been reported before. Maybe you could check whether your pytorch and CUDA version are compatible. Sometimes that could have an effect on the memory consumption.

huiwenzhang · 2023-06-24T13:39:00Z

@huiwenzhang Not such issue has been reported before. Maybe you could check whether your pytorch and CUDA version are compatible. Sometimes that could have an effect on the memory consumption.

I used pytorch version 2.0.1 with cuda version 11.8. The local cuda version is 12.1. According to the official doc of pytorch, newer cuda version is also supported. Besides, I didn't use GPU as you suggested. But the problem still exist. Training with cadrl and rgl policy is fine. Do you have any other guess about the memory leak?

ChanganVR · 2023-06-30T09:19:25Z

@huiwenzhang I see. I don't have a clue what could be causing the issue. You could debug by removing all codes and adding back parts by parts until the issue occurs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory leak for multi-human policies #22

memory leak for multi-human policies #22

huiwenzhang commented Jun 23, 2023 •

edited

Loading

ChanganVR commented Jun 24, 2023

huiwenzhang commented Jun 24, 2023

ChanganVR commented Jun 30, 2023

memory leak for multi-human policies #22

memory leak for multi-human policies #22

Comments

huiwenzhang commented Jun 23, 2023 • edited Loading

ChanganVR commented Jun 24, 2023

huiwenzhang commented Jun 24, 2023

ChanganVR commented Jun 30, 2023

huiwenzhang commented Jun 23, 2023 •

edited

Loading