Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Training Parallelism #10

Open
macromogic opened this issue May 31, 2022 · 2 comments
Open

About Training Parallelism #10

macromogic opened this issue May 31, 2022 · 2 comments

Comments

@macromogic
Copy link

Hi. I wanted to train the model on my own dataset, but I found my cuda memory runs out when processing occupancy_256 prediction. I tried nn.DataParallel to try to run the model on multiple GPUs, but it raises such an error:

AttributeError: 'MinkowskiConvolution' object has no attribute 'dimension'

I searched for this error and found out it was an unresolved issue of MinkowskiEngine (link here). I wonder how you trained the model on your computer, and could you please be kind to inform other possible solutions to make it work? Thank you!

@xheon
Copy link
Owner

xheon commented May 31, 2022

The model was trained on a RTX 2080Ti with 11GB memory.

Few things you can check:

  • Increase the number of iterations of lower resolutions are trained (LEVEL_ITERATIONS_64, LEVEL_ITERATIONS_128) before you train the entire model. If your lower resolution predictions are not good enough a lot of voxels could be created on the final resolution, which will require a lot of memory.
  • Increase the masking threshold (SPARSE_THRESHOLD_128, SPARSE_THRESHOLD_256). With that you can control the level of "confidence" an occupied voxel needs to be considered for the next resolution.
  • Generally, the 2D features (80 channels) did not contribute too much to the final performance, but actually require some memory. You can remove that part of the 3D model.

@macromogic
Copy link
Author

Thanks for the reply! However, I still could not proceed with the level-256 training. I am using the latest version of BlenderProc, so I suspect it was because the format of my generated data is different from yours (which may influence the performance of level-64 and 128). I am still figuring out why.

I inspected the 3D-FRONT dataset and read the code of your forked BlenderProc. There is something I am still wondering about:

  1. In the SegMapRenderer class, you seemed to map the instance to another integer ID (while my data is in float64 type). Does it affect the model performance?
  2. I noticed the dataset contains both raw_model and normalized_model files. How is the normalization performed? Does it have anything to do with geometry data generation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants