About Training Parallelism #10

macromogic · 2022-05-31T14:25:09Z

Hi. I wanted to train the model on my own dataset, but I found my cuda memory runs out when processing occupancy_256 prediction. I tried nn.DataParallel to try to run the model on multiple GPUs, but it raises such an error:

AttributeError: 'MinkowskiConvolution' object has no attribute 'dimension'

I searched for this error and found out it was an unresolved issue of MinkowskiEngine (link here). I wonder how you trained the model on your computer, and could you please be kind to inform other possible solutions to make it work? Thank you!

The text was updated successfully, but these errors were encountered:

xheon · 2022-05-31T14:41:39Z

The model was trained on a RTX 2080Ti with 11GB memory.

Few things you can check:

Increase the number of iterations of lower resolutions are trained (LEVEL_ITERATIONS_64, LEVEL_ITERATIONS_128) before you train the entire model. If your lower resolution predictions are not good enough a lot of voxels could be created on the final resolution, which will require a lot of memory.
Increase the masking threshold (SPARSE_THRESHOLD_128, SPARSE_THRESHOLD_256). With that you can control the level of "confidence" an occupied voxel needs to be considered for the next resolution.
Generally, the 2D features (80 channels) did not contribute too much to the final performance, but actually require some memory. You can remove that part of the 3D model.

macromogic · 2022-06-23T16:44:18Z

Thanks for the reply! However, I still could not proceed with the level-256 training. I am using the latest version of BlenderProc, so I suspect it was because the format of my generated data is different from yours (which may influence the performance of level-64 and 128). I am still figuring out why.

I inspected the 3D-FRONT dataset and read the code of your forked BlenderProc. There is something I am still wondering about:

In the SegMapRenderer class, you seemed to map the instance to another integer ID (while my data is in float64 type). Does it affect the model performance?
I noticed the dataset contains both raw_model and normalized_model files. How is the normalization performed? Does it have anything to do with geometry data generation?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Training Parallelism #10

About Training Parallelism #10

macromogic commented May 31, 2022

xheon commented May 31, 2022

macromogic commented Jun 23, 2022

About Training Parallelism #10

About Training Parallelism #10

Comments

macromogic commented May 31, 2022

xheon commented May 31, 2022

macromogic commented Jun 23, 2022