Our work has been presented and published in the IEEE-TENCON 2021 conference held in Auckland, New Zealand. This paper can be viewed and referenced using the IEEE-Xplore link provided below -
Resource-Conscious High-Performance Models for 2D-to-3D Single View Reconstruction
We, Dhruv Srikanth, Suraj Bidnur and Rishab Kumar would like to thank Dr. Sanjeev G for his guidance throughout our research and capstone project for our final year of undergraduate engineering. We would also like to thank the IEEE Society for publishing our paper titled - "Resource-Conscious High-Performance Models for 2D-to-3D Single View Reconstruction" by Suraj Bidnur, Dhruv Srikanth and Sanjeev G.
We aim to reconstruct 3D voxel models from their 2D images using deep learning algorithms. We differentiate from other techniques, methods and models used in our success in reducing resource utilization, increasing computational efficiency and reducing training time all while improving on the performance and accuracy.
The Pix2Vox model and 3D-R2N2 architectures provided us with inspiration. We based original based our approach off of a similar model and then made alteration from that point onwards for single view image reconstruction without any data augmentation. The papers for the Pix2Vox and 3D-R2N2 architectures can be found below -
- Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images
- 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
- Lack of 3D content despite increasing demands by various industries like gaming, medical, cinema etc.
- Increase in popularity along with the proven success of deep learning techniques like CNNs, GANs etc. over recent years.
- High resource requirements and computation costs in existing approaches.
The dataset we have trained our models on is the 3D-ShapeNet dataset. The links to the 2D rendering files and the 3D binvox files are mentioned below.
The dataset contains 13 different object classes with over 700,000 images.
Here we propose 2 models for use in different scenarios.
- AE-Dense: This model gives the best results (highest IoU) but it comes at the cost of a much higher GPU memory utilization (close to 9GB). In situations where there is no limitation on the GPU memory, then this model can be used.
- 3D-SkipNet: This model performs slightly worse than AE-Dense but it uses around 2GB less GPU memory (close to 7GB). In situations where GPU memory availability is critical, this model can be used.
All the details needed to setup and get the project running are given in the "setup_instructions.txt" file.
- Performance Metric - Intersection over Union (IoU)
- Loss - Binary Cross-Entropy (BCE)
- Epochs: 150
- Learning Rate: 0.001
- Input shape: 224,224,3
- Batch size: 32
- Output shape: 32,32,32 voxel grid
- GPU: Nvidia Tesla T4 with 16GB VRAM
- CPU and RAM: 4 vCPU’s and 28GB RAM
- OS: Ubuntu 18.04 running in a Microsoft Azure VM
- Tensorflow: 2.4.0
- CUDA: 11.0
- cuDNN: 8.0
- Python: 3.6-3.8
Given below are the mean IoUs for each of the following models that we trained:
- AE-Res: 0.6787
- AE-Dense: 0.7258
- 3D-SkipNet: 0.6871
- 3D-SkipNet with kernel splitting: 0.6626
Given below are the mean IoUs for each of the following models that are state-of-the-art baselines for comparison:
- Pix2Vox: 0.6340
- 3D-R2N2: 0.5600
Our research on this topic resulted in a research paper that has been presented and published in the IEEE-TENCON 2021 conference held in Auckland, New Zealand. The paper can be viewed and referenced using the IEEE-Xplore link provided below -
Resource-Conscious High-Performance Models for 2D-to-3D Single View Reconstruction
- There exists a trade-off for skip connections and dense connections between performance and resource utilization.
- We propose using dense connections in non-resource constrained environments.
- We hope that our models establish the potential to utilise 3D reconstruction of objects whilst utilising minimal resources towards building sustainability in the environment and accessibility on the edge.