Skip to content

Latest commit

 

History

History
25 lines (19 loc) · 2.21 KB

vol_vs_mvcnn.md

File metadata and controls

25 lines (19 loc) · 2.21 KB

Mar 2019

tl;dr: Improvement of volumetric CNNs (3d shapenets) closes its gap with multi-view CNNs (MVCNN).

Overall impression

The paper starts with one hypothesis: the performance gap of volumetric and multi-view CNN is due to the resolution difference. However experiment shows this only explains part of the gap. The paper then takes on two directions: improve the volumetric CNN architecture, and exploit the resolution in MVCNN. This paper already shows the concise and straightforward style of Charles Qi's style later shown in pointnet.

Key ideas

  • The gap between 2D and 3D can be attributed to 2 factors: input resolution and network architecture.
  • The experiment to examine the effect of spatial resolution in 2D and 3D CNNs is done by sphere rendering. Spheres are used for discretization as they are view invariant.
  • Two improved volumetric CNN are proposed
    • Sub volume supervision (SubvolSup): the 3D network overfits severely. To address this, more difficult but highly relevant auxiliary tasks are added to perform classification with partial volume (specifically with feature maps within each octant).
    • Anisotropic probing (AniProbing): use an anisotropic kernel to mimic 2D projection of a 3D input.
  • Use of orientation-pooling with multi-orientation (MO) input augmentation to boost the performance of SubvolSup.
    • As expected, AniProbing benefits more from the augmentation. In other words, AniProbing is inspired by MVCNN and is supposed to use with multi-orientation augmentation.
  • Another way to boost the performance of volumetric CNN is to use a spatial transformation network. STN tends to align all 3D volumes to a canonical viewpoint.

Technical details

  • AniProbing is different from 2D rendering of a 3D object with computer graphics in two ways: it "sees through" the 3D object and provides an x-ray like scanning capacity; it saves computation time.

Notes

  • Video presentation at CVPR 2015.
  • How about using deterministic average or max pooling, instead of learning an anisotropic kernel in AniProbing?