VirtualCam: Single-Stage Monocular 3D Object Detection with Virtual Cameras

January 2020

tl;dr: Generate synthetic views of image to reduce the complexity of 3D MOD neural networks.

Overall impression

The paper builds on the work of monoDIS. The main idea is that the network has to build distinct representations devoted to recognize objects at specific depths and there is little margin of generalization for different depth. This happens as it lacks generalization across depth. As a result, we have to scale up network's capacity as a function of the depth ranges, and scale up training data as well.

This is a classical tradeoff of model/data complexity vs inference complexity. If there is an inherent structure of the image (in autonomous driving camera images, closer object appear at the bottom of the image and further away object are higher up in the image), it can be exploited using row-aware or depth aware convolution (cf M3D RPN). In this paper, they did a row-wise image pyramid of the original image.

The paper also has a good introduction of monocular 3d object detection.

Key ideas

Training and inference discrepancy
Training: train a NN to make correct predictions within a limited depth range.
- generate nv = 8 virtual images per original image.
- Ground truth guided sampling procedure (cf PointRend). The object should be completely visible (not cropped). Random shift of virtual cam by [-Z_res/2, 0].
- GT falling out of preset depth range [0, Z_res] is set to ignore/dont_care.
- The depth is shifted by Zv to ensure depth invariance.
Inference:
- Sample every Z_res/2 (cut out horizontal strips)
- Adjust height to be the same

Technical details

The paper did detailed analysis of the virtual camera intrinsic prarameter but they did not use it for training nor inference. Basically crop and rescale

Notes

Questions and notes on how to improve/revise the current work

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

virtual_cam.md

virtual_cam.md

VirtualCam: Single-Stage Monocular 3D Object Detection with Virtual Cameras

Overall impression

Key ideas

Technical details

Notes

Files

virtual_cam.md

Latest commit

History

virtual_cam.md

File metadata and controls

VirtualCam: Single-Stage Monocular 3D Object Detection with Virtual Cameras

Overall impression

Key ideas

Technical details

Notes