September 2020
tl;dr: Project 3D object detection into BEV map to train a better driving agent.
Monocular 3D object detection in a way similar to Deep3DBox. Then the 3D object detection results are rendered into a BEV (Plan view). Having access to this plan view reduces collisions by half.
- Plan view is essential for planning.
- In perspective view, free space and overall structure is implicit rather than explicit.
- Hallucinating a top-down view of the road makes it easier to earn to drive as free and occupied spaces are explicitly represented at a constant resolution through the image.
- Perception stack should generate this plan view for planning stack.
- Summary of technical details
- Questions and notes on how to improve/revise the current work