Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges
July 2019
tl;dr: Very good review of sensor fusion in 3D object detection.
The summary of data fusion schemes is highly informative. After all, it seems that the pioneering work of MV3D and AVOD are still highly relevant.
Deep learning on Radar data has not been widely explored yet and have potential to yield better object detection algorithms.
- For 3D object detection, PointPillars has the best balanced algorithm based on lidar-only, and AVOD is the best balanced algorithm based on lidar/camera fusion.
- Three ways to process lidar point cloud
- voxelization
- pointnet
- projection to BEV or CPM (camera plane map) or spehrical map (azimuth and zenith angles)
- Project cameras to BEV space can help with occlusion and scale variations.
- FMCW lidars can provide speed info. Lidars are less affected by fog, rain than visual cameras, but still is not robust enough than radar.
- Classifying objects in radar is highly challenging due to low resolution.
- MVLD dataset has intention and interaction labeled.
- For sensor fusion, well calibrated sensors with accurate spatial and temporal alignment is the prerequisite for accurate multi-model perception.
- New ideas:
- Learn multi-modal fusion without accurate calibration
- Monocular depth estimation with BEV projection