Monocular Velocity: Camera-based vehicle velocity estimation from monocular video

July 2020

tl;dr: Relative velocity estimation from a sequence of monocular images, taken with a moving camera.

Overall impression

This is the winning entry to the monocular velocity estimation challenge. Lightweight trajectory based features (list of bbox location) are good enough. Better than full solution with depth and optical flow features.

The SOTA error is around 1.12 m/s, as compared to the GT error of 0.71 m/s.

Key ideas

TuSimple dataset:
- 20 fps, 40 frames long
- distance ranging from 5 to 90 meters
- bbox annotated on the last frame
Input: two stacked images
Off the shelf tools:
- Tracking with openCV lib (Median Flow + MIL).
- Depth with Monodepth
- Optical flow with FlowNet2.
Feature extraction:
- Spatial: shrink the bbox by 10% then calculate the mean
- Temporal: Gaussian smoothed with width=5.
Location and velocity prediction:
- MLP: 4-layer
- Split into 3 distance bins based on bbox size (20m, 45m as separator)
3 Models in different distance bins is better than one combined model

Technical details

Both depth and optical flow degrades beyond near range (> 20m). It can achieve the best performance in near range (< 20m)
Joint training with location leads to slightly better performance due to small dataset size
Ensemble of 5 models for each of the 3 distance bins (from 5-fold cross validation)

Notes

Presentation Slides

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

velocity_net.md

velocity_net.md

Monocular Velocity: Camera-based vehicle velocity estimation from monocular video

Overall impression

Key ideas

Technical details

Notes

Files

velocity_net.md

Latest commit

History

velocity_net.md

File metadata and controls

Monocular Velocity: Camera-based vehicle velocity estimation from monocular video

Overall impression

Key ideas

Technical details

Notes