May 2020
tl;dr: Egocentric/first person vehicle prediction.
The paper introduced HEVI (Honda egocentric view intersection) dataset.
First-person video or egocentric data are easier to collect, and also captures rich information about the objects performance.
However the front camera has a narrow FOV and tracklets are usually short. The paper selects tracklets that are 2 seconds long. Use 1 sec history and predict 1 second future.
The inclusion of dense optical flow improves results hugely. Incorporation of future ego motion is also important in reducing prediction error. Note that the future ego motion is fed as GT. During inference the system assumes future motion are from motion planning.
- Summaries of the key ideas
- Motion planning is represented in BEV, with 2 DoF translation and one DoF of rotation (yaw).
- HEVI classifies tracklets as easy and hard. Easy can be predicted with a constant acceleration model with lower than average error.
- This is quite similar to Nvidia's demo (see also blog here).