Skip to content

Latest commit

 

History

History
40 lines (30 loc) · 4.01 KB

multipath.md

File metadata and controls

40 lines (30 loc) · 4.01 KB

April 2020

tl;dr: Multi-modal behavior prediction via anchor trajectory with 6-second horizon.

Overall impression

Behavior prediction is inherently stochastic as it is impossible to know what the agent may do next. Most previous method, including Fast and furious, IntentNet and ChauffeurNet only predict MAP trajectory. Rules of the Road predicts multiple future trajectories but through a set of unweighted samples. Sample-based generative methods have drawbacks: non-deterministic, hard to estimate errors, no way to perform probabilistic inference (e.g. to know the probability of collision in a space-time region). Also sample based approaches requires repeated inference to obtain multi-modal prediction.

Anchor trajectories are obtained by grouping logged trajectories (modes) in collected data, and provide templates for coarse granularity features for an agent. This idea brilliantly solved the exchangeability issue in multiple future prediction, as detailed in Rules of the Road.

MultiPath also used the semantic map representation used in previous methods such as IntentNet and ChauffeurNet and Rules of the Road.

IntentNet also predicts intention. But they mainly focus on an MAP trajectory. IntentNet only predict one set of trajectories and make it unsuitable for multiple future path prediction. This can be changed to predict multiple path, each per intent, and then during inference we can sample K most likely trajectory each associated with the top intent. The discrete intent prediction roughly corresponds to the discrete anchors in MultiPath, but anchor design is more data driven and flexible.

The paper is extended to Multipath++ and achieves SOTA in Waymo open motion dataset (WOMD) in late 2021.

Key ideas

  • The overall architecture is faster RCNN like
    • Agent centric network that can be applied to each agent uniformly. The architecture is Faster-RCNN like, with RoIAlign layer to extract agent specific features.
    • Anchor classification (intent prediction) and waypoint offset prediction
    • Loss only cares about waypoints. Not too much about size and heading of the agent.
  • Mixture Density Networks (MDN) with log likelihood loss, similar to Gaussian YOLOv3. This formulation helps with multiple future prediction.
  • K x T x 5 predictions. K anchors, T waypoints per trajectory (time steps) and 5 predictions per waypoint (x, y, std x, std y, p). --> Maybe K x (T x 5 + 1) as only one softmax logits per anchor?

Technical details

  • Evaluation:
    • Log likelihood (LL) of the path given the image. Product of likelihood of all waypoints.
    • Distance based metric (most of them are in L2 norm)
      • ADE: average displacement error
      • FDE: final displacement error
      • minADE: ADE of the closest trajectory to GT out of M trajectories, so that reasonable predictions that do not happen to be the GT do not get penalized
      • minSDE: squared version of minADE
  • nat: or nit, is a bit equivalent with e as base. It is the unit for entropy.
  • Stratified sampling to tackle imbalanced data

Notes