March 2024
tl;dr: A fully data-driven way to SOLVE autonomous driving.
Primary hurdles to overcome for fully autonomous driving are:
- Technical scalability
- safety critical engineering efforts
- unit economics (BOM cost and util price)
- regulation.
Out of the four, technical scalability is the key, or the ability of the decision-making SW to generalize to new situation quickly with sufficient perf for deployment.
Previous efforts (AV 1.0) focus on solving specialized general intelligence by combining components of even narrower intelligence.
- AV1.0: sense-plan-act
- Sensing:
- not a limiting factor, especially for urban driving, the crown jewel of AD.
- Scene representation
- Localization and mapping: simplifies online decision making by shifting some info offline to a map with clean data. Bad: maintainance cost, and nothing is truly stationary.
- Perception: limitation is the hand-crafted representation itself. Do we have the necessary info for the decision?
- Behavior prediction: sensitive to upstream errors, and ha a dependency with planning. This means an isolated prediction system will ALWAYS have some representation error.
- Planning
- Behavior planning: hard to separate from behavior prediction, and the highly engineered expert system is known for being brittle. --> Upgrading expert system of Go gameplaying to AlphaGo.
- Motion planning: works OK with limited challenges.
- Control: works OK with limited challenges.
- Sensing:
- From AV1.0 to AV2.0
- Bottleneck of AV1.0 is in prediction and planning. Solving behavior prediction and planning as defined by these boundaries will enable self-driving.
- Holistic learned driver. The driving policy can be thought of learning to estimate the motion the vehicle should conduct given some conditioning goals.
- Difficult to apply increasing amount of learning to AV1.0 where the handcrafted interface limit the effectiveness of data.
- AV2.0 architecture: joint sensing and planning, end-to-end.
- Framing the problem as one that can be solved by data.
- Curate a data source, at sufficient scale and diversity.
- Build a data engine. Train and iterate effectively.
- Build testing infra to validate the system, virtually and in reality.
- Iterate. Through data and algo.
- HMI and interpretability
- The representation can be decoded for human interpretation. --> This may be the future of perception, merely as a friendly and reassuring HMI.
- Algo of AV2.0
- Learn a multimodal foundation model or world model
- Model prediction and planning jointly, and which is also conditioned on action.
- Finetune a model-based policy
- Need effective off-policy learning and eval (?) to compare driving decisions.
- Learn a multimodal foundation model or world model
- Summary of technical details, such as important training details, or bugs of previous benchmarks.
- Questions and notes on how to improve/revise the current work