Add MPDM

patrick-llgc · Jun 19, 2024 · 6f016cf · 6f016cf
1 parent 5ad3329
commit 6f016cf
Show file tree

Hide file tree

Showing 4 changed files with 44 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -34,13 +34,13 @@ I regularly update [my blog in Toward Data Science](https://medium.com/@patrickl
 - [Multimodal Regression](https://towardsdatascience.com/anchors-and-multi-bin-loss-for-multi-modal-target-regression-647ea1974617)
 - [Paper Reading in 2019](https://towardsdatascience.com/the-200-deep-learning-papers-i-read-in-2019-7fb7034f05f7?source=friends_link&sk=7628c5be39f876b2c05e43c13d0b48a3)
 
-## 2024-06 (0)
+## 2024-06 (7)
 - [LINGO-1: Exploring Natural Language for Autonomous Driving](https://wayve.ai/thinking/lingo-natural-language-autonomous-driving/) [[Notes](paper_notes/lingo1.md)] [Wayve, open-loop world model]
 - [LINGO-2: Driving with Natural Language](https://wayve.ai/thinking/lingo-2-driving-with-language/) [[Notes](paper_notes/lingo2.md)] [Wayve, closed-loop world model]
 - [OpenVLA: An Open-Source Vision-Language-Action Model](https://arxiv.org/abs/2406.09246) [open source RT-2]
 - [Parting with Misconceptions about Learning-based Vehicle Motion Planning](https://arxiv.org/abs/2306.07962) <kbd>CoRL 2023</kbd> [Simple non-learning based baseline]
 - [QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving](https://arxiv.org/abs/2404.01486) [Waabi]
-- [MPDM: Multipolicy decision-making in dynamic, uncertain environments for autonomous driving](https://ieeexplore.ieee.org/document/7139412) <kbd>ICRA 2015</kbd> [Behavior planning]
+- [MPDM: Multipolicy decision-making in dynamic, uncertain environments for autonomous driving](https://ieeexplore.ieee.org/document/7139412) <kbd>ICRA 2015</kbd> [Behavior planning, UMich, May Autonomy]
 - [MPDM2: Multipolicy Decision-Making for Autonomous Driving via Changepoint-based Behavior Prediction](https://www.roboticsproceedings.org/rss11/p43.pdf) <kbd>RSS 2015</kbd> [Behavior planning]
 - [MPDM3: Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment](https://link.springer.com/article/10.1007/s10514-017-9619-z) <kbd>RSS 2017</kbd> [Behavior planning]
 - [EUDM: Efficient Uncertainty-aware Decision-making for Automated Driving Using Guided Branching](https://arxiv.org/abs/2003.02746) [[Notes](paper_notes/eudm.md)] <kbd>ICRA 2020</kbd> [Wenchao Ding, Shaojie Shen, Behavior planning]

diff --git a/paper_notes/eudm.md b/paper_notes/eudm.md
@@ -11,16 +11,18 @@ In order to make POMDP more tractable it is essential to incorporate domain know
 
 In EUDM, ego behavior is allowed to change, allowing more flexible decision making than MPDM. This allows EUDM can make a lane-change decision even before passing the blocking vehicle (accelerate, then lane change).
 
+EUDM does guided branching in both action (of ego) and intention (of others).
+
 EUDM couples prediction and planning module. 
 
 It is further improved by [MARC](marc.md) where it considers risk-aware contingency planning.
 
 #### Key ideas
-- DCP-Tree (domain specific closed-loop policy tree)
+- DCP-Tree (domain specific closed-loop policy tree), ego-centric
 	- Guided branching in action space
 	- Each trace only contains ONE change of action (more flexible than MPDM but still manageable).
 	- Each semantic action is 2s, 4 levels deep, so planning horizon of 8s.
-- CFB (conditional focused branching)
+- CFB (conditional focused branching), for other agents
 	- conditioned on ego intention
 	- Pick out the potentially risky scenarios using **open loop** safety assement. (Open loop ignores interaction among agents, and allows checking of how serious the situation wil be if surrounding agents are completely uncoorpoerates and does not react to other agents.)
 	- select key vehicles first, only a subset of all vehicles. --> Like Tesla's AI day 2022.

diff --git a/paper_notes/marc.md b/paper_notes/marc.md
@@ -27,6 +27,7 @@ POMDP provides a theoretically sounds framework to handle dynamic interaction, b
 - Trajectory tree generation with RCP
 	- RCP (risk-aware contingency planning) considers tradeoff beween conservativeness and efficiency.
 	- RCP generates trajectories that are optimal in multiple future scenarios under user-defined risk-averse levels. --> This can mimic human preference.
+	- Risk tolerance levels of the users is controlled by a hyperpraram alpha.
 - Evalution
 	- Selection based on both policy tree and trajectory tree (new!), ensuring consistency of policies
 - MARC are more robust under uncertain interactions and fewer unexpected policy switches

diff --git a/paper_notes/mpdm.md b/paper_notes/mpdm.md
@@ -0,0 +1,37 @@
+# [MPDM: Multipolicy decision-making in dynamic, uncertain environments for autonomous driving](https://ieeexplore.ieee.org/document/7139412)
+
+_June 2024_
+
+tl;dr: A principled decision making framework to account for interaction with other agents. 
+
+#### Overall impression
+MPDM starts with a regorous formulation and makes assumptions with domain knowledge in autonomous driving to simplify the problem to be tractable for online deployment. 
+
+MPDM forward simulates multiple scenarios initiated by different policy (high level intention or behavior pattern) of ego and how other vehicles would react. This brings two advantages of MPDM.
+
+* MPDM can enable **personalized driving experience** we can evaluate multipe outcome using user defined cost function to accomodate different driving preferences. --> This is extended to include risk tolerance in [MARC](marc.md).
+
+* MPDM can enable intelligent and human-like behavior of **active cut-in** into dense traffic flow even when there is not a sufficient gap present(华山挤出一条路). This is NOT possible with a predict-then-plan schema without considering the interaction explicitly.
+
+Despite simple design, MPDM is a pioneering work in decision making, and improved by subsequent works. MPDM has the assumption that the ego intention does not change within the planning horizon (10s, at 0.25s). This is improved by [EUDM](eudm.md) which allows change of ego policy within planning horizon once, and [MARC](marc.md) which introduces risk aware contigency planning.
+
+#### Key ideas
+- Assumptions
+	- Much of decision making made by human drivers is over discrete action. --> This is largely true, but the discreteness may get blurry when in dense urban areas.
+	- Other vehicles will make reasonable safe decisions. 
+- MPDM models vehicle behavior as closed-loop policy for ego AND nearby vehicles.
+- Approximation
+	- Ideally we want to sample high likelihood senarios on which to make decisions, and to focus sampling on more likely outcomes
+	- Choose policies from a finite fixed set for ego and other agents.
+	- Approximate interaction with deterministic closed-loop simulation. Given a sampled policy and the driver model, the behaviors of other agents are deterministic.
+	- The decoupling of vehicle behavior as the instantaneous behavior is independent of each other.
+	- The formulation is highly inspiring and is the foundation of [EPSILON](epsilon.md) and all follow-up works.
+	- The horizon is 10s with 0.25s timesteps, so a 40-layer deep tree. 
+- MPDM How important is the closed-loop realism? The paper seems to argue that the inaccuracy in closed-loop simulation does not affect final algorithm performance that much. Close-loop or not is the key.
+
+#### Technical details
+- Summary of technical details, such as important training details, or bugs of previous benchmarks.
+
+#### Notes
+- Questions and notes on how to improve/revise the current work
+