Update

LLVM-AD · Nov 20, 2023 · b4a8897 · b4a8897
1 parent 6b922b3
commit b4a8897
Show file tree

Hide file tree

Showing 5 changed files with 20,761 additions and 40 deletions.
diff --git a/README.md b/README.md
@@ -1,50 +1,50 @@
-# MAPLM: A Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding     
+# MAPLM: A Real-World Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding         
 
 Tencent, University of Illinois at Urbana-Champaign, Purdue University, University of Virginia    
 
-[LLVM-AD Workshop & Challenges Website](https://llvm-ad.github.io/) | [Dataset Download](https://drive.google.com/drive/folders/1cqFjBH8MLeP6nKFM0l7oV-Srfke-Mx1R?usp=sharing)
+[Affiliated LLVM-AD Workshop & Challenges Website](https://llvm-ad.github.io/) | [QA Dataset Download](https://drive.google.com/drive/folders/1cqFjBH8MLeP6nKFM0l7oV-Srfke-Mx1R?usp=sharing)
 
-### Offical open-source datasets of 1st Workshop on Large Language Vision Models for Autonomous Driving (LLVM-AD) in WACV 2024     
+### Official open-source datasets of 1st Workshop on Large Language Vision Models for Autonomous Driving (LLVM-AD) in WACV 2024      
 
 ## MAPLM Dataset    
 
-Tencent Maps HD Map T.Lab, in collaboration with the University of Illinois at Urbana-Champaign, Purdue University, and the University of Virginia, has launched MAPLM, the industry's first multimodal language+vision traffic scenario understanding dataset. MAPLM combines point cloud BEV (Bird's Eye View) and panoramic images to provide a rich collection of road scenario images. This dataset also includes multi-level scene description data, which helps models navigate through complex and diverse traffic environments.      
+Tencent Maps HD Map T Lab, in collaboration with the University of Illinois at Urbana-Champaign, Purdue University, and the University of Virginia, has launched MAPLM, the industry's first multimodal language+vision traffic scenario understanding dataset. MAPLM combines point cloud BEV (Bird's Eye View) and panoramic images to provide a rich collection of road scenario images. This dataset also includes multi-level scene description data, which helps models navigate through complex and diverse traffic environments.      
 
 ### Scene of MAPLM：
 
 MAPLM offers a variety of traffic scenarios, including highways, expressways, city roads, and rural roads, along with detailed intersection scenes. Each frame of data includes two components:        
-✧ Point Cloud BEV: A projection image of 3D point cloud viewed from the BEV perspective with clear visuals and high resolution.     
+ - Point Cloud BEV: A projection image of 3D point cloud viewed from the BEV perspective with clear visuals and high resolution.     
 
-✧ Panoramic Images: High-resolution photographs captured from front, left-rear, and right-rear angles by a wide-angle camera.      
+ - Panoramic Images: High-resolution photographs captured from front, left-rear, and right-rear angles by a wide-angle camera.      
 
 ### Annotations：    
 
-✧ Feature-level: Lane lines, ground signs, stop lines, intersection areas, etc.          
-✧ Lane-level: Lane types, directions of traffic, turn categories, etc.        
-✧ Road-level: Scene types, road data quality, intersection structures, etc.     
+Feature-level: Lane lines, ground signs, stop lines, intersection areas, etc.          
+Lane-level: Lane types, directions of traffic, turn categories, etc.        
+Road-level: Scene types, road data quality, intersection structures, etc.     
 
 ### Data Display：    
 
-3D Point Cloud, Extracted Point Cloud BEV image + multiple panoramic photos + HD Map annotations. Note: Panoramic images are 4096*3000 portrait shots. The image below is only a cropped sample.            
+Bird's-Eye View image from LiDAR 3D Point Clouds + multiple panoramic photos + HD Map annotations. Note: Panoramic images are 4096*3000 portrait shots. The image below is only a cropped sample.        
 
 ![Poster](./figures/example1.png)     
 
 ### Label Display：    
 
-The image below illustrates one frame's annotation information, encompassing three parts: road-level information (in red font), lane-level information (yellow geometric lines + orange font), and intersection data (blue polygons + blue font).       
+The image below illustrates one frame's HD map annotation information, encompassing three parts: road-level information (in red font), lane-level information (yellow geometric lines + orange font), and intersection data (blue polygons + blue font).       
 
 <!-- ![Poster](./figures/example2.png) -->
 
-## MAPLM Challenge     
+## MAPLM-QA Challenge     
 
 Leveraging the rich road traffic scene information from the above dataset, we have designed a natural language and image combined Q&A task.     
 
 ### Data      
 
-We offer the following data:       
-✓ Point Cloud BEV Image: 3D point cloud projection in BEV perspective.      
-✓ Panoramic Images: Wide-angle camera shots covering front, left-rear, and right-rear angles.      
-✓ Projection Conversion Parameters: Perspective projection conversion parameters for each frame's photo and 3D point cloud.       
+We offer the following data in the first MAPLM-QA Challenge in WACV 2024:       
+ - Bird's-Eye View Image: LiDAR 3D point cloud projection in BEV perspective.      
+ - Panoramic Images: Wide-angle camera shots covering front, left-rear, and right-rear angles.      
+ - Projection Conversion Parameters: Perspective projection conversion parameters for each frame's photo and LiDAR 3D point clouds.       
 
 Questions will target various tag dimensions, such as scene type, number and attributes of lanes, presence of intersections, etc. Sample questions are as follows:      
 
@@ -75,12 +75,12 @@ Change the random guess to your algorithm's prediction to get the evaluation res
 
 **Please submit your results by filling out this [form](https://forms.office.com/r/mapGsGWQNf). This will allow us to update your results on the leaderboard.**      
 
-### Leaderboard     
+### Baseline     
 
 |      Method       | FRM  |  QNS  |  LAN  |  INT  |  QLT  |  SCN  |     
 |:-----------------:|:----:|:-----:|:-----:|:-----:|:-----:|:-----:|     
 | **Random Chance** | 0.00 | 19.55 | 21.00 | 16.73 | 25.20 | 15.27 |     
-| **LLaVA** | 30.00 | 73.12 | 57.46 | 62.46 | 81.60 | 90.94 |     
+| **Bseline** | 49.07 | 81.65 | 72.33 | 78.67 | 82.07 | 93.53 |     
 
 
 ### Data Release Timeline      
@@ -95,14 +95,23 @@ unzip combine.zip
 `01/2024` HD Map data and image caption, including 2M of 3D Point Cloud, Extracted Point Cloud BEV image + multiple panoramic images + HD Map annotations.      
 
 
-## Purdue UCU Dataset    
+## Purdue University UCU Dataset    
 
 [See this page](https://github.com/LLVM-AD/ucu-dataset)
 
 ## Citation    
 
 If the code, datasets, and research behind this workshop inspire you, please cite our work:    
 
+```
+@misc{tencent2023maplm,
+  title={MAPLM: A Real-World Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding},
+  author={Cao, Xu and Zhou, Tong and Ma, Yunsheng and Ye, Wenqian and Cui, Can and Tang, Kun and Cao, Zhipeng and Liang, Kaizhao and Wang, Ziran and Rehg, James and Zheng, Chao},
+  howpublished={\url{https://github.com/LLVM-AD/MAPLM}},
+  year={2023},
+}
+```
+
 ```
 @inproceedings{tang2023thma,
   title={THMA: tencent HD Map AI system for creating HD map annotations},