Update

LLVM-AD · Oct 22, 2023 · 15f93f2 · 15f93f2
1 parent d7bfd21
commit 15f93f2
Show file tree

Hide file tree

Showing 9 changed files with 75,948 additions and 132 deletions.
diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2023 LLVM-AD
+Copyright (c) 2023
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

diff --git a/README-zh.md b/README-zh.md
diff --git a/README.md b/README.md
@@ -4,113 +4,110 @@ Tencent, University of Illinois at Urbana-Champaign, Purdue University, Universi
 
 [Workshop Website](https://llvm-ad.github.io/) | [Dataset Download](https://drive.google.com/drive/folders/1cqFjBH8MLeP6nKFM0l7oV-Srfke-Mx1R?usp=sharing)
 
-### [Chinese Introduction](./README-zh.md)
+### Open-source datasets of 1st Workshop on Large Language Vision Models for Autonomous Driving (LLVM-AD) in WACV 2024     
 
-### Open-source datasets of 1st Workshop on Large Language Vision Models for Autonomous Driving (LLVM-AD) in WACV 2024
+## LLVM-AD Workshop Introduction     
 
-## LLVM-AD Workshop Introduction
-
-The Winter Conference on Applications of Computer Vision (WACV) is a renowned conference in the field of computer vision applications, held annually. This conference will co-host the "Workshop on Large Language and Vision Models for Autonomous Driving (LLVM-AD)" in collaboration with Tencent Maps, PediaMed AI Lab, University of Illinois at Urbana-Champaign, Purdue University, and University of Virginia. The workshop will cover various topics including computer vision, pattern recognition, autonomous driving, and high-definition maps. It will include paper submissions, competitions, and award-sharing sessions during WACV 2024. The goal is to bring together professionals from academia and industry to explore the applications of large language and vision models in autonomous driving and high-definition mapping.     
+The Winter Conference on Applications of Computer Vision (WACV) is a renowned conference in the field of computer vision applications, held annually. This conference will co-host the "Workshop on Large Language and Vision Models for Autonomous Driving (LLVM-AD)" in collaboration with Tencent Maps, PediaMed AI Lab, University of Illinois at Urbana-Champaign, Purdue University, and University of Virginia. The workshop will cover various topics including computer vision, pattern recognition, autonomous driving, and high-definition maps. It will include paper submissions, competitions, and award-sharing sessions during WACV 2024. The goal is to bring together professionals from academia and industry to explore the applications of large language and vision models in autonomous driving and high-definition mapping.       
 
 As part of the workshop, we are releasing two open-source datasets to encourage research on understanding real-world traffic language. While using these datasets is recommended, it is not mandatory for paper submissions related to these two datasets.     
 
 ## MAPLM Dataset
 
-Tencent Maps HD Map T.Lab, in collaboration with the University of Illinois at Urbana-Champaign, Purdue University, and the University of Virginia, has launched MAPLM, the industry's first multimodal language+vision traffic scenario understanding dataset. MAPLM combines point cloud BEV (Bird's Eye View) and panoramic images to provide a rich collection of road scenario images. This dataset also includes multi-level scene description data, which helps models navigate through complex and diverse traffic environments.     
+Tencent Maps HD Map T.Lab, in collaboration with the University of Illinois at Urbana-Champaign, Purdue University, and the University of Virginia, has launched MAPLM, the industry's first multimodal language+vision traffic scenario understanding dataset. MAPLM combines point cloud BEV (Bird's Eye View) and panoramic images to provide a rich collection of road scenario images. This dataset also includes multi-level scene description data, which helps models navigate through complex and diverse traffic environments.      
 
 ### Scene of MAPLM：
 
-MAPLM offers a variety of traffic scenarios, including highways, expressways, city roads, and rural roads, along with detailed intersection scenes. Each frame of data includes two components:                   
+MAPLM offers a variety of traffic scenarios, including highways, expressways, city roads, and rural roads, along with detailed intersection scenes. Each frame of data includes two components:        
 ✧ Point Cloud BEV: A projection image of 3D point cloud viewed from the BEV perspective with clear visuals and high resolution.     
 
 ✧ Panoramic Images: High-resolution photographs captured from front, left-rear, and right-rear angles by a wide-angle camera.      
 
-### Annotations：
+### Annotations：    
 
 ✧ Feature-level: Lane lines, ground signs, stop lines, intersection areas, etc.          
 ✧ Lane-level: Lane types, directions of traffic, turn categories, etc.        
 ✧ Road-level: Scene types, road data quality, intersection structures, etc.     
 
 ### Data Display：    
 
-3D Point Cloud, Extracted Point Cloud BEV image + multiple panoramic photos + HD Map annotations. Note: Panoramic images are 4096*3000 portrait shots. The image below is only a cropped sample.           
+3D Point Cloud, Extracted Point Cloud BEV image + multiple panoramic photos + HD Map annotations. Note: Panoramic images are 4096*3000 portrait shots. The image below is only a cropped sample.            
 
-![Poster](./figures/example1.png)
+![Poster](./figures/example1.png)     
 
-### Label Display：
+### Label Display：    
 
-The image below illustrates one frame's annotation information, encompassing three parts: road-level information (in red font), lane-level information (yellow geometric lines + orange font), and intersection data (blue polygons + blue font).     
+The image below illustrates one frame's annotation information, encompassing three parts: road-level information (in red font), lane-level information (yellow geometric lines + orange font), and intersection data (blue polygons + blue font).       
 
 <!-- ![Poster](./figures/example2.png) -->
 
-## Workshop Challenge
+## Workshop Challenge     
 
 Leveraging the rich road traffic scene information from the above dataset, we have designed a natural language and image combined Q&A task.     
 
-### Data    
+### Data      
 
 We offer the following data:       
-✓ Point Cloud BEV Image: 3D point cloud projection in BEV perspective.    
-✓ Panoramic Images: Wide-angle camera shots covering front, left-rear, and right-rear angles.    
-✓ Projection Conversion Parameters: Perspective projection conversion parameters for each frame's photo and 3D point cloud.      
+✓ Point Cloud BEV Image: 3D point cloud projection in BEV perspective.      
+✓ Panoramic Images: Wide-angle camera shots covering front, left-rear, and right-rear angles.      
+✓ Projection Conversion Parameters: Perspective projection conversion parameters for each frame's photo and 3D point cloud.       
 
 Questions will target various tag dimensions, such as scene type, number and attributes of lanes, presence of intersections, etc. Sample questions are as follows:     
 
 ![Poster](./figures/qa1.png)     
 
 ![Poster](./figures/qa2.png)     
 
-### Evaluation      
+### Evaluation       
 
-We will evaluate the performance of models on the test set using the following accuracy metrics:
+We will evaluate the performance of models on the test set using the following accuracy metrics:      
 
-- Frame-overall-accuracy `(FRM)`: A frame is considered correct if all closed-choice questions about it are answered
-  correctly.
-- Question-overall-accuracy `(QNS)`: A question is considered correct if its answer is correct.
-- Individual-question-accuracy: The accuracy of each specific closed-choice question, including:
-    - How many lanes in current road? `(LAN)`
-    - Is there any road cross, intersection or lane change zone in the main road? `(INT)`
-    - What is the point cloud data quality in current road area of this image? `(QLT)`
-    - What kind of road scene is it in the images? `(SCN)`
+- Frame-overall-accuracy `(FRM)`: A frame is considered correct if all closed-choice questions about it are answered correctly.     
+- Question-overall-accuracy `(QNS)`: A question is considered correct if its answer is correct.     
+- Individual-question-accuracy: The accuracy of each specific closed-choice question, including:    
+    - How many lanes in current road? `(LAN)`    
+    - Is there any road cross, intersection or lane change zone in the main road? `(INT)`     
+    - What is the point cloud data quality in current road area of this image? `(QLT)`     
+    - What kind of road scene is it in the images? `(SCN)`     
 
-We can get the accuracy metrics of each question and the overall accuracy with `random guessing` by running:
+We can get the accuracy metrics of each question and the overall accuracy with `random guessing` by running:      
 
-```bash
+```bash     
 cd tools
 python random_chance.py
-```
+```    
 
-Change the random guess to your algorithm's prediction to get the evaluation results of your algorithm.
+Change the random guess to your algorithm's prediction to get the evaluation results of your algorithm.     
 
-**Please submit your results by filling out this [form](https://forms.office.com/r/mapGsGWQNf). This will allow us to
-update your results on the leaderboard.**
+**Please submit your results by filling out this [form](https://forms.office.com/r/mapGsGWQNf). This will allow us to update your results on the leaderboard.**      
 
-### Leaderboard
+### Leaderboard     
 
-|      Method       | FRM  |  QNS  |  LAN  |  INT  |  QLT  |  SCN  |
-|:-----------------:|:----:|:-----:|:-----:|:-----:|:-----:|:-----:|
-| **Random Chance** | 0.00 | 19.55 | 21.00 | 16.73 | 25.20 | 15.27 |
+|      Method       | FRM  |  QNS  |  LAN  |  INT  |  QLT  |  SCN  |     
+|:-----------------:|:----:|:-----:|:-----:|:-----:|:-----:|:-----:|     
+| **Random Chance** | 0.00 | 19.55 | 21.00 | 16.73 | 25.20 | 15.27 |     
+| **LLaVA** | 30.00 | 73.12 | 57.46 | 62.46 | 81.60 | 90.94 |     
 
 
-### Data Release Timeline     
+### Data Release Timeline      
 
 `09/2023` First part of QA data, including extracted Point Cloud BEV image + 3 panoramic images: [Link](https://drive.google.com/drive/folders/1cqFjBH8MLeP6nKFM0l7oV-Srfke-Mx1R?usp=sharing)     
-Data Download: Put the maplm_v0.1.z01, maplm_v0.1.z02, maplm_v0.1.z03, maplm_v0.1.zip into one directory then run the following command to unzip the dataset.      
+Data Download: Put the maplm_v0.1.z01, maplm_v0.1.z02, maplm_v0.1.z03, maplm_v0.1.zip into one directory then run the following command to unzip the dataset.       
 ```
 zip -s 0  maplm_v0.1.zip --out combine.zip
-unzip combine.zip   
+unzip combine.zip    
 ```
 
 `01/2024` HD Map data and image caption, including 2M of 3D Point Cloud, Extracted Point Cloud BEV image + multiple panoramic images + HD Map annotations.      
 
 
-## UCU Dataset
+## UCU Dataset    
 
 [See this page](https://github.com/LLVM-AD/ucu-dataset)
 
-## Citation
+## Citation    
 
-If the code, datasets, and research behind this workshop inspire you, please cite our work:
+If the code, datasets, and research behind this workshop inspire you, please cite our work:    
 
 ```
 @inproceedings{tang2023thma,