diff --git a/README.md b/README.md index 671ff95..9e1f29d 100644 --- a/README.md +++ b/README.md @@ -2,60 +2,116 @@ ### [中文简介](./README-zh.md) -#### - WACV 2024 Workshop on Large Language Vision Models for Autonomous Driving (LLVM-AD) +### WACV 2024 Workshop on Large Language Vision Models for Autonomous Driving (LLVM-AD) ![Poster](./figures/poster.png) ## Workshop Introduction -The full name of WACV is the Winter Conference on Applications of Computer Vision. It is one of the renowned conferences in the domain of computer vision applications, held annually. The initiated “Workshop on Large Language and Vision Models for Autonomous Driving (LLVM-AD)” is co-hosted by Tencent Maps, PediaMed AI Lab, University of Illinois at Urbana-Champaign, Purdue University, and University of Virginia. The workshop will discuss various topics from computer vision, pattern recognition, autonomous driving, to high-definition maps. It will contain paper submissions, competitions, and award-sharing sessions during WACV 2024. The aim is to gather professionals from both academia and the industry to explore applications of large language and vision models in autonomous driving and high-definition mapping. As part of the workshop plan, we will release two open-source datasets to promote research on understanding real-world traffic language. While using the datasets is recommended, it is not mandatory for paper submission related to these two datasets. -## MAPLM Dataset -Tencent Maps HD Map T.Lab, in collaboration with the University of Illinois at Urbana-Champaign, Purdue University, and the University of Virginia, have launched the industry's first multimodal language+vision (point cloud BEV+panoramic images) traffic scenario understanding dataset: MAPLM. MAPLM provides abundant road scenario images complemented with multi-level scene description data, aiding models in navigating complex and varied traffic environments. +The Winter Conference on Applications of Computer Vision (WACV) is a renowned conference in the field of computer vision +applications, held annually. This conference will co-host the "Workshop on Large Language and Vision Models for +Autonomous Driving (LLVM-AD)" in collaboration with Tencent Maps, PediaMed AI Lab, University of Illinois at +Urbana-Champaign, Purdue University, and University of Virginia. The workshop will cover various topics including +computer vision, pattern recognition, autonomous driving, and high-definition maps. It will include paper submissions, +competitions, and award-sharing sessions during WACV 2024. The goal is to bring together professionals from academia and +industry to explore the applications of large language and vision models in autonomous driving and high-definition +mapping. -### Scene of MAPLM: -MAPLM offers a variety of traffic scenarios, including highways, expressways, city roads, and rural roads, along with detailed intersection scenes. Each frame of data includes two components: -✧ Point Cloud BEV: A projection image of 3D point cloud viewed from the BEV perspective with clear visuals and high resolution. -✧ Panoramic Images: High-resolution photographs captured from front, left-rear, and right-rear angles by a wide-angle camera. +As part of the workshop, we are releasing two open-source datasets to encourage research on understanding real-world +traffic language. While using these datasets is recommended, it is not mandatory for paper submissions related to these +two datasets. + +## MAPLM Dataset + +Tencent Maps HD Map T.Lab, in collaboration with the University of Illinois at Urbana-Champaign, Purdue University, and +the University of Virginia, has launched MAPLM, the industry's first multimodal language+vision traffic scenario +understanding dataset. MAPLM combines point cloud BEV (Bird's Eye View) and panoramic images to provide a rich +collection of road scenario images. This dataset also includes multi-level scene description data, which helps models +navigate through complex and diverse traffic environments. + +### Scene of MAPLM: + +MAPLM offers a variety of traffic scenarios, including highways, expressways, city roads, and rural roads, along with +detailed intersection scenes. Each frame of data includes two components: +✧ Point Cloud BEV: A projection image of 3D point cloud viewed from the BEV perspective with clear visuals and high +resolution. +✧ Panoramic Images: High-resolution photographs captured from front, left-rear, and right-rear angles by a wide-angle +camera. + +### Annotations: -### Annotations: ✧ Feature-level: Lane lines, ground signs, stop lines, intersection areas, etc. ✧ Lane-level: Lane types, directions of traffic, turn categories, etc. -✧ Road-level: Scene types, road data quality, intersection structures, etc. +✧ Road-level: Scene types, road data quality, intersection structures, etc. -### Data Display: -Point Cloud BEV image + 3 panoramic photos. Note: Panoramic images are 4096*3000 portrait shots. The image below is only a cropped sample. +### Data Display: + +Point Cloud BEV image + 3 panoramic photos. Note: Panoramic images are 4096*3000 portrait shots. The image below is only +a cropped sample. ![Poster](./figures/example1.png) -### Label Display: -The image below illustrates one frame's annotation information, encompassing three parts: road-level information (in red font), lane-level information (yellow geometric lines + orange font), and intersection data (blue polygons + blue font). +### Label Display: + +The image below illustrates one frame's annotation information, encompassing three parts: road-level information (in red +font), lane-level information (yellow geometric lines + orange font), and intersection data (blue polygons + blue font). ![Poster](./figures/example2.png) -## Workshop Tasks and Benefits +## Workshop Challenge -Leveraging the rich road traffic scene information from the above dataset, we have designed a natural language and image combined Q&A task based on ScienceQA. +Leveraging the rich road traffic scene information from the above dataset, we have designed a natural language and image +combined Q&A task based on ScienceQA. -### Task Introduction: -We offer the following data or prior inputs: +### Data + +We offer the following data: ✓ Point Cloud BEV Image: 3D point cloud projection in BEV perspective. ✓ Panoramic Images: Wide-angle camera shots covering front, left-rear, and right-rear angles. -✓ Projection Conversion Parameters: Perspective projection conversion parameters for each frame's photo and point cloud image. -Questions will randomly target various tag dimensions, such as scene type, number and attributes of lanes, presence of intersections, etc. Sample questions are as follows: +✓ Projection Conversion Parameters: Perspective projection conversion parameters for each frame's photo and point cloud +image. +Questions will target various tag dimensions, such as scene type, number and attributes of lanes, presence of +intersections, etc. Sample questions are as follows: + +![Poster](./figures/qa1.png) + +![Poster](./figures/qa2.png) + +### Evaluation + +We will evaluate the performance of models on the test set using the following accuracy metrics: + +- Frame-overall-accuracy `(FRM)`: A frame is considered correct if all closed-choice questions about it are answered + correctly. +- Question-overall-accuracy `(QNS)`: A question is considered correct if its answer is correct. +- Individual-question-accuracy: The accuracy of each specific closed-choice question, including: + - How many lanes in current road? `(LAN)` + - Is there any road cross, intersection or lane change zone in the main road? `(INT)` + - What is the point cloud data quality in current road area of this image? `(QLT)` + - What kind of road scene is it in the images? `(SCN)` + +We can get the accuracy metrics of each question and the overall accuracy with `random guessing` by running: + +```bash +cd tools +python random_chance.py +``` + +Change the random guess to your algorithm's prediction to get the evaluation results of your algorithm. + +**Please submit your results by filling out this [form](https://forms.office.com/r/mapGsGWQNf). This will allow us to +update your results on the leaderboard.** -![Poster](./figures/qa1.png) +### Leaderboard -![Poster](./figures/qa2.png) +| Method | FRM | QNS | LAN | INT | QLT | SCN | +|:-----------------:|:----:|:-----:|:-----:|:-----:|:-----:|:-----:| +| **Random Change** | 0.00 | 19.55 | 21.00 | 16.73 | 25.20 | 15.27 | -### How to Participate: -✓ Vision, multimodal integration, autonomous driving, high-definition mapping, and other relevant domain researchers or practitioners are welcome to join the challenge. -✓ Apart from the workshop competition content, we'll invite guests for topic-related presentations. Interested parties are urged to keep an eye out and join this workshop for paper submissions. -✓ Outstanding students in the workshop paper submissions and competitions may receive priority for internship opportunities at Tencent Maps Perception Machine Learning Engineer / Research Scientist positions. -✓ Competition details are still being prepared. Please keep checking our workshop homepage for updates. +## Citation +If the code, datasets, and research behind this workshop inspire you, please cite our work: -## Citation -If the code, datasets, and research behind this workshop inspire you, please cite our work: ``` @inproceedings{tang2023thma, title={THMA: tencent HD Map AI system for creating HD map annotations}, diff --git a/models/__pycache__/utils.cpython-38.pyc b/models/__pycache__/utils.cpython-38.pyc deleted file mode 100644 index 1bba959..0000000 Binary files a/models/__pycache__/utils.cpython-38.pyc and /dev/null differ diff --git a/models/random_chance.py b/tools/random_chance.py similarity index 91% rename from models/random_chance.py rename to tools/random_chance.py index cb3c406..7905e3c 100644 --- a/models/random_chance.py +++ b/tools/random_chance.py @@ -5,7 +5,7 @@ import random from typing import List -from utils import load_data, get_result_file, new_acc, compute_acc +from utils import load_data, get_result_file, acc_counter, compute_acc parser = argparse.ArgumentParser() parser.add_argument('--data_root', type=str, default='data/maplm_v0.1') @@ -19,14 +19,13 @@ args = parser.parse_args() results = dict( - question_overall=new_acc(), - frame_overall=new_acc(), + question_overall=acc_counter(), + frame_overall=acc_counter(), ) if __name__ == "__main__": print('===== Input Arguments =====') print(json.dumps(vars(args), indent=4, sort_keys=True)) - load_data(args) random.seed(args.random_seed) @@ -49,7 +48,7 @@ random_guess: int = random.randint(0, len(choices) - 1) if question not in results: - results[question] = new_acc() + results[question] = acc_counter() correct = bool(random_guess == true_answer) corrects.append(correct) @@ -66,5 +65,3 @@ acc_dict = compute_acc(results) print(json.dumps(acc_dict, indent=4, sort_keys=True)) print(json.dumps(results, indent=4, sort_keys=True)) - - diff --git a/models/utils.py b/tools/utils.py similarity index 97% rename from models/utils.py rename to tools/utils.py index 8490c3c..8adf435 100644 --- a/models/utils.py +++ b/tools/utils.py @@ -19,7 +19,7 @@ def get_result_file(args): return result_file -def new_acc(): +def acc_counter(): return { 'total': 0, 'correct': 0