Skip to content

Commit

Permalink
update readme and leaderboard.
Browse files Browse the repository at this point in the history
  • Loading branch information
maysonma committed Sep 28, 2023
1 parent 29192af commit 29f2c85
Show file tree
Hide file tree
Showing 4 changed files with 90 additions and 37 deletions.
114 changes: 85 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,60 +2,116 @@

### [中文简介](./README-zh.md)

#### - WACV 2024 Workshop on Large Language Vision Models for Autonomous Driving (LLVM-AD)
### WACV 2024 Workshop on Large Language Vision Models for Autonomous Driving (LLVM-AD)

![Poster](./figures/poster.png)

## Workshop Introduction
The full name of WACV is the Winter Conference on Applications of Computer Vision. It is one of the renowned conferences in the domain of computer vision applications, held annually. The initiated “Workshop on Large Language and Vision Models for Autonomous Driving (LLVM-AD)” is co-hosted by Tencent Maps, PediaMed AI Lab, University of Illinois at Urbana-Champaign, Purdue University, and University of Virginia. The workshop will discuss various topics from computer vision, pattern recognition, autonomous driving, to high-definition maps. It will contain paper submissions, competitions, and award-sharing sessions during WACV 2024. The aim is to gather professionals from both academia and the industry to explore applications of large language and vision models in autonomous driving and high-definition mapping. As part of the workshop plan, we will release two open-source datasets to promote research on understanding real-world traffic language. While using the datasets is recommended, it is not mandatory for paper submission related to these two datasets.

## MAPLM Dataset
Tencent Maps HD Map T.Lab, in collaboration with the University of Illinois at Urbana-Champaign, Purdue University, and the University of Virginia, have launched the industry's first multimodal language+vision (point cloud BEV+panoramic images) traffic scenario understanding dataset: MAPLM. MAPLM provides abundant road scenario images complemented with multi-level scene description data, aiding models in navigating complex and varied traffic environments.
The Winter Conference on Applications of Computer Vision (WACV) is a renowned conference in the field of computer vision
applications, held annually. This conference will co-host the "Workshop on Large Language and Vision Models for
Autonomous Driving (LLVM-AD)" in collaboration with Tencent Maps, PediaMed AI Lab, University of Illinois at
Urbana-Champaign, Purdue University, and University of Virginia. The workshop will cover various topics including
computer vision, pattern recognition, autonomous driving, and high-definition maps. It will include paper submissions,
competitions, and award-sharing sessions during WACV 2024. The goal is to bring together professionals from academia and
industry to explore the applications of large language and vision models in autonomous driving and high-definition
mapping.

### Scene of MAPLM:
MAPLM offers a variety of traffic scenarios, including highways, expressways, city roads, and rural roads, along with detailed intersection scenes. Each frame of data includes two components:
✧ Point Cloud BEV: A projection image of 3D point cloud viewed from the BEV perspective with clear visuals and high resolution.
✧ Panoramic Images: High-resolution photographs captured from front, left-rear, and right-rear angles by a wide-angle camera.
As part of the workshop, we are releasing two open-source datasets to encourage research on understanding real-world
traffic language. While using these datasets is recommended, it is not mandatory for paper submissions related to these
two datasets.

## MAPLM Dataset

Tencent Maps HD Map T.Lab, in collaboration with the University of Illinois at Urbana-Champaign, Purdue University, and
the University of Virginia, has launched MAPLM, the industry's first multimodal language+vision traffic scenario
understanding dataset. MAPLM combines point cloud BEV (Bird's Eye View) and panoramic images to provide a rich
collection of road scenario images. This dataset also includes multi-level scene description data, which helps models
navigate through complex and diverse traffic environments.

### Scene of MAPLM:

MAPLM offers a variety of traffic scenarios, including highways, expressways, city roads, and rural roads, along with
detailed intersection scenes. Each frame of data includes two components:
✧ Point Cloud BEV: A projection image of 3D point cloud viewed from the BEV perspective with clear visuals and high
resolution.
✧ Panoramic Images: High-resolution photographs captured from front, left-rear, and right-rear angles by a wide-angle
camera.

### Annotations:

### Annotations:
✧ Feature-level: Lane lines, ground signs, stop lines, intersection areas, etc.
✧ Lane-level: Lane types, directions of traffic, turn categories, etc.
✧ Road-level: Scene types, road data quality, intersection structures, etc.
✧ Road-level: Scene types, road data quality, intersection structures, etc.

### Data Display:
Point Cloud BEV image + 3 panoramic photos. Note: Panoramic images are 4096*3000 portrait shots. The image below is only a cropped sample.
### Data Display:

Point Cloud BEV image + 3 panoramic photos. Note: Panoramic images are 4096*3000 portrait shots. The image below is only
a cropped sample.

![Poster](./figures/example1.png)

### Label Display:
The image below illustrates one frame's annotation information, encompassing three parts: road-level information (in red font), lane-level information (yellow geometric lines + orange font), and intersection data (blue polygons + blue font).
### Label Display:

The image below illustrates one frame's annotation information, encompassing three parts: road-level information (in red
font), lane-level information (yellow geometric lines + orange font), and intersection data (blue polygons + blue font).

![Poster](./figures/example2.png)

## Workshop Tasks and Benefits
## Workshop Challenge

Leveraging the rich road traffic scene information from the above dataset, we have designed a natural language and image combined Q&A task based on ScienceQA.
Leveraging the rich road traffic scene information from the above dataset, we have designed a natural language and image
combined Q&A task based on ScienceQA.

### Task Introduction:
We offer the following data or prior inputs:
### Data

We offer the following data:
✓ Point Cloud BEV Image: 3D point cloud projection in BEV perspective.
✓ Panoramic Images: Wide-angle camera shots covering front, left-rear, and right-rear angles.
✓ Projection Conversion Parameters: Perspective projection conversion parameters for each frame's photo and point cloud image.
Questions will randomly target various tag dimensions, such as scene type, number and attributes of lanes, presence of intersections, etc. Sample questions are as follows:
✓ Projection Conversion Parameters: Perspective projection conversion parameters for each frame's photo and point cloud
image.
Questions will target various tag dimensions, such as scene type, number and attributes of lanes, presence of
intersections, etc. Sample questions are as follows:

![Poster](./figures/qa1.png)

![Poster](./figures/qa2.png)

### Evaluation

We will evaluate the performance of models on the test set using the following accuracy metrics:

- Frame-overall-accuracy `(FRM)`: A frame is considered correct if all closed-choice questions about it are answered
correctly.
- Question-overall-accuracy `(QNS)`: A question is considered correct if its answer is correct.
- Individual-question-accuracy: The accuracy of each specific closed-choice question, including:
- How many lanes in current road? `(LAN)`
- Is there any road cross, intersection or lane change zone in the main road? `(INT)`
- What is the point cloud data quality in current road area of this image? `(QLT)`
- What kind of road scene is it in the images? `(SCN)`

We can get the accuracy metrics of each question and the overall accuracy with `random guessing` by running:

```bash
cd tools
python random_chance.py
```

Change the random guess to your algorithm's prediction to get the evaluation results of your algorithm.

**Please submit your results by filling out this [form](https://forms.office.com/r/mapGsGWQNf). This will allow us to
update your results on the leaderboard.**

![Poster](./figures/qa1.png)
### Leaderboard

![Poster](./figures/qa2.png)
| Method | FRM | QNS | LAN | INT | QLT | SCN |
|:-----------------:|:----:|:-----:|:-----:|:-----:|:-----:|:-----:|
| **Random Change** | 0.00 | 19.55 | 21.00 | 16.73 | 25.20 | 15.27 |

### How to Participate:
✓ Vision, multimodal integration, autonomous driving, high-definition mapping, and other relevant domain researchers or practitioners are welcome to join the challenge.
✓ Apart from the workshop competition content, we'll invite guests for topic-related presentations. Interested parties are urged to keep an eye out and join this workshop for paper submissions.
✓ Outstanding students in the workshop paper submissions and competitions may receive priority for internship opportunities at Tencent Maps Perception Machine Learning Engineer / Research Scientist positions.
✓ Competition details are still being prepared. Please keep checking our workshop homepage for updates.
## Citation

If the code, datasets, and research behind this workshop inspire you, please cite our work:

## Citation
If the code, datasets, and research behind this workshop inspire you, please cite our work:
```
@inproceedings{tang2023thma,
title={THMA: tencent HD Map AI system for creating HD map annotations},
Expand Down
Binary file removed models/__pycache__/utils.cpython-38.pyc
Binary file not shown.
11 changes: 4 additions & 7 deletions models/random_chance.py → tools/random_chance.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import random
from typing import List

from utils import load_data, get_result_file, new_acc, compute_acc
from utils import load_data, get_result_file, acc_counter, compute_acc

parser = argparse.ArgumentParser()
parser.add_argument('--data_root', type=str, default='data/maplm_v0.1')
Expand All @@ -19,14 +19,13 @@
args = parser.parse_args()

results = dict(
question_overall=new_acc(),
frame_overall=new_acc(),
question_overall=acc_counter(),
frame_overall=acc_counter(),
)

if __name__ == "__main__":
print('===== Input Arguments =====')
print(json.dumps(vars(args), indent=4, sort_keys=True))
load_data(args)

random.seed(args.random_seed)

Expand All @@ -49,7 +48,7 @@
random_guess: int = random.randint(0, len(choices) - 1)

if question not in results:
results[question] = new_acc()
results[question] = acc_counter()

correct = bool(random_guess == true_answer)
corrects.append(correct)
Expand All @@ -66,5 +65,3 @@
acc_dict = compute_acc(results)
print(json.dumps(acc_dict, indent=4, sort_keys=True))
print(json.dumps(results, indent=4, sort_keys=True))


2 changes: 1 addition & 1 deletion models/utils.py → tools/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ def get_result_file(args):
return result_file


def new_acc():
def acc_counter():
return {
'total': 0,
'correct': 0
Expand Down

0 comments on commit 29f2c85

Please sign in to comment.