Skip to content

Commit

Permalink
Merge pull request #14 from linyanAI/main
Browse files Browse the repository at this point in the history
support challenge. Everything under ./challenge/
  • Loading branch information
ChonghaoSima authored Feb 29, 2024
2 parents bf5f429 + c766609 commit 7a02153
Show file tree
Hide file tree
Showing 36 changed files with 5,613 additions and 0 deletions.
171 changes: 171 additions & 0 deletions challenge/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
### TL;DR
* The purpose of this folder is to facilitate our CVPR 2024 challenge. Initially, we will use a small subset of training data (**demo train** in the following text) as an illustrative example, demonstrating how to obtain the **test data format**, train the baseline, infer the baseline and go through the evaluation pipeline.

* For the purpose of the new **test data format**, it is essential that our primary intention is to creat a specific test data format preventing possible cheating.

<!-- > * Subsequently, we will demonstrate the process of conducting evaluations, encompassing the baseline methodology. -->

* For better illustration, we provide [google slides](https://docs.google.com/presentation/d/1bicxoR_L3t05p5xw-qZM0Dj5KdJhjynqLM0Rck0qdcI/edit?usp=sharing) for your reference.

* **Official annoucement about the DriveLM challenge is maintained in this folder**. Please raise an issue in the repo if you find anything unclear.

## How to Prepare Data

### DriveLM
Download the full DriveLM data [v1_0_train_nus.json](https://drive.google.com/file/d/1LK7pYHytv64neN1626u6eTQBy1Uf4IQH/view?usp=sharing), and the demo train DriveLM data [train_sample.json](https://drive.google.com/file/d/1pDikp6xoZGdyUS75qCqCM-Bh5-DWLyj-/view?usp=drive_link).
The code can run on both full and sampled data, and we provide the entire process of running on the demo train data as follows:

Prepare test data by running
### Extract Data

Extract fundamental question-and-answer (QA) pairs from the training dataset.

**Note that** the number and the content of thefundamental QA pairs might change in the test server, but we ensure that **all the question types are limited in our provided test data format**. That being said, the question types are within 1) multi-choice question; 2) conversation question; 3) yes/no questions;

```bash
# The following script assumes that you download the train data json under ./challenge/data
# make sure you are under ./challenge
mkdir data
mv train_sample.json data/train_sample.json
python extract_data.py
```
Then we will get the test.json in challenge folder. The example of test.json can be found in [test.json](test.json)

### Convert Data

Transform the obtained test.json data into the required test format.

```bash
# The following script assumes that you download the train data json under ./challenge/data
# make sure you are under ./challenge
python convert_data.py
```
Then we will get the test_v1.json in challenge folder. The example of test_v1.json can be found in [test_v1.json](test_v1.json)

We use llama-adapter v2 as our baseline. If we want to convert data into llama-adapter format:
```bash
# The following script assumes that you prepare the test_v1.json under ./challenge
# make sure you are under ./challenge
python convert2llama.py
```
Then we will get the test_v2.json in challenge folder. The example of test_v2.json can be found in [test_v2.json](test_v2.json)

[test_v1.json](test_v1.json) is used for evaluation. [test_v2.json](test_v2.json) is used for training and inference of the baseline.

## How to run baseline

As we said above, we use [llama-adapter v2](https://github.com/OpenGVLab/LLaMA-Adapter/tree/main/llama_adapter_v2_multimodal7b) as our baseline.

### Setup
We provide a simple setup script below, and you can also refer to [docs](llama_adapter_v2_multimodal7b/README.md#L9) for more specific installation.
* setup up a new conda env and install necessary packages.
```bash
# make sure you are under ./challenge/llama_adapter_v2_multimodal7b
conda create -n llama_adapter_v2 python=3.8 -y
conda activate llama_adapter_v2
pip install -r requirements.txt
```

* Obtain the LLaMA pretrained weights using this [form](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform?usp=send_form). Please note that checkpoints from unofficial sources (e.g., BitTorrent) may contain malicious code and should be used with care. Organize the downloaded file in the following structure
```bash
/path/to/llama_model_weights
├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
└── tokenizer.model
```

### Train baseline
You should modify the [finetune_data_config.yaml](llama_adapter_v2_multimodal7b/finetune_data_config.yaml#L2) to specify the datasets for fine-tuning.
The format of datasets refer to [test_v2.json](test_v2.json).

The pre-trained checkpoint can be downloaded in [ckpts](https://github.com/OpenGVLab/LLaMA-Adapter/releases/tag/v.2.0.0).

First prepare the [nuscenes](https://www.nuscenes.org/) dataset which can refer to [BEVFormer](https://github.com/fundamentalvision/BEVFormer/blob/master/docs/prepare_dataset.md).
```bash
data/nuscenes
├── samples
│ ├── n015-2018-11-21-19-58-31+0800__CAM_FRONT_LEFT__1542801707504844.jpg
│ ├── n015-2018-11-21-19-58-31+0800__CAM_FRONT_LEFT__1542801708004844.jpg
```

Then link the nuscenes dataset under the folder llama_adapter_v2_multimodal7b/data/.
```bash
# The following script assumes that you prepare the nuscenes under ./challenge
ln -s nuscenes llama_adapter_v2_multimodal7b/data
```

Then we can train baseline as follows.
```bash
# /path/to/llama_model_weights, /path/to/pre-trained/checkpoint.pth and /output/path need to be modified by your path
# make sure you are under ./challenge/llama_adapter_v2_multimodal7b
.exps/finetune.sh \
/path/to/llama_model_weights /path/to/pre-trained/checkpoint.pth \
finetune_data_config.yaml /output/path
```

### Inference baseline

```bash
# /path/to/llama_model_weights and /path/to/pre-trained/checkpoint.pth need to be modified by your path
# make sure you are under ./challenge/llama_adapter_v2_multimodal7b
python demo.py --llama_dir /path/to/llama_model_weights --checkpoint /path/to/pre-trained/checkpoint.pth --data ../test_v2.json --output ../llama-adapter-DriveLM.json
```
Then we will get the [llama-adapter-DriveLM.json](llama-adapter-DriveLM.json), which are the predicted answers used for evaluation purposes.


## How to Eval

We implement diverse evaluation metrics tailored to different question types as mentioned [above](https://github.com/OpenDriveLab/DriveLM-private/blob/test/challenge/README.md?plain=1#L19).

### Setup
Intall the language-evaluation package

Following [https://github.com/bckim92/language-evaluation](https://github.com/bckim92/language-evaluation) (skip first step if related libraries are already installed)

```bash
# FIRST STEP
# Oracle Java
sudo add-apt-repository ppa:webupd8team/java
sudo apt upadte
apt-get install oracle-java8-installer

# libxml-parser-perl
sudo apt install libxml-parser-perl
```
Then run:
```bash
# SECOND STEP
pip install git+https://github.com/bckim92/language-evaluation.git
python -c "import language_evaluation; language_evaluation.download('coco')"
```

### Evaluation
**The number and the content of the questions are subject to change in later version, but the question types are limited and provided.**

We have implemented three types of evaluation methods: Accuracy, ChatGPT Score, Language Evaluation and Match Score. The [final score](evaluation.py#L157) is the weighted average of four metrics.

The inputs required for evaluation are [llama-adapter-DriveLM.json](llama-adapter-DriveLM.json) and [test_v1.json](test_v1.json).

1. Replace [root_path1](evaluation.py#L97) with the path of your models' output. The example of models' output can be found in [output](llama-adapter-DriveLM.json).
2. Replace [root_path2](evaluation.py#L101) with the path of test_v1.json. The example of test_v1.json can be found in [test_v1.json](test_v1.json)
3. Replace [API-KEY](chatgpt.py#L17) with your own chatGPT api key.

```bash
# The following script assumes that you prepare the llama-adapter-DriveLM.json and test_v1.json under ./challenge
# make sure you are under ./challenge
python evaluation.py --root_path1 ./llama-adapter-DriveLM.json --root_path2 ./test_v1.json
```

### Results
The zero-shot results of baseline on the sampled data is as follows:
```
accuracy: 0.0
chatgpt: 78.5
match: 23.75
language score: {'val/Bleu_1': 0.029183757177883535, 'val/Bleu_2': 0.00017003737042789148, 'val/Bleu_3': 3.066026234534233e-05, 'val/Bleu_4': 1.3024512157157705e-05, 'val/ROUGE_L': 0.05928706665796174, 'val/CIDEr': 0.05818698178494484}
final score: 0.36633034231114403
```


1 change: 1 addition & 0 deletions challenge/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .chatgpt import ChatGPT
56 changes: 56 additions & 0 deletions challenge/chatgpt.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
from tenacity import (
retry,
stop_after_attempt,
wait_incrementing,
)
import openai
import pickle
import pdb
import numpy as np
import torch
import json
import argparse


class ChatGPT:
def __init__(self):
openai.api_key = "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"

def call_chatgpt(self, chatgpt_messages, max_tokens=40, model="gpt-3.5-turbo"):
response = openai.ChatCompletion.create(
model=model, messages=chatgpt_messages, temperature=0.6, max_tokens=max_tokens
)
reply = response["choices"][0]["message"]["content"]
total_tokens = response["usage"]["total_tokens"]
return reply, total_tokens

def prepare_chatgpt_message(self, prompt):
system_message = "an evaluator who rates my answer based on the correct answer"
messages = [{"role": "system", "content": system_message}]
messages.append({"role": "user", "content": "{}".format(prompt)})

return messages

def forward(self, answer, GT):
prompts = "Rate my answer based on the correct answer out of 100, with higher scores indicating that the answer is closer to the correct answer, and you should be accurate to single digits like 62, 78, 41,etc. Output the number only"
prompts = prompts + "This is the correct answer: " + GT + "This is my answer: " + answer

output = ""
messages = self.prepare_chatgpt_message(prompts)
reply, total_tokens = self.call_chatgpt(messages, max_tokens=3000)

output += reply
output += "\n\n"

output = output[:-2]

return output


if __name__ == "__main__":
prediction = "Keep going at the same speed."
GT = "Keep going at the same speed, decelerate gradually without braking."

eval = ChatGPT()
scores = eval.forward(prediction, GT)
print(scores)
47 changes: 47 additions & 0 deletions challenge/convert2llama.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import numpy as np
import json


def convert2llama(root, dst):
with open(root, 'r') as f:
test_file = json.load(f)

output = []
for scene_id in test_file.keys():
scene_data = test_file[scene_id]['key_frames']

for frame_id in scene_data.keys():
image_paths = scene_data[frame_id]['image_paths']
image_paths = [image_paths[key].replace("..", "data") for key in image_paths.keys()]

frame_data_qa = scene_data[frame_id]['QA']
QA_pairs = frame_data_qa["perception"] + frame_data_qa["prediction"] + frame_data_qa["planning"] + frame_data_qa["behavior"]

for idx, qa in enumerate(QA_pairs):
question = qa['Q']
answer = qa['A']
output.append(
{
"id": scene_id + "_" + frame_id + "_" + str(idx),
"image": image_paths,
"conversations": [
{
"from": "human",
"value": "<image>\n" + question
},
{
"from": "gpt",
"value": answer
},
]
}
)

with open(dst, 'w') as f:
json.dump(output, f, indent=4)


if __name__ == '__main__':
root = "test_v1.json"
dst = "test_v2.json"
convert2llama(root, dst)
71 changes: 71 additions & 0 deletions challenge/convert_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
import numpy as np
import json
import random


def rule_based1(question, answer):
rule = ["Going ahead.", "Turn right.", "Turn left.", "Stopped."]
question += f" Please select the correct answer from the following options: A. {rule[0]} B. {rule[1]} C. {rule[2]} D. {rule[3]}"
idx = rule.index(answer)
mapping = {0: "A", 1: "B", 2: "C", 3: "D"}
return {"Q": question, "A": mapping[idx]}

def rule_based2(question, answer):
rule = ['The ego vehicle is slightly steering to the left. The ego vehicle is driving very fast.', 'The ego vehicle is steering to the left. The ego vehicle is driving with normal speed.', 'The ego vehicle is steering to the left. The ego vehicle is driving fast.', 'The ego vehicle is slightly steering to the right. The ego vehicle is driving fast.', 'The ego vehicle is going straight. The ego vehicle is driving slowly.', 'The ego vehicle is going straight. The ego vehicle is driving with normal speed.', 'The ego vehicle is slightly steering to the left. The ego vehicle is driving with normal speed.', 'The ego vehicle is slightly steering to the left. The ego vehicle is driving slowly.', 'The ego vehicle is slightly steering to the right. The ego vehicle is driving slowly.', 'The ego vehicle is slightly steering to the right. The ego vehicle is driving very fast.', 'The ego vehicle is steering to the right. The ego vehicle is driving fast.', 'The ego vehicle is steering to the right. The ego vehicle is driving very fast.', 'The ego vehicle is slightly steering to the left. The ego vehicle is driving fast.', 'The ego vehicle is steering to the left. The ego vehicle is driving very fast.', 'The ego vehicle is going straight. The ego vehicle is not moving.', 'The ego vehicle is slightly steering to the right. The ego vehicle is driving with normal speed.', 'The ego vehicle is steering to the right. The ego vehicle is driving slowly.', 'The ego vehicle is steering to the right. The ego vehicle is driving with normal speed.', 'The ego vehicle is going straight. The ego vehicle is driving very fast.', 'The ego vehicle is going straight. The ego vehicle is driving fast.', 'The ego vehicle is steering to the left. The ego vehicle is driving slowly.']
rule.remove(answer)
choices = random.sample(rule, 3)
choices.append(answer)
random.shuffle(choices)
idx = choices.index(answer)
question += f" Please select the correct answer from the following options: A. {choices[0]} B. {choices[1]} C. {choices[2]} D. {choices[3]}"
mapping = {0: "A", 1: "B", 2: "C", 3: "D"}
return {"Q": question, "A": mapping[idx]}


def loop_test(root, dst):
with open(root, 'r') as f:
test_file = json.load(f)

for scene_id in test_file.keys():
scene_data = test_file[scene_id]['key_frames']

for frame_id in scene_data.keys():
# frame_data_infos = scene_data[frame_id]['key_object_infos']
frame_data_qa = scene_data[frame_id]['QA']
image_paths = scene_data[frame_id]['image_paths']

test_file[scene_id]['key_frames'][frame_id] = dict()
# test_file[scene_id]['key_frames'][frame_id]['key_object_infos'] = frame_data_infos
test_file[scene_id]['key_frames'][frame_id]['QA'] = dict()
test_file[scene_id]['key_frames'][frame_id]['QA']['perception'] = []
# add all prediction and planning
test_file[scene_id]['key_frames'][frame_id]['QA']['prediction'] = frame_data_qa["prediction"]
test_file[scene_id]['key_frames'][frame_id]['QA']['planning'] = frame_data_qa["planning"]

test_file[scene_id]['key_frames'][frame_id]['QA']['behavior'] = []
test_file[scene_id]['key_frames'][frame_id]['image_paths'] = image_paths

for qa in frame_data_qa["perception"]:
question = qa['Q']
answer = qa['A']
if "What is the moving status of object".lower() in question.lower():
qa.update(rule_based1(question, answer))
test_file[scene_id]['key_frames'][frame_id]['QA']['perception'].append(qa)
else:
test_file[scene_id]['key_frames'][frame_id]['QA']['perception'].append(qa)

for qa in frame_data_qa["behavior"]:
question = qa['Q']
answer = qa['A']
qa.update(rule_based2(question, answer))
test_file[scene_id]['key_frames'][frame_id]['QA']['behavior'].append(qa)

with open(dst, 'w') as f:
json.dump(test_file, f, indent=4)



if __name__ == '__main__':
root = "test.json"
dst = "test_v1.json"
loop_test(root, dst)
Loading

0 comments on commit 7a02153

Please sign in to comment.