-
-
Notifications
You must be signed in to change notification settings - Fork 53
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #14 from linyanAI/main
support challenge. Everything under ./challenge/
- Loading branch information
Showing
36 changed files
with
5,613 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,171 @@ | ||
### TL;DR | ||
* The purpose of this folder is to facilitate our CVPR 2024 challenge. Initially, we will use a small subset of training data (**demo train** in the following text) as an illustrative example, demonstrating how to obtain the **test data format**, train the baseline, infer the baseline and go through the evaluation pipeline. | ||
|
||
* For the purpose of the new **test data format**, it is essential that our primary intention is to creat a specific test data format preventing possible cheating. | ||
|
||
<!-- > * Subsequently, we will demonstrate the process of conducting evaluations, encompassing the baseline methodology. --> | ||
|
||
* For better illustration, we provide [google slides](https://docs.google.com/presentation/d/1bicxoR_L3t05p5xw-qZM0Dj5KdJhjynqLM0Rck0qdcI/edit?usp=sharing) for your reference. | ||
|
||
* **Official annoucement about the DriveLM challenge is maintained in this folder**. Please raise an issue in the repo if you find anything unclear. | ||
|
||
## How to Prepare Data | ||
|
||
### DriveLM | ||
Download the full DriveLM data [v1_0_train_nus.json](https://drive.google.com/file/d/1LK7pYHytv64neN1626u6eTQBy1Uf4IQH/view?usp=sharing), and the demo train DriveLM data [train_sample.json](https://drive.google.com/file/d/1pDikp6xoZGdyUS75qCqCM-Bh5-DWLyj-/view?usp=drive_link). | ||
The code can run on both full and sampled data, and we provide the entire process of running on the demo train data as follows: | ||
|
||
Prepare test data by running | ||
### Extract Data | ||
|
||
Extract fundamental question-and-answer (QA) pairs from the training dataset. | ||
|
||
**Note that** the number and the content of thefundamental QA pairs might change in the test server, but we ensure that **all the question types are limited in our provided test data format**. That being said, the question types are within 1) multi-choice question; 2) conversation question; 3) yes/no questions; | ||
|
||
```bash | ||
# The following script assumes that you download the train data json under ./challenge/data | ||
# make sure you are under ./challenge | ||
mkdir data | ||
mv train_sample.json data/train_sample.json | ||
python extract_data.py | ||
``` | ||
Then we will get the test.json in challenge folder. The example of test.json can be found in [test.json](test.json) | ||
|
||
### Convert Data | ||
|
||
Transform the obtained test.json data into the required test format. | ||
|
||
```bash | ||
# The following script assumes that you download the train data json under ./challenge/data | ||
# make sure you are under ./challenge | ||
python convert_data.py | ||
``` | ||
Then we will get the test_v1.json in challenge folder. The example of test_v1.json can be found in [test_v1.json](test_v1.json) | ||
|
||
We use llama-adapter v2 as our baseline. If we want to convert data into llama-adapter format: | ||
```bash | ||
# The following script assumes that you prepare the test_v1.json under ./challenge | ||
# make sure you are under ./challenge | ||
python convert2llama.py | ||
``` | ||
Then we will get the test_v2.json in challenge folder. The example of test_v2.json can be found in [test_v2.json](test_v2.json) | ||
|
||
[test_v1.json](test_v1.json) is used for evaluation. [test_v2.json](test_v2.json) is used for training and inference of the baseline. | ||
|
||
## How to run baseline | ||
|
||
As we said above, we use [llama-adapter v2](https://github.com/OpenGVLab/LLaMA-Adapter/tree/main/llama_adapter_v2_multimodal7b) as our baseline. | ||
|
||
### Setup | ||
We provide a simple setup script below, and you can also refer to [docs](llama_adapter_v2_multimodal7b/README.md#L9) for more specific installation. | ||
* setup up a new conda env and install necessary packages. | ||
```bash | ||
# make sure you are under ./challenge/llama_adapter_v2_multimodal7b | ||
conda create -n llama_adapter_v2 python=3.8 -y | ||
conda activate llama_adapter_v2 | ||
pip install -r requirements.txt | ||
``` | ||
|
||
* Obtain the LLaMA pretrained weights using this [form](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform?usp=send_form). Please note that checkpoints from unofficial sources (e.g., BitTorrent) may contain malicious code and should be used with care. Organize the downloaded file in the following structure | ||
```bash | ||
/path/to/llama_model_weights | ||
├── 7B | ||
│ ├── checklist.chk | ||
│ ├── consolidated.00.pth | ||
│ └── params.json | ||
└── tokenizer.model | ||
``` | ||
|
||
### Train baseline | ||
You should modify the [finetune_data_config.yaml](llama_adapter_v2_multimodal7b/finetune_data_config.yaml#L2) to specify the datasets for fine-tuning. | ||
The format of datasets refer to [test_v2.json](test_v2.json). | ||
|
||
The pre-trained checkpoint can be downloaded in [ckpts](https://github.com/OpenGVLab/LLaMA-Adapter/releases/tag/v.2.0.0). | ||
|
||
First prepare the [nuscenes](https://www.nuscenes.org/) dataset which can refer to [BEVFormer](https://github.com/fundamentalvision/BEVFormer/blob/master/docs/prepare_dataset.md). | ||
```bash | ||
data/nuscenes | ||
├── samples | ||
│ ├── n015-2018-11-21-19-58-31+0800__CAM_FRONT_LEFT__1542801707504844.jpg | ||
│ ├── n015-2018-11-21-19-58-31+0800__CAM_FRONT_LEFT__1542801708004844.jpg | ||
``` | ||
|
||
Then link the nuscenes dataset under the folder llama_adapter_v2_multimodal7b/data/. | ||
```bash | ||
# The following script assumes that you prepare the nuscenes under ./challenge | ||
ln -s nuscenes llama_adapter_v2_multimodal7b/data | ||
``` | ||
|
||
Then we can train baseline as follows. | ||
```bash | ||
# /path/to/llama_model_weights, /path/to/pre-trained/checkpoint.pth and /output/path need to be modified by your path | ||
# make sure you are under ./challenge/llama_adapter_v2_multimodal7b | ||
.exps/finetune.sh \ | ||
/path/to/llama_model_weights /path/to/pre-trained/checkpoint.pth \ | ||
finetune_data_config.yaml /output/path | ||
``` | ||
|
||
### Inference baseline | ||
|
||
```bash | ||
# /path/to/llama_model_weights and /path/to/pre-trained/checkpoint.pth need to be modified by your path | ||
# make sure you are under ./challenge/llama_adapter_v2_multimodal7b | ||
python demo.py --llama_dir /path/to/llama_model_weights --checkpoint /path/to/pre-trained/checkpoint.pth --data ../test_v2.json --output ../llama-adapter-DriveLM.json | ||
``` | ||
Then we will get the [llama-adapter-DriveLM.json](llama-adapter-DriveLM.json), which are the predicted answers used for evaluation purposes. | ||
|
||
|
||
## How to Eval | ||
|
||
We implement diverse evaluation metrics tailored to different question types as mentioned [above](https://github.com/OpenDriveLab/DriveLM-private/blob/test/challenge/README.md?plain=1#L19). | ||
|
||
### Setup | ||
Intall the language-evaluation package | ||
|
||
Following [https://github.com/bckim92/language-evaluation](https://github.com/bckim92/language-evaluation) (skip first step if related libraries are already installed) | ||
|
||
```bash | ||
# FIRST STEP | ||
# Oracle Java | ||
sudo add-apt-repository ppa:webupd8team/java | ||
sudo apt upadte | ||
apt-get install oracle-java8-installer | ||
|
||
# libxml-parser-perl | ||
sudo apt install libxml-parser-perl | ||
``` | ||
Then run: | ||
```bash | ||
# SECOND STEP | ||
pip install git+https://github.com/bckim92/language-evaluation.git | ||
python -c "import language_evaluation; language_evaluation.download('coco')" | ||
``` | ||
|
||
### Evaluation | ||
**The number and the content of the questions are subject to change in later version, but the question types are limited and provided.** | ||
|
||
We have implemented three types of evaluation methods: Accuracy, ChatGPT Score, Language Evaluation and Match Score. The [final score](evaluation.py#L157) is the weighted average of four metrics. | ||
|
||
The inputs required for evaluation are [llama-adapter-DriveLM.json](llama-adapter-DriveLM.json) and [test_v1.json](test_v1.json). | ||
|
||
1. Replace [root_path1](evaluation.py#L97) with the path of your models' output. The example of models' output can be found in [output](llama-adapter-DriveLM.json). | ||
2. Replace [root_path2](evaluation.py#L101) with the path of test_v1.json. The example of test_v1.json can be found in [test_v1.json](test_v1.json) | ||
3. Replace [API-KEY](chatgpt.py#L17) with your own chatGPT api key. | ||
|
||
```bash | ||
# The following script assumes that you prepare the llama-adapter-DriveLM.json and test_v1.json under ./challenge | ||
# make sure you are under ./challenge | ||
python evaluation.py --root_path1 ./llama-adapter-DriveLM.json --root_path2 ./test_v1.json | ||
``` | ||
|
||
### Results | ||
The zero-shot results of baseline on the sampled data is as follows: | ||
``` | ||
accuracy: 0.0 | ||
chatgpt: 78.5 | ||
match: 23.75 | ||
language score: {'val/Bleu_1': 0.029183757177883535, 'val/Bleu_2': 0.00017003737042789148, 'val/Bleu_3': 3.066026234534233e-05, 'val/Bleu_4': 1.3024512157157705e-05, 'val/ROUGE_L': 0.05928706665796174, 'val/CIDEr': 0.05818698178494484} | ||
final score: 0.36633034231114403 | ||
``` | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
from .chatgpt import ChatGPT |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
from tenacity import ( | ||
retry, | ||
stop_after_attempt, | ||
wait_incrementing, | ||
) | ||
import openai | ||
import pickle | ||
import pdb | ||
import numpy as np | ||
import torch | ||
import json | ||
import argparse | ||
|
||
|
||
class ChatGPT: | ||
def __init__(self): | ||
openai.api_key = "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" | ||
|
||
def call_chatgpt(self, chatgpt_messages, max_tokens=40, model="gpt-3.5-turbo"): | ||
response = openai.ChatCompletion.create( | ||
model=model, messages=chatgpt_messages, temperature=0.6, max_tokens=max_tokens | ||
) | ||
reply = response["choices"][0]["message"]["content"] | ||
total_tokens = response["usage"]["total_tokens"] | ||
return reply, total_tokens | ||
|
||
def prepare_chatgpt_message(self, prompt): | ||
system_message = "an evaluator who rates my answer based on the correct answer" | ||
messages = [{"role": "system", "content": system_message}] | ||
messages.append({"role": "user", "content": "{}".format(prompt)}) | ||
|
||
return messages | ||
|
||
def forward(self, answer, GT): | ||
prompts = "Rate my answer based on the correct answer out of 100, with higher scores indicating that the answer is closer to the correct answer, and you should be accurate to single digits like 62, 78, 41,etc. Output the number only" | ||
prompts = prompts + "This is the correct answer: " + GT + "This is my answer: " + answer | ||
|
||
output = "" | ||
messages = self.prepare_chatgpt_message(prompts) | ||
reply, total_tokens = self.call_chatgpt(messages, max_tokens=3000) | ||
|
||
output += reply | ||
output += "\n\n" | ||
|
||
output = output[:-2] | ||
|
||
return output | ||
|
||
|
||
if __name__ == "__main__": | ||
prediction = "Keep going at the same speed." | ||
GT = "Keep going at the same speed, decelerate gradually without braking." | ||
|
||
eval = ChatGPT() | ||
scores = eval.forward(prediction, GT) | ||
print(scores) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
import numpy as np | ||
import json | ||
|
||
|
||
def convert2llama(root, dst): | ||
with open(root, 'r') as f: | ||
test_file = json.load(f) | ||
|
||
output = [] | ||
for scene_id in test_file.keys(): | ||
scene_data = test_file[scene_id]['key_frames'] | ||
|
||
for frame_id in scene_data.keys(): | ||
image_paths = scene_data[frame_id]['image_paths'] | ||
image_paths = [image_paths[key].replace("..", "data") for key in image_paths.keys()] | ||
|
||
frame_data_qa = scene_data[frame_id]['QA'] | ||
QA_pairs = frame_data_qa["perception"] + frame_data_qa["prediction"] + frame_data_qa["planning"] + frame_data_qa["behavior"] | ||
|
||
for idx, qa in enumerate(QA_pairs): | ||
question = qa['Q'] | ||
answer = qa['A'] | ||
output.append( | ||
{ | ||
"id": scene_id + "_" + frame_id + "_" + str(idx), | ||
"image": image_paths, | ||
"conversations": [ | ||
{ | ||
"from": "human", | ||
"value": "<image>\n" + question | ||
}, | ||
{ | ||
"from": "gpt", | ||
"value": answer | ||
}, | ||
] | ||
} | ||
) | ||
|
||
with open(dst, 'w') as f: | ||
json.dump(output, f, indent=4) | ||
|
||
|
||
if __name__ == '__main__': | ||
root = "test_v1.json" | ||
dst = "test_v2.json" | ||
convert2llama(root, dst) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
import numpy as np | ||
import json | ||
import random | ||
|
||
|
||
def rule_based1(question, answer): | ||
rule = ["Going ahead.", "Turn right.", "Turn left.", "Stopped."] | ||
question += f" Please select the correct answer from the following options: A. {rule[0]} B. {rule[1]} C. {rule[2]} D. {rule[3]}" | ||
idx = rule.index(answer) | ||
mapping = {0: "A", 1: "B", 2: "C", 3: "D"} | ||
return {"Q": question, "A": mapping[idx]} | ||
|
||
def rule_based2(question, answer): | ||
rule = ['The ego vehicle is slightly steering to the left. The ego vehicle is driving very fast.', 'The ego vehicle is steering to the left. The ego vehicle is driving with normal speed.', 'The ego vehicle is steering to the left. The ego vehicle is driving fast.', 'The ego vehicle is slightly steering to the right. The ego vehicle is driving fast.', 'The ego vehicle is going straight. The ego vehicle is driving slowly.', 'The ego vehicle is going straight. The ego vehicle is driving with normal speed.', 'The ego vehicle is slightly steering to the left. The ego vehicle is driving with normal speed.', 'The ego vehicle is slightly steering to the left. The ego vehicle is driving slowly.', 'The ego vehicle is slightly steering to the right. The ego vehicle is driving slowly.', 'The ego vehicle is slightly steering to the right. The ego vehicle is driving very fast.', 'The ego vehicle is steering to the right. The ego vehicle is driving fast.', 'The ego vehicle is steering to the right. The ego vehicle is driving very fast.', 'The ego vehicle is slightly steering to the left. The ego vehicle is driving fast.', 'The ego vehicle is steering to the left. The ego vehicle is driving very fast.', 'The ego vehicle is going straight. The ego vehicle is not moving.', 'The ego vehicle is slightly steering to the right. The ego vehicle is driving with normal speed.', 'The ego vehicle is steering to the right. The ego vehicle is driving slowly.', 'The ego vehicle is steering to the right. The ego vehicle is driving with normal speed.', 'The ego vehicle is going straight. The ego vehicle is driving very fast.', 'The ego vehicle is going straight. The ego vehicle is driving fast.', 'The ego vehicle is steering to the left. The ego vehicle is driving slowly.'] | ||
rule.remove(answer) | ||
choices = random.sample(rule, 3) | ||
choices.append(answer) | ||
random.shuffle(choices) | ||
idx = choices.index(answer) | ||
question += f" Please select the correct answer from the following options: A. {choices[0]} B. {choices[1]} C. {choices[2]} D. {choices[3]}" | ||
mapping = {0: "A", 1: "B", 2: "C", 3: "D"} | ||
return {"Q": question, "A": mapping[idx]} | ||
|
||
|
||
def loop_test(root, dst): | ||
with open(root, 'r') as f: | ||
test_file = json.load(f) | ||
|
||
for scene_id in test_file.keys(): | ||
scene_data = test_file[scene_id]['key_frames'] | ||
|
||
for frame_id in scene_data.keys(): | ||
# frame_data_infos = scene_data[frame_id]['key_object_infos'] | ||
frame_data_qa = scene_data[frame_id]['QA'] | ||
image_paths = scene_data[frame_id]['image_paths'] | ||
|
||
test_file[scene_id]['key_frames'][frame_id] = dict() | ||
# test_file[scene_id]['key_frames'][frame_id]['key_object_infos'] = frame_data_infos | ||
test_file[scene_id]['key_frames'][frame_id]['QA'] = dict() | ||
test_file[scene_id]['key_frames'][frame_id]['QA']['perception'] = [] | ||
# add all prediction and planning | ||
test_file[scene_id]['key_frames'][frame_id]['QA']['prediction'] = frame_data_qa["prediction"] | ||
test_file[scene_id]['key_frames'][frame_id]['QA']['planning'] = frame_data_qa["planning"] | ||
|
||
test_file[scene_id]['key_frames'][frame_id]['QA']['behavior'] = [] | ||
test_file[scene_id]['key_frames'][frame_id]['image_paths'] = image_paths | ||
|
||
for qa in frame_data_qa["perception"]: | ||
question = qa['Q'] | ||
answer = qa['A'] | ||
if "What is the moving status of object".lower() in question.lower(): | ||
qa.update(rule_based1(question, answer)) | ||
test_file[scene_id]['key_frames'][frame_id]['QA']['perception'].append(qa) | ||
else: | ||
test_file[scene_id]['key_frames'][frame_id]['QA']['perception'].append(qa) | ||
|
||
for qa in frame_data_qa["behavior"]: | ||
question = qa['Q'] | ||
answer = qa['A'] | ||
qa.update(rule_based2(question, answer)) | ||
test_file[scene_id]['key_frames'][frame_id]['QA']['behavior'].append(qa) | ||
|
||
with open(dst, 'w') as f: | ||
json.dump(test_file, f, indent=4) | ||
|
||
|
||
|
||
if __name__ == '__main__': | ||
root = "test.json" | ||
dst = "test_v1.json" | ||
loop_test(root, dst) |
Oops, something went wrong.