Merge pull request #28 from DevLinyan/main

Fix unclear part
OpenDriveLab · Mar 5, 2024 · f3a760b · f3a760b
2 parents a6f6b2d + e74eca8
commit f3a760b
Show file tree

Hide file tree

Showing 13 changed files with 3,549 additions and 660 deletions.
diff --git a/challenge/README.md b/challenge/README.md
@@ -73,17 +73,17 @@ Transform the obtained test.json data into the required test format.
 # make sure you are under ./challenge
 python convert_data.py
 ```
-Then we will get the test_v1.json in challenge folder. The example of test_v1.json can be found in [test_v1.json](test_v1.json)
+Then we will get the test_eval.json in challenge folder. The example of test_eval.json can be found in [test_eval.json](test_eval.json)
 
 We use llama-adapter v2 as our baseline. If you want to convert data into llama-adapter format:
 ```bash
-# The following script assumes that you prepare the test_v1.json under ./challenge
+# The following script assumes that you prepare the test_eval.json under ./challenge
 # make sure you are under ./challenge
 python convert2llama.py
 ```
-Then we will get the test_v2.json in challenge folder. The example of test_v2.json can be found in [test_v2.json](test_v2.json)
+Then we will get the test_llama.json in challenge folder. The example of test_llama.json can be found in [test_llama.json](test_llama.json)
 
-[test_v1.json](test_v1.json) is used for evaluation. [test_v2.json](test_v2.json) is used for training and inference of the baseline.
+[test_eval.json](test_eval.json) is used for evaluation. [test_llama.json](test_llama.json) is used for training and inference of the baseline.
 
 ## How to run baseline
 
@@ -111,9 +111,9 @@ pip install -r requirements.txt
 
 ### Train baseline
 You should modify the [finetune_data_config.yaml](llama_adapter_v2_multimodal7b/finetune_data_config.yaml#L2) to specify the datasets for fine-tuning. 
-The format of datasets refers to [test_v2.json](test_v2.json). 
+The format of datasets refers to [test_llama.json](test_llama.json). 
 
-The pre-trained checkpoint can be downloaded in [ckpts](https://github.com/OpenGVLab/LLaMA-Adapter/releases/tag/v.2.0.0).
+The pre-trained checkpoint can be downloaded in [ckpts](https://github.com/OpenGVLab/LLaMA-Adapter/releases/tag/v.2.0.0). You can choose any one of them.
 
 Then we can train baseline as follows. 
 ```bash
@@ -129,9 +129,9 @@ finetune_data_config.yaml /output/path
 ```bash
 # /path/to/llama_model_weights and /path/to/pre-trained/checkpoint.pth need to be modified by your path
 # make sure you are under ./challenge/llama_adapter_v2_multimodal7b
-python demo.py --llama_dir /path/to/llama_model_weights --checkpoint /path/to/pre-trained/checkpoint.pth --data ../test_v2.json  --output ../llama-adapter-DriveLM.json
+python demo.py --llama_dir /path/to/llama_model_weights --checkpoint /path/to/pre-trained/checkpoint.pth --data ../test_llama.json  --output ../output.json
 ```
-Then we will get the [llama-adapter-DriveLM.json](llama-adapter-DriveLM.json), which are the predicted answers used for evaluation purposes.
+Then we will get the [output.json](output.json), which are the predicted answers used for evaluation purposes.
 
 
 ## How to Eval
@@ -165,26 +165,26 @@ python -c "import language_evaluation; language_evaluation.download('coco')"
 
 We have implemented three types of evaluation methods: Accuracy, ChatGPT Score, Language Evaluation and Match Score. The [final score](evaluation.py#L157) is the weighted average of four metrics.
 
-The inputs required for evaluation are [llama-adapter-DriveLM.json](llama-adapter-DriveLM.json) and [test_v1.json](test_v1.json).
+The inputs required for evaluation are [output.json](output.json) and [test_eval.json](test_eval.json).
 
-1. Replace [root_path1](evaluation.py#L97) with the path of your models' output. The example of models' output can be found in [output](llama-adapter-DriveLM.json).
-2. Replace [root_path2](evaluation.py#L101) with the path of test_v1.json. The example of test_v1.json can be found in [test_v1.json](test_v1.json)
+1. Replace [root_path1](evaluation.py#L97) with the path of your models' output. The example of models' output can be found in [output](output.json).
+2. Replace [root_path2](evaluation.py#L101) with the path of test_eval.json. The example of test_eval.json can be found in [test_eval.json](test_eval.json)
 3. Replace [API-KEY](chatgpt.py#L17) with your own chatGPT api key.
 
 ```bash
-# The following script assumes that you prepare the llama-adapter-DriveLM.json and test_v1.json under ./challenge
+# The following script assumes that you prepare the output.json and test_eval.json under ./challenge
 # make sure you are under ./challenge
-python evaluation.py --root_path1 ./llama-adapter-DriveLM.json --root_path2 ./test_v1.json
+python evaluation.py --root_path1 ./output.json --root_path2 ./test_eval.json
 ```
 
 ### Results
 The zero-shot results of baseline on the sampled data are as follows:
 ```
 accuracy:  0.0
-chatgpt:  78.5
-match:  23.75
-language score:  {'val/Bleu_1': 0.029183757177883535, 'val/Bleu_2': 0.00017003737042789148, 'val/Bleu_3': 3.066026234534233e-05, 'val/Bleu_4': 1.3024512157157705e-05, 'val/ROUGE_L': 0.05928706665796174, 'val/CIDEr': 0.05818698178494484}
-final score:  0.36633034231114403
+chatgpt:  65.11111111111111
+match:  28.25
+language score:  {'val/Bleu_1': 0.0495223110147729, 'val/Bleu_2': 0.00021977465683011536, 'val/Bleu_3': 3.6312541763196866e-05, 'val/Bleu_4': 1.4776149283286042e-05, 'val/ROUGE_L': 0.08383567940883102, 'val/CIDEr': 0.09901486412073952}
+final score:  0.3240234750718823
 ```
 
 
diff --git a/challenge/convert2llama.py b/challenge/convert2llama.py
@@ -42,6 +42,6 @@ def convert2llama(root, dst):
 
 
 if __name__ == '__main__':
-    root = "test_v1.json"
-    dst = "test_v2.json"
+    root = "test_eval.json"
+    dst = "test_llama.json"
     convert2llama(root, dst)
diff --git a/challenge/convert_data.py b/challenge/convert_data.py
@@ -71,5 +71,5 @@ def loop_test(root, dst):
 
 if __name__ == '__main__':
     root = "test.json"
-    dst = "test_v1.json"
+    dst = "test_eval.json"
     loop_test(root, dst)
diff --git a/challenge/evaluation.py b/challenge/evaluation.py
@@ -97,8 +97,8 @@ def forward(self, tag, answer, GT):
 if __name__ == '__main__':
     # get args
     parser = argparse.ArgumentParser(description='Evaluation')
-    parser.add_argument('--root_path1', type=str, default="./llama-adapter-DriveLM.json", help='path to prediction file')
-    parser.add_argument('--root_path2', type=str, default="./test_v1.json", help='path to test file')
+    parser.add_argument('--root_path1', type=str, default="./output.json", help='path to prediction file')
+    parser.add_argument('--root_path2', type=str, default="./test_eval.json", help='path to test file')
     args = parser.parse_args()
 
     with open(args.root_path1, 'r') as f :#, \