Skip to content

Commit

Permalink
Merge pull request #28 from DevLinyan/main
Browse files Browse the repository at this point in the history
Fix unclear part
  • Loading branch information
ChonghaoSima committed Mar 5, 2024
2 parents a6f6b2d + e74eca8 commit f3a760b
Show file tree
Hide file tree
Showing 13 changed files with 3,549 additions and 660 deletions.
34 changes: 17 additions & 17 deletions challenge/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,17 +73,17 @@ Transform the obtained test.json data into the required test format.
# make sure you are under ./challenge
python convert_data.py
```
Then we will get the test_v1.json in challenge folder. The example of test_v1.json can be found in [test_v1.json](test_v1.json)
Then we will get the test_eval.json in challenge folder. The example of test_eval.json can be found in [test_eval.json](test_eval.json)

We use llama-adapter v2 as our baseline. If you want to convert data into llama-adapter format:
```bash
# The following script assumes that you prepare the test_v1.json under ./challenge
# The following script assumes that you prepare the test_eval.json under ./challenge
# make sure you are under ./challenge
python convert2llama.py
```
Then we will get the test_v2.json in challenge folder. The example of test_v2.json can be found in [test_v2.json](test_v2.json)
Then we will get the test_llama.json in challenge folder. The example of test_llama.json can be found in [test_llama.json](test_llama.json)

[test_v1.json](test_v1.json) is used for evaluation. [test_v2.json](test_v2.json) is used for training and inference of the baseline.
[test_eval.json](test_eval.json) is used for evaluation. [test_llama.json](test_llama.json) is used for training and inference of the baseline.

## How to run baseline

Expand Down Expand Up @@ -111,9 +111,9 @@ pip install -r requirements.txt

### Train baseline
You should modify the [finetune_data_config.yaml](llama_adapter_v2_multimodal7b/finetune_data_config.yaml#L2) to specify the datasets for fine-tuning.
The format of datasets refers to [test_v2.json](test_v2.json).
The format of datasets refers to [test_llama.json](test_llama.json).

The pre-trained checkpoint can be downloaded in [ckpts](https://github.com/OpenGVLab/LLaMA-Adapter/releases/tag/v.2.0.0).
The pre-trained checkpoint can be downloaded in [ckpts](https://github.com/OpenGVLab/LLaMA-Adapter/releases/tag/v.2.0.0). You can choose any one of them.

Then we can train baseline as follows.
```bash
Expand All @@ -129,9 +129,9 @@ finetune_data_config.yaml /output/path
```bash
# /path/to/llama_model_weights and /path/to/pre-trained/checkpoint.pth need to be modified by your path
# make sure you are under ./challenge/llama_adapter_v2_multimodal7b
python demo.py --llama_dir /path/to/llama_model_weights --checkpoint /path/to/pre-trained/checkpoint.pth --data ../test_v2.json --output ../llama-adapter-DriveLM.json
python demo.py --llama_dir /path/to/llama_model_weights --checkpoint /path/to/pre-trained/checkpoint.pth --data ../test_llama.json --output ../output.json
```
Then we will get the [llama-adapter-DriveLM.json](llama-adapter-DriveLM.json), which are the predicted answers used for evaluation purposes.
Then we will get the [output.json](output.json), which are the predicted answers used for evaluation purposes.


## How to Eval
Expand Down Expand Up @@ -165,26 +165,26 @@ python -c "import language_evaluation; language_evaluation.download('coco')"

We have implemented three types of evaluation methods: Accuracy, ChatGPT Score, Language Evaluation and Match Score. The [final score](evaluation.py#L157) is the weighted average of four metrics.

The inputs required for evaluation are [llama-adapter-DriveLM.json](llama-adapter-DriveLM.json) and [test_v1.json](test_v1.json).
The inputs required for evaluation are [output.json](output.json) and [test_eval.json](test_eval.json).

1. Replace [root_path1](evaluation.py#L97) with the path of your models' output. The example of models' output can be found in [output](llama-adapter-DriveLM.json).
2. Replace [root_path2](evaluation.py#L101) with the path of test_v1.json. The example of test_v1.json can be found in [test_v1.json](test_v1.json)
1. Replace [root_path1](evaluation.py#L97) with the path of your models' output. The example of models' output can be found in [output](output.json).
2. Replace [root_path2](evaluation.py#L101) with the path of test_eval.json. The example of test_eval.json can be found in [test_eval.json](test_eval.json)
3. Replace [API-KEY](chatgpt.py#L17) with your own chatGPT api key.

```bash
# The following script assumes that you prepare the llama-adapter-DriveLM.json and test_v1.json under ./challenge
# The following script assumes that you prepare the output.json and test_eval.json under ./challenge
# make sure you are under ./challenge
python evaluation.py --root_path1 ./llama-adapter-DriveLM.json --root_path2 ./test_v1.json
python evaluation.py --root_path1 ./output.json --root_path2 ./test_eval.json
```

### Results
The zero-shot results of baseline on the sampled data are as follows:
```
accuracy: 0.0
chatgpt: 78.5
match: 23.75
language score: {'val/Bleu_1': 0.029183757177883535, 'val/Bleu_2': 0.00017003737042789148, 'val/Bleu_3': 3.066026234534233e-05, 'val/Bleu_4': 1.3024512157157705e-05, 'val/ROUGE_L': 0.05928706665796174, 'val/CIDEr': 0.05818698178494484}
final score: 0.36633034231114403
chatgpt: 65.11111111111111
match: 28.25
language score: {'val/Bleu_1': 0.0495223110147729, 'val/Bleu_2': 0.00021977465683011536, 'val/Bleu_3': 3.6312541763196866e-05, 'val/Bleu_4': 1.4776149283286042e-05, 'val/ROUGE_L': 0.08383567940883102, 'val/CIDEr': 0.09901486412073952}
final score: 0.3240234750718823
```


4 changes: 2 additions & 2 deletions challenge/convert2llama.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,6 @@ def convert2llama(root, dst):


if __name__ == '__main__':
root = "test_v1.json"
dst = "test_v2.json"
root = "test_eval.json"
dst = "test_llama.json"
convert2llama(root, dst)
2 changes: 1 addition & 1 deletion challenge/convert_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,5 +71,5 @@ def loop_test(root, dst):

if __name__ == '__main__':
root = "test.json"
dst = "test_v1.json"
dst = "test_eval.json"
loop_test(root, dst)
4 changes: 2 additions & 2 deletions challenge/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,8 @@ def forward(self, tag, answer, GT):
if __name__ == '__main__':
# get args
parser = argparse.ArgumentParser(description='Evaluation')
parser.add_argument('--root_path1', type=str, default="./llama-adapter-DriveLM.json", help='path to prediction file')
parser.add_argument('--root_path2', type=str, default="./test_v1.json", help='path to test file')
parser.add_argument('--root_path1', type=str, default="./output.json", help='path to prediction file')
parser.add_argument('--root_path2', type=str, default="./test_eval.json", help='path to test file')
args = parser.parse_args()

with open(args.root_path1, 'r') as f :#, \
Expand Down
Loading

0 comments on commit f3a760b

Please sign in to comment.