|
5 | 5 | <a href="https://rank.opencompass.org.cn/leaderboard-multimodal">🏆 Learderboard </a> •
|
6 | 6 | <a href="#-datasets-models-and-evaluation-results">📊Datasets & Models </a> •
|
7 | 7 | <a href="#%EF%B8%8F-quickstart">🏗️Quickstart </a> •
|
8 |
| -<a href="#%EF%B8%8F-custom-benchmark-or-vlm">🛠️Support New </a> • |
| 8 | +<a href="#%EF%B8%8F-development-guide">🛠️Development </a> • |
9 | 9 | <a href="#-the-goal-of-vlmevalkit">🎯Goal </a> •
|
10 | 10 | <a href="#%EF%B8%8F-citation">🖊️Citation </a>
|
11 | 11 | </div>
|
@@ -100,69 +100,11 @@ print(ret) # There are two apples in the provided images.
|
100 | 100 |
|
101 | 101 | ## 🏗️ QuickStart
|
102 | 102 |
|
103 |
| -Before running the evaluation script, you need to **configure** the VLMs and set the model_paths properly. |
| 103 | +See [QuickStart](/QuickStart.md) for a quick start guide. |
104 | 104 |
|
105 |
| -After that, you can use a single script `run.py` to inference and evaluate multiple VLMs and benchmarks at a same time. |
| 105 | +## 🛠️ Development Guide |
106 | 106 |
|
107 |
| -### Step0. Installation |
108 |
| - |
109 |
| -```bash |
110 |
| -git clone https://github.com/open-compass/VLMEvalKit.git |
111 |
| -cd VLMEvalKit |
112 |
| -pip install -e . |
113 |
| -``` |
114 |
| - |
115 |
| -### Step1. Configuration |
116 |
| - |
117 |
| -**VLM Configuration**: All VLMs are configured in `vlmeval/config.py`, for some VLMs, you need to configure the code root (MiniGPT-4, PandaGPT, etc.) or the model_weight root (LLaVA-v1-7B, etc.) before conducting the evaluation. During evaluation, you should use the model name specified in `supported_VLM` in `vlmeval/config.py` to select the VLM. For MiniGPT-4 and InstructBLIP, you also need to modify the config files in `vlmeval/vlm/misc` to configure LLM path and ckpt path. |
118 |
| - |
119 |
| -Following VLMs require the configuration step: |
120 |
| - |
121 |
| -**Code Preparation & Installation**: InstructBLIP ([LAVIS](https://github.com/salesforce/LAVIS)), LLaVA ([LLaVA](https://github.com/haotian-liu/LLaVA)), MiniGPT-4 ([MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4)), mPLUG-Owl2 ([mPLUG-Owl2](https://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2)), OpenFlamingo-v2 ([OpenFlamingo](https://github.com/mlfoundations/open_flamingo)), PandaGPT-13B ([PandaGPT](https://github.com/yxuansu/PandaGPT)), TransCore-M ([TransCore-M](https://github.com/PCIResearch/TransCore-M)). |
122 |
| - |
123 |
| -**Manual Weight Preparation & Configuration**: InstructBLIP, LLaVA-v1-7B, MiniGPT-4, PandaGPT-13B |
124 |
| - |
125 |
| -### Step2. Evaluation |
126 |
| - |
127 |
| -We use `run.py` for evaluation. To use the script, you can use `$VLMEvalKit/run.py` or create a soft-link of the script (to use the script anywhere): |
128 |
| - |
129 |
| -**Arguments** |
130 |
| - |
131 |
| -- `--data (list[str])`: Set the dataset names that are supported in VLMEvalKit (defined in `vlmeval/utils/dataset_config.py`). |
132 |
| -- `--model (list[str])`: Set the VLM names that are supported in VLMEvalKit (defined in `supported_VLM` in `vlmeval/config.py`). |
133 |
| -- `--mode (str, default to 'all', choices are ['all', 'infer'])`: When `mode` set to "all", will perform both inference and evaluation; when set to "infer", will only perform the inference. |
134 |
| -- `--nproc (int, default to 4)`: The number of threads for OpenAI API calling. |
135 |
| - |
136 |
| -**Command** |
137 |
| - |
138 |
| -You can run the script with `python` or `torchrun`: |
139 |
| - |
140 |
| -```bash |
141 |
| -# When running with `python`, only one VLM instance is instantiated, and it might use multiple GPUs (depending on its default behavior). |
142 |
| -# That is recommended for evaluating very large VLMs (like IDEFICS-80B-Instruct). |
143 |
| - |
144 |
| -# IDEFICS-80B-Instruct on MMBench_DEV_EN, MME, and SEEDBench_IMG, Inference and Evalution |
145 |
| -python run.py --data MMBench_DEV_EN MME SEEDBench_IMG --model idefics_80b_instruct --verbose |
146 |
| -# IDEFICS-80B-Instruct on MMBench_DEV_EN, MME, and SEEDBench_IMG, Inference only |
147 |
| -python run.py --data MMBench_DEV_EN MME SEEDBench_IMG --model idefics_80b_instruct --verbose --mode infer |
148 |
| - |
149 |
| -# When running with `torchrun`, one VLM instance is instantiated on each GPU. It can speed up the inference. |
150 |
| -# However, that is only suitable for VLMs that consume small amounts of GPU memory. |
151 |
| - |
152 |
| -# IDEFICS-9B-Instruct, Qwen-VL-Chat, mPLUG-Owl2 on MMBench_DEV_EN, MME, and SEEDBench_IMG. On a node with 8 GPU. Inference and Evaluation. |
153 |
| -torchrun --nproc-per-node=8 run.py --data MMBench_DEV_EN MME SEEDBench_IMG --model idefics_80b_instruct qwen_chat mPLUG-Owl2 --verbose |
154 |
| -# Qwen-VL-Chat on MME. On a node with 2 GPU. Inference and Evaluation. |
155 |
| -torchrun --nproc-per-node=2 run.py --data MME --model qwen_chat --verbose |
156 |
| -``` |
157 |
| - |
158 |
| -The evaluation results will be printed as logs, besides. **Result Files** will also be generated in the directory `$YOUR_WORKING_DIRECTORY/{model_name}`. Files ending with `.csv` contain the evaluated metrics. |
159 |
| - |
160 |
| -## 🛠️ Custom Benchmark or VLM |
161 |
| - |
162 |
| -To implement a custom benchmark or VLM in **VLMEvalKit**, please refer to [Custom_Benchmark_and_Model](/Custom_Benchmark_and_Model.md). Example PRs to follow: |
163 |
| - |
164 |
| -- [**New Model**] Support Monkey ([#45](https://github.com/open-compass/VLMEvalKit/pull/45/files)) |
165 |
| -- [**New Benchmark**] Support AI2D ([#51](https://github.com/open-compass/VLMEvalKit/pull/51/files)) |
| 107 | +To develop custom benchmarks, VLMs, or simply contribute other codes to **VLMEvalKit**, please refer to [Development_Guide](/Development.md). |
166 | 108 |
|
167 | 109 | ## 🎯 The Goal of VLMEvalKit
|
168 | 110 |
|
|
0 commit comments