You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Refactor] hide the video dataset related args (#675)
* [Refactor] merge the video dataset related args into config json and each dataset inside
* fix the concat dataset problem
* update the build_model_from_config with empty dict
* add supported_video_datasets function for quick start
* update on result_file_name problem
* fix lint
* update configSystem doc and quickStart doc
Copy file name to clipboardExpand all lines: docs/en/ConfigSystem.md
+15-5
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Config System
2
2
3
-
By default, VLMEvalKit launches the evaluation by setting the model name(s) (defined in `/vlmeval/config.py`) and dataset name(s) (defined in `vlmeval/dataset/__init__.py`) in the `run.py` script with the `--model` and `--data` arguments. Such approach is simple and efficient in most scenarios, however, it may not be flexible enough when the user wants to evaluate multiple models / datasets with different settings.
3
+
By default, VLMEvalKit launches the evaluation by setting the model name(s) (defined in `/vlmeval/config.py`) and dataset name(s) (defined in `vlmeval/dataset/__init__.py` or `vlmeval/dataset/video_dataset_config.py`) in the `run.py` script with the `--model` and `--data` arguments. Such approach is simple and efficient in most scenarios, however, it may not be flexible enough when the user wants to evaluate multiple models / datasets with different settings.
4
4
5
5
To address this, VLMEvalKit provides a more flexible config system. The user can specify the model and dataset settings in a json file, and pass the path to the config file to the `run.py` script with the `--config` argument. Here is a sample config json:
6
6
@@ -18,7 +18,8 @@ To address this, VLMEvalKit provides a more flexible config system. The user can
18
18
"model": "gpt-4o-2024-08-06",
19
19
"temperature": 1.0,
20
20
"img_detail": "low"
21
-
}
21
+
},
22
+
"GPT4o_20241120": {}
22
23
},
23
24
"data": {
24
25
"MME-RealWorld-Lite": {
@@ -28,7 +29,14 @@ To address this, VLMEvalKit provides a more flexible config system. The user can
28
29
"MMBench_DEV_EN_V11": {
29
30
"class": "ImageMCQDataset",
30
31
"dataset": "MMBench_DEV_EN_V11"
31
-
}
32
+
},
33
+
"MMBench_Video_8frame_nopack":{},
34
+
"Video-MME_16frame_subs": {
35
+
"class": "VideoMME",
36
+
"dataset": "Video-MME",
37
+
"nframe": 16,
38
+
"use_subtitle": true
39
+
},
32
40
}
33
41
}
34
42
```
@@ -39,10 +47,11 @@ Explanation of the config json:
39
47
2. For items in `model`, the value is a dictionary containing the following keys:
40
48
-`class`: The class name of the model, which should be a class name defined in `vlmeval/vlm/__init__.py` (open-source models) or `vlmeval/api/__init__.py` (API models).
41
49
- Other kwargs: Other kwargs are model-specific parameters, please refer to the definition of the model class for detailed usage. For example, `model`, `temperature`, `img_detail` are arguments of the `GPT4V` class. It's noteworthy that the `model` argument is required by most model classes.
50
+
- Tip: The defined model in the `supported_VLM` of `vlmeval/config.py` can be used as a shortcut, for example, `GPT4o_20241120: {}` is equivalent to `GPT4o_20241120: {'class': 'GPT4V', 'model': 'gpt-4o-2024-11-20', 'temperature': 0, 'img_size': -1, 'img_detail': 'high', 'retry': 10, 'verbose': False}`
42
51
3. For the dictionary `data`, we suggest users to use the official dataset name as the key (or part of the key), since we frequently determine the post-processing / judging settings based on the dataset name. For items in `data`, the value is a dictionary containing the following keys:
43
52
-`class`: The class name of the dataset, which should be a class name defined in `vlmeval/dataset/__init__.py`.
44
-
- Other kwargs: Other kwargs are dataset-specific parameters, please refer to the definition of the dataset class for detailed usage. Typically, the `dataset` argument is required by most dataset classes.
45
-
53
+
- Other kwargs: Other kwargs are dataset-specific parameters, please refer to the definition of the dataset class for detailed usage. Typically, the `dataset` argument is required by most dataset classes. It's noteworthy that the `nframe` argument or `fps` argument is required by most video dataset classes.
54
+
- Tip: The defined dataset in the `supported_video_datasets` of `vlmeval/dataset/video_dataset_config.py` can be used as a shortcut, for example, `MMBench_Video_8frame_nopack: {}` is equivalent to `MMBench_Video_8frame_nopack: {'class': 'MMBenchVideo', 'dataset': 'MMBench-Video', 'nframe': 8, 'pack': False}`.
46
55
Saving the example config json to `config.json`, you can launch the evaluation by:
47
56
48
57
```bash
@@ -55,3 +64,4 @@ That will generate the following output files under the working directory `$WORK
Copy file name to clipboardExpand all lines: docs/en/Quickstart.md
+4-6
Original file line number
Diff line number
Diff line change
@@ -68,8 +68,6 @@ We use `run.py` for evaluation. To use the script, you can use `$VLMEvalKit/run.
68
68
-`--mode (str, default to 'all', choices are ['all', 'infer'])`: When `mode` set to "all", will perform both inference and evaluation; when set to "infer", will only perform the inference.
69
69
-`--nproc (int, default to 4)`: The number of threads for OpenAI API calling.
70
70
-`--work-dir (str, default to '.')`: The directory to save evaluation results.
71
-
-`--nframe (int, default to 8)`: The number of frames to sample from a video, only applicable to the evaluation of video benchmarks.
72
-
-`--pack (bool, store_true)`: A video may associate with multiple questions, if `pack==True`, will ask all questions for a video in a single query.
# IDEFICS2-8B on MMBench-Video, with 8 frames as inputs and vanilla evaluation. On a node with 8 GPUs. MMBench_Video_8frame_nopack is a defined dataset setting in `vlmeval/dataset/video_dataset_config.py`.
The evaluation results will be printed as logs, besides. **Result Files** will also be generated in the directory `$YOUR_WORKING_DIRECTORY/{model_name}`. Files ending with `.csv` contain the evaluated metrics.
0 commit comments