@@ -31,70 +31,9 @@ PS: 日本語の README には最新のアップデートがすべて含まれ
31
31
32
32
[ ** OpenVLM Leaderboard** ] ( https://huggingface.co/spaces/opencompass/open_vlm_leaderboard ) : [ すべての詳細な結果をダウンロード] ( http://opencompass.openxlab.space/assets/OpenVLM.json ) 。
33
33
34
- ** Supported Image Understanding Dataset**
35
-
36
- - デフォルトでは、すべての評価結果は[ ** OpenVLM Leaderboard** ] ( https://huggingface.co/spaces/opencompass/open_vlm_leaderboard ) に表示されます。
37
-
38
- | データセット | データセット名 (run.py用) | タスク | データセット | データセット名 (run.py用) | タスク |
39
- | ------------------------------------------------------------ | ------------------------------------------------------ | --------- | --------- | --------- | --------- |
40
- | [ ** MMBench シリーズ** ] ( https://github.com/open-compass/mmbench/ ) : <br >MMBench, MMBench-CN, CCBench | MMBench\_ DEV\_ [ EN/CN] <br >MMBench\_ TEST\_ [ EN/CN] <br >MMBench\_ DEV\_ [ EN/CN] \_ V11<br >MMBench\_ TEST\_ [ EN/CN] \_ V11<br >CCBench | 多肢選択問題 (MCQ) | [ ** MMStar** ] ( https://github.com/MMStar-Benchmark/MMStar ) | MMStar | MCQ |
41
- | [ ** MME** ] ( https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation ) | MME | はい/いいえ (Y/N) | [ ** SEEDBench シリーズ** ] ( https://github.com/AILab-CVC/SEED-Bench ) | SEEDBench_IMG <br >SEEDBench2 <br >SEEDBench2_Plus | MCQ |
42
- | [ ** MM-Vet** ] ( https://github.com/yuweihao/MM-Vet ) | MMVet | VQA | [ ** MMMU** ] ( https://mmmu-benchmark.github.io ) | MMMU_ [ DEV_VAL/TEST] | MCQ |
43
- | [ ** MathVista** ] ( https://mathvista.github.io ) | MathVista_MINI | VQA | [ ** ScienceQA_IMG** ] ( https://scienceqa.github.io ) | ScienceQA_ [ VAL/TEST] | MCQ |
44
- | [ ** COCO Caption** ] ( https://cocodataset.org ) | COCO_VAL | キャプション | [ ** HallusionBench** ] ( https://github.com/tianyi-lab/HallusionBench ) | HallusionBench | Y/N |
45
- | [ ** OCRVQA** ] ( https://ocr-vqa.github.io ) * | OCRVQA_ [ TESTCORE/TEST] | VQA | [ ** TextVQA** ] ( https://textvqa.org ) * | TextVQA_VAL | VQA |
46
- | [ ** ChartQA** ] ( https://github.com/vis-nlp/ChartQA ) * | ChartQA_TEST | VQA | [ ** AI2D** ] ( https://allenai.org/data/diagrams ) | AI2D_ [ TEST/TEST_NO_MASK] | MCQ |
47
- | [ ** LLaVABench** ] ( https://huggingface.co/datasets/liuhaotian/llava-bench-in-the-wild ) | LLaVABench | VQA | [ ** DocVQA** ] ( https://www.docvqa.org ) + | DocVQA_ [ VAL/TEST] | VQA |
48
- | [ ** InfoVQA** ] ( https://www.docvqa.org/datasets/infographicvqa ) + | InfoVQA_ [ VAL/TEST] | VQA | [ ** OCRBench** ] ( https://github.com/Yuliang-Liu/MultimodalOCR ) | OCRBench | VQA |
49
- | [ ** RealWorldQA** ] ( https://x.ai/blog/grok-1.5v ) | RealWorldQA | MCQ | [ ** POPE** ] ( https://github.com/AoiDragon/POPE ) | POPE | Y/N |
50
- | [ ** Core-MM** ] ( https://github.com/core-mm/core-mm ) - | CORE_MM | VQA | [ ** MMT-Bench** ] ( https://mmt-bench.github.io ) | MMT-Bench_ [ VAL/VAL_MI/ALL/ALL_MI] | MCQ |
51
- | [ ** MLLMGuard** ] ( https://github.com/Carol-gutianle/MLLMGuard ) - | MLLMGuard_DS | VQA | [ ** AesBench** ] ( https://github.com/yipoh/AesBench ) | AesBench_ [ VAL/TEST] | MCQ |
52
- | [ ** VCR-wiki** ] ( https://huggingface.co/vcr-org/ ) + | VCR\_ [ EN/ZH] \_ [ EASY/HARD] _ [ ALL/500/100] | VQA | [ ** MMLongBench-Doc** ] ( https://mayubo2333.github.io/MMLongBench-Doc/ ) + | MMLongBench_DOC | VQA |
53
- | [ ** BLINK** ] ( https://zeyofu.github.io/blink/ ) + | BLINK | MCQ | [ ** MathVision** ] ( https://mathvision-cuhk.github.io ) + | MathVision<br >MathVision_MINI | VQA |
54
- | [ ** MT-VQA** ] ( https://github.com/bytedance/MTVQA ) + | MTVQA_TEST | VQA | [ ** MMDU** ] ( https://liuziyu77.github.io/MMDU/ ) + | MMDU | VQA (multi-turn) |
55
- | [ ** Q-Bench1** ] ( https://github.com/Q-Future/Q-Bench ) + | Q-Bench1_ [ VAL/TEST] | MCQ | [ ** A-Bench** ] ( https://github.com/Q-Future/A-Bench ) + | A-Bench_ [ VAL/TEST] | MCQ |
56
- | [ ** TaskMeAnything ImageQA Random** ] ( https://huggingface.co/datasets/weikaih/TaskMeAnything-v1-imageqa-random ) + | TaskMeAnything_v1_imageqa_random | MCQ | | | |
57
-
58
- ** \* ** ゼロショット設定で合理的な結果を出せないVLMの一部の評価結果のみを提供しています
59
-
60
- ** \+ ** 評価結果はまだ利用できません
61
-
62
- ** \- ** VLMEvalKitでは推論のみがサポートされています
63
-
64
- VLMEvalKitは、キーを設定すると** 判定LLM** を使用して出力から回答を抽出し、それ以外の場合は** 正確なマッチング** モード(出力文字列で「はい」、「いいえ」、「A」、「B」、「C」...を検索)を使用します。** 正確なマッチングは、はい/いいえのタスクと多肢選択問題にのみ適用できます。**
65
-
66
- ** Supported Video Understanding Dataset**
67
-
68
- | Dataset | Dataset Names (for run.py) | Task | Dataset | Dataset Names (for run.py) | Task |
69
- | ---------------------------------------------------- | -------------------------- | ---- | --------------------------------------------- | -------------------------- | ---- |
70
- | [ ** MMBench-Video** ] ( https://mmbench-video.github.io ) | MMBench-Video | VQA | [ ** Video-MME** ] ( https://video-mme.github.io/ ) | Video-MME | MCQ |
71
-
72
- ** Supported API Models**
73
-
74
- | [ ** GPT-4v (20231106, 20240409)** ] ( https://platform.openai.com/docs/guides/vision ) 🎞️🚅 | [ ** GPT-4o** ] ( https://openai.com/index/hello-gpt-4o/ ) 🎞️🚅 | [ ** Gemini-1.0-Pro** ] ( https://platform.openai.com/docs/guides/vision ) 🎞️🚅 | [ ** Gemini-1.5-Pro** ] ( https://platform.openai.com/docs/guides/vision ) 🎞️🚅 | [ ** Step-1V** ] ( https://www.stepfun.com/#step1v ) 🎞️🚅 |
75
- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------- |
76
- | [ ** Reka-[ Edge / Flash / Core] ** ] ( https://www.reka.ai ) 🚅 | [ ** Qwen-VL-[ Plus / Max] ** ] ( https://huggingface.co/spaces/Qwen/Qwen-VL-Max ) 🎞️🚅 | [ ** Claude-3v-[ Haiku / Sonnet / Opus] ** ] ( https://www.anthropic.com/news/claude-3-family ) 🎞️🚅 | [ ** GLM-4v** ] ( https://open.bigmodel.cn/dev/howuse/glm4v ) 🚅 | [ ** CongRong** ] ( https://mllm.cloudwalk.com/web ) 🎞️🚅 |
77
- | [ ** Claude3.5-Sonnet** ] ( https://www.anthropic.com/news/claude-3-5-sonnet ) 🎞️🚅 | [ ** GPT-4o-Mini** ] ( https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/ ) 🎞️🚅 | [ ** Yi-Vision** ] ( https://platform.lingyiwanwu.com ) 🎞️🚅 | [ ** Hunyuan-Vision** ] ( https://cloud.tencent.com/document/product/1729 ) 🎞️🚅 | [ ** BlueLM-V** ] ( https://developers.vivo.com/ ) 🎞️🚅 |
78
-
79
- ** Supported PyTorch / HF Models**
80
-
81
- | [ ** IDEFICS-[ 9B/80B/v2-8B] -Instruct** ] ( https://huggingface.co/HuggingFaceM4/idefics-9b-instruct ) 🎞️🚅 | [ ** InstructBLIP-[ 7B/13B] ** ] ( https://github.com/salesforce/LAVIS/blob/main/projects/instructblip/README.md ) | [ ** LLaVA-[ v1-7B/v1.5-7B/v1.5-13B] ** ] ( https://github.com/haotian-liu/LLaVA ) | [ ** MiniGPT-4-[ v1-7B/v1-13B/v2-7B] ** ] ( https://github.com/Vision-CAIR/MiniGPT-4 ) |
82
- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
83
- | [ ** mPLUG-Owl2** ] ( https://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2 ) 🎞️ | [ ** OpenFlamingo-v2** ] ( https://github.com/mlfoundations/open_flamingo ) 🎞️ | [ ** PandaGPT-13B** ] ( https://github.com/yxuansu/PandaGPT ) | [ ** Qwen-VL** ] ( https://huggingface.co/Qwen/Qwen-VL ) 🎞️🚅, [ ** Qwen-VL-Chat** ] ( https://huggingface.co/Qwen/Qwen-VL-Chat ) 🎞️** 🚅** |
84
- | [ ** VisualGLM-6B** ] ( https://huggingface.co/THUDM/visualglm-6b ) 🚅 | [ ** InternLM-XComposer-[ 1/2] ** ] ( https://huggingface.co/internlm/internlm-xcomposer-7b ) 🚅 | [ ** ShareGPT4V-[ 7B/13B] ** ] ( https://sharegpt4v.github.io ) 🚅 | [ ** TransCore-M** ] ( https://github.com/PCIResearch/TransCore-M ) |
85
- | [ ** LLaVA (XTuner)** ] ( https://huggingface.co/xtuner/llava-internlm-7b ) 🚅 | [ ** CogVLM-[ Chat/Llama3] ** ] ( https://huggingface.co/THUDM/cogvlm-chat-hf ) 🚅 | [ ** ShareCaptioner** ] ( https://huggingface.co/spaces/Lin-Chen/Share-Captioner ) 🚅 | [ ** CogVLM-Grounding-Generalist** ] ( https://huggingface.co/THUDM/cogvlm-grounding-generalist-hf ) 🚅 |
86
- | [ ** Monkey** ] ( https://github.com/Yuliang-Liu/Monkey ) 🚅, [ ** Monkey-Chat** ] ( https://github.com/Yuliang-Liu/Monkey ) 🚅 | [ ** EMU2-Chat** ] ( https://github.com/baaivision/Emu ) 🚅🎞️ | [ ** Yi-VL-[ 6B/34B] ** ] ( https://huggingface.co/01-ai/Yi-VL-6B ) | [ ** MMAlaya** ] ( https://huggingface.co/DataCanvas/MMAlaya ) 🚅 |
87
- | [ ** InternLM-XComposer-2.5** ] ( https://github.com/InternLM/InternLM-XComposer ) 🚅🎞️ | [ ** MiniCPM-[ V1/V2/V2.5/V2.6] ** ] ( https://github.com/OpenBMB/MiniCPM-V ) 🚅🎞️ | [ ** OmniLMM-12B** ] ( https://huggingface.co/openbmb/OmniLMM-12B ) | [ ** InternVL-Chat-[ V1-1/V1-2/V1-5/V2] ** ] ( https://github.com/OpenGVLab/InternVL ) 🚅🎞️, <br >[ ** Mini-InternVL-Chat-[ 2B/4B] -V1-5** ] ( https://github.com/OpenGVLab/InternVL ) 🚅🎞️ |
88
- | [ ** DeepSeek-VL** ] ( https://github.com/deepseek-ai/DeepSeek-VL/tree/main ) 🎞️ | [ ** LLaVA-NeXT** ] ( https://llava-vl.github.io/blog/2024-01-30-llava-next/ ) 🚅🎞️ | [ ** Bunny-Llama3** ] ( https://huggingface.co/BAAI/Bunny-v1_1-Llama-3-8B-V ) 🚅 | [ ** XVERSE-V-13B** ] ( https://github.com/xverse-ai/XVERSE-V-13B/blob/main/vxverse/models/vxverse.py ) |
89
- | [ ** PaliGemma-3B** ] ( https://huggingface.co/google/paligemma-3b-pt-448 ) 🚅 | [ ** 360VL-70B** ] ( https://huggingface.co/qihoo360/360VL-70B ) 🚅 | [ ** Phi-3-Vision** ] ( https://huggingface.co/microsoft/Phi-3-vision-128k-instruct ) 🚅 | [ ** WeMM** ] ( https://github.com/scenarios/WeMM ) 🚅 |
90
- | [ ** GLM-4v-9B** ] ( https://huggingface.co/THUDM/glm-4v-9b ) 🚅 | [ ** Cambrian-[ 8B/13B/34B] ** ] ( https://cambrian-mllm.github.io/ ) | [ ** LLaVA-Next-[ Qwen-32B] ** ] ( https://huggingface.co/lmms-lab/llava-next-qwen-32b ) 🎞️ | [ ** Chameleon-[ 7B/30B] ** ] ( https://huggingface.co/facebook/chameleon-7b ) 🚅🎞️ |
91
- | [ ** Video-LLaVA-7B-[ HF] ** ] ( https://github.com/PKU-YuanGroup/Video-LLaVA ) 🎬 | [ ** VILA1.5-[ 8B/13B/40B] ** ] ( https://github.com/NVlabs/VILA/ ) 🎞️ | [ ** Ovis1.5-Llama3-8B** ] ( https://github.com/AIDC-AI/Ovis ) 🚅🎞 | [ ** Mantis-8B-[ siglip-llama3/clip-llama3/Idefics2/Fuyu] ** ] ( https://huggingface.co/TIGER-Lab/Mantis-8B-Idefics2 ) 🎞️ |
92
-
93
- 🎞️: 複数の画像を入力としてサポートします。
94
-
95
- 🚅: 追加の設定/操作なしで使用できるモデルです。
96
-
97
- 🎬: 入力としてビデオをサポート。
34
+ ** Supported Benchmarks** in [ ** VLMEvalKit Features** ] ( https://aicarrier.feishu.cn/wiki/Qp7wwSzQ9iK1Y6kNUJVcr6zTnPe?table=tblsdEpLieDoCxtb ) を確認して、すべてのサポートされているベンチマーク(70以上)を表示してください。
35
+
36
+ ** Supported LMMs** in [ ** VLMEvalKit Features** ] ( https://aicarrier.feishu.cn/wiki/Qp7wwSzQ9iK1Y6kNUJVcr6zTnPe?table=tblsdEpLieDoCxtb ) を確認して、すべてのサポートされている LMMs(200以上)を表示してください。
98
37
99
38
** Transformersバージョンの推奨事項:**
100
39
0 commit comments