diff --git a/README.md b/README.md
index f13bd21d6..d92aa1742 100644
--- a/README.md
+++ b/README.md
@@ -118,25 +118,26 @@ VLMEvalKit will use a **judge LLM** to extract answer from the output if you set
**Supported PyTorch / HF Models**
-| [**IDEFICS-[9B/80B/v2-8B/v3-8B]-Instruct**](https://huggingface.co/HuggingFaceM4/idefics-9b-instruct)๐
๐๏ธ | [**InstructBLIP-[7B/13B]**](https://github.com/salesforce/LAVIS/blob/main/projects/instructblip/README.md) | [**LLaVA-[v1-7B/v1.5-7B/v1.5-13B]**](https://github.com/haotian-liu/LLaVA) | [**MiniGPT-4-[v1-7B/v1-13B/v2-7B]**](https://github.com/Vision-CAIR/MiniGPT-4) |
-| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
-| [**mPLUG-Owl[2/3]**](https://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2)๐๏ธ | [**OpenFlamingo-v2**](https://github.com/mlfoundations/open_flamingo)๐๏ธ | [**PandaGPT-13B**](https://github.com/yxuansu/PandaGPT) | [**Qwen-VL**](https://huggingface.co/Qwen/Qwen-VL)๐
๐๏ธ
[**Qwen-VL-Chat**](https://huggingface.co/Qwen/Qwen-VL-Chat)๐
๐๏ธ |
-| [**VisualGLM-6B**](https://huggingface.co/THUDM/visualglm-6b)๐
| [**InternLM-XComposer-[1/2]**](https://huggingface.co/internlm/internlm-xcomposer-7b)๐
| [**ShareGPT4V-[7B/13B]**](https://sharegpt4v.github.io)๐
| [**TransCore-M**](https://github.com/PCIResearch/TransCore-M) |
-| [**LLaVA (XTuner)**](https://huggingface.co/xtuner/llava-internlm-7b)๐
| [**CogVLM-[Chat/Llama3]**](https://huggingface.co/THUDM/cogvlm-chat-hf)๐
| [**ShareCaptioner**](https://huggingface.co/spaces/Lin-Chen/Share-Captioner)๐
| [**CogVLM-Grounding-Generalist**](https://huggingface.co/THUDM/cogvlm-grounding-generalist-hf)๐
|
-| [**Monkey**](https://github.com/Yuliang-Liu/Monkey)๐
[**Monkey-Chat**](https://github.com/Yuliang-Liu/Monkey)๐
| [**EMU2-Chat**](https://github.com/baaivision/Emu)๐
๐๏ธ | [**Yi-VL-[6B/34B]**](https://huggingface.co/01-ai/Yi-VL-6B) | [**MMAlaya**](https://huggingface.co/DataCanvas/MMAlaya)๐
|
-| [**InternLM-XComposer-2.5**](https://github.com/InternLM/InternLM-XComposer)๐
๐๏ธ | [**MiniCPM-[V1/V2/V2.5/V2.6]**](https://github.com/OpenBMB/MiniCPM-V)๐
๐๏ธ | [**OmniLMM-12B**](https://huggingface.co/openbmb/OmniLMM-12B) | [**InternVL-Chat-[V1-1/V1-2/V1-5/V2]**](https://github.com/OpenGVLab/InternVL)๐
๐๏ธ |
-| [**DeepSeek-VL**](https://github.com/deepseek-ai/DeepSeek-VL/tree/main)๐๏ธ | [**LLaVA-NeXT**](https://llava-vl.github.io/blog/2024-01-30-llava-next/)๐
๐๏ธ | [**Bunny-Llama3**](https://huggingface.co/BAAI/Bunny-v1_1-Llama-3-8B-V)๐
| [**XVERSE-V-13B**](https://github.com/xverse-ai/XVERSE-V-13B/blob/main/vxverse/models/vxverse.py) |
-| [**PaliGemma-3B**](https://huggingface.co/google/paligemma-3b-pt-448) ๐
| [**360VL-70B**](https://huggingface.co/qihoo360/360VL-70B) ๐
| [**Phi-3-Vision**](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct)๐
๐๏ธ
[**Phi-3.5-Vision**](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)๐
๐๏ธ | [**WeMM**](https://github.com/scenarios/WeMM)๐
|
-| [**GLM-4v-9B**](https://huggingface.co/THUDM/glm-4v-9b) ๐
| [**Cambrian-[8B/13B/34B]**](https://cambrian-mllm.github.io/) | [**LLaVA-Next-[Qwen-32B]**](https://huggingface.co/lmms-lab/llava-next-qwen-32b) ๐๏ธ | [**Chameleon-[7B/30B]**](https://huggingface.co/facebook/chameleon-7b)๐
๐๏ธ |
-| [**Video-LLaVA-7B-[HF]**](https://github.com/PKU-YuanGroup/Video-LLaVA) ๐ฌ | [**VILA1.5-[3B/8B/13B/40B]**](https://github.com/NVlabs/VILA/)๐๏ธ | [**Ovis[1.5-Llama3-8B/1.5-Gemma2-9B/1.6-Gemma2-9B/1.6-Llama3.2-3B/1.6-Gemma2-27B]**](https://github.com/AIDC-AI/Ovis) ๐
๐๏ธ | [**Mantis-8B-[siglip-llama3/clip-llama3/Idefics2/Fuyu]**](https://huggingface.co/TIGER-Lab/Mantis-8B-Idefics2) ๐๏ธ |
-| [**Llama-3-MixSenseV1_1**](https://huggingface.co/Zero-Vision/Llama-3-MixSenseV1_1)๐
| [**Parrot-7B**](https://github.com/AIDC-AI/Parrot) ๐
| [**OmChat-v2.0-13B-sinlge-beta**](https://huggingface.co/omlab/omchat-v2.0-13B-single-beta_hf) ๐
| [**Video-ChatGPT**](https://github.com/mbzuai-oryx/Video-ChatGPT) ๐ฌ |
-| [**Chat-UniVi-7B[-v1.5]**](https://github.com/PKU-YuanGroup/Chat-UniVi) ๐ฌ | [**LLaMA-VID-7B**](https://github.com/dvlab-research/LLaMA-VID) ๐ฌ | [**VideoChat2-HD**](https://huggingface.co/OpenGVLab/VideoChat2_HD_stage4_Mistral_7B) ๐ฌ | [**PLLaVA-[7B/13B/34B]**](https://huggingface.co/ermu2001/pllava-7b) ๐ฌ |
-| [**RBDash_72b**](https://github.com/RBDash-Team/RBDash) ๐
๐๏ธ | [**xgen-mm-phi3-[interleave/dpo]-r-v1.5**](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5) ๐
๐๏ธ | [**Qwen2-VL-[2B/7B/72B]**](https://github.com/QwenLM/Qwen2-VL)๐
๐๏ธ | [**slime_[7b/8b/13b]**](https://github.com/yfzhang114/SliME)๐๏ธ |
+| [**IDEFICS-[9B/80B/v2-8B/v3-8B]-Instruct**](https://huggingface.co/HuggingFaceM4/idefics-9b-instruct)๐
๐๏ธ | [**InstructBLIP-[7B/13B]**](https://github.com/salesforce/LAVIS/blob/main/projects/instructblip/README.md) | [**LLaVA-[v1-7B/v1.5-7B/v1.5-13B]**](https://github.com/haotian-liu/LLaVA) | [**MiniGPT-4-[v1-7B/v1-13B/v2-7B]**](https://github.com/Vision-CAIR/MiniGPT-4) |
+|--------------------------------------------------------------------------------------------------------------------------------------| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [**mPLUG-Owl[2/3]**](https://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2)๐๏ธ | [**OpenFlamingo-v2**](https://github.com/mlfoundations/open_flamingo)๐๏ธ | [**PandaGPT-13B**](https://github.com/yxuansu/PandaGPT) | [**Qwen-VL**](https://huggingface.co/Qwen/Qwen-VL)๐
๐๏ธ
[**Qwen-VL-Chat**](https://huggingface.co/Qwen/Qwen-VL-Chat)๐
๐๏ธ |
+| [**VisualGLM-6B**](https://huggingface.co/THUDM/visualglm-6b)๐
| [**InternLM-XComposer-[1/2]**](https://huggingface.co/internlm/internlm-xcomposer-7b)๐
| [**ShareGPT4V-[7B/13B]**](https://sharegpt4v.github.io)๐
| [**TransCore-M**](https://github.com/PCIResearch/TransCore-M) |
+| [**LLaVA (XTuner)**](https://huggingface.co/xtuner/llava-internlm-7b)๐
| [**CogVLM-[Chat/Llama3]**](https://huggingface.co/THUDM/cogvlm-chat-hf)๐
| [**ShareCaptioner**](https://huggingface.co/spaces/Lin-Chen/Share-Captioner)๐
| [**CogVLM-Grounding-Generalist**](https://huggingface.co/THUDM/cogvlm-grounding-generalist-hf)๐
|
+| [**Monkey**](https://github.com/Yuliang-Liu/Monkey)๐
[**Monkey-Chat**](https://github.com/Yuliang-Liu/Monkey)๐
| [**EMU2-Chat**](https://github.com/baaivision/Emu)๐
๐๏ธ | [**Yi-VL-[6B/34B]**](https://huggingface.co/01-ai/Yi-VL-6B) | [**MMAlaya**](https://huggingface.co/DataCanvas/MMAlaya)๐
|
+| [**InternLM-XComposer-2.5**](https://github.com/InternLM/InternLM-XComposer)๐
๐๏ธ | [**MiniCPM-[V1/V2/V2.5/V2.6]**](https://github.com/OpenBMB/MiniCPM-V)๐
๐๏ธ | [**OmniLMM-12B**](https://huggingface.co/openbmb/OmniLMM-12B) | [**InternVL-Chat-[V1-1/V1-2/V1-5/V2]**](https://github.com/OpenGVLab/InternVL)๐
๐๏ธ |
+| [**DeepSeek-VL**](https://github.com/deepseek-ai/DeepSeek-VL/tree/main)๐๏ธ | [**LLaVA-NeXT**](https://llava-vl.github.io/blog/2024-01-30-llava-next/)๐
๐๏ธ | [**Bunny-Llama3**](https://huggingface.co/BAAI/Bunny-v1_1-Llama-3-8B-V)๐
| [**XVERSE-V-13B**](https://github.com/xverse-ai/XVERSE-V-13B/blob/main/vxverse/models/vxverse.py) |
+| [**PaliGemma-3B**](https://huggingface.co/google/paligemma-3b-pt-448) ๐
| [**360VL-70B**](https://huggingface.co/qihoo360/360VL-70B) ๐
| [**Phi-3-Vision**](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct)๐
๐๏ธ
[**Phi-3.5-Vision**](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)๐
๐๏ธ | [**WeMM**](https://github.com/scenarios/WeMM)๐
|
+| [**GLM-4v-9B**](https://huggingface.co/THUDM/glm-4v-9b) ๐
| [**Cambrian-[8B/13B/34B]**](https://cambrian-mllm.github.io/) | [**LLaVA-Next-[Qwen-32B]**](https://huggingface.co/lmms-lab/llava-next-qwen-32b) ๐๏ธ | [**Chameleon-[7B/30B]**](https://huggingface.co/facebook/chameleon-7b)๐
๐๏ธ |
+| [**Video-LLaVA-7B-[HF]**](https://github.com/PKU-YuanGroup/Video-LLaVA) ๐ฌ | [**VILA1.5-[3B/8B/13B/40B]**](https://github.com/NVlabs/VILA/)๐๏ธ | [**Ovis[1.5-Llama3-8B/1.5-Gemma2-9B/1.6-Gemma2-9B/1.6-Llama3.2-3B/1.6-Gemma2-27B]**](https://github.com/AIDC-AI/Ovis) ๐
๐๏ธ | [**Mantis-8B-[siglip-llama3/clip-llama3/Idefics2/Fuyu]**](https://huggingface.co/TIGER-Lab/Mantis-8B-Idefics2) ๐๏ธ |
+| [**Llama-3-MixSenseV1_1**](https://huggingface.co/Zero-Vision/Llama-3-MixSenseV1_1)๐
| [**Parrot-7B**](https://github.com/AIDC-AI/Parrot) ๐
| [**OmChat-v2.0-13B-sinlge-beta**](https://huggingface.co/omlab/omchat-v2.0-13B-single-beta_hf) ๐
| [**Video-ChatGPT**](https://github.com/mbzuai-oryx/Video-ChatGPT) ๐ฌ |
+| [**Chat-UniVi-7B[-v1.5]**](https://github.com/PKU-YuanGroup/Chat-UniVi) ๐ฌ | [**LLaMA-VID-7B**](https://github.com/dvlab-research/LLaMA-VID) ๐ฌ | [**VideoChat2-HD**](https://huggingface.co/OpenGVLab/VideoChat2_HD_stage4_Mistral_7B) ๐ฌ | [**PLLaVA-[7B/13B/34B]**](https://huggingface.co/ermu2001/pllava-7b) ๐ฌ |
+| [**RBDash_72b**](https://github.com/RBDash-Team/RBDash) ๐
๐๏ธ | [**xgen-mm-phi3-[interleave/dpo]-r-v1.5**](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5) ๐
๐๏ธ | [**Qwen2-VL-[2B/7B/72B]**](https://github.com/QwenLM/Qwen2-VL)๐
๐๏ธ | [**slime_[7b/8b/13b]**](https://github.com/yfzhang114/SliME)๐๏ธ |
| [**Eagle-X4-[8B/13B]**](https://github.com/NVlabs/EAGLE)๐
๐๏ธ,
[**Eagle-X5-[7B/13B/34B]**](https://github.com/NVlabs/EAGLE)๐
๐๏ธ | [**Moondream1**](https://github.com/vikhyat/moondream)๐
,
[**Moondream2**](https://github.com/vikhyat/moondream)๐
| [**XinYuan-VL-2B-Instruct**](https://huggingface.co/Cylingo/Xinyuan-VL-2B)๐
๐๏ธ | [**Llama-3.2-[11B/90B]-Vision-Instruct**](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)๐
|
-| [**Kosmos2**](https://huggingface.co/microsoft/kosmos-2-patch14-224)๐
| [**H2OVL-Mississippi-[0.8B/2B]**](https://huggingface.co/h2oai/h2ovl-mississippi-2b)๐
๐๏ธ | [**Pixtral-12B**](https://huggingface.co/mistralai/Pixtral-12B-2409)๐๏ธ | [**Falcon2-VLM-11B**](https://huggingface.co/tiiuae/falcon-11B-vlm)๐
|
-| [**MiniMonkey**](https://huggingface.co/mx262/MiniMonkey)๐
๐๏ธ | [**LLaVA-OneVision**](https://huggingface.co/lmms-lab/llava-onevision-qwen2-72b-ov-sft)๐
๐๏ธ | [**LLaVA-Video**](https://huggingface.co/collections/lmms-lab/llava-video-661e86f5e8dabc3ff793c944)๐
๐๏ธ | [**Aquila-VL-2B**](https://huggingface.co/BAAI/Aquila-VL-2B-llava-qwen)๐
๐๏ธ |
-| [**Mini-InternVL-Chat-[2B/4B]-V1-5**](https://github.com/OpenGVLab/InternVL)๐
๐๏ธ | [**InternVL2 Series**](https://huggingface.co/OpenGVLab/InternVL2-8B) ๐
๐๏ธ | [**Janus-1.3B**](https://huggingface.co/deepseek-ai/Janus-1.3B)๐
๐๏ธ | [**molmoE-1B/molmo-7B/molmo-72B**](https://huggingface.co/allenai/Molmo-7B-D-0924)๐
|
-| [**Points-[Yi-1.5-9B/Qwen-2.5-7B]**](https://huggingface.co/WePOINTS/POINTS-Yi-1-5-9B-Chat)๐
| [**NVLM**](https://huggingface.co/nvidia/NVLM-D-72B)๐
| [**VIntern**](https://huggingface.co/5CD-AI/Vintern-3B-beta)๐
๐๏ธ | [**Aria**](https://huggingface.co/rhymes-ai/Aria)๐
๐๏ธ |
+| [**Kosmos2**](https://huggingface.co/microsoft/kosmos-2-patch14-224)๐
| [**H2OVL-Mississippi-[0.8B/2B]**](https://huggingface.co/h2oai/h2ovl-mississippi-2b)๐
๐๏ธ | [**Pixtral-12B**](https://huggingface.co/mistralai/Pixtral-12B-2409)๐๏ธ | [**Falcon2-VLM-11B**](https://huggingface.co/tiiuae/falcon-11B-vlm)๐
|
+| [**MiniMonkey**](https://huggingface.co/mx262/MiniMonkey)๐
๐๏ธ | [**LLaVA-OneVision**](https://huggingface.co/lmms-lab/llava-onevision-qwen2-72b-ov-sft)๐
๐๏ธ | [**LLaVA-Video**](https://huggingface.co/collections/lmms-lab/llava-video-661e86f5e8dabc3ff793c944)๐
๐๏ธ | [**Aquila-VL-2B**](https://huggingface.co/BAAI/Aquila-VL-2B-llava-qwen)๐
๐๏ธ |
+| [**Mini-InternVL-Chat-[2B/4B]-V1-5**](https://github.com/OpenGVLab/InternVL)๐
๐๏ธ | [**InternVL2 Series**](https://huggingface.co/OpenGVLab/InternVL2-8B) ๐
๐๏ธ | [**Janus-1.3B**](https://huggingface.co/deepseek-ai/Janus-1.3B)๐
๐๏ธ | [**molmoE-1B/molmo-7B/molmo-72B**](https://huggingface.co/allenai/Molmo-7B-D-0924)๐
|
+| [**Points-[Yi-1.5-9B/Qwen-2.5-7B]**](https://huggingface.co/WePOINTS/POINTS-Yi-1-5-9B-Chat)๐
| [**NVLM**](https://huggingface.co/nvidia/NVLM-D-72B)๐
| [**VIntern**](https://huggingface.co/5CD-AI/Vintern-3B-beta)๐
๐๏ธ | [**Aria**](https://huggingface.co/rhymes-ai/Aria)๐
๐๏ธ |
+| [**VARCO-VISION-14B**](https://huggingface.co/NCSOFT/VARCO-VISION-14B-HF)๐
| | | |
๐๏ธ: Support multiple images as inputs.
diff --git a/docs/zh-CN/README_zh-CN.md b/docs/zh-CN/README_zh-CN.md
index 8e8ae1e95..71d693262 100644
--- a/docs/zh-CN/README_zh-CN.md
+++ b/docs/zh-CN/README_zh-CN.md
@@ -128,6 +128,8 @@ $$^1$$ VLMEvalKit ๅจ่ฏๆต้็ๅฎๆนไปฃ็ ๅบไธญ่ขซไฝฟ็จ
| **[MiniMonkey](https://huggingface.co/mx262/MiniMonkey)**๐
๐๏ธ | **[LLaVA-OneVision](https://huggingface.co/lmms-lab/llava-onevision-qwen2-72b-ov-sft)**๐
๐๏ธ | **[LLaVA-Video](https://huggingface.co/collections/lmms-lab/llava-video-661e86f5e8dabc3ff793c944)**๐
๐๏ธ | **[Aquila-VL-2B](https://huggingface.co/BAAI/Aquila-VL-2B-llava-qwen)**๐
๐๏ธ |
| [**Mini-InternVL-Chat-[2B/4B]-V1-5**](https://github.com/OpenGVLab/InternVL)๐
๐๏ธ | **[InternVL2 Series](https://huggingface.co/OpenGVLab/InternVL2-8B)** ๐
๐๏ธ | **[Janus-1.3B](https://huggingface.co/deepseek-ai/Janus-1.3B)**๐
๐๏ธ | **[molmoE-1B/molmo-7B/molmo-72B](https://huggingface.co/allenai/Molmo-7B-D-0924)**๐
|
| **[Points-[Yi-1.5-9B/Qwen-2.5-7B]](https://huggingface.co/WePOINTS/POINTS-Yi-1-5-9B-Chat)**๐
| **[NVLM](https://huggingface.co/nvidia/NVLM-D-72B)**๐
| **[VIntern](https://huggingface.co/5CD-AI/Vintern-3B-beta)**๐
๐๏ธ | **[Aria](https://huggingface.co/rhymes-ai/Aria)**๐
๐๏ธ |
+| [**VARCO-VISION-14B**](https://huggingface.co/NCSOFT/VARCO-VISION-14B-HF)๐
| | | |
+
๐๏ธ ่กจ็คบๆฏๆๅคๅพ็่พๅ
ฅใ
diff --git a/vlmeval/config.py b/vlmeval/config.py
index 36eb53948..6c346b57b 100644
--- a/vlmeval/config.py
+++ b/vlmeval/config.py
@@ -165,6 +165,7 @@
'Aquila-VL-2B': partial(LLaVA_OneVision, model_path='BAAI/Aquila-VL-2B-llava-qwen'),
'llava_video_qwen2_7b':partial(LLaVA_OneVision, model_path='lmms-lab/LLaVA-Video-7B-Qwen2'),
'llava_video_qwen2_72b':partial(LLaVA_OneVision, model_path='lmms-lab/LLaVA-Video-72B-Qwen2'),
+ 'varco-vision-hf':partial(LLaVA_OneVision_HF, model_path='NCSOFT/VARCO-VISION-14B-HF'),
}
internvl_series = {