Skip to content

Commit

Permalink
Merge branch 'EvolvingLMMs-Lab:main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
99Franklin authored Mar 3, 2025
2 parents a628d51 + 9310d89 commit 5cb2d6c
Show file tree
Hide file tree
Showing 77 changed files with 2,962 additions and 151 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
submodules: true
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v4
with:
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,5 @@ VATEX/
lmms_eval/tasks/vatex/__pycache__/utils.cpython-310.pyc
lmms_eval/tasks/mlvu/__pycache__/utils.cpython-310.pyc

scripts/
scripts/
.env
15 changes: 7 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,18 +20,14 @@

## Annoucement

- [2025-02] 🚀🚀 We have integrated [`vllm`](https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/544) into our models, enabling accelerated evaluation for both multimodal and language models. Additionally, we have incorporated [`openai_compatible`](https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/546) to support the evaluation of any API-based model that follows the OpenAI API format. Check the usages [here](https://github.com/EvolvingLMMs-Lab/lmms-eval/tree/main/miscs/model_dryruns).

- [2025-01] 🎓🎓 We have released our new benchmark: [Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos](https://arxiv.org/abs/2501.13826). Please refer to the [project page](https://videommmu.github.io/) for more details.

- [2024-12] 🎉🎉 We have presented [MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs](https://arxiv.org/pdf/2411.15296), jointly with [MME Team](https://github.com/BradyFU/Video-MME) and [OpenCompass Team](https://github.com/open-compass).

- [2024-11] 🔈🔊 The `lmms-eval/v0.3.0` has been upgraded to support audio evaluations for audio models like Qwen2-Audio and Gemini-Audio across tasks such as AIR-Bench, Clotho-AQA, LibriSpeech, and more. Please refer to the [blog](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/lmms-eval-0.3.md) for more details!

- [2024-07] 🎉🎉 We have released the [technical report](https://arxiv.org/abs/2407.12772) and [LiveBench](https://huggingface.co/spaces/lmms-lab/LiveBench)!

- [2024-06] 🎬🎬 The `lmms-eval/v0.2.0` has been upgraded to support video evaluations for video models like LLaVA-NeXT Video and Gemini 1.5 Pro across tasks such as EgoSchema, PerceptionTest, VideoMME, and more. Please refer to the [blog](https://lmms-lab.github.io/posts/lmms-eval-0.2/) for more details!

- [2024-03] 📝📝 We have released the first version of `lmms-eval`, please refer to the [blog](https://lmms-lab.github.io/posts/lmms-eval-0.1/) for more details!

<details>
<summary>We warmly welcome contributions from the open-source community! Below is a chronological list of recent tasks, models, and features added by our amazing contributors. </summary>

Expand All @@ -42,6 +38,9 @@
- [2024-09] ⚙️️⚙️️️️ We upgrade `lmms-eval` to `0.2.3` with more tasks and features. We support a compact set of language tasks evaluations (code credit to [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)), and we remove the registration logic at start (for all models and tasks) to reduce the overhead. Now `lmms-eval` only launches necessary tasks/models. Please check the [release notes](https://github.com/EvolvingLMMs-Lab/lmms-eval/releases/tag/v0.2.3) for more details.
- [2024-08] 🎉🎉 We welcome the new model [LLaVA-OneVision](https://huggingface.co/papers/2408.03326), [Mantis](https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/162), new tasks [MVBench](https://huggingface.co/datasets/OpenGVLab/MVBench), [LongVideoBench](https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/117), [MMStar](https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/158). We provide new feature of SGlang Runtime API for llava-onevision model, please refer the [doc](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/commands.md) for inference acceleration
- [2024-07] 👨‍💻👨‍💻 The `lmms-eval/v0.2.1` has been upgraded to support more models, including [LongVA](https://github.com/EvolvingLMMs-Lab/LongVA), [InternVL-2](https://github.com/OpenGVLab/InternVL), [VILA](https://github.com/NVlabs/VILA), and many more evaluation tasks, e.g. [Details Captions](https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/136), [MLVU](https://arxiv.org/abs/2406.04264), [WildVision-Bench](https://huggingface.co/datasets/WildVision/wildvision-arena-data), [VITATECS](https://github.com/lscpku/VITATECS) and [LLaVA-Interleave-Bench](https://llava-vl.github.io/blog/2024-06-16-llava-next-interleave/).
- [2024-07] 🎉🎉 We have released the [technical report](https://arxiv.org/abs/2407.12772) and [LiveBench](https://huggingface.co/spaces/lmms-lab/LiveBench)!
- [2024-06] 🎬🎬 The `lmms-eval/v0.2.0` has been upgraded to support video evaluations for video models like LLaVA-NeXT Video and Gemini 1.5 Pro across tasks such as EgoSchema, PerceptionTest, VideoMME, and more. Please refer to the [blog](https://lmms-lab.github.io/posts/lmms-eval-0.2/) for more details!
- [2024-03] 📝📝 We have released the first version of `lmms-eval`, please refer to the [blog](https://lmms-lab.github.io/posts/lmms-eval-0.1/) for more details!

</details>

Expand Down Expand Up @@ -194,8 +193,8 @@ python3 -m accelerate.commands.launch --num_processes=8 -m lmms_eval --config ./
**Evaluation of video model (llava-next-video-32B)**
```bash
accelerate launch --num_processes 8 --main_process_port 12345 -m lmms_eval \
--model llavavid \
--model_args pretrained=lmms-lab/LLaVA-NeXT-Video-32B-Qwen,conv_template=qwen_1_5,video_decode_backend=decord,max_frames_num=32mm_spatial_pool_mode=average,mm_newline_position=grid,mm_resampler_location=after \
--model llava_vid \
--model_args pretrained=lmms-lab/LLaVA-NeXT-Video-32B-Qwen,conv_template=qwen_1_5,video_decode_backend=decord,max_frames_num=32,mm_spatial_pool_mode=average,mm_newline_position=grid,mm_resampler_location=after \
--tasks videomme \
--batch_size 1 \
--log_samples \
Expand Down
24 changes: 14 additions & 10 deletions lmms_eval/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
logger.add(sys.stdout, level="WARNING")

AVAILABLE_MODELS = {
"aria": "Aria",
"auroracap": "AuroraCap",
"batch_gpt4": "BatchGPT4",
"claude": "Claude",
Expand All @@ -21,44 +22,47 @@
"gpt4v": "GPT4V",
"idefics2": "Idefics2",
"instructblip": "InstructBLIP",
"internvideo2": "InternVideo2",
"internvl": "InternVLChat",
"internvl2": "InternVL2",
"llama_vid": "LLaMAVid",
"llama_vision": "LlamaVision",
"llava": "Llava",
"llava_hf": "LlavaHf",
"llava_onevision": "Llava_OneVision",
"llava_onevision_moviechat": "Llava_OneVision_MovieChat",
"llava_sglang": "LlavaSglang",
"llava_vid": "LlavaVid",
"slime": "Slime",
"longva": "LongVA",
"mantis": "Mantis",
"minicpm_v": "MiniCPM_V",
"minimonkey": "MiniMonkey",
"moviechat": "MovieChat",
"mplug_owl_video": "mplug_Owl",
"ola": "Ola",
"openai_compatible": "OpenAICompatible",
"oryx": "Oryx",
"phi3v": "Phi3v",
"qwen_vl": "Qwen_VL",
"qwen2_vl": "Qwen2_VL",
"qwen2_5_vl": "Qwen2_5_VL",
"qwen2_5_vl_interleave": "Qwen2_5_VL_Interleave",
"qwen2_audio": "Qwen2_Audio",
"qwen2_vl": "Qwen2_VL",
"qwen_vl": "Qwen_VL",
"qwen_vl_api": "Qwen_VL_API",
"reka": "Reka",
"ross": "Ross",
"slime": "Slime",
"srt_api": "SRT_API",
"tinyllava": "TinyLlava",
"videoChatGPT": "VideoChatGPT",
"videochat2": "VideoChat2",
"video_llava": "VideoLLaVA",
"vila": "VILA",
"vita": "VITA",
"vllm": "VLLM",
"xcomposer2_4KHD": "XComposer2_4KHD",
"internvideo2": "InternVideo2",
"xcomposer2d5": "XComposer2D5",
"oryx": "Oryx",
"videochat2": "VideoChat2",
"llama_vision": "LlamaVision",
"aria": "Aria",
"ross": "Ross",
"vita": "VITA",
"egogpt": "EgoGPT",
}


Expand Down
6 changes: 3 additions & 3 deletions lmms_eval/models/aria.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,12 +106,12 @@ def __init__(
elif accelerator.num_processes == 1 and device_map == "auto":
eval_logger.info(f"Using {accelerator.num_processes} devices with pipeline parallelism")
self._rank = 0
self._word_size = 1
self._world_size = 1
else:
eval_logger.info(f"Using single device: {self._device}")
self.model.to(self._device)
self._rank = 0
self._word_size = 1
self._world_size = 1
self.accelerator = accelerator

@property
Expand Down Expand Up @@ -303,7 +303,7 @@ def _collate(x):
"""
keywords = [
"Answer:",
"answer is:", "choice is:", "option is:",
"answer is:", "choice is:", "option is:",
"Answer is:", "Choice is:", "Option is:",
"answer is", "choice is", "option is",
"Answer is", "Choice is", "Option is"
Expand Down
2 changes: 1 addition & 1 deletion lmms_eval/models/auroracap.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ def __init__(
else:
self.model.to(self._device)
self._rank = 0
self._word_size = 1
self._world_size = 1

# For Video Caption
self.video_decode_backend = video_decode_backend
Expand Down
2 changes: 1 addition & 1 deletion lmms_eval/models/cogvlm2.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def __init__(
self._world_size = self.accelerator.num_processes
else:
self._rank = 0
self._word_size = 1
self._world_size = 1

@property
def config(self):
Expand Down
Loading

0 comments on commit 5cb2d6c

Please sign in to comment.