Skip to content

Commit 31f149e

Browse files
author
Guo Chen
committed
update
1 parent bbdb0e7 commit 31f149e

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

index.html

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -912,6 +912,13 @@ <h1 data-aos="fade-down" data-aos-duration="1000">
912912

913913
<section class="description_noborder">
914914
<div class="description-content" data-aos="fade-up" data-aos-duration="1000">
915+
<!-- We introduce <strong><span style="color: #76b900;" class="highlight-box">Eagle 2.5</span></strong>, a frontier vision-language models (VLMs) for long-context multimodal learning.
916+
Our work addresses the challenges in long video comprehension and high-resolution image understanding, introducing a generalist framework for both tasks.
917+
The proposed training framework incorporates Automatic Degrade Sampling and Image Area Preservation, two techniques that preserve contextual integrity and visual details.
918+
The framework also includes numerous efficiency optimizations in the pipeline for long-context data training.
919+
Finally, we propose <strong><span style="color: #76b900;" class="highlight-box">Eagle-Video-110K</span></strong>, a novel dataset that integrates both story-level and clip-level annotations, facilitating long-video understanding.
920+
<strong><span style="color: #76b900;" class="highlight-box">Eagle 2.5</span></strong> demonstrates substantial improvements on long-context multimodal benchmarks, providing a robust solution to the limitations of existing VLMs.
921+
Notably, our best model <strong><span style="color: #76b900;" class="highlight-box">Eagle 2.5-8B</span></strong> achieves 72.4% on Video-MME with 512 input frames, matching the results of top-tier commercial model such as GPT-4o and large-scale open-source models like Qwen2.5-VL-72B and InternVL2.5-78B. -->
915922
<p style="margin-bottom: 20px; display: flex; align-items: flex-start;">
916923
<img src="asset/ps3_ikon.svg" width="50" height="50" style="flex-shrink: 0; margin-right: 25px; margin-top: 4px; filter: invert(55%) sepia(96%) saturate(1747%) hue-rotate(42deg) brightness(97%) contrast(106%);" class="pulse-animation">
917924
<span><strong><span style="color: #76b900;" class="highlight-box">Eagle 2.5</span></strong> is a versatile multimodal model designed to efficiently process <strong>extensive contextual information</strong> with consistent performance scaling as input length increases.</span>
@@ -925,7 +932,7 @@ <h1 data-aos="fade-down" data-aos-duration="1000">
925932
<span><strong><span style="color: #76b900;">Progressive training</span></strong> incrementally expands context length during training, enhancing the model's ability to process <strong>inputs of varying sizes</strong>.</span>
926933
</p>
927934
<p style="margin-bottom: 20px; margin-left: 5px; display: flex; align-items: flex-start;">
928-
<img src="asset/4kpro_ikon.svg" width="40" height="40" style="flex-shrink: 0; margin-right: 30px; margin-top: 4px; filter: invert(55%) sepia(96%) saturate(1747%) hue-rotate(42deg) brightness(97%) contrast(106%);">
935+
<i class="fas fa-database" style="flex-shrink: 0; margin-right: 30px; margin-top: 4px; font-size: 40px; color: #76b900;"></i>
929936
<span><strong><span style="color: #76b900;">Eagle-Video-110K</span></strong> is a diverse video dataset with <strong>dual annotation approaches</strong> for comprehensive long-form understanding.</span>
930937
</p>
931938
</div>

0 commit comments

Comments
 (0)