update

Guo Chen · Guo Chen · commit 31f149ea08f4 · 2025-04-19T01:58:51.000+08:00
diff --git a/index.html b/index.html
@@ -912,6 +912,13 @@ <h1 data-aos="fade-down" data-aos-duration="1000">
 
         <section class="description_noborder">
             <div class="description-content" data-aos="fade-up" data-aos-duration="1000">
+                <!-- We introduce <strong><span style="color: #76b900;" class="highlight-box">Eagle 2.5</span></strong>, a frontier vision-language models (VLMs) for long-context multimodal learning. 
+                Our work addresses the challenges in long video comprehension and high-resolution image understanding, introducing a generalist framework for both tasks. 
+                The proposed training framework incorporates Automatic Degrade Sampling and Image Area Preservation, two techniques that preserve contextual integrity and visual details. 
+                The framework also includes numerous efficiency optimizations in the pipeline for long-context data training. 
+                Finally, we propose <strong><span style="color: #76b900;" class="highlight-box">Eagle-Video-110K</span></strong>, a novel dataset that integrates both story-level and clip-level annotations, facilitating long-video understanding. 
+                <strong><span style="color: #76b900;" class="highlight-box">Eagle 2.5</span></strong> demonstrates substantial improvements on long-context multimodal benchmarks, providing a robust solution to the limitations of existing VLMs. 
+                Notably, our best model <strong><span style="color: #76b900;" class="highlight-box">Eagle 2.5-8B</span></strong> achieves 72.4% on Video-MME with 512 input frames, matching the results of top-tier commercial model such as GPT-4o and large-scale open-source models like Qwen2.5-VL-72B and InternVL2.5-78B. -->
                 <p style="margin-bottom: 20px; display: flex; align-items: flex-start;">
                     <img src="asset/ps3_ikon.svg" width="50" height="50" style="flex-shrink: 0; margin-right: 25px; margin-top: 4px; filter: invert(55%) sepia(96%) saturate(1747%) hue-rotate(42deg) brightness(97%) contrast(106%);" class="pulse-animation">
                     <span><strong><span style="color: #76b900;" class="highlight-box">Eagle 2.5</span></strong> is a versatile multimodal model designed to efficiently process <strong>extensive contextual information</strong> with consistent performance scaling as input length increases.</span>
@@ -925,7 +932,7 @@ <h1 data-aos="fade-down" data-aos-duration="1000">
                     <span><strong><span style="color: #76b900;">Progressive training</span></strong> incrementally expands context length during training, enhancing the model's ability to process <strong>inputs of varying sizes</strong>.</span>
                 </p>
                 <p style="margin-bottom: 20px; margin-left: 5px; display: flex; align-items: flex-start;">
-                    <img src="asset/4kpro_ikon.svg" width="40" height="40" style="flex-shrink: 0; margin-right: 30px; margin-top: 4px; filter: invert(55%) sepia(96%) saturate(1747%) hue-rotate(42deg) brightness(97%) contrast(106%);">
+                    <i class="fas fa-database" style="flex-shrink: 0; margin-right: 30px; margin-top: 4px; font-size: 40px; color: #76b900;"></i>
                     <span><strong><span style="color: #76b900;">Eagle-Video-110K</span></strong> is a diverse video dataset with <strong>dual annotation approaches</strong> for comprehensive long-form understanding.</span>
                 </p>
             </div>