Feat/eagle-support #263

SiqiLi-Fighting · 2025-10-23T03:31:30Z

No description provided.

gemini-code-assist · 2025-10-23T03:32:41Z

Summary of Changes

Hello @SiqiLi-Fighting, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a major new feature: Multi-Token Prediction (MTP) within the EAGLE speculative decoding framework for SGLang. The primary objective is to significantly boost inference throughput for large language models by enabling them to predict and verify multiple tokens concurrently, moving beyond the limitations of sequential single-token generation. This involves a comprehensive set of changes, including extending the attention mechanism to support custom, non-causal masks essential for parallel verification, and integrating a dedicated EAGLE speculative decoding worker. The modifications touch core components such as the scheduler, model executor, and attention backends, ensuring a robust and configurable implementation of this advanced decoding strategy.

Highlights

Multi-Token Prediction (MTP) RFC: A comprehensive Request for Comments (RFC) document has been added, detailing the proposal for implementing Multi-Token Prediction (MTP) as an enhancement to the existing EAGLE speculative decoding algorithm. This RFC outlines the motivation, goals, design, and implementation plan for MTP, aiming to significantly improve inference throughput.
EAGLE Speculative Decoding Core Logic: New files and extensive modifications introduce the core logic for EAGLE speculative decoding. This includes the definition of EagleDraftInput and EagleVerifyInput dataclasses, functions for managing cache locations, building the speculative tree structure, and implementing token verification algorithms like verify_tree_greedy and tree_speculative_sampling_target_only.
Custom Attention Mask Support: The FlashAttention kernel and backend have been significantly extended to support custom attention masks and a causal parameter. This is a crucial change that enables non-causal attention patterns, which are necessary for parallel verification of multiple speculative tokens in the EAGLE framework.
Modular Speculative Algorithm Framework: A new SpeculativeAlgorithm enum has been introduced, providing a structured way to define and manage different speculative decoding strategies (e.g., EAGLE, EAGLE3, STANDALONE). This allows for flexible selection and integration of various speculative decoding approaches.
Configurable Speculative Decoding Parameters: Numerous new command-line arguments have been added to server_args.py, allowing users to configure various aspects of speculative decoding, including the chosen algorithm, draft model path, number of speculative steps, EAGLE top-k value, number of draft tokens, and acceptance thresholds.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/release-pypi.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

* add llama eagle3 model file * fix padding bug * fix some padding problem * rm some debug log

* qwen eagle3 * rm log

SiqiLi-Fighting requested a review from jimoosciuc October 23, 2025 03:31

SiqiLi-Fighting force-pushed the feat/eagle-support-rebase branch from 91a2644 to dde5c16 Compare October 23, 2025 04:24

feat/eagle-support

e4474e2

SiqiLi-Fighting force-pushed the feat/eagle-support-rebase branch from dde5c16 to e4474e2 Compare October 23, 2025 04:25

SiqiLi-Fighting and others added 5 commits October 23, 2025 16:02

fix cache loc bug

29b9069

fix: cu_q_len and cu_kv_len error (#30)

743505a

fix: multi batch run panic (#31)

4c92553

fix: seq len bug (#32)

7d9ca03

two prompt ok

fac0b3e

SiqiLi-Fighting force-pushed the feat/eagle-support-rebase branch from adb2610 to fac0b3e Compare October 23, 2025 18:32

Iamleos and others added 4 commits October 25, 2025 16:01

fix: memory leak (#34)

b6bf41f

fix: non-greedy sample bug (#35)

2a0af12

Feat/eagle support eagle3 (#36)

46fb96c

* add llama eagle3 model file * fix padding bug * fix some padding problem * rm some debug log

Feat/eagle support eagle3 (#37)

36d236f

* qwen eagle3 * rm log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/eagle-support #263

Feat/eagle-support #263

Uh oh!

SiqiLi-Fighting commented Oct 23, 2025

Uh oh!

gemini-code-assist bot commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feat/eagle-support #263

Are you sure you want to change the base?

Feat/eagle-support #263

Uh oh!

Conversation

SiqiLi-Fighting commented Oct 23, 2025

Uh oh!

gemini-code-assist bot commented Oct 23, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants