add vllm benchmark for spec decode #125

RuBing-Yang · 2025-11-03T12:35:59Z

This pull request introduces a new vLLM backend for benchmarking speculative decoding in the AngelSlim project, alongside several related improvements. The main changes include implementing a modular benchmarking engine for vLLM, updating the engine interface to support multiple backends, and extending the CLI to allow backend selection and configuration. These updates make the benchmarking framework more flexible and extensible for different model serving backends.

vLLM Benchmark Integration:

Added a new vllm benchmarking module with BenchmarkEngine, BenchmarkConfig, and BenchmarkMode classes, supporting Eagle speculative decoding and baseline evaluation, metrics calculation, and output management.
Created an __init__.py for the vLLM benchmark module to expose the new benchmarking classes and added appropriate copyright/license headers.

Engine Abstraction and Backend Selection:

Refactored the main SpecEngine class in engine.py to support backend selection via the deploy_backend parameter, dynamically importing and using either the PyTorch or vLLM benchmarking classes. [1] [2] [3]
Updated SpecEngine methods to use the selected backend's BenchmarkConfig, BenchmarkEngine, and BenchmarkMode throughout the workflow. [1] [2] [3]

CLI and Argument Improvements:

Extended the CLI (tools/spec_benchmark.py) to accept a --deploy-backend argument for choosing between PyTorch and vLLM, and added vLLM-specific and general benchmarking parameters (e.g., --batch-size, --top-p, --speculative-draft-tensor-parallel-size). [1] [2] [3] [4]

Path Handling and Output Consistency:

Improved file path handling in both the PyTorch and vLLM benchmarking code to ensure output and dataset paths are robust and consistent, using project-root-relative paths.

These changes collectively enhance the benchmarking framework's flexibility, maintainability, and usability for both PyTorch and vLLM backends.

RuBing-Yang added 2 commits November 3, 2025 20:33

add vllm benchmark

946deb2

fix copilot suggestions

65553b7

yghstill approved these changes Nov 3, 2025

View reviewed changes

RuBing-Yang merged commit 02eaa65 into Tencent:main Nov 4, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add vllm benchmark for spec decode #125

add vllm benchmark for spec decode #125

Uh oh!

RuBing-Yang commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add vllm benchmark for spec decode #125

add vllm benchmark for spec decode #125

Uh oh!

Conversation

RuBing-Yang commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants