Skip to content

Conversation

@RuBing-Yang
Copy link
Collaborator

This pull request introduces a new vLLM backend for benchmarking speculative decoding in the AngelSlim project, alongside several related improvements. The main changes include implementing a modular benchmarking engine for vLLM, updating the engine interface to support multiple backends, and extending the CLI to allow backend selection and configuration. These updates make the benchmarking framework more flexible and extensible for different model serving backends.

vLLM Benchmark Integration:

  • Added a new vllm benchmarking module with BenchmarkEngine, BenchmarkConfig, and BenchmarkMode classes, supporting Eagle speculative decoding and baseline evaluation, metrics calculation, and output management.
  • Created an __init__.py for the vLLM benchmark module to expose the new benchmarking classes and added appropriate copyright/license headers.

Engine Abstraction and Backend Selection:

  • Refactored the main SpecEngine class in engine.py to support backend selection via the deploy_backend parameter, dynamically importing and using either the PyTorch or vLLM benchmarking classes. [1] [2] [3]
  • Updated SpecEngine methods to use the selected backend's BenchmarkConfig, BenchmarkEngine, and BenchmarkMode throughout the workflow. [1] [2] [3]

CLI and Argument Improvements:

  • Extended the CLI (tools/spec_benchmark.py) to accept a --deploy-backend argument for choosing between PyTorch and vLLM, and added vLLM-specific and general benchmarking parameters (e.g., --batch-size, --top-p, --speculative-draft-tensor-parallel-size). [1] [2] [3] [4]

Path Handling and Output Consistency:

  • Improved file path handling in both the PyTorch and vLLM benchmarking code to ensure output and dataset paths are robust and consistent, using project-root-relative paths.

These changes collectively enhance the benchmarking framework's flexibility, maintainability, and usability for both PyTorch and vLLM backends.

@RuBing-Yang RuBing-Yang merged commit 02eaa65 into Tencent:main Nov 4, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants