Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Speculative decoding] Alignment Speculative decoding vs Continuous batching results with many requests #1171

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

iefode
Copy link
Contributor

@iefode iefode commented Nov 7, 2024

Details:

  • First token generation by main model to align with cb_results
  • Set benchmark sampling params for SD to average per prompt
  • Debug Improvements

Tickets:

  • CVS-157577
  • CVS-156391

@github-actions github-actions bot added category: continuous batching Continuous batching category: sampling Sampling / Decoding algorithms category: speculative decoding Speculative decoding labels Nov 7, 2024
@github-actions github-actions bot removed the category: continuous batching Continuous batching label Nov 7, 2024
@iefode iefode removed the category: sampling Sampling / Decoding algorithms label Nov 7, 2024
@ilya-lavrenov ilya-lavrenov added this to the 2025.0 milestone Nov 11, 2024
@ilya-lavrenov ilya-lavrenov self-assigned this Nov 11, 2024
@github-actions github-actions bot added category: continuous batching Continuous batching category: sampling Sampling / Decoding algorithms category: GenAI C++ API Changes in GenAI C++ public headers labels Nov 19, 2024
@iefode
Copy link
Contributor Author

iefode commented Nov 19, 2024

@ilya-lavrenov Please take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: continuous batching Continuous batching category: GenAI C++ API Changes in GenAI C++ public headers category: sampling Sampling / Decoding algorithms category: speculative decoding Speculative decoding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants