Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add support for attention score output #11365

Open
1 task done
WoutDeRijck opened this issue Dec 20, 2024 · 5 comments
Open
1 task done

[Feature]: Add support for attention score output #11365

WoutDeRijck opened this issue Dec 20, 2024 · 5 comments

Comments

@WoutDeRijck
Copy link

WoutDeRijck commented Dec 20, 2024

🚀 The feature, motivation and pitch

Problem

vLLM currently doesn't provide access to attention scores during inference, which are essential for model analysis and interpretability research. #11862

Feature Request

Add the ability to retrieve attention scores during model inference, similar to HuggingFace's output_attentions=True parameter.

Motivation

Need to analyze token-level relationships in model outputs
Required for building visualization tools and debugging model behavior
Critical for research into attention mechanisms

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@Dineshkumar-Anandan-ZS0367

You are asking output_attentions=True or return_cross_attentions=True for getting coordinates right.

These only given by vision encoder decoder models or cross encoder models.

Which model?

@WoutDeRijck
Copy link
Author

WoutDeRijck commented Jan 9, 2025

I don't mean to get coordinates. I am using Llama-3.1-8b, let's say I want to extract data out of the input context, then I need the attention scores to be able to visualize where the model is looking. (Pure text-based, no vision)

These are ofcourse also present in decoder-only models.

@Dineshkumar-Anandan-ZS0367

Apologise by mistakes. I have integrated score using tensor logits already. Thanks!

@WoutDeRijck
Copy link
Author

I do not need the logits as well. I need the attention scores.

@HuiSiqi
Copy link

HuiSiqi commented Jan 15, 2025

Any update of this? I also need to visualize the attention scores of decoder-based models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants