Inference with RadixAttention rather than Self-Attention，but output weirdly #1960

walker-ai · 2024-11-08T09:38:09Z

walker-ai
Nov 8, 2024

I'm trying to adjust my own LLM inference code, I have replaced the self-attention with RadixAttention, and some necessary components. But I found that the output is weird:

For example, when I gave "who are you" and "Which city is the capital of China?" as prompts, it went:

or

My model is official weight from meta-llama2-7b-chat, and my inference code is based on pytorch-llama.

It really confused me. I'm not sure if it's a problem with attention calculation. Anyone can help me to figure out what's going on inside here. 🙏

walker-ai · 2024-11-08T09:43:54Z

walker-ai
Nov 8, 2024
Author

@merrymercy Would you do me a favor, thank you for your time! 🙏

0 replies

merrymercy · 2024-11-13T18:53:36Z

merrymercy
Nov 13, 2024
Maintainer

You need to setup many things like kv cache correctly. You can compare it with sglang code .

1 reply

walker-ai Nov 15, 2024
Author

You need to setup many things like kv cache correctly. You can compare it with sglang code .

But I was able to get normal output with radixattention, and from the results it just looks like poor quality answers with tons of duplicates. Does this have nothing to do with the attn calculation, but just with the model or sample method and parameter settings?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference with RadixAttention rather than Self-Attention，but output weirdly #1960

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Inference with RadixAttention rather than Self-Attention，but output weirdly #1960

walker-ai Nov 8, 2024

Replies: 2 comments · 1 reply

walker-ai Nov 8, 2024 Author

merrymercy Nov 13, 2024 Maintainer

walker-ai Nov 15, 2024 Author

walker-ai
Nov 8, 2024

Replies: 2 comments 1 reply

walker-ai
Nov 8, 2024
Author

merrymercy
Nov 13, 2024
Maintainer

walker-ai Nov 15, 2024
Author