Replies: 1 comment
-
@merrymercy Would you do me a favor, thank you for your time! 🙏 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm trying to adjust my own LLM inference code, I have replaced the self-attention with RadixAttention, and some necessary components. But I found that the output is weird:
For example, when I gave "who are you" and "Which city is the capital of China?" as prompts, it went:
or
My model is official weight from meta-llama2-7b-chat, and my inference code is based on pytorch-llama.
It really confused me. I'm not sure if it's a problem with attention calculation. Anyone can help me to figure out what's going on inside here. 🙏
Beta Was this translation helpful? Give feedback.
All reactions