[Question]: Am I using minference correctly? #83

YLGH · 2024-10-30T20:32:48Z

Describe the issue

Hi,

I followed the instructions for running minference on HF and ran an example where I give it the full text of Dante's inferno - in Italian. As well as the book of the harry potter series, and then ask it a few questions. I'm testing this on the llama 3p1 8b instruct model, but with the config modified so the sequence length is 262k.

https://gist.github.com/YLGH/2b70d6ed10a6b5ea97404cb2668e24f3

The output when using minference attention seems to be completely off. It doesn't acknowledge the existence of Dante's inferno at all, and says that I gave it books 1 and 2 of harry potter.

I also ran the same prompt through full dense attention, and it's able to distinguish the two.

Am I using the HF example correctly?

Thanks!

YLGH · 2024-11-11T17:39:24Z

Hi, bump on this -

Also curious if it's related to some observations that some attention heads are fully dense (like duo attention? Perhaps this is something that the benchmarks don't measure well?

iofu728 · 2024-11-12T07:37:17Z

Hi @YLGH, sorry, I haven't had a chance to check the previous issues yet, but I can provide a quick answer to your question.

For methods like DuoAttention and RazorAttention, I think they’re quite reasonable. First, a head-level hybrid sparsity approach makes a lot of sense—some heads should be able to handle tasks with only an A-shape pattern. This similar approach is also used in pretrained LLMs, such as those from Character.AI and Yi-lightning, which shows its effectiveness.

However, from my perspective, this approach is not fully optimized. The main reason is that attention heads are inherently very sparse, regardless of the specific head.

YLGH added the question Further information is requested label Oct 30, 2024

iofu728 self-assigned this Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Am I using minference correctly? #83

[Question]: Am I using minference correctly? #83

YLGH commented Oct 30, 2024

YLGH commented Nov 11, 2024

iofu728 commented Nov 12, 2024

[Question]: Am I using minference correctly? #83

[Question]: Am I using minference correctly? #83

Comments

YLGH commented Oct 30, 2024

Describe the issue

YLGH commented Nov 11, 2024

iofu728 commented Nov 12, 2024