Llama-3_1-Nemotron 51B support #726

ymcki · 2025-01-28T04:55:36Z

Dear all,

I am the person who added Llama-3_1-Nemotron-51B support to llama.cpp.

I tried to add this support to exllamav2 and came up with a hack that
can convert and infer Llama-3_1-Nemotron-51B.

While the hack is working, I am not sure if it is the best way to implement it
as it changes quite a lot of code at exllamav2.

This is because the current exllamav2 codebase is not designed for the case
when different layers of an llm can have different number of key_value_heads and
different structures as in the case of DeciLMForCausalLM (this 51B model) and Apple's
OpenELMForCausalLM.

For this 51B model, there are three types of layers:

Normal layer that is same as the llama3 model it is based on
A linear attention layer that has no q_proj, k_proj, v_proj and o_proj but has
a linear_attn which is simply matmuled with input.
An attention-free layer that is simply a MLP layer.

Also, for this model, number of kv_heads and intermediate_size can be different
for different layers.

As a result, there are quite a lot of changes to the code in my fork. I also added
a file called linear_attn.py to define ExLlamaV2LinearAttention to handle the
linear attention layer.

While it can run without errors based on my testing so far, I am not sure if it
covers all situations. Maybe it will be better waiting for a rewrite that accomodates
these per layer variable models like DeciLMForCausalLM and OpenELMForCausalLM.

It would be great if this hack can serve as a starting point for such a rewrite and
allow me to add the support later for a cleaner contribution.

Thank you very much for your time.

…it to ExLlammaV2Attention

ymcki · 2025-01-29T06:29:07Z

Removed linear_attn.py by merging the ExLlamaV2LinearAttention class into ExLlamaV2Attention.

ymcki added 3 commits January 28, 2025 12:30

Llama-3_1-Nemotron-51B support

c64d65a

Llama-3_1-Nemotron-51B support

bc370eb

removed linear_attn.py and ExLlamaV2LinearAttetnion class by merging …

a62b0fd

…it to ExLlammaV2Attention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama-3_1-Nemotron 51B support #726

Llama-3_1-Nemotron 51B support #726

ymcki commented Jan 28, 2025

ymcki commented Jan 29, 2025

Llama-3_1-Nemotron 51B support #726

Are you sure you want to change the base?

Llama-3_1-Nemotron 51B support #726

Conversation

ymcki commented Jan 28, 2025

ymcki commented Jan 29, 2025