Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Llama-3_1-Nemotron-51B #10669

Merged
merged 5 commits into from
Dec 23, 2024
Merged

Support for Llama-3_1-Nemotron-51B #10669

merged 5 commits into from
Dec 23, 2024

Conversation

ymcki
Copy link
Contributor

@ymcki ymcki commented Dec 5, 2024

Make sure to read the contributing guidelines before submitting a PR

More details is here:
#10648

Seems like my changes in vocab.py doesn't really break CI test.

It might be a better idea to not to modify vocab.py but instead ask the user to fix the tokenizer_config.json instead. In that case, you can ignore the changes I made in vocab.py.

@github-actions github-actions bot added the python python script changes label Dec 5, 2024
@bartowski1182
Copy link
Contributor

I wonder if it's also a better idea not to group this with the normal llama archs since it requires so many changes, may be better to make it its own model type?

@ymcki
Copy link
Contributor Author

ymcki commented Dec 6, 2024

I wonder if it's also a better idea not to group this with the normal llama archs since it requires so many changes, may be better to make it its own model type?

I think src/llama.cpp doesn't change that much but convert_hf_to_gguf.py does have more changes. Anyway, I can make another fork to make it a separate model type and submit another pull request.

What do you think about the vocab.py problem? Should I just leave the original vocab.py as is and ask people to fix tokenizer_config.json instead?

@ymcki
Copy link
Contributor Author

ymcki commented Dec 6, 2024

Created a separate Deci Model. This version doesn't change vocab.py and relies on people manually fixing 51B model's tokenizer_config.json.

@ymcki
Copy link
Contributor Author

ymcki commented Dec 11, 2024

Any updates?

src/llama.cpp Outdated
Comment on lines 11063 to 11065
if (n_head == 0) // attention-free layer of Llama-3_1-Nemotron-51B
cur = inpL;
else {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (n_head == 0) // attention-free layer of Llama-3_1-Nemotron-51B
cur = inpL;
else {
if (n_head == 0) { // attention-free layer of Llama-3_1-Nemotron-51B
cur = inpL;
} else {

src/llama.cpp Outdated
Comment on lines 11076 to 11078
} else if (n_head > 0)
// self-attention
{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (n_head > 0)
// self-attention
{
} else if (n_head > 0) {
// self-attention

@ngxson
Copy link
Collaborator

ngxson commented Dec 12, 2024

To fix editorconfig / flake8 tests, you need to modify your source code to remove trailing spaces / add new line.

And to fix server CI, you need to merge latest commits from master branch

@ymcki
Copy link
Contributor Author

ymcki commented Dec 15, 2024

Can someone approve the workflows?

@ymcki ymcki reopened this Dec 19, 2024
@ymcki
Copy link
Contributor Author

ymcki commented Dec 19, 2024

Yay! Finally passed all checks! :)

src/llama.cpp Outdated
cur = llm_build_lora_mm(lctx, ctx0, model.layers[il].wo, cur);
cb(cur, "wo", il);
} else if (n_head > 0) {
// self-attention
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// self-attention
// self-attention

src/llama.cpp Outdated
const int64_t n_head_kv = hparams.n_head_kv(il);
const int64_t n_head = hparams.n_head(il);

if (n_head == 0) { // attention-free layer of Llama-3_1-Nemotron-51B
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (n_head == 0) { // attention-free layer of Llama-3_1-Nemotron-51B
if (n_head == 0) {
// attention-free layer of Llama-3_1-Nemotron-51B

src/llama.cpp Outdated
cb(cur, "attn_norm", il);
}

if (n_head > 0 && n_head_kv == 0) { // "linear attention" of Llama-3_1-Nemotron-51B
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (n_head > 0 && n_head_kv == 0) { // "linear attention" of Llama-3_1-Nemotron-51B
if (n_head > 0 && n_head_kv == 0) {
// "linear attention" of Llama-3_1-Nemotron-51B

@slaren slaren merged commit 6f0c9e0 into ggerganov:master Dec 23, 2024
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants