Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama #35

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open

Llama #35

wants to merge 34 commits into from

Conversation

laggui
Copy link
Member

@laggui laggui commented May 22, 2024

Bringing the first official Llama implementation to Burn! With pre-trained weights in mpk format (hosted on HF hub).

  • Llama decoder-only transformer architecture
  • Llama 3
    • Tiktoken tokenizer
    • Pre-trained weights
  • TinyLlama
    • Sentencepiece tokenizer
    • Pre-trained weights

Currently the top-p sampling is done on CPU before decoding since Burn is missing categorical distribution sampling. We could improve that once everything else is done.

Closes #20

@laggui laggui marked this pull request as ready for review May 24, 2024 18:47
@laggui
Copy link
Member Author

laggui commented May 24, 2024

Currently downloading the Llama 3 8B Instruct to have a chat mode available for Llama 3 as well.

Also need to update the README to provide a bit more info.

Otherwise everything is ready to go 💪

/edit
Actually, a small note: even TinyLlama's record takes ~50sec to load on my machine.. so we could try to improve that but that is on Burn's side.

@laggui
Copy link
Member Author

laggui commented May 29, 2024

Tested with wgpu and tch (gpu). I think this is ready for review!

TinyLlama results on my dev machine:

Wgpu

Loading record...
Loaded in 20s
Processing prompt: How many helicopters can a human eat in one sitting?
> It's impossible to know for certain how many helicopters a human can eat in one sitting. However, it's generally accepted that humans have a limited appetite and can only eat a small amount of food at a time.

50 tokens generated (3.5432 tokens/s)

Generation completed in 0m14s

LibTorch<f16>

Loading record...
Loaded in 18s
Processing prompt: How many helicopters can a human eat in one sitting?
> It's impossible to know for certain how many helicopters a human can eat in one sitting. However, it's generally accepted that humans have a limited appetite and can only eat a small amount of food at a time.

50 tokens generated (21.6305 tokens/s)

Generation completed in 0m2s

Pretty big difference 😅

Copy link
Member

@nathanielsimard nathanielsimard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I have only one small comment, but otherwise very good job! 👏

llama-burn/src/sampling.rs Outdated Show resolved Hide resolved
@laggui
Copy link
Member Author

laggui commented Aug 7, 2024

Weights have been updated to use the named mpk format (much faster now that data is treated as bytes with serde). In follow-up PRs we will add quantization and support for Llama 3.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Llama 3 model variants
2 participants