Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add llama C++ example #926

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

natke
Copy link
Contributor

@natke natke commented Sep 25, 2024

No description provided.

// Show usage of GetOutput
std::unique_ptr<OgaTensor> output_logits = generator->GetOutput("logits");

// Assuming output_logits.Type() is float as it's logits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just fyi, the raw model output can be float16 if the model runs on cuda. Our internal "processed logits" are always float32

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this correct, or not?

generator->GenerateNextToken();

// Show usage of GetOutput
std::unique_ptr<OgaTensor> output_logits = generator->GetOutput("logits");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the std::unique_ptr<OgaTensor> for clarity vs auto?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So auto would be simpler? I'll update it

std::cout << "Run Llama" << std::endl;
std::cout << "-------------" << std::endl;

#ifdef USE_CXX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the USE_CXX here for, given the whole file is C++?

@baijumeswani
Copy link
Collaborator

What changes from one decoder-only model to another? Shall we instead create a standardized example that can work with other decoder-only models?

@kunal-vaishnavi
Copy link
Contributor

What changes from one decoder-only model to another? Shall we instead create a standardized example that can work with other decoder-only models?

I agree that one standardized example should be sufficient. From a user's perspective, the main change between decoder-only models is the chat template.

@@ -6,6 +6,7 @@ set(CMAKE_CXX_STANDARD 20)
option(USE_CUDA "Build with CUDA support" OFF)
option(USE_CXX "Invoke the C++ example" ON)
option(PHI3 "Build the Phi example" OFF)
option(LLAMA "Build the Llama example" OFF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have something like the following?

Suggested change
option(LLAMA "Build the Llama example" OFF)
option(LLM "Build the large-language model example" OFF)
option(VLM "Build the vision-language model example" OFF)
option(ALM "Build the audio-language model example" OFF)

The abbreviations and names can be different, but the idea would be to create examples grouped by input and output modality.

  • The LLM could be for decoder-only models (e.g. LLaMA, Phi)
    • Inputs: text
    • Outputs: text
  • The VLM could be for vision-text models (e.g. Phi-3/Phi-3.5 vision)
    • Inputs: images, text
    • Outputs: text
  • The ALM could be for audio-text models (e.g. Whisper)
    • Inputs: audios, text
    • Outputs: text

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great proposal for a new PR!

@natke
Copy link
Contributor Author

natke commented Oct 1, 2024

What changes from one decoder-only model to another? Shall we instead create a standardized example that can work with other decoder-only models?

I agree that we should have a base decoder class, maybe? And these scripts can inherit from that class. But I'd like to get this sample in soonish so that folks can run Llama

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants