-
Notifications
You must be signed in to change notification settings - Fork 218
Adding a llama.cpp LLM Component #1052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Ed Lee <[email protected]>
|
@edlee123 Please wait for several days. And integrate it into new interface. Sorry for this. |
Signed-off-by: Ed Lee <[email protected]>
Signed-off-by: Ed Lee <[email protected]>
…lled image has specific tag. Signed-off-by: Ed Lee <[email protected]>
|
Hi @xiguiw
No problem, which refactor branches should I wait for? I can wait for the refactoring to merge, and then I see how I can use the same approach. |
…oint.sh Signed-off-by: Ed Lee <[email protected]>
for more information, see https://pre-commit.ci
The LLM refactor code is merged. https://github.com/opea-project/GenAIComps/tree/main/comps/llms Foy your reference:
|
|
Thank you @xiguiw - would good next steps for this PR be:
Thank you for your guidance |
Signed-off-by: Ed Lee <[email protected]>
Signed-off-by: Ed Lee <[email protected]>
Signed-off-by: Ed Lee <[email protected]>
Signed-off-by: Ed Lee <[email protected]>
…-party service first. Signed-off-by: Ed Lee <[email protected]>
…ous tests Signed-off-by: Ed Lee <[email protected]>
… all containers. Signed-off-by: Ed Lee <[email protected]>
Signed-off-by: Ed Lee <[email protected]>
Signed-off-by: Ed Lee <[email protected]>
Signed-off-by: Ed Lee <[email protected]>
Signed-off-by: Ed Lee <[email protected]>
Signed-off-by: Ed Lee <[email protected]>
Signed-off-by: Ed Lee <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Ed Lee <[email protected]>
@edlee123
|
|
Hi @xiguiw It looks like there's some debate in llama.cpp about semver or "stable" tagging to balance their rapid development and stability: ggml-org/llama.cpp#9276 But I don't think their process looks resolved yet: "The llama-server example is on path to become another production ready artifact". Would you have some suggestion how I should proceed for now - maybe provide a disclaimer to check changelog carefully? For the test I can do something like vLLM, e.g., pull the Docker file that goes with a particular release, and build with a |
|
Changed to WIP pending bug #1323 Orphan container. |
jrevillard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello,
A general comment is that the llama.cpp repository moved so you should adapt the docker images URL (and all the links): ggml-org/llama.cpp#11801
- ghcr.io/ggerganov/llama.cpp -> ghcr.io/ggml-org/llama.cpp
- https://github.com/ggerganov/llama.cpp -> https://github.com/ggml-org/llama.cpp
Best,
Jérôme
@edlee123 If we cannot get a stable version, we have the risk of broken llm->llama sometimes (This happened in vLLM and Gaudi. So we get the latest stable version in OPEA now). If we fixed a verified commit ID, that's acceptable. But we need to update the commit id from time to time to get latest change from llama.cpp. There are some maintaining work. Who will take this work, will you take this work? |
|
Hi @xiguiw, with the maintenance I'm thinking we revisit at a future time when llama.cpp is more stable, and we can close for now. Seems there would be many changes in llama.cpp to stay on top. Also with #1395 can use openai style remote endpoints, and there are many good free options in OpenRouter.ai. I think this would give many options for devs on low compute machines. |
|
remind: Thanks for exploration extending the OPEA llm backend. |
|
@xiguiw sounds good to me, closing it now. Thank you for all for the review |
Description
I added a llama.cpp LLM OPEA component. Llama.cpp is a popular LLM inference library/server "with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud" written in pure C/C++.
The component code is written in llm.py, and is most similar to the existing code in
llms/text-generation/ray_serve. I also referred to ollama, and tgi to try keep with conventions.Please see the README.md provides instructions how to use it.
Issues
List the issue or RFC link this PR is working on. If there is no such link, please mark it as
n/a.Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
The dependencies are similar to other llm components.
Tests
This was tested on CPU (laptop) with Phi3.5 mini 4k instruct. The Llama Cpp can use GPU as needed but didn't test it.