Adding a llama.cpp LLM Component #1052

edlee123 · 2024-12-20T03:45:43Z

Description

I added a llama.cpp LLM OPEA component. Llama.cpp is a popular LLM inference library/server "with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud" written in pure C/C++.

The component code is written in llm.py, and is most similar to the existing code in llms/text-generation/ray_serve. I also referred to ollama, and tgi to try keep with conventions.

Please see the README.md provides instructions how to use it.

Issues

List the issue or RFC link this PR is working on. If there is no such link, please mark it as n/a.

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds new functionality)
Breaking change (fix or feature that would break existing design and interface)
Others (enhancement, documentation, validation, etc.)

Dependencies

The dependencies are similar to other llm components.

Tests

This was tested on CPU (laptop) with Phi3.5 mini 4k instruct. The Llama Cpp can use GPU as needed but didn't test it.

Signed-off-by: Ed Lee <[email protected]>

xiguiw · 2025-01-03T07:10:14Z

@edlee123
GenAIComps is under refactor - There are too much duplicated code.

Please wait for several days. And integrate it into new interface. Sorry for this.
But code refactor is helpful the new interface is simple. You just focus on LLＭ, no micro-service code needed.

comps/llms/text-generation/llamacpp/docker_compose_llm.yaml

comps/llms/text-generation/llamacpp/README.md

Signed-off-by: Ed Lee <[email protected]>

…lled image has specific tag. Signed-off-by: Ed Lee <[email protected]>

edlee123 · 2025-01-06T22:14:13Z

Hi @xiguiw

GenAIComps is under refactor - There are too much duplicated code.

No problem, which refactor branches should I wait for? I can wait for the refactoring to merge, and then I see how I can use the same approach.

Signed-off-by: Ed Lee <[email protected]>

comps/llms/text-generation/llamacpp/Dockerfile

comps/llms/text-generation/llamacpp/entrypoint.sh

…oint.sh Signed-off-by: Ed Lee <[email protected]>

for more information, see https://pre-commit.ci

xiguiw · 2025-01-13T03:39:24Z

Hi @xiguiw

GenAIComps is under refactor - There are too much duplicated code.

No problem, which refactor branches should I wait for? I can wait for the refactoring to merge, and then I see how I can use the same approach.

@edlee123

The LLM refactor code is merged.
Here is the code structure for your reference:

https://github.com/opea-project/GenAIComps/tree/main/comps/llms

.
├── deployment
│   ├── docker_compose
│   └── kubernetes
├── src
│   └── text-generation
           └── integrations

Foy your reference:

The docker-compose yaml file is put in deployment folder.
There is only one micro-service code for llm, opea_llm_microservice.py. For integration of each LLM services engine, it is located in integrations. Please refer to opea_llm_microservice.py and opea.py.
For multiple services/engines integrations, please can refer to https://github.com/opea-project/GenAIComps/tree/main/comps/embeddings/src/integrations

edlee123 · 2025-01-13T21:01:15Z

Thank you @xiguiw - would good next steps for this PR be:

Move the comps/llms/text-generation/llamacpp/docker_compose_llm.yaml file to comps/llms/deployment/docker_compose/text-generation_llamacpp.yaml (renaming it).
Test the new refactored opea llm microservice works with the above docker compose file.
Update the README.md in comps/llms/text-generation/llamacpp to use the above workflow?
Delete files from comps/llms/text-generation/llamacpp that are no longer needed.
Fix the last two github checks Check Online Document Building and Compose file and dockerfile path checking

Thank you for your guidance

Signed-off-by: Ed Lee <[email protected]>

…-party service first. Signed-off-by: Ed Lee <[email protected]>

…ous tests Signed-off-by: Ed Lee <[email protected]>

… all containers. Signed-off-by: Ed Lee <[email protected]>

Signed-off-by: Ed Lee <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Ed Lee <[email protected]>

xiguiw · 2025-02-25T03:35:45Z

Thank you @xiguiw. One question for above 1. create Dockefile to build docker image. vllm has src/Dockerfile.intel_gpu to build for Intel GPU. For llama.cpp CPU, it wouldn't need a special build, and could use the image: ghcr.io/ggerganov/llama.cpp:server-b4419. Am I understanding we should include the Dockerfile for this ghcr.io/ggerganov/llama.cpp:server-b4419 in the repo?

@edlee123
I agree we use the image and include the Docker file for the image ghcr.io/ggerganov/llama.cpp:server-b4419.
We need a stable version. Is this image a stable release version?

Perhaps I can write the test it'll be easier to see, and we can look at the test's function build_docker_images() ?
Yes, it's great that you write the test.

edlee123 · 2025-02-25T21:59:52Z

Hi @xiguiw

It looks like there's some debate in llama.cpp about semver or "stable" tagging to balance their rapid development and stability: ggml-org/llama.cpp#9276

But I don't think their process looks resolved yet: "The llama-server example is on path to become another production ready artifact".

Would you have some suggestion how I should proceed for now - maybe provide a disclaimer to check changelog carefully?

For the test I can do something like vLLM, e.g., pull the Docker file that goes with a particular release, and build with a build_docker_images function.

edlee123 · 2025-03-11T20:41:48Z

Changed to WIP pending bug #1323 Orphan container.

jrevillard

Hello,

A general comment is that the llama.cpp repository moved so you should adapt the docker images URL (and all the links): ggml-org/llama.cpp#11801

ghcr.io/ggerganov/llama.cpp -> ghcr.io/ggml-org/llama.cpp
https://github.com/ggerganov/llama.cpp -> https://github.com/ggml-org/llama.cpp

Best,
Jérôme

xiguiw · 2025-03-19T09:14:00Z

Hi @xiguiw

It looks like there's some debate in llama.cpp about semver or "stable" tagging to balance their rapid development and stability: ggml-org/llama.cpp#9276

But I don't think their process looks resolved yet: "The llama-server example is on path to become another production ready artifact".

Would you have some suggestion how I should proceed for now - maybe provide a disclaimer to check changelog carefully?

For the test I can do something like vLLM, e.g., pull the Docker file that goes with a particular release, and build with a build_docker_images function.

@edlee123
Thanks for sharing the information.
Yes, there is braking changes in llama.cpp.

If we cannot get a stable version, we have the risk of broken llm->llama sometimes (This happened in vLLM and Gaudi. So we get the latest stable version in OPEA now).

If we fixed a verified commit ID, that's acceptable. But we need to update the commit id from time to time to get latest change from llama.cpp. There are some maintaining work.

Who will take this work, will you take this work?
If there is maintainer, that's not a problem. Otherwise, it's a potential issue.

edlee123 · 2025-03-19T16:43:02Z

Hi @xiguiw, with the maintenance I'm thinking we revisit at a future time when llama.cpp is more stable, and we can close for now. Seems there would be many changes in llama.cpp to stay on top.

Also with #1395 can use openai style remote endpoints, and there are many good free options in OpenRouter.ai. I think this would give many options for devs on low compute machines.

xiguiw · 2025-03-26T02:05:17Z

@edlee123

remind:

Thanks for exploration extending the OPEA llm backend.
How about this PR, shall we close this an open a new one in the future?

edlee123 · 2025-03-26T13:08:16Z

@xiguiw sounds good to me, closing it now. Thank you for all for the review

First commit of llamacpp Opea component

397f7b8

Signed-off-by: Ed Lee <[email protected]>

edlee123 requested a review from lvliang-intel as a code owner December 20, 2024 03:45

edlee123 added 3 commits December 19, 2024 21:50

Removed unneeded requirements file

cb4f5e5

Signed-off-by: Ed Lee <[email protected]>

Merge branch 'main' into llamacpp

df3d943

Merge branch 'main' into llamacpp

8893f38

xiguiw reviewed Jan 3, 2025

View reviewed changes

comps/llms/text-generation/llamacpp/docker_compose_llm.yaml Outdated Show resolved Hide resolved

comps/llms/text-generation/llamacpp/README.md Outdated Show resolved Hide resolved

xiguiw requested a review from letonghan January 6, 2025 09:17

edlee123 added 5 commits January 6, 2025 15:38

Pin the llama.cpp server version, and fix small typo

2a48bae

Signed-off-by: Ed Lee <[email protected]>

Merge branch 'llamacpp' of github.com:edlee123/GenAIComps into llamacpp

644ecce

Update README.md to describe hardware support, and provide reference.

4e82152

Signed-off-by: Ed Lee <[email protected]>

Updated docker_compose_llm.yaml so that the llamacpp-server so the pu…

baf381d

…lled image has specific tag. Signed-off-by: Ed Lee <[email protected]>

Merge branch 'main' into llamacpp

7bab970

edlee123 added 3 commits January 7, 2025 08:48

Merge branch 'main' into llamacpp

e4f4b70

Small adjustments to README.md

9d7539d

Signed-off-by: Ed Lee <[email protected]>

Merge branch 'main' into llamacpp

2cf25e5

eero-t reviewed Jan 8, 2025

View reviewed changes

comps/llms/text-generation/llamacpp/Dockerfile Show resolved Hide resolved

comps/llms/text-generation/llamacpp/entrypoint.sh Outdated Show resolved Hide resolved

edlee123 and others added 4 commits January 10, 2025 13:13

This removes unneeded dependencies in the Dockerfile, unneeded entryp…

fd15ee7

…oint.sh Signed-off-by: Ed Lee <[email protected]>

Merge branch 'llamacpp' of github.com:edlee123/GenAIComps into llamacpp

666196c

Merge branch 'main' into llamacpp

104527a

[pre-commit.ci] auto fixes from pre-commit.com hooks

c931902

for more information, see https://pre-commit.ci

edlee123 added 6 commits January 24, 2025 09:47

Merge branch 'main' into llamacpp

6b98403

Merge branch 'main' into llamacpp

240d3d1

Merge branch 'main' into llamacpp

91e0fd4

Refactored llama cpp and text-generation README_llamacpp.md

a75d28d

Signed-off-by: Ed Lee <[email protected]>

Delete unrefactored files

830da58

Signed-off-by: Ed Lee <[email protected]>

Adding llama.cpp backend include in the compose_text-genearation.yaml

8d058bb

Signed-off-by: Ed Lee <[email protected]>

edlee123 added 10 commits February 21, 2025 16:34

Fixed typos on http environment variables, and volumes

c474a64

Signed-off-by: Ed Lee <[email protected]>

Splitting the llama.cpp test to use compose up on the llama.cpp third…

712f575

…-party service first. Signed-off-by: Ed Lee <[email protected]>

add alternate command to stop and remove docker containers from previ…

68cc00f

…ous tests Signed-off-by: Ed Lee <[email protected]>

Modifying tear down of stop_docker in llamacpp tests to try to remove…

2dd2064

… all containers. Signed-off-by: Ed Lee <[email protected]>

Adding some logs output to debug llamacpp test

dbff6fc

Signed-off-by: Ed Lee <[email protected]>

Found model path bug and fixed it to run llama.cpp test

f184897

Signed-off-by: Ed Lee <[email protected]>

Adjusted LLM_ENDPOINT env variable

ea4ea38

Signed-off-by: Ed Lee <[email protected]>

Cleaned up test file

01fca03

Signed-off-by: Ed Lee <[email protected]>

Adjust host_ip env variable in scope of start_service

dfd5057

Signed-off-by: Ed Lee <[email protected]>

Merge branch 'main' into llamacpp

a741320

eero-t mentioned this pull request Feb 24, 2025

Provide a clear working example of how to orchestrate multiple services #1272

Closed

edlee123 and others added 5 commits February 24, 2025 10:56

Docker ps to debug orphaned containers.

4a965da

Signed-off-by: Ed Lee <[email protected]>

Merge branch 'llamacpp' of github.com:edlee123/GenAIComps into llamacpp

25240da

[pre-commit.ci] auto fixes from pre-commit.com hooks

32b06e9

for more information, see https://pre-commit.ci

Adding output to debug orphaned docker containers

3363504

Signed-off-by: Ed Lee <[email protected]>

Merge branch 'llamacpp' of github.com:edlee123/GenAIComps into llamacpp

421b1ab

Merge branch 'main' into llamacpp

d5d3c1e

jrevillard suggested changes Mar 12, 2025

View reviewed changes

yinghu5 added the WIP label Mar 19, 2025

Merge branch 'main' into llamacpp

d85c60e

edlee123 closed this Mar 26, 2025

yinghu5 added Backlog features in backlog A3 Maintain labels Mar 27, 2025

Adding a llama.cpp LLM Component #1052

Adding a llama.cpp LLM Component #1052

Uh oh!

Conversation

edlee123 commented Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues

Type of change

Dependencies

Tests

Uh oh!

xiguiw commented Jan 3, 2025

Uh oh!

Uh oh!

Uh oh!

edlee123 commented Jan 6, 2025

Uh oh!

Uh oh!

Uh oh!

xiguiw commented Jan 13, 2025

Uh oh!

edlee123 commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xiguiw commented Feb 25, 2025

Uh oh!

edlee123 commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edlee123 commented Mar 11, 2025

Uh oh!

jrevillard left a comment

Choose a reason for hiding this comment

Uh oh!

xiguiw commented Mar 19, 2025

Uh oh!

edlee123 commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xiguiw commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edlee123 commented Mar 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

edlee123 commented Dec 20, 2024 •

edited

Loading

edlee123 commented Jan 13, 2025 •

edited

Loading

edlee123 commented Feb 25, 2025 •

edited

Loading

edlee123 commented Mar 19, 2025 •

edited

Loading

xiguiw commented Mar 26, 2025 •

edited

Loading