Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
397f7b8
First commit of llamacpp Opea component
edlee123 Dec 20, 2024
cb4f5e5
Removed unneeded requirements file
edlee123 Dec 20, 2024
df3d943
Merge branch 'main' into llamacpp
edlee123 Dec 20, 2024
8893f38
Merge branch 'main' into llamacpp
edlee123 Dec 28, 2024
2a48bae
Pin the llama.cpp server version, and fix small typo
edlee123 Jan 6, 2025
644ecce
Merge branch 'llamacpp' of github.com:edlee123/GenAIComps into llamacpp
edlee123 Jan 6, 2025
4e82152
Update README.md to describe hardware support, and provide reference.
edlee123 Jan 6, 2025
baf381d
Updated docker_compose_llm.yaml so that the llamacpp-server so the pu…
edlee123 Jan 6, 2025
7bab970
Merge branch 'main' into llamacpp
edlee123 Jan 6, 2025
e4f4b70
Merge branch 'main' into llamacpp
edlee123 Jan 7, 2025
9d7539d
Small adjustments to README.md
edlee123 Jan 7, 2025
2cf25e5
Merge branch 'main' into llamacpp
edlee123 Jan 8, 2025
fd15ee7
This removes unneeded dependencies in the Dockerfile, unneeded entryp…
edlee123 Jan 10, 2025
666196c
Merge branch 'llamacpp' of github.com:edlee123/GenAIComps into llamacpp
edlee123 Jan 10, 2025
104527a
Merge branch 'main' into llamacpp
edlee123 Jan 10, 2025
c931902
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 10, 2025
6b98403
Merge branch 'main' into llamacpp
edlee123 Jan 24, 2025
240d3d1
Merge branch 'main' into llamacpp
edlee123 Feb 3, 2025
91e0fd4
Merge branch 'main' into llamacpp
edlee123 Feb 14, 2025
a75d28d
Refactored llama cpp and text-generation README_llamacpp.md
edlee123 Feb 14, 2025
830da58
Delete unrefactored files
edlee123 Feb 14, 2025
8d058bb
Adding llama.cpp backend include in the compose_text-genearation.yaml
edlee123 Feb 14, 2025
a0294a5
Merge branch 'llamacpp' of github.com:edlee123/GenAIComps into llamacpp
edlee123 Feb 14, 2025
a6740b6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 14, 2025
d0e27bf
Fix service name
edlee123 Feb 21, 2025
91324af
Revise llamacpp, using smaller Qwen model and remove unnecessary curl…
edlee123 Feb 21, 2025
f295e29
Update llamacpp thirdparty readme to use smaller model
edlee123 Feb 21, 2025
480cb69
Fix healthcheck in llamacpp deployment compose.yaml
edlee123 Feb 21, 2025
2c9f877
Wrote a test and tested for llamacpp text gen service
edlee123 Feb 21, 2025
f3147f1
Merge branch 'llamacpp' of github.com:edlee123/GenAIComps into llamacpp
edlee123 Feb 21, 2025
7310d6a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 21, 2025
80ed9b0
Merge branch 'main' into llamacpp
edlee123 Feb 21, 2025
efde309
Increase the llamacpp-server wait time
edlee123 Feb 21, 2025
1a7db52
Merge branch 'llamacpp' of github.com:edlee123/GenAIComps into llamacpp
edlee123 Feb 21, 2025
c474a64
Fixed typos on http environment variables, and volumes
edlee123 Feb 21, 2025
712f575
Splitting the llama.cpp test to use compose up on the llama.cpp third…
edlee123 Feb 21, 2025
68cc00f
add alternate command to stop and remove docker containers from previ…
edlee123 Feb 22, 2025
2dd2064
Modifying tear down of stop_docker in llamacpp tests to try to remove…
edlee123 Feb 22, 2025
dbff6fc
Adding some logs output to debug llamacpp test
edlee123 Feb 22, 2025
f184897
Found model path bug and fixed it to run llama.cpp test
edlee123 Feb 22, 2025
ea4ea38
Adjusted LLM_ENDPOINT env variable
edlee123 Feb 22, 2025
01fca03
Cleaned up test file
edlee123 Feb 22, 2025
dfd5057
Adjust host_ip env variable in scope of start_service
edlee123 Feb 22, 2025
a741320
Merge branch 'main' into llamacpp
edlee123 Feb 24, 2025
4a965da
Docker ps to debug orphaned containers.
edlee123 Feb 24, 2025
25240da
Merge branch 'llamacpp' of github.com:edlee123/GenAIComps into llamacpp
edlee123 Feb 24, 2025
32b06e9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 24, 2025
3363504
Adding output to debug orphaned docker containers
edlee123 Feb 24, 2025
421b1ab
Merge branch 'llamacpp' of github.com:edlee123/GenAIComps into llamacpp
edlee123 Feb 24, 2025
d5d3c1e
Merge branch 'main' into llamacpp
edlee123 Mar 11, 2025
d85c60e
Merge branch 'main' into llamacpp
xiguiw Mar 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ include:
- ../../../third_parties/tgi/deployment/docker_compose/compose.yaml
- ../../../third_parties/vllm/deployment/docker_compose/compose.yaml
- ../../../third_parties/ollama/deployment/docker_compose/compose.yaml
- ../../../third_parties/llamacpp/deployment/docker_compose/compose.yaml


services:
textgen:
Expand Down Expand Up @@ -100,6 +102,16 @@ services:
environment:
LLM_COMPONENT_NAME: ${LLM_COMPONENT_NAME:-OpeaTextGenNative}

textgen-service-llamacpp:
extends: textgen
container_name: textgen-service-llamacpp
environment:
LLM_ENDPOINT: http://llamacpp-server
LLM_COMPONENT_NAME: ${LLM_COMPONENT_NAME:-OpeaTextGenService}
depends_on:
llamacpp-server:
condition: service_healthy

networks:
default:
driver: bridge
83 changes: 83 additions & 0 deletions comps/llms/src/text-generation/README_llamacpp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# llama.cpp Introduction

[llama.cpp](https://github.com/ggerganov/llama.cpp) provides inference in pure C/C++, and enables "LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud".

This OPEA component wraps llama.cpp server so that it can interface with other OPEA components, or for creating OPEA Megaservices.

llama.cpp supports this [hardware](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#supported-backends), and has only been tested on CPU.

To use a CUDA server please refer to [this llama.cpp reference](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#docker) and modify docker_compose_llm.yaml accordingly.

## Get Started

### 1. Download a gguf model to serve

To download an example .gguf model to a model path:

```bash
export MODEL_PATH=~/models
mkdir -p $MODEL_PATH # -p means make only if doesn't exist
cd $MODEL_PATH
wget --no-clobber https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF/resolve/main/qwen2.5-1.5b-instruct-q4_k_m.gguf
```

### 2. Set Environment Variables

```bash
export MODEL_PATH=~/models
export host_ip=$(hostname -I | awk '{print $1}')
export TEXTGEN_PORT=9000
export LLM_ENDPOINT_PORT=8008
export LLM_ENDPOINT="http://${host_ip}:80"
export LLM_MODEL_ID="models/qwen2.5-1.5b-instruct-q4_k_m.gguf"
export LLAMA_ARG_CTX_SIZE=4096
```

### 3. Run the llama.cpp OPEA Microservice

```bash
export service_name="textgen-service-llamacpp"
cd comps/llms/deployment/docker_compose/
docker compose -f compose_text-generation.yaml up ${service_name} -d
```

The server output can be observed in a terminal with `docker log <container>`.

## Consume the Service

Verify the backend llama.cpp backend server:

```bash
curl http://0.0.0.0:8008/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is deep learning?"
}
]
}'
```

Consume the service:

This component is based on openAI API convention:

```bash
curl -X POST http://localhost:9000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Write a limerick about python exceptions"}],
"max_tokens": 100,
"temperature": 0.7,
"top_p": 0.9,
"top_k": 50,
"stream": false
}'
```
55 changes: 55 additions & 0 deletions comps/third_parties/llamacpp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Introduction

[llama.cpp](https://github.com/ggerganov/llama.cpp) provides inference in pure C/C++, and enables "LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud".

This OPEA component wraps llama.cpp server so that it can interface with other OPEA components, or for creating OPEA Megaservices.

llama.cpp supports this [hardware](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#supported-backends), and has only been tested on CPU.

To use a CUDA server please refer to [this llama.cpp reference](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#docker) and modify docker_compose_llm.yaml accordingly.

## Get Started

### 1. Download a gguf Model

To download an example .gguf model to a model path:

```bash
export MODEL_PATH=~/models
mkdir -p $MODEL_PATH # -p means make only if doesn't exist
cd $MODEL_PATH

wget --no-clobber https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF/resolve/main/qwen2.5-1.5b-instruct-q4_k_m.gguf
```

### 2. Set Environment Variables

```bash
export MODEL_PATH=~/models
export host_ip=$(hostname -I | awk '{print $1}')
export LLM_ENDPOINT_PORT=8008
export LLM_MODEL_ID="models/qwen2.5-1.5b-instruct-q4_k_m.gguf"
export LLAMA_ARG_CTX_SIZE=4096
```

### 3. Run the llama.cpp Backend Microservice

```bash
cd deployment/docker_compose
docker compose -f compose.yaml up llamacpp-server -d
```

To use this in an OPEA text generation component please see [llama.cpp text-generation](../../llms/src/text-generation/README_llamacpp.md)

Note: can use docker logs <container> to observe server.

## Consume the service

Llama cpp supports openai style API:

```bash
curl http://${host_ip}:8008/v1/chat/completions \
-X POST \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "What is Deep Learning?"}]}'
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

services:
llamacpp-server:
image: ghcr.io/ggerganov/llama.cpp:server-b4419
container_name: llamacpp-server
ports:
- ${LLM_ENDPOINT_PORT:-8008}:80
volumes:
# Download the .gguf models to this path.
- ${MODEL_PATH:-~/models}:/models
environment:
LOGFLAG: False
no_proxy: ${no_proxy}
https_proxy: ${http_proxy}
http_proxy: ${https_proxy}
LLM_MODEL_ID: ${LLM_MODEL_ID}
LLM_ENDPOINT_PORT: ${LLM_ENDPOINT_PORT}
host_ip: ${host_ip}
# llama.cpp env variables. Please refer to reference:
# https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md
LLAMA_ARG_PORT: 80
LLAMA_ARG_MODEL: /$LLM_MODEL_ID
LLAMA_ARG_CTX_SIZE: ${LLAMA_ARG_CTX_SIZE:-4096}
LLAMA_ARG_N_PARALLEL: 2
LLAMA_ARG_ENDPOINT_METRICS: 1
ipc: host
healthcheck:
test: [ "CMD-SHELL", "curl -f http://${host_ip}:${LLM_ENDPOINT_PORT}/health || exit 1" ]
interval: 10s
timeout: 10s
retries: 100

networks:
default:
driver: bridge
8 changes: 5 additions & 3 deletions tests/llms/test_llms_doc-summarization_tgi.sh
Original file line number Diff line number Diff line change
Expand Up @@ -140,10 +140,12 @@ function stop_docker() {
cd $WORKPATH/comps/llms/deployment/docker_compose
docker compose -f compose_doc-summarization.yaml down ${service_name} --remove-orphans
}

function main() {

echo "Docker containers before stop_docker"
docker ps -a
stop_docker
echo "Docker containers after stop_docker"
docker ps -a


build_docker_images
start_service
Expand Down
8 changes: 5 additions & 3 deletions tests/llms/test_llms_doc-summarization_tgi_on_intel_hpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -141,10 +141,12 @@ function stop_docker() {
cd $WORKPATH/comps/llms/deployment/docker_compose
docker compose -f compose_doc-summarization.yaml down ${service_name} --remove-orphans
}

function main() {

echo "Docker containers before stop_docker"
docker ps -a
stop_docker
echo "Docker containers after stop_docker"
docker ps -a


build_docker_images
start_service
Expand Down
8 changes: 5 additions & 3 deletions tests/llms/test_llms_doc-summarization_vllm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -155,10 +155,12 @@ function stop_docker() {
cd $WORKPATH/comps/llms/deployment/docker_compose
docker compose -f compose_doc-summarization.yaml down ${service_name} --remove-orphans
}

function main() {

echo "Docker containers before stop_docker"
docker ps -a
stop_docker
echo "Docker containers after stop_docker"
docker ps -a


build_docker_images
start_service
Expand Down
8 changes: 5 additions & 3 deletions tests/llms/test_llms_doc-summarization_vllm_on_intel_hpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -158,10 +158,12 @@ function stop_docker() {
cd $WORKPATH/comps/llms/deployment/docker_compose
docker compose -f compose_doc-summarization.yaml down ${service_name} --remove-orphans
}

function main() {

echo "Docker containers before stop_docker"
docker ps -a
stop_docker
echo "Docker containers after stop_docker"
docker ps -a


build_docker_images
start_service
Expand Down
8 changes: 5 additions & 3 deletions tests/llms/test_llms_faq-generation_tgi.sh
Original file line number Diff line number Diff line change
Expand Up @@ -102,10 +102,12 @@ function stop_docker() {
cd $WORKPATH/comps/llms/deployment/docker_compose
docker compose -f compose_faq-generation.yaml down ${service_name} --remove-orphans
}

function main() {

echo "Docker containers before stop_docker"
docker ps -a
stop_docker
echo "Docker containers after stop_docker"
docker ps -a


build_docker_images
start_service
Expand Down
8 changes: 5 additions & 3 deletions tests/llms/test_llms_faq-generation_tgi_on_intel_hpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,12 @@ function stop_docker() {
cd $WORKPATH/comps/llms/deployment/docker_compose
docker compose -f compose_faq-generation.yaml down ${service_name} --remove-orphans
}

function main() {

echo "Docker containers before stop_docker"
docker ps -a
stop_docker
echo "Docker containers after stop_docker"
docker ps -a


build_docker_images
start_service
Expand Down
8 changes: 5 additions & 3 deletions tests/llms/test_llms_faq-generation_vllm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -118,10 +118,12 @@ function stop_docker() {
cd $WORKPATH/comps/llms/deployment/docker_compose
docker compose -f compose_faq-generation.yaml down ${service_name} --remove-orphans
}

function main() {

echo "Docker containers before stop_docker"
docker ps -a
stop_docker
echo "Docker containers after stop_docker"
docker ps -a


build_docker_images
start_service
Expand Down
5 changes: 4 additions & 1 deletion tests/llms/test_llms_faq-generation_vllm_on_intel_hpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -121,8 +121,11 @@ function stop_docker() {
}

function main() {

echo "Docker containers before stop_docker"
docker ps -a
stop_docker
echo "Docker containers after stop_docker"
docker ps -a

build_docker_images
start_service
Expand Down
8 changes: 5 additions & 3 deletions tests/llms/test_llms_text-generation_native_on_intel_hpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,12 @@ function stop_docker() {
cd $WORKPATH/comps/llms/deployment/docker_compose
docker compose -f compose_text-generation.yaml down ${service_name} --remove-orphans
}

function main() {

echo "Docker containers before stop_docker"
docker ps -a
stop_docker
echo "Docker containers after stop_docker"
docker ps -a

build_docker_images
start_service
validate_microservice
Expand Down
Loading
Loading