Skip to content

Commit 14c22d5

Browse files
authored
Merge branch 'opea-project:main' into main
2 parents 2b040c0 + 50f6b3e commit 14c22d5

25 files changed

+722
-693
lines changed

ChatQnA/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,3 +120,14 @@ For ChatQnA specific tracing and metrics monitoring, follow [OpenTelemetry on Ch
120120
## FAQ Generation Application
121121

122122
FAQ Generation Application leverages the power of large language models (LLMs) to revolutionize the way you interact with and comprehend complex textual data. By harnessing cutting-edge natural language processing techniques, our application can automatically generate comprehensive and natural-sounding frequently asked questions (FAQs) from your documents, legal texts, customer queries, and other sources. We merged the FaqGen into the ChatQnA example, which utilize LangChain to implement FAQ Generation and facilitate LLM inference using Text Generation Inference on Intel Xeon and Gaudi2 processors.
123+
124+
## Validated Configurations
125+
126+
| **Deploy Method** | **LLM Engine** | **LLM Model** | **Embedding** | **Vector Database** | **Reranking** | **Guardrails** | **Hardware** |
127+
| ----------------- | -------------- | ----------------------------------- | ------------- | ---------------------------------------- | ------------- | -------------- | ------------ |
128+
| Docker Compose | vLLM, TGI | meta-llama/Meta-Llama-3-8B-Instruct | TEI | Redis | w/, w/o | w/, w/o | Intel Gaudi |
129+
| Docker Compose | vLLM, TGI | meta-llama/Meta-Llama-3-8B-Instruct | TEI | Redis, Mariadb, Milvus, Pinecone, Qdrant | w/, w/o | w/o | Intel Xeon |
130+
| Docker Compose | Ollama | llama3.2 | TEI | Redis | w/ | w/o | Intel AIPC |
131+
| Docker Compose | vLLM, TGI | meta-llama/Meta-Llama-3-8B-Instruct | TEI | Redis | w/ | w/o | AMD ROCm |
132+
| Helm Charts | vLLM, TGI | meta-llama/Meta-Llama-3-8B-Instruct | TEI | Redis | w/, w/o | w/, w/o | Intel Gaudi |
133+
| Helm Charts | vLLM, TGI | meta-llama/Meta-Llama-3-8B-Instruct | TEI | Redis, Milvus, Qdrant | w/, w/o | w/o | Intel Xeon |

CodeGen/docker_compose/intel/cpu/xeon/README.md

Lines changed: 61 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -33,65 +33,67 @@ This guide focuses on running the pre-configured CodeGen service using Docker Co
3333

3434
## Quick Start Deployment
3535

36-
This uses the default vLLM-based deployment profile (`codegen-xeon-vllm`).
36+
This uses the default vLLM-based deployment using `compose.yaml`.
3737

3838
1. **Configure Environment:**
3939
Set required environment variables in your shell:
4040

41-
```bash
42-
# Replace with your host's external IP address (do not use localhost or 127.0.0.1)
43-
export HOST_IP="your_external_ip_address"
44-
# Replace with your Hugging Face Hub API token
45-
export HF_TOKEN="your_huggingface_token"
46-
47-
# Optional: Configure proxy if needed
48-
# export http_proxy="your_http_proxy"
49-
# export https_proxy="your_https_proxy"
50-
# export no_proxy="localhost,127.0.0.1,${HOST_IP}" # Add other hosts if necessary
51-
source intel/set_env.sh
52-
cd /intel/cpu/xeon
53-
```
41+
```bash
42+
# Replace with your host's external IP address (do not use localhost or 127.0.0.1)
43+
export HOST_IP="your_external_ip_address"
44+
# Replace with your Hugging Face Hub API token
45+
export HF_TOKEN="your_huggingface_token"
46+
47+
# Optional: Configure proxy if needed
48+
# export http_proxy="your_http_proxy"
49+
# export https_proxy="your_https_proxy"
50+
# export no_proxy="localhost,127.0.0.1,${HOST_IP}" # Add other hosts if necessary
51+
source intel/set_env.sh
52+
cd /intel/cpu/xeon
53+
```
5454

55-
_Note: The compose file might read additional variables from set_env.sh. Ensure all required variables like ports (`LLM_SERVICE_PORT`, `MEGA_SERVICE_PORT`, etc.) are set if not using defaults from the compose file._
55+
_Note: The compose file might read additional variables from set_env.sh. Ensure all required variables like ports (`LLM_SERVICE_PORT`, `MEGA_SERVICE_PORT`, etc.) are set if not using defaults from the compose file._
5656

5757
For instance, edit the set_env.sh to change the LLM model
5858

59-
```
60-
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
61-
```
62-
can be changed to other model if needed
63-
```
64-
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-32B-Instruct"
65-
```
59+
```bash
60+
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
61+
```
6662

67-
2. **Start Services (vLLM Profile):**
63+
can be changed to other model if needed
6864

6965
```bash
70-
docker compose --profile codegen-xeon-vllm up -d
66+
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-32B-Instruct"
67+
```
68+
69+
2. **Start Services (vLLM):**
70+
71+
```bash
72+
docker compose up -d
7173
```
7274

7375
3. **Validate:**
7476
Wait several minutes for models to download (especially the first time) and services to initialize. Check container logs (`docker compose logs -f <service_name>`) or proceed to the validation steps below.
7577

7678
### Available Deployment Options
7779

78-
The `compose.yaml` file uses Docker Compose profiles to select the LLM serving backend.
80+
Different Docker Compose files are available to select the LLM serving backend.
7981

80-
#### Default: vLLM-based Deployment (`--profile codegen-xeon-vllm`)
82+
#### Default: vLLM-based Deployment (`compose.yaml`)
8183

82-
- **Profile:** `codegen-xeon-vllm`
83-
- **Description:** Uses vLLM optimized for Intel CPUs as the LLM serving engine. This is the default profile used in the Quick Start.
84+
- **Compose File:** `compose.yaml`
85+
- **Description:** Uses vLLM optimized for Intel CPUs as the LLM serving engine. This is the default deployment option used in the Quick Start.
8486
- **Services Deployed:** `codegen-vllm-server`, `codegen-llm-server`, `codegen-tei-embedding-server`, `codegen-retriever-server`, `redis-vector-db`, `codegen-dataprep-server`, `codegen-backend-server`, `codegen-gradio-ui-server`.
8587

86-
#### TGI-based Deployment (`--profile codegen-xeon-tgi`)
88+
#### TGI-based Deployment (`compose_tgi.yaml`)
8789

88-
- **Profile:** `codegen-xeon-tgi`
90+
- **Compose File:** `compose_tgi.yaml`
8991
- **Description:** Uses Hugging Face Text Generation Inference (TGI) optimized for Intel CPUs as the LLM serving engine.
9092
- **Services Deployed:** `codegen-tgi-server`, `codegen-llm-server`, `codegen-tei-embedding-server`, `codegen-retriever-server`, `redis-vector-db`, `codegen-dataprep-server`, `codegen-backend-server`, `codegen-gradio-ui-server`.
9193
- **To Run:**
9294
```bash
9395
# Ensure environment variables (HOST_IP, HF_TOKEN) are set
94-
docker compose --profile codegen-xeon-tgi up -d
96+
docker compose -f compose_tgi.yaml up -d
9597
```
9698

9799
### Configuration Parameters
@@ -100,28 +102,28 @@ The `compose.yaml` file uses Docker Compose profiles to select the LLM serving b
100102

101103
Key parameters are configured via environment variables set before running `docker compose up`.
102104

103-
| Environment Variable | Description | Default (Set Externally) |
104-
| :-------------------------------------- | :------------------------------------------------------------------------------------------------------------------ | :--------------------------------------------- | ------------------------------------ |
105-
| `HOST_IP` | External IP address of the host machine. **Required.** | `your_external_ip_address` |
106-
| `HF_TOKEN` | Your Hugging Face Hub token for model access. **Required.** | `your_huggingface_token` |
107-
| `LLM_MODEL_ID` | Hugging Face model ID for the CodeGen LLM (used by TGI/vLLM service). Configured within `compose.yaml` environment. | `Qwen/Qwen2.5-Coder-7B-Instruct` |
108-
| `EMBEDDING_MODEL_ID` | Hugging Face model ID for the embedding model (used by TEI service). Configured within `compose.yaml` environment. | `BAAI/bge-base-en-v1.5` |
109-
| `LLM_ENDPOINT` | Internal URL for the LLM serving endpoint (used by `codegen-llm-server`). Configured in `compose.yaml`. | `http://codegen-vllm | tgi-server:9000/v1/chat/completions` |
110-
| `TEI_EMBEDDING_ENDPOINT` | Internal URL for the Embedding service. Configured in `compose.yaml`. | `http://codegen-tei-embedding-server:80/embed` |
111-
| `DATAPREP_ENDPOINT` | Internal URL for the Data Preparation service. Configured in `compose.yaml`. | `http://codegen-dataprep-server:80/dataprep` |
112-
| `BACKEND_SERVICE_ENDPOINT` | External URL for the CodeGen Gateway (MegaService). Derived from `HOST_IP` and port `7778`. | `http://${HOST_IP}:7778/v1/codegen` |
113-
| `*_PORT` (Internal) | Internal container ports (e.g., `80`, `6379`). Defined in `compose.yaml`. | N/A |
114-
| `http_proxy` / `https_proxy`/`no_proxy` | Network proxy settings (if required). | `""` |
105+
| Environment Variable | Description | Default (Set Externally) |
106+
| :-------------------------------------- | :----------------------------------------------------------------------------------------------------- | :---------------------------------------------------- |
107+
| `HOST_IP` | External IP address of the host machine. **Required.** | `your_external_ip_address` |
108+
| `HF_TOKEN` | Your Hugging Face Hub token for model access. **Required.** | `your_huggingface_token` |
109+
| `LLM_MODEL_ID` | Hugging Face model ID for the CodeGen LLM (used by TGI/vLLM service). Configured within composes files | `Qwen/Qwen2.5-Coder-7B-Instruct` |
110+
| `EMBEDDING_MODEL_ID` | Hugging Face model ID for the embedding model (used by TEI service). Configured within compose files. | `BAAI/bge-base-en-v1.5` |
111+
| `LLM_ENDPOINT` | Internal URL for the LLM serving endpoint (used by `codegen-llm-server`). Configured in compose files. | `http://codegen-vllm-server:9000/v1/chat/completions` |
112+
| `TEI_EMBEDDING_ENDPOINT` | Internal URL for the Embedding service. Configured in compose files. | `http://codegen-tei-embedding-server:80/embed` |
113+
| `DATAPREP_ENDPOINT` | Internal URL for the Data Preparation service. Configured in compose files. | `http://codegen-dataprep-server:80/dataprep` |
114+
| `BACKEND_SERVICE_ENDPOINT` | External URL for the CodeGen Gateway (MegaService). Derived from `HOST_IP` and port `7778`. | `http://${HOST_IP}:7778/v1/codegen` |
115+
| `*_PORT` (Internal) | Internal container ports (e.g., `80`, `6379`). Defined in compose files. | N/A |
116+
| `http_proxy` / `https_proxy`/`no_proxy` | Network proxy settings (if required). | `""` |
115117

116118
Most of these parameters are in `set_env.sh`, you can either modify this file or overwrite the env variables by setting them.
117119

118120
```shell
119121
source CodeGen/docker_compose/set_env.sh
120122
```
121123

122-
#### Compose Profiles
124+
#### Compose Files
123125

124-
Docker Compose profiles (`codegen-xeon-vllm`, `codegen-xeon-tgi`) control which LLM serving backend (vLLM or TGI) and its associated dependencies are started. Only one profile should typically be active.
126+
Different Docker Compose files (`compose.yaml`, `compose_tgi.yaml`) control which LLM serving backend (vLLM or TGI) and its associated dependencies are started. Choose the appropriate compose file based on your requirements.
125127

126128
## Building Custom Images (Optional)
127129

@@ -130,19 +132,20 @@ If you need to modify the microservices:
130132
1. Clone the [OPEA GenAIComps](https://github.com/opea-project/GenAIComps) repository.
131133
2. Follow build instructions in the respective component directories (e.g., `comps/llms/text-generation`, `comps/codegen`, `comps/ui/gradio`, etc.). Use the provided Dockerfiles (e.g., `CodeGen/Dockerfile`, `CodeGen/ui/docker/Dockerfile.gradio`).
132134
3. Tag your custom images appropriately (e.g., `my-custom-codegen:latest`).
133-
4. Update the `image:` fields in the `compose.yaml` file to use your custom image tags.
135+
4. Update the `image:` fields in the compose files (`compose.yaml` or `compose_tgi.yaml`) to use your custom image tags.
134136

135137
_Refer to the main [CodeGen README](../../../../README.md) for links to relevant GenAIComps components._
136138

137139
## Validate Services
138140

139141
### Check Container Status
140142

141-
Ensure all containers associated with the chosen profile are running:
143+
Ensure all containers associated with the chosen compose file are running:
142144

143145
```bash
144-
docker compose --profile <profile_name> ps
145-
# Example: docker compose --profile codegen-xeon-vllm ps
146+
docker compose -f <compose-file> ps
147+
# Example: docker compose ps # for vLLM (compose.yaml)
148+
# Example: docker compose -f compose_tgi.yaml ps # for TGI
146149
```
147150

148151
Check logs for specific services: `docker compose logs <service_name>`
@@ -173,7 +176,7 @@ Use `curl` commands to test the main service endpoints. Ensure `HOST_IP` is corr
173176

174177
## Accessing the User Interface (UI)
175178

176-
Multiple UI options can be configured via the `compose.yaml`.
179+
Multiple UI options can be configured via the compose files.
177180

178181
### Gradio UI (Default)
179182

@@ -186,16 +189,16 @@ _(Port `5173` is the default host mapping for `codegen-gradio-ui-server`)_
186189

187190
### Svelte UI (Optional)
188191

189-
1. Modify `compose.yaml`: Comment out the `codegen-gradio-ui-server` service and uncomment/add the `codegen-xeon-ui-server` (Svelte) service definition, ensuring the port mapping is correct (e.g., `"- 5173:5173"`).
190-
2. Restart Docker Compose: `docker compose --profile <profile_name> up -d`
192+
1. Modify the compose file (either `compose.yaml` or `compose_tgi.yaml`): Comment out the `codegen-gradio-ui-server` service and uncomment/add the `codegen-xeon-ui-server` (Svelte) service definition, ensuring the port mapping is correct (e.g., `"- 5173:5173"`).
193+
2. Restart Docker Compose: `docker compose up -d` or `docker compose -f compose_tgi.yaml up -d`
191194
3. Access: `http://{HOST_IP}:5173` (or the host port you mapped).
192195

193196
![Svelte UI Init](../../../../assets/img/codeGen_ui_init.jpg)
194197

195198
### React UI (Optional)
196199

197-
1. Modify `compose.yaml`: Comment out the default UI service and uncomment/add the `codegen-xeon-react-ui-server` definition, ensuring correct port mapping (e.g., `"- 5174:80"`).
198-
2. Restart Docker Compose: `docker compose --profile <profile_name> up -d`
200+
1. Modify the compose file (either `compose.yaml` or `compose_tgi.yaml`): Comment out the default UI service and uncomment/add the `codegen-xeon-react-ui-server` definition, ensuring correct port mapping (e.g., `"- 5174:80"`).
201+
2. Restart Docker Compose: `docker compose up -d` or `docker compose -f compose_tgi.yaml up -d`
199202
3. Access: `http://{HOST_IP}:5174` (or the host port you mapped).
200203

201204
![React UI](../../../../assets/img/codegen_react.png)
@@ -218,21 +221,18 @@ Users can interact with the backend service using the `Neural Copilot` VS Code e
218221

219222
- **Model Download Issues:** Check `HF_TOKEN`. Ensure internet connectivity or correct proxy settings. Check logs of `tgi-service`/`vllm-service` and `tei-embedding-server`. Gated models need prior Hugging Face access.
220223
- **Connection Errors:** Verify `HOST_IP` is correct and accessible. Check `docker ps` for port mappings. Ensure `no_proxy` includes `HOST_IP` if using a proxy. Check logs of the service failing to connect (e.g., `codegen-backend-server` logs if it can't reach `codegen-llm-server`).
221-
- **"Container name is in use"**: Stop existing containers (`docker compose down`) or change `container_name` in `compose.yaml`.
224+
- **"Container name is in use"**: Stop existing containers (`docker compose down`) or change `container_name` in the compose file.
222225
- **Resource Issues:** CodeGen models can be memory-intensive. Monitor host RAM usage. Increase Docker resources if needed.
223226
224227
## Stopping the Application
225228
226229
```bash
227-
docker compose --profile <profile_name> down
228-
# Example: docker compose --profile codegen-xeon-vllm down
230+
docker compose down # for vLLM (compose.yaml)
231+
# or
232+
docker compose -f compose_tgi.yaml down # for TGI
229233
```
230234
231235
## Next Steps
232236
233237
- Consult the [OPEA GenAIComps](https://github.com/opea-project/GenAIComps) repository for details on individual microservices.
234238
- Refer to the main [CodeGen README](../../../../README.md) for links to benchmarking and Kubernetes deployment options.
235-
236-
```
237-
238-
```

CodeGen/docker_compose/intel/cpu/xeon/compose.yaml

Lines changed: 0 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -3,33 +3,9 @@
33

44
services:
55

6-
tgi-service:
7-
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
8-
container_name: tgi-server
9-
profiles:
10-
- codegen-xeon-tgi
11-
ports:
12-
- "8028:80"
13-
volumes:
14-
- "${MODEL_CACHE:-./data}:/data"
15-
shm_size: 1g
16-
environment:
17-
no_proxy: ${no_proxy}
18-
http_proxy: ${http_proxy}
19-
https_proxy: ${https_proxy}
20-
HF_TOKEN: ${HF_TOKEN}
21-
host_ip: ${host_ip}
22-
healthcheck:
23-
test: ["CMD-SHELL", "curl -f http://localhost:80/health || exit 1"]
24-
interval: 10s
25-
timeout: 10s
26-
retries: 100
27-
command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0
286
vllm-service:
297
image: ${REGISTRY:-opea}/vllm:${TAG:-latest}
308
container_name: vllm-server
31-
profiles:
32-
- codegen-xeon-vllm
339
ports:
3410
- "8028:80"
3511
volumes:
@@ -58,22 +34,9 @@ services:
5834
LLM_MODEL_ID: ${LLM_MODEL_ID}
5935
HUGGINGFACEHUB_API_TOKEN: ${HF_TOKEN}
6036
restart: unless-stopped
61-
llm-tgi-service:
62-
extends: llm-base
63-
container_name: llm-codegen-tgi-server
64-
profiles:
65-
- codegen-xeon-tgi
66-
ports:
67-
- "9000:9000"
68-
ipc: host
69-
depends_on:
70-
tgi-service:
71-
condition: service_healthy
7237
llm-vllm-service:
7338
extends: llm-base
7439
container_name: llm-codegen-vllm-server
75-
profiles:
76-
- codegen-xeon-vllm
7740
ports:
7841
- "9000:9000"
7942
ipc: host

0 commit comments

Comments
 (0)