Skip to content

Commit 0d52c2f

Browse files
pallavijaini0525pre-commit-ci[bot]AI WorkloadsPallavi Jainiroot
authored
Pinecone update to Readme and docker compose for ChatQnA (opea-project#540)
Signed-off-by: pallavi jaini <[email protected]> Signed-off-by: AI Workloads <[email protected]> Signed-off-by: Pallavi Jaini <pallavi,[email protected]> Signed-off-by: Pallavi Jaini <[email protected]> Signed-off-by: root <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: AI Workloads <[email protected]> Co-authored-by: Pallavi Jaini <pallavi,[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chen, suyue <[email protected]>
1 parent 1ff85f6 commit 0d52c2f

File tree

4 files changed

+778
-0
lines changed

4 files changed

+778
-0
lines changed
Lines changed: 382 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,382 @@
1+
# Build Mega Service of ChatQnA (with Pinecone) on Xeon
2+
3+
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service.
4+
5+
## 🚀 Apply Xeon Server on AWS
6+
7+
To apply a Xeon server on AWS, start by creating an AWS account if you don't have one already. Then, head to the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home) to begin the process. Within the EC2 service, select the Amazon EC2 M7i or M7i-flex instance type to leverage the power of 4th Generation Intel Xeon Scalable processors. These instances are optimized for high-performance computing and demanding workloads.
8+
9+
For detailed information about these instance types, you can refer to this [link](https://aws.amazon.com/ec2/instance-types/m7i/). Once you've chosen the appropriate instance type, proceed with configuring your instance settings, including network configurations, security groups, and storage options.
10+
11+
After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed.
12+
13+
**Certain ports in the EC2 instance need to opened up in the security group, for the microservices to work with the curl commands**
14+
15+
> See one example below. Please open up these ports in the EC2 instance based on the IP addresses you want to allow
16+
17+
```
18+
19+
data_prep_service
20+
=====================
21+
Port 6007 - Open to 0.0.0.0/0
22+
Port 6008 - Open to 0.0.0.0/0
23+
24+
tei_embedding_service
25+
=====================
26+
Port 6006 - Open to 0.0.0.0/0
27+
28+
embedding
29+
=========
30+
Port 6000 - Open to 0.0.0.0/0
31+
32+
retriever
33+
=========
34+
Port 7000 - Open to 0.0.0.0/0
35+
36+
tei_xeon_service
37+
================
38+
Port 8808 - Open to 0.0.0.0/0
39+
40+
reranking
41+
=========
42+
Port 8000 - Open to 0.0.0.0/0
43+
44+
tgi-service
45+
===========
46+
Port 9009 - Open to 0.0.0.0/0
47+
48+
llm
49+
===
50+
Port 9000 - Open to 0.0.0.0/0
51+
52+
chaqna-xeon-backend-server
53+
==========================
54+
Port 8888 - Open to 0.0.0.0/0
55+
56+
chaqna-xeon-ui-server
57+
=====================
58+
Port 5173 - Open to 0.0.0.0/0
59+
```
60+
61+
## 🚀 Build Docker Images
62+
63+
First of all, you need to build Docker Images locally and install the python package of it.
64+
65+
```bash
66+
git clone https://github.com/opea-project/GenAIComps.git
67+
cd GenAIComps
68+
```
69+
70+
### 1. Build Embedding Image
71+
72+
```bash
73+
docker build --no-cache -t opea/embedding-tei:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/tei/langchain/Dockerfile .
74+
```
75+
76+
### 2. Build Retriever Image
77+
78+
```bash
79+
docker build --no-cache -t opea/retriever-pinecone:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/pinecone/langchain/Dockerfile .
80+
```
81+
82+
### 3. Build Rerank Image
83+
84+
```bash
85+
docker build --no-cache -t opea/reranking-tei:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/reranks/tei/Dockerfile .
86+
```
87+
88+
### 4. Build LLM Image
89+
90+
```bash
91+
docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
92+
```
93+
94+
### 5. Build Dataprep Image
95+
96+
```bash
97+
docker build --no-cache -t opea/dataprep-pinecone:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/pinecone/langchain/Dockerfile .
98+
cd ..
99+
```
100+
101+
### 6. Build MegaService Docker Image
102+
103+
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `chatqna.py` Python script. Build MegaService Docker image via below command:
104+
105+
```bash
106+
git clone https://github.com/opea-project/GenAIExamples.git
107+
cd GenAIExamples/ChatQnA/docker
108+
docker build --no-cache -t opea/chatqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
109+
cd ../../..
110+
```
111+
112+
### 7. Build UI Docker Image
113+
114+
Build frontend Docker image via below command:
115+
116+
```bash
117+
cd GenAIExamples/ChatQnA/docker/ui/
118+
docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
119+
cd ../../../..
120+
```
121+
122+
### 8. Build Conversational React UI Docker Image (Optional)
123+
124+
Build frontend Docker image that enables Conversational experience with ChatQnA megaservice via below command:
125+
126+
**Export the value of the public IP address of your Xeon server to the `host_ip` environment variable**
127+
128+
```bash
129+
cd GenAIExamples/ChatQnA/docker/ui/
130+
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
131+
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
132+
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6008/v1/dataprep/get_file"
133+
docker build --no-cache -t opea/chatqna-conversation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy --build-arg BACKEND_SERVICE_ENDPOINT=$BACKEND_SERVICE_ENDPOINT --build-arg DATAPREP_SERVICE_ENDPOINT=$DATAPREP_SERVICE_ENDPOINT --build-arg DATAPREP_GET_FILE_ENDPOINT=$DATAPREP_GET_FILE_ENDPOINT -f ./docker/Dockerfile.react .
134+
cd ../../../..
135+
```
136+
137+
Then run the command `docker images`, you will have the following 7 Docker Images:
138+
139+
1. `opea/dataprep-pinecone:latest`
140+
2. `opea/embedding-tei:latest`
141+
3. `opea/retriever-pinecone:latest`
142+
4. `opea/reranking-tei:latest`
143+
5. `opea/llm-tgi:latest`
144+
6. `opea/chatqna:latest`
145+
7. `opea/chatqna-ui:latest`
146+
147+
## 🚀 Start Microservices
148+
149+
### Setup Environment Variables
150+
151+
Since the `compose_pinecone.yaml` will consume some environment variables, you need to setup them in advance as below.
152+
153+
**Export the value of the public IP address of your Xeon server to the `host_ip` environment variable**
154+
155+
> Change the External_Public_IP below with the actual IPV4 value
156+
157+
```
158+
export host_ip="External_Public_IP"
159+
```
160+
161+
**Export the value of your Huggingface API token to the `your_hf_api_token` environment variable**
162+
163+
> Change the Your_Huggingface_API_Token below with tyour actual Huggingface API Token value
164+
165+
```
166+
export your_hf_api_token="Your_Huggingface_API_Token"
167+
```
168+
169+
**Append the value of the public IP address to the no_proxy list**
170+
171+
```
172+
export your_no_proxy=${your_no_proxy},"External_Public_IP"
173+
```
174+
175+
\*\*Get the PINECONE_API_KEY and the INDEX_NAME
176+
177+
```
178+
export pinecone_api_key=${api_key}
179+
export pinecone_index_name=${pinecone_index}
180+
```
181+
182+
```bash
183+
export no_proxy=${your_no_proxy}
184+
export http_proxy=${your_http_proxy}
185+
export https_proxy=${your_http_proxy}
186+
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
187+
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
188+
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
189+
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
190+
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
191+
export TGI_LLM_ENDPOINT="http://${host_ip}:9009"
192+
export PINECONE_API_KEY=${pinecone_api_key}
193+
export PINECONE_INDEX_NAME=${pinecone_index_name}
194+
export INDEX_NAME=${pinecone_index_name}
195+
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
196+
export MEGA_SERVICE_HOST_IP=${host_ip}
197+
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
198+
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
199+
export RERANK_SERVICE_HOST_IP=${host_ip}
200+
export LLM_SERVICE_HOST_IP=${host_ip}
201+
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
202+
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
203+
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6008/v1/dataprep/get_file"
204+
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6009/v1/dataprep/delete_file"
205+
```
206+
207+
Note: Please replace with `host_ip` with you external IP address, do not use localhost.
208+
209+
### Start all the services Docker Containers
210+
211+
> Before running the docker compose command, you need to be in the folder that has the docker compose yaml file
212+
213+
```bash
214+
cd GenAIExamples/ChatQnA/docker/xeon/
215+
docker compose -f compose_pinecone.yaml up -d
216+
```
217+
218+
### Validate Microservices
219+
220+
1. TEI Embedding Service
221+
222+
```bash
223+
curl ${host_ip}:6006/embed \
224+
-X POST \
225+
-d '{"inputs":"What is Deep Learning?"}' \
226+
-H 'Content-Type: application/json'
227+
```
228+
229+
2. Embedding Microservice
230+
231+
```bash
232+
curl http://${host_ip}:6000/v1/embeddings\
233+
-X POST \
234+
-d '{"text":"hello"}' \
235+
-H 'Content-Type: application/json'
236+
```
237+
238+
3. Retriever Microservice
239+
To validate the retriever microservice, you need to generate a mock embedding vector of length 768 in Python script:
240+
241+
```Python
242+
import random
243+
embedding = [random.uniform(-1, 1) for _ in range(768)]
244+
print(embedding)
245+
```
246+
247+
Then substitute your mock embedding vector for the `${your_embedding}` in the following cURL command:
248+
249+
```bash
250+
curl http://${host_ip}:7000/v1/retrieval \
251+
-X POST \
252+
-d '{"text":"What is the revenue of Nike in 2023?","embedding":"'"${your_embedding}"'"}' \
253+
-H 'Content-Type: application/json'
254+
```
255+
256+
4. TEI Reranking Service
257+
258+
```bash
259+
curl http://${host_ip}:8808/rerank \
260+
-X POST \
261+
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
262+
-H 'Content-Type: application/json'
263+
```
264+
265+
5. Reranking Microservice
266+
267+
```bash
268+
curl http://${host_ip}:8000/v1/reranking\
269+
-X POST \
270+
-d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
271+
-H 'Content-Type: application/json'
272+
```
273+
274+
6. TGI Service
275+
276+
```bash
277+
curl http://${host_ip}:9009/generate \
278+
-X POST \
279+
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
280+
-H 'Content-Type: application/json'
281+
```
282+
283+
7. LLM Microservice
284+
285+
```bash
286+
curl http://${host_ip}:9000/v1/chat/completions\
287+
-X POST \
288+
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
289+
-H 'Content-Type: application/json'
290+
```
291+
292+
8. MegaService
293+
294+
```bash
295+
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
296+
"messages": "What is the revenue of Nike in 2023?"
297+
}'
298+
```
299+
300+
9. Dataprep Microservice(Optional)
301+
302+
If you want to update the default knowledge base, you can use the following commands:
303+
304+
Update Knowledge Base via Local File Upload:
305+
306+
```bash
307+
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
308+
-H "Content-Type: multipart/form-data" \
309+
-F "files=@./nke-10k-2023.pdf"
310+
```
311+
312+
This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.
313+
314+
Add Knowledge Base via HTTP Links:
315+
316+
```bash
317+
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
318+
-H "Content-Type: multipart/form-data" \
319+
-F 'link_list=["https://opea.dev"]'
320+
```
321+
322+
This command updates a knowledge base by submitting a list of HTTP links for processing.
323+
324+
Also, you are able to get the file list that you uploaded:
325+
326+
```bash
327+
curl -X POST "http://${host_ip}:6008/v1/dataprep/get_file" \
328+
-H "Content-Type: application/json"
329+
```
330+
331+
## Enable LangSmith for Monotoring Application (Optional)
332+
333+
LangSmith offers tools to debug, evaluate, and monitor language models and intelligent agents. It can be used to assess benchmark data for each microservice. Before launching your services with `docker compose -f compose_pinecone.yaml up -d`, you need to enable LangSmith tracing by setting the `LANGCHAIN_TRACING_V2` environment variable to true and configuring your LangChain API key.
334+
335+
Here's how you can do it:
336+
337+
1. Install the latest version of LangSmith:
338+
339+
```bash
340+
pip install -U langsmith
341+
```
342+
343+
2. Set the necessary environment variables:
344+
345+
```bash
346+
export LANGCHAIN_TRACING_V2=true
347+
export LANGCHAIN_API_KEY=ls_...
348+
```
349+
350+
## 🚀 Launch the UI
351+
352+
To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
353+
354+
```yaml
355+
chaqna-gaudi-ui-server:
356+
image: opea/chatqna-ui:latest
357+
...
358+
ports:
359+
- "80:5173"
360+
```
361+
362+
## 🚀 Launch the Conversational UI (react)
363+
364+
To access the Conversational UI frontend, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
365+
366+
```yaml
367+
chaqna-xeon-conversation-ui-server:
368+
image: opea/chatqna-conversation-ui:latest
369+
...
370+
ports:
371+
- "80:80"
372+
```
373+
374+
![project-screenshot](../../../../assets/img/chat_ui_init.png)
375+
376+
Here is an example of running ChatQnA:
377+
378+
![project-screenshot](../../../../assets/img/chat_ui_response.png)
379+
380+
Here is an example of running ChatQnA with Conversational UI (React):
381+
382+
![project-screenshot](../../../../assets/img/conversation_ui_response.png)

0 commit comments

Comments
 (0)