diff --git a/ChatQnA/README.md b/ChatQnA/README.md index 2cccfa62e5..30519a1a4e 100644 --- a/ChatQnA/README.md +++ b/ChatQnA/README.md @@ -1,149 +1,22 @@ # ChatQnA Application -Chatbots are the most widely adopted use case for leveraging the powerful chat and reasoning capabilities of large language models (LLMs). The retrieval augmented generation (RAG) architecture is quickly becoming the industry standard for chatbots development. It combines the benefits of a knowledge base (via a vector store) and generative models to reduce hallucinations, maintain up-to-date information, and leverage domain-specific knowledge. +Chatbots are the most widely adopted use case for leveraging the powerful chat and reasoning capabilities of large language models (LLMs). The retrieval augmented generation (RAG) architecture is quickly becoming the industry standard for chatbot development. It combines the benefits of a knowledge base (via a vector store) and generative models to reduce hallucinations, maintain up-to-date information, and leverage domain-specific knowledge. -RAG bridges the knowledge gap by dynamically fetching relevant information from external sources, ensuring that responses generated remain factual and current. The core of this architecture are vector databases, which are instrumental in enabling efficient and semantic retrieval of information. These databases store data as vectors, allowing RAG to swiftly access the most pertinent documents or data points based on semantic similarity. +RAG bridges the knowledge gap by dynamically fetching relevant information from external sources, ensuring that the response generated remains factual and current. Vector databases are at the core of this architecture, enabling efficient retrieval of semantically relevant information. These databases store data as vectors, allowing RAG to swiftly access the most pertinent documents or data points based on semantic similarity. # Table of contents -1. [Automated Terraform Deployment](#automated-deployment-to-ubuntu-based-systemif-not-using-terraform-using-intel-optimized-cloud-modules-for-ansible) -2. [Automated Deployment to Ubuntu based system](#automated-deployment-to-ubuntu-based-systemif-not-using-terraform-using-intel-optimized-cloud-modules-for-ansible) -3. [Manually Deployment](#manually-deploy-chatqna-service) -4. [Architecture and Deploy Details](#architecture-and-deploy-details) -5. [Consume Service](#consume-chatqna-service-with-rag) -6. [Monitoring and Tracing](#monitoring-opea-service-with-prometheus-and-grafana-dashboard) +1. [Architecture](#architecture) +2. [Deployment Options](#deployment-options) +3. [Monitoring and Tracing](./README_miscellaneous.md#Monitoring-OPEA-Service-with-Prometheus-and-Grafana-dashboard) -## šŸ¤– Automated Terraform Deployment using IntelĀ® Optimized Cloud Modules for **Terraform** +## Architecture -| Cloud Provider | Intel Architecture | Intel Optimized Cloud Module for Terraform | Comments | -| -------------------- | ------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------- | -| AWS | 4th Gen Intel Xeon with Intel AMX | [AWS Deployment](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-chatqna) | Uses meta-llama/Meta-Llama-3-8B-Instruct by default | -| AWS Falcon2-11B | 4th Gen Intel Xeon with Intel AMX | [AWS Deployment with Falcon11B](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-chatqna-falcon11B) | Uses TII Falcon2-11B LLM Model | -| AWS Falcon3 | 4th Gen Intel Xeon with Intel AMX | [AWS Deployment with Falcon3](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-chatqna-falcon3) | Uses TII Falcon3 LLM Model | -| GCP | 4th/5th Gen Intel Xeon with Intel AMX & Intel TDX | [GCP Deployment](https://github.com/intel/terraform-intel-gcp-vm/tree/main/examples/gen-ai-xeon-opea-chatqna) | Supports Confidential AI by using IntelĀ® TDX with 4th Gen Xeon | -| Azure | 4th/5th Gen Intel Xeon with Intel AMX & Intel TDX | [Azure Deployment](https://github.com/intel/terraform-intel-azure-linux-vm/tree/main/examples/azure-gen-ai-xeon-opea-chatqna-tdx) | Supports Confidential AI by using IntelĀ® TDX with 4th Gen Xeon | -| Intel Tiber AI Cloud | 5th Gen Intel Xeon with Intel AMX | Work-in-progress | Work-in-progress | +The ChatQnA application is a customizable end-to-end workflow that leverages the capabilities of LLMs and RAG efficiently. ChatQnA architecture is shown below: -## Automated Deployment to Ubuntu based system (if not using Terraform) using IntelĀ® Optimized Cloud Modules for **Ansible** - -To deploy to existing Xeon Ubuntu based system, use our Intel Optimized Cloud Modules for Ansible. This is the same Ansible playbook used by Terraform. -Use this if you are not using Terraform and have provisioned your system with another tool or manually including bare metal. - -| Operating System | Intel Optimized Cloud Module for Ansible | -| ---------------- | ----------------------------------------------------------------------------------------------------------------- | -| Ubuntu 20.04 | [ChatQnA Ansible Module](https://github.com/intel/optimized-cloud-recipes/tree/main/recipes/ai-opea-chatqna-xeon) | -| Ubuntu 22.04 | Work-in-progress | - -## Manually Deploy ChatQnA Service - -The ChatQnA service can be effortlessly deployed on Intel Gaudi2, Intel Xeon Scalable Processors,Nvidia GPU and AMD GPU. - -Two types of ChatQnA pipeline are supported now: `ChatQnA with/without Rerank`. And the `ChatQnA without Rerank` pipeline (including Embedding, Retrieval, and LLM) is offered for Xeon customers who can not run rerank service on HPU yet require high performance and accuracy. - -Quick Start Deployment Steps: - -1. Set up the environment variables. -2. Run Docker Compose. -3. Consume the ChatQnA Service. - -Note: - -1. If you do not have docker installed you can run this script to install docker : `bash docker_compose/install_docker.sh`. - -2. The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) `or` you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models). - -### Quick Start: 1.Setup Environment Variable - -To set up environment variables for deploying ChatQnA services, follow these steps: - -1. Set the required environment variables: - - ```bash - # Example: host_ip="192.168.1.1" - export host_ip="External_Public_IP" - # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" - export no_proxy="Your_No_Proxy" - export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" - ``` - -2. If you are in a proxy environment, also set the proxy-related environment variables: - - ```bash - export http_proxy="Your_HTTP_Proxy" - export https_proxy="Your_HTTPs_Proxy" - ``` - -3. Set up other environment variables: - - > Notice that you can only choose **one** hardware option below to set up envs according to your hardware. Make sure port numbers are set correctly as well. - - ```bash - # on Gaudi - cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/ - source ./set_env.sh - export no_proxy="Your_No_Proxy",chatqna-gaudi-ui-server,chatqna-gaudi-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,guardrails,jaeger,prometheus,grafana,gaudi-node-exporter-1 - # on Xeon - cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/ - source ./set_env.sh - export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,jaeger,prometheus,grafana,xeon-node-exporter-1 - # on Nvidia GPU - cd GenAIExamples/ChatQnA/docker_compose/nvidia/gpu - source ./set_env.sh - export no_proxy="Your_No_Proxy",chatqna-ui-server,chatqna-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service - ``` - -### Quick Start: 2.Run Docker Compose - -Select the compose.yaml file that matches your hardware. - -CPU example: - -```bash -cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/ -# cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/ -# cd GenAIExamples/ChatQnA/docker_compose/nvidia/gpu/ -docker compose up -d -``` - -To enable Open Telemetry Tracing, compose.telemetry.yaml file need to be merged along with default compose.yaml file. -CPU example with Open Telemetry feature: - -```bash -cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/ -docker compose -f compose.yaml -f compose.telemetry.yaml up -d -``` - -It will automatically download the docker image on `docker hub`: - -```bash -docker pull opea/chatqna:latest -docker pull opea/chatqna-ui:latest -``` - -In following cases, you could build docker image from source by yourself. - -- Failed to download the docker image. - -- If you want to use a specific version of Docker image. - -Please refer to the 'Build Docker Images' in [Guide](docker_compose/intel/cpu/xeon/README.md). - -### QuickStart: 3.Consume the ChatQnA Service - -```bash -curl http://${host_ip}:8888/v1/chatqna \ - -H "Content-Type: application/json" \ - -d '{ - "messages": "What is the revenue of Nike in 2023?" - }' -``` - -## Architecture and Deploy details - -ChatQnA architecture shows below: ![architecture](./assets/img/chatqna_architecture.png) -The ChatQnA example is implemented using the component-level microservices defined in [GenAIComps](https://github.com/opea-project/GenAIComps). The flow chart below shows the information flow between different microservices for this example. +This application is modular as it leverages each component as a microservice(as defined in [GenAIComps](https://github.com/opea-project/GenAIComps)) that can scale independently. It comprises data preparation, embedding, retrieval, reranker(optional) and LLM microservices. All these microservices are stiched together by the Chatqna megaservice that orchestrates the data through these microservices. The flow chart below shows the information flow between different microservices for this example. ```mermaid --- @@ -219,192 +92,21 @@ flowchart LR ``` -This ChatQnA use case performs RAG using LangChain, Redis VectorDB and Text Generation Inference on [Intel Gaudi2](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html) or [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html). -In the below, we provide a table that describes for each microservice component in the ChatQnA architecture, the default configuration of the open source project, hardware, port, and endpoint. - -Gaudi default compose.yaml - -| MicroService | Open Source Project | HW | Port | Endpoint | -| ------------ | ------------------- | ----- | ---- | -------------------- | -| Embedding | Langchain | Xeon | 6000 | /v1/embeddings | -| Retriever | Langchain, Redis | Xeon | 7000 | /v1/retrieval | -| Reranking | Langchain, TEI | Gaudi | 8000 | /v1/reranking | -| LLM | Langchain, TGI | Gaudi | 9000 | /v1/chat/completions | -| Dataprep | Redis, Langchain | Xeon | 6007 | /v1/dataprep/ingest | - -### Required Models - -By default, the embedding, reranking and LLM models are set to a default value as listed below: - -| Service | Model | -| --------- | ----------------------------------- | -| Embedding | BAAI/bge-base-en-v1.5 | -| Reranking | BAAI/bge-reranker-base | -| LLM | meta-llama/Meta-Llama-3-8B-Instruct | - -Change the `xxx_MODEL_ID` in `docker_compose/xxx/set_env.sh` for your needs. - -For customers with proxy issues, the models from [ModelScope](https://www.modelscope.cn/models) are also supported in ChatQnA. Refer to [this readme](docker_compose/intel/cpu/xeon/README.md) for details. - -### Deploy ChatQnA on Gaudi - -Find the corresponding [compose.yaml](./docker_compose/intel/hpu/gaudi/compose.yaml). - -```bash -cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/ -docker compose up -d -``` - -To enable Open Telemetry Tracing, compose.telemetry.yaml file need to be merged along with default compose.yaml file. - -```bash -cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/ -docker compose -f compose.yaml -f compose.telemetry.yaml up -d -``` - -Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) to build docker images from source. - -### Deploy ChatQnA on Xeon - -Find the corresponding [compose.yaml](./docker_compose/intel/cpu/xeon/compose.yaml). - -```bash -cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/ -docker compose up -d -``` - -To enable Open Telemetry Tracing, compose.telemetry.yaml file need to be merged along with default compose.yaml file. - -```bash -cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/ -docker compose -f compose.yaml -f compose.telemetry.yaml up -d -``` - -Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for more instructions on building docker images from source. - -### Deploy ChatQnA on NVIDIA GPU - -```bash -cd GenAIExamples/ChatQnA/docker_compose/nvidia/gpu/ -docker compose up -d -``` - -Refer to the [NVIDIA GPU Guide](./docker_compose/nvidia/gpu/README.md) for more instructions on building docker images from source. - -### Deploy ChatQnA on Kubernetes using Helm Chart - -Refer to the [ChatQnA helm chart](./kubernetes/helm/README.md) for instructions on deploying ChatQnA on Kubernetes. - -### Deploy ChatQnA on AI PC - -Refer to the [AI PC Guide](./docker_compose/intel/cpu/aipc/README.md) for instructions on deploying ChatQnA on AI PC. - -### Deploy ChatQnA on Red Hat OpenShift Container Platform (RHOCP) - -Refer to the [Intel Technology enabling for Openshift readme](https://github.com/intel/intel-technology-enabling-for-openshift/blob/main/workloads/opea/chatqna/README.md) for instructions to deploy ChatQnA prototype on RHOCP with [Red Hat OpenShift AI (RHOAI)](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai). - -## Consume ChatQnA Service with RAG - -### Check Service Status - -Before consuming ChatQnA Service, make sure the vLLM/TGI service is ready, which takes some time. - -```bash -# vLLM example -docker logs vllm-gaudi-server 2>&1 | grep complete -# TGI example -docker logs tgi-gaudi-server | grep Connected -``` - -Consume ChatQnA service until you get the response like below. - -```log -# vLLM -INFO: Application startup complete. -# TGI -2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected -``` - -### Upload RAG Files (Optional) - -To chat with retrieved information, you need to upload a file using `Dataprep` service. - -Here is an example of `Nike 2023` pdf. - -```bash -# download pdf file -wget https://raw.githubusercontent.com/opea-project/GenAIComps/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf -# upload pdf file with dataprep -curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./nke-10k-2023.pdf" -``` - -### Consume Chat Service - -Two ways of consuming ChatQnA Service: - -1. Use cURL command on terminal - - ```bash - curl http://${host_ip}:8888/v1/chatqna \ - -H "Content-Type: application/json" \ - -d '{ - "messages": "What is the revenue of Nike in 2023?" - }' - ``` - -2. Access via frontend - - To access the frontend, open the following URL in your browser: `http://{host_ip}:5173` - - By default, the UI runs on port 5173 internally. - - If you choose conversational UI, use this URL: `http://{host_ip}:5174` - -## Troubleshooting - -1. If you get errors like "Access Denied", [validate micro service](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker_compose/intel/cpu/xeon/README.md#validate-microservices) first. A simple example: - - ```bash - http_proxy="" curl ${host_ip}:6006/embed -X POST -d '{"inputs":"What is Deep Learning?"}' -H 'Content-Type: application/json' - ``` - -2. (Docker only) If all microservices work well, check the port ${host_ip}:8888, the port may be allocated by other users, you can modify the `compose.yaml`. - -3. (Docker only) If you get errors like "The container name is in use", change container name in `compose.yaml`. - -## Monitoring OPEA Service with Prometheus and Grafana dashboard - -OPEA microservice deployment can easily be monitored through Grafana dashboards in conjunction with Prometheus data collection. Follow the [README](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/grafana/README.md) to setup Prometheus and Grafana servers and import dashboards to monitor the OPEA service. - -![chatqna dashboards](./assets/img/chatqna_dashboards.png) -![tgi dashboard](./assets/img/tgi_dashboard.png) - -## Tracing Services with OpenTelemetry Tracing and Jaeger - -> NOTE: This feature is disabled by default. Please check the Deploy ChatQnA sessions for how to enable this feature with compose.telemetry.yaml file. - -OPEA microservice and TGI/TEI serving can easily be traced through Jaeger dashboards in conjunction with OpenTelemetry Tracing feature. Follow the [README](https://github.com/opea-project/GenAIComps/tree/main/comps/cores/telemetry#tracing) to trace additional functions if needed. - -Tracing data is exported to http://{EXTERNAL_IP}:4318/v1/traces via Jaeger. -Users could also get the external IP via below command. - -```bash -ip route get 8.8.8.8 | grep -oP 'src \K[^ ]+' -``` - -Access the Jaeger dashboard UI at http://{EXTERNAL_IP}:16686 - -For TGI serving on Gaudi, users could see different services like opea, TEI and TGI. -![Screenshot from 2024-12-27 11-58-18](https://github.com/user-attachments/assets/6126fa70-e830-4780-bd3f-83cb6eff064e) - -Here is a screenshot for one tracing of TGI serving request. -![Screenshot from 2024-12-27 11-26-25](https://github.com/user-attachments/assets/3a7c51c6-f422-41eb-8e82-c3df52cd48b8) - -There are also OPEA related tracings. Users could understand the time breakdown of each service request by looking into each opea:schedule operation. -![image](https://github.com/user-attachments/assets/6137068b-b374-4ff8-b345-993343c0c25f) - -There could be async function such as `llm/MicroService_asyn_generate` and user needs to check the trace of the async function in another operation like -opea:llm_generate_stream. -![image](https://github.com/user-attachments/assets/a973d283-198f-4ce2-a7eb-58515b77503e) +## Deployment Options + +The table below lists currently available deployment options. They outline in detail the implementation of this example on selected hardware. + +| Category | Deployment Option | Description | +| ----------------------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| On-premise Deployments | Docker compose | [ChatQnA deployment on Xeon](./docker_compose/intel/cpu/xeon) | +| | | [ChatQnA deployment on AI PC](./docker_compose/intel/cpu/aipc) | +| | | [ChatQnA deployment on Gaudi](./docker_compose/intel/hpu/gaudi) | +| | | [ChatQnA deployment on Nvidia GPU](./docker_compose/nvidia/gpu) | +| | | [ChatQnA deployment on AMD ROCm](./docker_compose/amd/gpu/rocm) | +| | Kubernetes | [Helm Charts](./kubernetes/helm) | +| Cloud Service Providers | AWS | [Terraform deployment on 4th Gen Intel Xeon with Intel AMX using meta-llama/Meta-Llama-3-8B-Instruct ](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-chatqna) | +| | | [Terraform deployment on 4th Gen Intel Xeon with Intel AMX using TII Falcon2-11B](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-chatqna-falcon11B) | +| | GCP | [Terraform deployment on 5th Gen Intel Xeon with Intel AMX(support Confidential AI by using IntelĀ® TDX](https://github.com/intel/terraform-intel-gcp-vm/tree/main/examples/gen-ai-xeon-opea-chatqna) | +| | Azure | Work-in-progress | +| | Intel Tiber AI Cloud | Work-in-progress | +| | Any Xeon based Ubuntu system | [ChatQnA Ansible Module for Ubuntu 20.04](https://github.com/intel/optimized-cloud-recipes/tree/main/recipes/ai-opea-chatqna-xeon) .Use this if you are not using Terraform and have provisioned your system either manually or with another tool, including directly on bare metal. | diff --git a/ChatQnA/README_miscellaneous.md b/ChatQnA/README_miscellaneous.md new file mode 100644 index 0000000000..579cdef67c --- /dev/null +++ b/ChatQnA/README_miscellaneous.md @@ -0,0 +1,86 @@ +# Table of contents + +1. [Build MegaService Docker Image](#Build-MegaService-Docker-Image) +2. [Build Basic UI Docker Image](#Build-Basic-UI-Docker-Image) +3. [Build Conversational React UI Docker Image](#Build-Conversational-React-UI-Docker-Image) +4. [Troubleshooting](#Troubleshooting) +5. [Monitoring OPEA Services with Prometheus and Grafana Dashboards](#Monitoring-OPEA-Services-with-Prometheus-and-Grafana-Dashboard) +6. [Tracing with OpenTelemetry and Jaeger](#Tracing-with-OpenTelemetry-and-Jaeger) + +## Build MegaService Docker Image + +To construct the Mega Service with Rerank, we utilize the [GenAIExamples](https://github.com/opea-project/GenAIExamples.git) microservice pipeline within the `chatqna.py` Python script. Build MegaService Docker image via below command: + +```bash +git clone https://github.com/opea-project/GenAIExamples.git +git fetch && git checkout tags/v1.2 +cd GenAIExamples/ChatQnA +docker build --no-cache -t opea/chatqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . +``` + +## Build Basic UI Docker Image + +Build frontend Docker image via below command: + +```bash +cd GenAIExamples/ChatQnA/ui +docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . +``` + +## Build Conversational React UI Docker Image (Optional) + +Build a frontend Docker image for an interactive conversational UI experience with ChatQnA MegaService + +**Export the value of the public IP address of your host machine server to the `host_ip` environment variable** + +```bash +cd GenAIExamples/ChatQnA/ui +docker build --no-cache -t opea/chatqna-conversation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react . +``` + +## Troubleshooting + +1. If you get errors like "Access Denied", [validate micro service](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker_compose/intel/cpu/xeon/README.md#validate-microservices) first. A simple example: + + ```bash + http_proxy="" curl ${host_ip}:6006/embed -X POST -d '{"inputs":"What is Deep Learning?"}' -H 'Content-Type: application/json' + ``` + +2. (Docker only) If all microservices work well, check the port ${host_ip}:8888, the port may be allocated by other users, you can modify the `compose.yaml`. + +3. (Docker only) If you get errors like "The container name is in use", change container name in `compose.yaml`. + +## Monitoring OPEA Services with Prometheus and Grafana Dashboard + +OPEA microservice deployment can easily be monitored through Grafana dashboards using data collected via Prometheus. Follow the [README](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/grafana/README.md) to setup Prometheus and Grafana servers and import dashboards to monitor the OPEA services. + +![chatqna dashboards](./assets/img/chatqna_dashboards.png) +![tgi dashboard](./assets/img/tgi_dashboard.png) + +## Tracing with OpenTelemetry and Jaeger + +> NOTE: This feature is disabled by default. Please use the compose.telemetry.yaml file to enable this feature. + +OPEA microservice and [TGI](https://huggingface.co/docs/text-generation-inference/en/index)/[TEI](https://huggingface.co/docs/text-embeddings-inference/en/index) serving can easily be traced through [Jaeger](https://www.jaegertracing.io/) dashboards in conjunction with [OpenTelemetry](https://opentelemetry.io/) Tracing feature. Follow the [README](https://github.com/opea-project/GenAIComps/tree/main/comps/cores/telemetry#tracing) to trace additional functions if needed. + +Tracing data is exported to http://{EXTERNAL_IP}:4318/v1/traces via Jaeger. +Users could also get the external IP via below command. + +```bash +ip route get 8.8.8.8 | grep -oP 'src \K[^ ]+' +``` + +Access the Jaeger dashboard UI at http://{EXTERNAL_IP}:16686 + +For TGI serving on Gaudi, users could see different services like opea, TEI and TGI. +![Screenshot from 2024-12-27 11-58-18](https://github.com/user-attachments/assets/6126fa70-e830-4780-bd3f-83cb6eff064e) + +Here is a screenshot for one tracing of TGI serving request. +![Screenshot from 2024-12-27 11-26-25](https://github.com/user-attachments/assets/3a7c51c6-f422-41eb-8e82-c3df52cd48b8) + +There are also OPEA related tracings. Users could understand the time breakdown of each service request by looking into each opea:schedule operation. +![image](https://github.com/user-attachments/assets/6137068b-b374-4ff8-b345-993343c0c25f) + +There could be async function such as `llm/MicroService_asyn_generate` and user needs to check the trace of the async function in another operation like +opea:llm_generate_stream. +![image](https://github.com/user-attachments/assets/a973d283-198f-4ce2-a7eb-58515b77503e) diff --git a/ChatQnA/docker_compose/intel/cpu/xeon/README.md b/ChatQnA/docker_compose/intel/cpu/xeon/README.md index 4b61c091df..0d1491117e 100644 --- a/ChatQnA/docker_compose/intel/cpu/xeon/README.md +++ b/ChatQnA/docker_compose/intel/cpu/xeon/README.md @@ -1,56 +1,63 @@ -# Build Mega Service of ChatQnA on Xeon +# Deploying ChatQnA on IntelĀ® XeonĀ® Processors -This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`,`llm` and `faqgen`. +This document outlines the single node deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservices on Intel Xeon server. The steps include pulling Docker images, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank` and `llm`. -The default pipeline deploys with vLLM as the LLM serving component and leverages rerank component. It also provides options of not using rerank in the pipeline and using TGI backend for LLM microservice, please refer to [start-all-the-services-docker-containers](#start-all-the-services-docker-containers) section in this page. Besides, refer to [Build with Pinecone VectorDB](./README_pinecone.md) and [Build with Qdrant VectorDB](./README_qdrant.md) for other deployment variants. +# Table of contents -Quick Start: +1. [ChatQnA Quick Start Deployment](#chatqna-quick-start-Deployment) +2. [ChatQnA Docker Compose file Options](#chatqna-docker-compose-files) +3. [ChatQnA with Conversational UI](#chatqna-with-conversational-ui-optional) -1. Set up the environment variables. -2. Run Docker Compose. -3. Consume the ChatQnA Service. +## ChatQnA Quick Start Deployment -Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models). +This section describes how to quickly deploy and test the ChatQnA service manually on an IntelĀ® XeonĀ® processor. The basic steps are: -## Quick Start: 1.Setup Environment Variable +1. [Access the Code](#access-the-code) +2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token) +3. [Configure the Deployment Environment](#configure-the-deployment-environment) +4. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose) +5. [Check the Deployment Status](#check-the-deployment-status) +6. [Test the Pipeline](#test-the-pipeline) +7. [Cleanup the Deployment](#cleanup-the-deployment) -To set up environment variables for deploying ChatQnA services, follow these steps: +### Access the Code -1. Set the required environment variables: +Clone the GenAIExample repository and access the ChatQnA IntelĀ® GaudiĀ® platform Docker Compose files and supporting scripts: - ```bash - # Example: host_ip="192.168.1.1" - export host_ip="External_Public_IP" - export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" - ``` +``` +git clone https://github.com/opea-project/GenAIExamples.git +cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/ +``` -2. If you are in a proxy environment, also set the proxy-related environment variables: +Checkout a released version, such as v1.2: - ```bash - export http_proxy="Your_HTTP_Proxy" - export https_proxy="Your_HTTPs_Proxy" - # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" - export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,llm-faqgen - ``` +``` +git checkout v1.2 +``` -3. Set up other environment variables: +### Generate a HuggingFace Access Token - ```bash - source ./set_env.sh - ``` +Some HuggingFace resources, such as some models, are only accessible if the developer have an access token. In the absence of a HuggingFace access token, the developer can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). -4. Change Model for LLM serving +### Configure the Deployment Environment - By default, Meta-Llama-3-8B-Instruct is used for LLM serving, the default model can be changed to other validated LLM models. - Please pick a [validated llm models](https://github.com/opea-project/GenAIComps/tree/main/comps/llms/src/text-generation#validated-llm-models) from the table. - To change the default model defined in set_env.sh, overwrite it by exporting LLM_MODEL_ID to the new model or by modifying set_env.sh, and then repeat step 3. - For example, change to Llama-2-7b-chat-hf using the following command. +To set up environment variables for deploying ChatQnA services, set up some parameters specific to the deployment environment and source the _setup_env.sh_ script in this directory: - ```bash - export LLM_MODEL_ID="meta-llama/Llama-2-7b-chat-hf" - ``` +``` +export host_ip="External_Public_IP" #ip address of the node +export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" +export http_proxy="Your_HTTP_Proxy" #http proxy if any +export https_proxy="Your_HTTPs_Proxy" #https proxy if any +export no_proxy=localhost,127.0.0.1,$host_ip #additional no proxies if needed +export no_proxy=$no_proxy,chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,llm-faqgen +source ./set_env.sh +``` + +Consult the section on [ChatQnA Service configuration](#chatqna-configuration) for information on how service specific configuration parameters affect deployments. -## Quick Start: 2.Run Docker Compose +### Deploy the Services Using Docker Compose + +To deploy the ChatQnA services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute the command below. It uses the 'compose.yaml' file. ```bash docker compose up -d @@ -66,242 +73,124 @@ CPU example with Open Telemetry feature: docker compose -f compose.yaml -f compose.telemetry.yaml up -d ``` -It will automatically download the docker image on `docker hub`: +**Note**: developers should build docker image from source when: -```bash -docker pull opea/chatqna:latest -docker pull opea/chatqna-ui:latest -``` +- Developing off the git main branch (as the container's ports in the repo may be different from the published docker image). +- Unable to download the docker image. +- Use a specific version of Docker image. -NB: You should build docker image from source by yourself if: +Please refer to the table below to build different microservices from source: -- You are developing off the git main branch (as the container's ports in the repo may be different from the published docker image). -- You can't download the docker image. -- You want to use a specific version of Docker image. +| Microservice | Deployment Guide | +| ------------ | --------------------------------------------------------------------------------------------- | +| Dataprep | https://github.com/opea-project/GenAIComps/tree/main/comps/dataprep | +| Embedding | https://github.com/opea-project/GenAIComps/tree/main/comps/embeddings | +| Retriever | https://github.com/opea-project/GenAIComps/tree/main/comps/retrievers | +| Reranker | https://github.com/opea-project/GenAIComps/tree/main/comps/rerankings | +| LLM | https://github.com/opea-project/GenAIComps/tree/main/comps/llms | +| Megaservice | [Megaservice build guide](../../../../README_miscellaneous.md#build-megaservice-docker-image) | +| UI | [Basic UI build guide](../../../../README_miscellaneous.md#build-ui-docker-image) | -Please refer to ['Build Docker Images'](#šŸš€-build-docker-images) in below. +### Check the Deployment Status -## QuickStart: 3.Consume the ChatQnA Service +After running docker compose, check if all the containers launched via docker compose have started: -```bash -curl http://${host_ip}:8888/v1/chatqna \ - -H "Content-Type: application/json" \ - -d '{ - "messages": "What is the revenue of Nike in 2023?" - }' ``` - -## šŸš€ Apply Xeon Server on AWS - -To apply a Xeon server on AWS, start by creating an AWS account if you don't have one already. Then, head to the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home) to begin the process. Within the EC2 service, select the Amazon EC2 M7i or M7i-flex instance type to leverage 4th Generation Intel Xeon Scalable processors that are optimized for demanding workloads. - -For detailed information about these instance types, you can refer to this [link](https://aws.amazon.com/ec2/instance-types/m7i/). Once you've chosen the appropriate instance type, proceed with configuring your instance settings, including network configurations, security groups, and storage options. - -After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed. - -### Network Port & Security - -- Access the ChatQnA UI by web browser - - It supports to access by `80` port. Please confirm the `80` port is opened in the firewall of EC2 instance. - -- Access the microservice by tool or API - - 1. Login to the EC2 instance and access by **local IP address** and port. - - It's recommended and do nothing of the network port setting. - - 2. Login to a remote client and access by **public IP address** and port. - - You need to open the port of the microservice in the security group setting of firewall of EC2 instance setting. - - For detailed guide, please refer to [Validate Microservices](#validate-microservices). - - Note, it will increase the risk of security, so please confirm before do it. - -## šŸš€ Build Docker Images - -First of all, you need to build Docker Images locally and install the python package of it. - -```bash -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps +docker ps -a ``` -### 1. Build Retriever Image +For the default deployment, the following 10 containers should have started: -```bash -docker build --no-cache -t opea/retriever:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/src/Dockerfile . ``` - -### 2. Build Dataprep Image - -```bash -docker build --no-cache -t opea/dataprep:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/src/Dockerfile . -cd .. +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +3b5fa9a722da opea/chatqna-ui:${RELEASE_VERSION} "docker-entrypoint.s…" 32 hours ago Up 2 hours 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server +d3b37f3d1faa opea/chatqna:${RELEASE_VERSION} "python chatqna.py" 32 hours ago Up 2 hours 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server +b3e1388fa2ca opea/reranking-tei:${RELEASE_VERSION} "python reranking_te…" 32 hours ago Up 2 hours 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server +24a240f8ad1c opea/retriever-redis:${RELEASE_VERSION} "python retriever_re…" 32 hours ago Up 2 hours 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server +9c0d2a2553e8 opea/embedding-tei:${RELEASE_VERSION} "python embedding_te…" 32 hours ago Up 2 hours 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server +24cae0db1a70 opea/llm-vllm:${RELEASE_VERSION} "bash entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-vllm-server +ea3986c3cf82 opea/dataprep-redis:${RELEASE_VERSION} "python prepare_doc_…" 32 hours ago Up 2 hours 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server +e10dd14497a8 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db +b98fa07a4f5c opea/vllm:${RELEASE_VERSION} "python3 -m vllm.ent…" 32 hours ago Up 2 hours 0.0.0.0:9009->80/tcp, :::9009->80/tcp vllm-service +79276cf45a47 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:6006->80/tcp, :::6006->80/tcp tei-embedding-server +4943e5f6cd80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:8808->80/tcp, :::8808->80/tcp ``` -### 3. Build FaqGen LLM Image (Optional) +If any issues are encountered during deployment, refer to the [troubleshooting](../../../../README_miscellaneous.md##troubleshooting) section. -If you want to enable FAQ generation LLM in the pipeline, please use the below command: +### Test the Pipeline -```bash -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -docker build -t opea/llm-faqgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/faq-generation/Dockerfile . -``` - -### 4. Build MegaService Docker Image - -To construct the Mega Service with Rerank, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `chatqna.py` Python script. Build MegaService Docker image via below command: +Once the ChatQnA services are running, test the pipeline using the following command: ```bash -git clone https://github.com/opea-project/GenAIExamples.git -cd GenAIExamples/ChatQnA -docker build --no-cache -t opea/chatqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . -``` - -### 5. Build UI Docker Image - -Build frontend Docker image via below command: - -```bash -cd GenAIExamples/ChatQnA/ui -docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . +curl http://${host_ip}:8888/v1/chatqna \ + -H "Content-Type: application/json" \ + -d '{ + "messages": "What is the revenue of Nike in 2023?" + }' ``` -### 6. Build Conversational React UI Docker Image (Optional) +**Note** : Access the ChatQnA UI by web browser through this URL: `http://${host_ip}:80`. Please confirm the `80` port is opened in the firewall. To validate each microservie used in the pipeline refer to the [Validate microservicess](#validate-microservices) section. -Build frontend Docker image that enables Conversational experience with ChatQnA megaservice via below command: +### Cleanup the Deployment -**Export the value of the public IP address of your Xeon server to the `host_ip` environment variable** +To stop the containers associated with the deployment, execute the following command: -```bash -cd GenAIExamples/ChatQnA/ui -docker build --no-cache -t opea/chatqna-conversation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react . ``` - -### 7. Build Nginx Docker Image - -```bash -cd GenAIComps -docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/nginx/src/Dockerfile . +docker compose -f compose.yaml down ``` -Then run the command `docker images`, you will have the following 5 Docker Images: - -1. `opea/dataprep:latest` -2. `opea/retriever:latest` -3. `opea/chatqna:latest` -4. `opea/chatqna-ui:latest` -5. `opea/nginx:latest` - -If FaqGen related docker image is built, you will find one more image: - -- `opea/llm-faqgen:latest` - -## šŸš€ Start Microservices - -### Required Models - -By default, the embedding, reranking and LLM models are set to a default value as listed below: - -| Service | Model | -| --------- | ----------------------------------- | -| Embedding | BAAI/bge-base-en-v1.5 | -| Reranking | BAAI/bge-reranker-base | -| LLM | meta-llama/Meta-Llama-3-8B-Instruct | - -Change the `xxx_MODEL_ID` below for your needs. - -For users in China who are unable to download models directly from Huggingface, you can use [ModelScope](https://www.modelscope.cn/models) or a Huggingface mirror to download models. The vLLM/TGI can load the models either online or offline as described below: - -1. Online - - ```bash - export HF_TOKEN=${your_hf_token} - export HF_ENDPOINT="https://hf-mirror.com" - model_name="meta-llama/Meta-Llama-3-8B-Instruct" - # Start vLLM LLM Service - docker run -p 8008:80 -v ./data:/root/.cache/huggingface/hub --name vllm-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --shm-size 128g opea/vllm:latest --model $model_name --host 0.0.0.0 --port 80 - # Start TGI LLM Service - docker run -p 8008:80 -v ./data:/data --name tgi-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --shm-size 1g ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu --model-id $model_name - ``` - -2. Offline - - - Search your model name in ModelScope. For example, check [this page](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/files) for model `Meta-Llama-3-8B-Instruct`. - - - Click on `Download this model` button, and choose one way to download the model to your local path `/path/to/model`. - - - Run the following command to start the LLM service. - - ```bash - export HF_TOKEN=${your_hf_token} - export model_path="/path/to/model" - # Start vLLM LLM Service - docker run -p 8008:80 -v $model_path:/root/.cache/huggingface/hub --name vllm-service --shm-size 128g opea/vllm:latest --model /root/.cache/huggingface/hub --host 0.0.0.0 --port 80 - # Start TGI LLM Service - docker run -p 8008:80 -v $model_path:/data --name tgi-service --shm-size 1g ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu --model-id /data - ``` +## ChatQnA Docker Compose Files -### Setup Environment Variables +In the context of deploying a ChatQnA pipeline on an IntelĀ® XeonĀ® platform, we can pick and choose different vector databases, large language model serving frameworks, and remove pieces of the pipeline such as the reranker. The table below outlines the various configurations that are available as part of the application. These configurations can be used as templates and can be extended to different components available in [GenAIComps](https://github.com/opea-project/GenAIComps.git). -1. Set the required environment variables: +| File | Description | +| ------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------ | +| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework and redis as vector database | +| [compose_milvus.yaml](./compose_milvus.yaml) | The vector database utilized is Milvus. All other configurations remain the same as the default | +| [compose_pinecone.yaml](./compose_pinecone.yaml) | The vector database utilized is Pinecone. All other configurations remain the same as the default | +| [compose_qdrant.yaml](./compose_qdrant.yaml) | The vector database utilized is Qdrant. All other configurations remain the same as the default | +| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as the default | +| [compose_without_rerank.yaml](./compose_without_rerank.yaml) | Default configuration without the reranker | +| [compose.telemetry.yaml](./compose.telemetry.yaml) | Helper file for telemetry features for vllm. Can be used along with any compose files that serves vllm | +| [compose_tgi.telemetry.yaml](./compose_tgi.telemetry.yaml) | Helper file for telemetry features for tgi. Can be used along with any compose files that serves tgi | - ```bash - # Example: host_ip="192.168.1.1" - export host_ip="External_Public_IP" - export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" - # Example: NGINX_PORT=80 - export NGINX_PORT=${your_nginx_port} - ``` - -2. If you are in a proxy environment, also set the proxy-related environment variables: - - ```bash - export http_proxy="Your_HTTP_Proxy" - export https_proxy="Your_HTTPs_Proxy" - # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" - export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service - ``` +## ChatQnA with Conversational UI (Optional) -3. Set up other environment variables: +To access the Conversational UI (react based) frontend, modify the UI service in the `compose` file used to deploy. Replace `chaqna-xeon-ui-server` service with the `chatqna-xeon-conversation-ui-server` service as per the config below: - ```bash - source ./set_env.sh - ``` - -### Start all the services Docker Containers +```yaml +chaqna-xeon-conversation-ui-server: + image: opea/chatqna-conversation-ui:latest + container_name: chatqna-xeon-conversation-ui-server + environment: + - APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT} + - APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT} + ports: + - "5174:80" + depends_on: + - chaqna-xeon-backend-server + ipc: host + restart: always +``` -> Before running the docker compose command, you need to be in the folder that has the docker compose yaml file +Once the services are up, open the following URL in the browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If the developer prefers to use a different host port to access the frontend, it can be modiied by port mapping in the `compose.yaml` file as shown below: -```bash -cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/ +```yaml + chaqna-gaudi-conversation-ui-server: + image: opea/chatqna-conversation-ui:latest + ... + ports: + - "80:80" ``` -If use vLLM as the LLM serving backend. +Here is an example of running ChatQnA (default UI): -```bash -# Start ChatQnA with Rerank Pipeline -docker compose -f compose.yaml up -d -# Start ChatQnA without Rerank Pipeline -docker compose -f compose_without_rerank.yaml up -d -# Start ChatQnA with Rerank Pipeline and Open Telemetry Tracing -docker compose -f compose.yaml -f compose.telemetry.yaml up -d -# Start ChatQnA with FaqGen Pipeline -docker compose -f compose_faqgen.yaml up -d -``` +![project-screenshot](../../../../assets/img/chat_ui_response.png) -If use TGI as the LLM serving backend. +Here is an example of running ChatQnA with Conversational UI (React): -```bash -docker compose -f compose_tgi.yaml up -d -# Start ChatQnA with Open Telemetry Tracing -docker compose -f compose_tgi.yaml -f compose_tgi.telemetry.yaml up -d -# Start ChatQnA with FaqGen Pipeline -docker compose -f compose_faqgen_tgi.yaml up -d -``` +![project-screenshot](../../../../assets/img/conversation_ui_response.png) ### Validate Microservices @@ -375,16 +264,7 @@ For details on how to verify the correctness of the response, refer to [how-to-v -H 'Content-Type: application/json' ``` -5. FaqGen LLM Microservice (if enabled) - -```bash -curl http://${host_ip}:${LLM_SERVICE_PORT}/v1/faqgen \ - -X POST \ - -d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' \ - -H 'Content-Type: application/json' -``` - -6. MegaService +5. MegaService ```bash curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ @@ -392,7 +272,7 @@ curl http://${host_ip}:${LLM_SERVICE_PORT}/v1/faqgen \ }' ``` -7. Nginx Service +6. Nginx Service ```bash curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \ @@ -400,7 +280,7 @@ curl http://${host_ip}:${LLM_SERVICE_PORT}/v1/faqgen \ -d '{"messages": "What is the revenue of Nike in 2023?"}' ``` -8. Dataprep Microservice(Optional) +7. Dataprep Microservice(Optional) If you want to update the default knowledge base, you can use the following commands: @@ -539,59 +419,6 @@ Open a web browser and type "chrome://tracing" or "ui.perfetto.dev", and then lo to see the vLLM profiling result as below diagram. ![image](https://github.com/user-attachments/assets/55c7097e-5574-41dc-97a7-5e87c31bc286) -## šŸš€ Launch the UI +## Conclusion -### Launch with origin port - -To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: - -```yaml - chaqna-gaudi-ui-server: - image: opea/chatqna-ui:latest - ... - ports: - - "80:5173" -``` - -### Launch with Nginx - -If you want to launch the UI using Nginx, open this URL: `http://${host_ip}:${NGINX_PORT}` in your browser to access the frontend. - -## šŸš€ Launch the Conversational UI (Optional) - -To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chaqna-xeon-ui-server` service with the `chatqna-xeon-conversation-ui-server` service as per the config below: - -```yaml -chaqna-xeon-conversation-ui-server: - image: opea/chatqna-conversation-ui:latest - container_name: chatqna-xeon-conversation-ui-server - environment: - - APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT} - - APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT} - ports: - - "5174:80" - depends_on: - - chaqna-xeon-backend-server - ipc: host - restart: always -``` - -Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: - -```yaml - chaqna-gaudi-conversation-ui-server: - image: opea/chatqna-conversation-ui:latest - ... - ports: - - "80:80" -``` - -![project-screenshot](../../../../assets/img/chat_ui_init.png) - -Here is an example of running ChatQnA: - -![project-screenshot](../../../../assets/img/chat_ui_response.png) - -Here is an example of running ChatQnA with Conversational UI (React): - -![project-screenshot](../../../../assets/img/conversation_ui_response.png) +This guide should enable developer to deploy the default configuration or any of the other compose yaml files for different configurations. It also highlights the configurable parameters that can be set before deployment.