diff --git a/tutorial/CodeTrans/CodeTrans_Guide.rst b/tutorial/CodeTrans/CodeTrans_Guide.rst index aaef2945..b57260a8 100644 --- a/tutorial/CodeTrans/CodeTrans_Guide.rst +++ b/tutorial/CodeTrans/CodeTrans_Guide.rst @@ -1,16 +1,13 @@ .. _CodeTrans_Guide: -Code Translations +Code Translation ############################## -.. note:: This guide is in its early development and is a work-in-progress with - placeholder content. - Overview ******** This example showcases a code translation system that converts code from one programming language to another while preserving the original logic and functionality. The primary component is the CodeTrans MegaService, which encompasses an LLM microservice that performs the actual translation. -A lightweight Gateway service and a User Interface allow users to submit their source code in a given language and receive the translated output in another language. +A lightweight gateway service and a user interface allow users to submit their source code in a given language and receive the translated output in another language. Purpose ******* @@ -28,18 +25,16 @@ How It Works 1. A user specifies the source language, the target language, and the snippet of code to be translated. This request is handled by the front-end UI or via a direct API call. +2. The user’s request is sent to the CodeTrans gateway, which orchestrates the call to the LLM microservice. The gateway handles details like constructing prompts and managing responses. -2. The user’s request is sent to the CodeTrans Gateway, which orchestrates the call to the LLM MicroService. The gateway handles details like constructing prompts and managing responses. - - -3. The large language model processes the user’s code snippet, analyzing syntax and semantics before generating an equivalent snippet in the target language. +3. The large language model processes the user’s code snippet by analyzing syntax and semantics before generating an equivalent snippet in the target language. -4. The gateway formats the model’s output and returns the translated code to the user, either via an API response or rendered within the UI. +4. The gateway formats the model’s output and returns the translated code to the user, via an API response or rendered within the UI. Deployment ********** -Here are some deployment options, depending on your hardware and environment: +Here are some deployment options, depending on the hardware and environment: Single Node +++++++++++++++ diff --git a/tutorial/CodeTrans/deploy/gaudi.md b/tutorial/CodeTrans/deploy/gaudi.md index 473bcd48..7e198aca 100644 --- a/tutorial/CodeTrans/deploy/gaudi.md +++ b/tutorial/CodeTrans/deploy/gaudi.md @@ -1,269 +1,201 @@ -# # Single node on-prem deployment with TGI on Gaudi AI Accelerator +# Single node on-prem deployment on Gaudi AI Accelerator -This deployment section covers the single-node on-prem deployment of the CodeTrans example with OPEA comps using the Text Generation service based on TGI. The solution demonstrates building a code translation service using `mistralai/Mistral-7B-Instruct-v0.3` model deployed on the Intel® Gaudi® AI Accelerator. To quickly learn about OPEA in just 5 minutes and set up the required hardware and software, please follow the instructions in the [Getting Started](../../../getting-started/README.md) section. +This section covers single-node on-prem deployment of the CodeTrans example. It will show how to deploy an end-to-end code translation service with the `mistralai/Mistral-7B-Instruct-v0.3` model running on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started](../../../getting-started/README.md) section. ## Overview -In this tutorial, we will walk through how to enable the following microservices from OPEA GenAIComps to deploy a single node Text Generation megaservice solution for code translation: +The CodeTrans use case uses a single LLM microservice for code translation with model serving done on vLLM or TGI. -1. LLM with TGI -2. Nginx Service - -The solution demonstrates using the Mistral-7B-Instruct-v0.3 model on the Intel® Gaudi® AI Accelerator for translating code between different programming languages. We will go through how to set up docker containers to start the microservices and megaservice. Users can input code in one programming language and get it translated into another language. The solution is deployed with a basic UI accessible through both direct port and Nginx. +This solution is designed to demonstrate the use of the `Mistral-7B-Instruct-v0.3` model on the Intel® Gaudi® AI Accelerators to translate code between different programming languages. The steps will involve setting up Docker containers, taking code in one programming language as input, and generating code in another programming language. The solution is deployed with a basic UI accessible through both a direct port and Nginx. ## Prerequisites -The first step is to clone the GenAIExamples and GenAIComps projects. GenAIComps are -fundamental necessary components used to build the examples you find in -GenAIExamples and deploy them as microservices. Set an environment -variable for the desired release version with the **number only** -(i.e. 1.0, 1.1, etc) and checkout using the tag with that version. +To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be port forwarded when using SSH to log in to the host machine: +- 7777: CodeTrans megaservice port +This port is used for `BACKEND_SERVICE_ENDPOINT` defined in the `set_env.sh` for this example inside the `docker compose` folder. Specifically, for CodeTrans, append the following to the ssh command: ```bash -# Set workspace -export WORKSPACE= -cd $WORKSPACE - -# Set desired release version - number only -export RELEASE_VERSION= +-L 7777:localhost:7777 +``` -# GenAIComps -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -git checkout tags/v${RELEASE_VERSION} -cd .. +Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. +```bash +export WORKSPACE= +cd $WORKSPACE +git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples +``` -# GenAIExamples -git clone https://github.com/opea-project/GenAIExamples.git +**Optional** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. +```bash +export RELEASE_VERSION= # Set desired release version - number only cd GenAIExamples git checkout tags/v${RELEASE_VERSION} cd .. ``` -The examples utilize model weights from HuggingFace. -Set up your [HuggingFace](https://huggingface.co/) account and -apply for model access to `Mistral-7B-Instruct-v0.3` which is a gated model. To obtain access for using the model, visit the [model site](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) and click on `Agree and access repository`. +The example utilizes model weights from HuggingFace. Set up a [HuggingFace](https://huggingface.co/) account and apply for model access to `Mistral-7B-Instruct-v0.3` which is a gated model. To obtain access for using the model, visit the [model site](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) and click on `Agree and access repository`. -Next, generate [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). - -Setup the HuggingFace token +Next, generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). +Set the `HUGGINGFACEHUB_API_TOKEN` environment variable to the value of the Hugging Face token by executing the following command: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` -The example requires you to set the `host_ip` to deploy the microservices on the endpoint enabled with ports. Set the host_ip env variable. - +Set the `host_ip` environment variable to deploy the microservices on the endpoints enabled with ports: ```bash export host_ip=$(hostname -I | awk '{print $1}') ``` -Make sure to set Proxies if you are behind a firewall. +Set up a desired port for Nginx: +```bash +# Example: NGINX_PORT=80 +export NGINX_PORT=${your_nginx_port} +``` +For machines behind a firewall, set up the proxy environment variables: ```bash export no_proxy=${your_no_proxy},$host_ip export http_proxy=${your_http_proxy} export https_proxy=${your_http_proxy} ``` -## Prepare (Building / Pulling) Docker images - -This step involves either building or pulling four required Docker images. Each image serves a specific purpose in the CodeTrans architecture. - -::::::{tab-set} - -:::::{tab-item} Pull -:sync: Pull - -If you decide to pull the docker containers and not build them locally, you can proceed to the next step where all the necessary containers will be pulled in from Docker Hub. -::::: -:::::{tab-item} Build -:sync: Build +## Use Case Setup -Follow the steps below to build the docker images from within the `GenAIComps` folder. -**Note:** For RELEASE_VERSIONS older than 1.0, you will need to add a 'v' in front -of ${RELEASE_VERSION} to reference the correct image on Docker Hub. +CodeTrans will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the `set_env.sh` script or the `compose.yaml` file. -```bash -cd $WORKSPACE/GenAIComps -``` +| Use Case Components | Tools         | Model                                | Service Type         | +|---------------------|---------------|--------------------------------------|----------------------| +| LLM                 | vLLM or TGI   | mistralai/Mistral-7B-Instruct-v0.3   | OPEA Microservice    | +| UI                  |               | NA                                   | Gateway Service      | +| Ingress             | Nginx         | NA                                   | Gateway Service      | -### Build LLM Image +Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. -First, build the Text Generation LLM service image: +To run the UI on a web browser on a laptop, modify `BACKEND_SERVICE_ENDPOINT` to use `localhost` or `127.0.0.1` instead of `host_ip` inside `set_env.sh` for the backend to properly receive data from the UI. +Run the `set_env.sh` script. ```bash -docker  build  -t  opea/llm-textgen:${RELEASE_VERSION}  --build-arg  https_proxy=$https_proxy  \ ---build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile . +cd $WORKSPACE/GenAIExamples/CodeTrans/docker_compose +source ./set_env.sh ``` ->**Note**: `llm-textgen` uses Text Generation Inference (TGI) which is pulled automatically via the docker compose file in the next steps. - -### Build Nginx Image - -Build the Nginx service image that will handle routing: +## Deploy the Use Case +Navigate to the `docker compose` directory for this hardware platform. ```bash -docker  build  -t  opea/nginx:${RELEASE_VERSION}  --build-arg  https_proxy=$https_proxy  \ ---build-arg http_proxy=$http_proxy -f comps/third_parties/nginx/src/Dockerfile . - +cd $WORKSPACE/GenAIExamples/CodeTrans/docker_compose/intel/hpu/gaudi ``` -### Build MegaService Image - -The Megaservice is a pipeline that channels data through different microservices, each performing varied tasks. We define the different microservices and the flow of data between them in the  `code_translation.py` file, in this example, CodeTrans MegaService formats the input code and language parameters into a prompt template, sends it to the LLM microservice, and returns the translated code.. You can also add newer or remove some microservices and customize the megaservice to suit the needs. - -Build the megaservice image for this use case. +Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. The vLLM or TGI service can be used for CodeTrans. -```bash -cd $WORKSPACE/GenAIExamples/CodeTrans -``` +::::{tab-set} +:::{tab-item} vllm +:sync: vllm ```bash -docker  build  -t  opea/codetrans:${RELEASE_VERSION}  --build-arg  https_proxy=$https_proxy  \ ---build-arg http_proxy=$http_proxy -f Dockerfile . +docker compose -f compose.yaml up -d ``` - -### Build UI Image - -Build the UI service image: +::: +:::{tab-item} TGI +:sync: TGI ```bash -cd $WORKSPACE/GenAIExamples/CodeTrans/ui -docker  build  -t  opea/codetrans-ui:${RELEASE_VERSION}  --build-arg  https_proxy=$https_proxy  \ ---build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . +docker compose -f compose_tgi.yaml up -d ``` +::: +:::: -### Sanity Check - -Before proceeding, verify that you have all required Docker images by running `docker images`. You should see the following images: +### Check Env Variables -* opea/llm-textgen:${RELEASE_VERSION} -* opea/codetrans:${RELEASE_VERSION} -* opea/codetrans-ui:${RELEASE_VERSION} -* opea/nginx:${RELEASE_VERSION} +After running `docker compose`, check for warning messages for environment variables that are **NOT** set. Address them if needed. -::::: -:::::: - -## Use Case Setup + ubuntu@gaudi-vm:~/GenAIExamples/CodeTrans/docker_compose/intel/hpu/gaudi$ docker compose -f ./compose.yaml up -d -The use case will use the following combination of the GenAIComps with the tools. + WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. -| Use Case Components | Tools         | Model                                | Service Type         | -|---------------------|---------------|--------------------------------------|----------------------| -| LLM                 | TGI           | mistralai/Mistral-7B-Instruct-v0.3   | OPEA Microservice    | -| UI                  |               | NA                                   | Gateway Service      | -| Ingress             | Nginx         | NA                                   | Gateway Service      | - -Tools and models mentioned in the table are configurable either through the environment variable or `compose.yaml` - -Set the necessary environment variables to setup the use case by running the `set_env.sh` script. -Here is where the environment variable `LLM_MODEL_ID` is set, and you can change it to another model -by specifying the HuggingFace model card ID. - -**Note:** If you wish to run the UI on a web browser on your laptop, you will need to modify `BACKEND_SERVICE_IP` to use `localhost` or `127.0.0.1` instead of `host_ip` inside `set_env.sh` for the backend to properly receive data from the UI. Additionally, you will need to port-forward the port used for `BACKEND_SERVICE_IP`. Specifically, for CodeTrans, append the following to your ssh command: - -```bash --L 7777:localhost:7777 -``` +Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and in some cases `Healthy`. -Run the `set_env.sh` script. +Run this command to see this info: ```bash -cd $WORKSPACE/GenAIExamples/CodeTrans/docker_compose -source ./set_env.sh +docker ps -a ``` -Set up a desired port for Nginx: +Sample output: ```bash -# Example: NGINX_PORT=80 -export  NGINX_PORT=${your_nginx_port} +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +ca0cfb3edce5 opea/nginx:latest "/docker-entrypoint.…" 8 minutes ago Up 6 minutes 0.0.0.0:80->80/tcp, [::]:80->80/tcp codetrans-gaudi-nginx-server +d7ef9da3f7db opea/codetrans-ui:latest "docker-entrypoint.s…" 8 minutes ago Up 6 minutes 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp codetrans-gaudi-ui-server +2cfc12e1c8f1 opea/codetrans:latest "python code_transla…" 8 minutes ago Up 6 minutes 0.0.0.0:7777->7777/tcp, [::]:7777->7777/tcp codetrans-gaudi-backend-server +c1db5a49003d opea/llm-textgen:latest "bash entrypoint.sh" 8 minutes ago Up 6 minutes 0.0.0.0:9000->9000/tcp, [::]:9000->9000/tcp codetrans-gaudi-llm-server +450f74cb65a4 opea/vllm:latest "python3 -m vllm.ent…" 8 minutes ago Up 8 minutes (healthy) 0.0.0.0:8008->80/tcp, [::]:8008->80/tcp codetrans-gaudi-vllm-service ``` -## Deploy the use case - -In this tutorial, we will be deploying via docker compose with the provided YAML file. The docker compose instructions should start all the above-mentioned services as containers. +Each docker container's log can also be checked using: ```bash -cd $WORKSPACE/GenAIExamples/CodeTrans/docker_compose/intel/hpu/gaudi -docker compose up -d +docker logs ``` -### Validate microservice - -#### Check Env Variables - -Check the startup log by `docker compose -f ./compose.yaml logs`. -The warning messages print out the variables if they are **NOT** set. +## Validate Microservices -ubuntu@xeon-vm:~/GenAIExamples/CodeTrans/docker_compose/intel/cpu/xeon$ docker compose -f ./compose.yaml up -d +This section will guide through the various methods for interacting with the deployed microservices. -WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. -WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. +### vLLM or TGI Service -#### Check the container status +During the initial startup, this service will take a few minutes to download the model files and complete the warm-up process. Once this is finished, the service will be ready for use. -Check if all the containers launched via docker compose has started -For example, the CodeTrans example starts 5 docker (services), check these docker containers are all running, i.e., all the containers `STATUS` are `Up`. +Try the command below to check whether the LLM serving is ready. -To do a quick sanity check, try `docker ps -a` to see if all the containers are running. +::::{tab-set} +:::{tab-item} vllm +:sync: vllm ```bash -CONTAINER ID   IMAGE                                 COMMAND                  CREATED         STATUS                   PORTS                                       NAMES -a6d83e9fb44f   opea/nginx:${RELEASE_VERSION}                     "/docker-entrypoint.…"   8 minutes ago   Up 26 seconds            0.0.0.0:80->80/tcp, :::80->80/tcp           codetrans-gaudi-nginx-server -42af29c8a8b6   opea/codetrans-ui:${RELEASE_VERSION}              "docker-entrypoint.s…"   8 minutes ago   Up 27 seconds            0.0.0.0:5173->5173/tcp, :::5173->5173/tcp   codetrans-gaudi-ui-server -d995d76e7b52   opea/codetrans:${RELEASE_VERSION}                 "python code_transla…"   8 minutes ago   Up 27 seconds            0.0.0.0:7777->7777/tcp, :::7777->7777/tcp   codetrans-gaudi-backend-server -f40e954b107e   opea/llm-textgen:${RELEASE_VERSION}               "bash entrypoint.sh"     8 minutes ago   Up 27 seconds            0.0.0.0:9000->9000/tcp, :::9000->9000/tcp   llm-textgen-gaudi-server -0eade4fe0637   ghcr.io/huggingface/tgi-gaudi:2.0.6   "text-generation-lau…"   8 minutes ago   Up 8 minutes (healthy)   0.0.0.0:8008->80/tcp, :::8008->80/tcp       codetrans-tgi-service - +# vLLM service +docker logs codetrans-gaudi-vllm-service 2>&1 | grep complete +# If the service is ready, you will get the response like below. +INFO: Application startup complete. ``` - - -## Interacting with CodeTrans deployment - -In this section, you will walk through the different ways to interact with the deployed microservices. - -### TGI Service - -In the first startup, this service will take more time to download the model files. After it's finished, the service will be ready. - -Try the command below to check whether the LLM serving is ready. -``` -docker logs ${CONTAINER_ID} | grep Connected -``` -If the service is ready, you will get a response like below. +::: +:::{tab-item} TGI +:sync: TGI ```bash -2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected +# TGI service +docker logs codetrans-gaudi-tgi-service | grep Connected +# If the service is ready, you will get the response like below. +2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected ``` + +::: +:::: + +Then try the `cURL` command to verify the vLLM or TGI service: ```bash -curl  http://${host_ip}:8008/generate  \ --X POST \ --d  '{"inputs":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:","parameters":{"max_new_tokens":17, "do_sample": true}}'  \ --H 'Content-Type: application/json' +curl http://${host_ip}:8008/generate \ + -X POST \ + -d '{"inputs":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:","parameters":{"max_new_tokens":17, "do_sample": true}}' \ + -H 'Content-Type: application/json' ``` -TGI service generates text for the input prompt. Here is the expected result from TGI: -  +The vLLM or TGI service generates text for the input prompt. Here is the expected result: ```bash {"generated_text":"'''Python\nprint(\"Hello, World!\")"} ``` -**NOTE**: After launching TGI, it takes a few minutes for the TGI server to load the LLM model and warm up. -### Text Generation Microservice - -This service handles the core language model operations. You can validate it's working by sending a direct request to translate a simple "Hello World" program from Go to Python: +### LLM Microservice +This service handles the core language model operations. Send a direct request to translate a simple "Hello World" program from Go to Python: ```bash -curl http://${host_ip}:9000/v1/chat/completions \ - -X POST \ -  -d '{ - "query": "### System: Please translate the following Golang codes into Python codes. ### Original codes: ```Golang\npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}\n``` ### Translated codes:", - "max_tokens": 17 - }' \ - -H 'Content-Type: application/json' +curl http://${host_ip}:9000/v1/chat/completions\ + -X POST \ + -d '{"query":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:"}' \ + -H 'Content-Type: application/json' ``` -The expected output is as shown below: + +Sample output: ```bash data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123223,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123223,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} @@ -285,16 +217,16 @@ data: {"id":"","choices":[{"finish_reason":"length","index":0,"logprobs":null,"t data: [DONE] ``` -### MegaService +### CodeTrans MegaService The CodeTrans megaservice orchestrates the entire translation process. Test it with a simple code translation request: - ```bash -curl  http://${host_ip}:7777/v1/codetrans  \ --H "Content-Type: application/json" \ --d  '{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}' +curl http://${host_ip}:7777/v1/codetrans \ + -H "Content-Type: application/json" \ + -d '{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}' ``` -When you send this request, you’ll receive a streaming response from the MegaService. It will appear line by line like so: + +Sample output: ```bash data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} @@ -316,112 +248,66 @@ data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" data: {"id":"","choices":[{"finish_reason":"eos_token","index":0,"logprobs":null,"text":""}],"created":1737121309,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":{"completion_tokens":18,"prompt_tokens":74,"total_tokens":92,"completion_tokens_details":null,"prompt_tokens_details":null}} data: [DONE] ``` -Within this output, each line contains JSON that includes a `text` field. Once you combine the `text` values in order, you’ll reconstruct the translated code. In this example, the final code is simply: + +The megaservice streams each segment of the response. Each line contains JSON that includes a `text` field. Combining the `text` values in order will reconstruct the translated code. In this example, the final code is simply: ```bash print("Hello, World!") ``` -This demonstrates how the MegaService streams each segment of the response, which you can then piece together to get the complete translation. ### Nginx Service -The Nginx service acts as a reverse proxy and load balancer for the application. You can verify it's properly routing requests by sending the same translation request through Nginx: - +The Nginx service acts as a reverse proxy and load balancer for the application. To verify it is properly routing requests, send the same translation request through Nginx: ```bash -curl  http://${host_ip}:${NGINX_PORT}/v1/codetrans  \ --H "Content-Type: application/json" \ --d  '{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}' +curl http://${host_ip}:${NGINX_PORT}/v1/codetrans \ + -H "Content-Type: application/json" \ + -d '{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}' ``` -The expected output is the same as the MegaService output. +The expected output is the same as the megaservice output. Each of these endpoints should return a successful response with the translated Python code. If any of these tests fail, check the corresponding service logs for more details. -## Check the docker container logs - -Following is an example of debugging using Docker logs: - -Check the log of the container using: - -`docker logs -t` - -Check the log using `docker logs 0eade4fe0637 -t`. +## Launch UI -``` -2024-06-05T01:30:30.695934928Z error: a value is required for '--model-id ' but none was supplied +### Basic UI -2024-06-05T01:30:30.697123534Z +To access the frontend user interface (UI), the primary method is through the Nginx reverse proxy service. Open the following URL in a web browser: http://${host_ip}:${NGINX_PORT}. This provides a stable and secure access point to the UI. -2024-06-05T01:30:30.697148330Z For more information, try '--help'. -``` -The log indicates the `MODEL_ID` is not set. +Alternatively, the UI can be accessed directly using its internal port. This method bypasses the Nginx proxy and can be used for testing or troubleshooting purposes. To access the UI directly, open the following URL in a web browser: http://${host_ip}:5173. By default, the UI runs on port 5173. A different host port can be used to access the frontend by modifying the `FRONTEND_SERVICE_PORT` environment variable. For reference, the port mapping in the `compose.yaml` file is shown below: -View the docker input parameters in `$WORKSPACE/GenAIExamples/CodeTrans/docker_compose/intel/hpu/gaudi/compose.yaml` ```yaml -tgi-service: - image: ghcr.io/huggingface/tgi-gaudi:2.0.6 - container_name: codetrans-tgi-service - ports: - - "8008:80" - volumes: - - "./data:/data" - environment: - no_proxy: ${no_proxy} - http_proxy: ${http_proxy} - https_proxy: ${https_proxy} - HABANA_VISIBLE_DEVICES: all - OMPI_MCA_btl_vader_single_copy_mechanism: none - HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} - ENABLE_HPU_GRAPH: true - LIMIT_HPU_GRAPH: true - USE_FLASH_ATTENTION: true - FLASH_ATTENTION_RECOMPUTE: true - healthcheck: - test: ["CMD-SHELL", "sleep 500 && exit 0"] - interval: 1s - timeout: 505s - retries: 1 - runtime: habana - cap_add: - - SYS_NICE - ipc: host - command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048 +codetrans-gaudi-ui-server: + image: ${REGISTRY:-opea}/codetrans-ui:${TAG:-latest} + container_name: codetrans-gaudi-ui-server + depends_on: + - codetrans-gaudi-backend-server + ports: + - "${FRONTEND_SERVICE_PORT:-5173}:5173" ``` -The input `MODEL_ID` is `${LLM_MODEL_ID}` -Check environment variable `LLM_MODEL_ID` is set correctly, and spelled correctly. +After making this change, restart the containers for the change to take effect. -Set the `LLM_MODEL_ID` then restart the containers. +### Stop the Services -You can also check overall logs with the following command, where the -`compose.yaml` is the MegaService docker-compose configuration file. +Navigate to the `docker compose` directory for this hardware platform. ```bash -docker compose -f $WORKSPACE/GenAIExamples/CodeTrans/docker_compose/intel/hpu/gaudi/compose.yaml logs +cd $WORKSPACE/GenAIExamples/CodeTrans/docker_compose/intel/hpu/gaudi ``` -## Launch UI - -### Basic UI - -To access the frontend user interface (UI), the primary method is through the Nginx reverse proxy service. Open the following URL in your browser: `http://${host_ip}:${NGINX_PORT}`. This provides a stable and secure access point to the UI. The value of `${NGINX_PORT}` has been defined in the earlier steps. -Alternatively, you can access the UI directly using its internal port. This method bypasses the Nginx proxy and can be used for testing or troubleshooting purposes. To access the UI directly, open the following URL in your browser: http://${host_ip}:5173. By default, the UI runs on port 5173. +To stop and remove all the containers, use the commands below: -If you need to change the port used to access the UI directly (not through Nginx), modify the ports section of the `compose.yaml` file: +::::{tab-set} +:::{tab-item} vllm +:sync: vllm -```yaml -codetrans-gaudi-ui-server: - image: ${REGISTRY:-opea}/codetrans-ui:${TAG:-latest} - container_name: codetrans-gaudi-ui-server - depends_on: - - codetrans-gaudi-backend-server - ports: - - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to your desired port +```bash +docker compose -f compose.yaml down ``` -Remember to replace YOUR_HOST_PORT with your preferred host port number. After making this change, you will need to rebuild and restart your containers for the change to take effect. - - -### Stop the services - -Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: +::: +:::{tab-item} TGI +:sync: TGI ```bash -docker compose -f compose.yaml down -``` \ No newline at end of file +docker compose -f compose_tgi.yaml down +``` +::: +:::: diff --git a/tutorial/CodeTrans/deploy/xeon.md b/tutorial/CodeTrans/deploy/xeon.md index c3f9deb9..a2a83770 100644 --- a/tutorial/CodeTrans/deploy/xeon.md +++ b/tutorial/CodeTrans/deploy/xeon.md @@ -1,269 +1,201 @@ -# Single node on-prem deployment with TGI on Xeon Scalable processors +# Single node on-prem deployment on Xeon Scalable processors -This deployment section covers the single-node on-prem deployment of the CodeTrans example with OPEA comps using the Text Generation service based on TGI. The solution demonstrates building a code translation service using the `mistralai/Mistral-7B-Instruct-v0.3` model deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA in just 5 minutes and set up the required hardware and software, please follow the instructions in the [Getting Started](../../../getting-started/README.md) section. +This section covers single-node on-prem deployment of the CodeTrans example. It will show how to deploy an end-to-end code translation service with the `mistralai/Mistral-7B-Instruct-v0.3` model running on Intel® Xeon® Scalable processors. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started](../../../getting-started/README.md) section. ## Overview -In this tutorial, we will walk through how to enable the following microservices from OPEA GenAIComps to deploy a single node Text Generation megaservice solution for code translation: +The CodeTrans use case uses a single LLM microservice for code translation with model serving done on vLLM or TGI. -1. LLM with TGI -2. Nginx Service - -The solution demonstrates using the Mistral-7B-Instruct-v0.3 model on Intel Xeon Scalable processors to translate code between different programming languages. We will go through setting up Docker containers to start the microservices and megaservice. Users can input code in one programming language and get it translated to another. The solution is deployed with a basic UI accessible through both direct port and Nginx. +This solution is designed to demonstrate the use of the `Mistral-7B-Instruct-v0.3` model on the Intel® Xeon® Scalable processors to translate code between different programming languages. The steps will involve setting up Docker containers, taking code in one programming language as input, and generating code in another programming language. The solution is deployed with a basic UI accessible through both a direct port and Nginx. ## Prerequisites -TThe first step is to clone the GenAIExamples and GenAIComps projects. GenAIComps are -fundamental necessary components used to build the examples you find in -GenAIExamples and deploy them as microservices. Set an environment -variable for the desired release version with the **number only** -(i.e. 1.0, 1.1, etc) and checkout using the tag with that version. +To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be port forwarded when using SSH to log in to the host machine: +- 7777: CodeTrans megaservice port +This port is used for `BACKEND_SERVICE_ENDPOINT` defined in the `set_env.sh` for this example inside the `docker compose` folder. Specifically, for CodeTrans, append the following to the ssh command: ```bash -# Set workspace -export WORKSPACE= -cd $WORKSPACE - -# Set desired release version - number only -export RELEASE_VERSION= +-L 7777:localhost:7777 +``` -# GenAIComps -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -git checkout tags/v${RELEASE_VERSION} -cd .. +Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. +```bash +export WORKSPACE= +cd $WORKSPACE +git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples +``` -# GenAIExamples -git clone https://github.com/opea-project/GenAIExamples.git +**Optional** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. +```bash +export RELEASE_VERSION= # Set desired release version - number only cd GenAIExamples git checkout tags/v${RELEASE_VERSION} cd .. ``` -The examples utilize model weights from HuggingFace. -Set up your [HuggingFace](https://huggingface.co/) account and -apply for model access to `Mistral-7B-Instruct-v0.3` which is a gated model. To obtain access for using the model, visit the [model site](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) and click on `Agree and access repository`. +The example utilizes model weights from HuggingFace. Set up a [HuggingFace](https://huggingface.co/) account and apply for model access to `Mistral-7B-Instruct-v0.3` which is a gated model. To obtain access for using the model, visit the [model site](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) and click on `Agree and access repository`. -Next, generate [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). - -Setup the HuggingFace token +Next, generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). +Set the `HUGGINGFACEHUB_API_TOKEN` environment variable to the value of the Hugging Face token by executing the following command: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` -The example requires you to set the `host_ip` to deploy the microservices on the endpoint enabled with ports. Set the host_ip env variable. - +Set the `host_ip` environment variable to deploy the microservices on the endpoints enabled with ports: ```bash export host_ip=$(hostname -I | awk '{print $1}') ``` -Make sure to set Proxies if you are behind a firewall. +Set up a desired port for Nginx: +```bash +# Example: NGINX_PORT=80 +export NGINX_PORT=${your_nginx_port} +``` +For machines behind a firewall, set up the proxy environment variables: ```bash export no_proxy=${your_no_proxy},$host_ip export http_proxy=${your_http_proxy} export https_proxy=${your_http_proxy} ``` -## Prepare (Building / Pulling) Docker images - -This step involves either building or pulling four required Docker images. Each image serves a specific purpose in the CodeTrans architecture. - -::::::{tab-set} - -:::::{tab-item} Pull -:sync: Pull - -If you decide to pull the docker containers and not build them locally, you can proceed to the next step where all the necessary containers will be pulled in from Docker Hub. -::::: -:::::{tab-item} Build -:sync: Build +## Use Case Setup -Follow the steps below to build the docker images from within the `GenAIComps` folder. -**Note:** For RELEASE_VERSIONS older than 1.0, you will need to add a 'v' in front -of ${RELEASE_VERSION} to reference the correct image on Docker Hub. +CodeTrans will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the `set_env.sh` script or the `compose.yaml` file. -```bash -cd $WORKSPACE/GenAIComps -``` +| Use Case Components | Tools | Model | Service Type | +|---------------------|---------------|--------------------------------------|----------------------| +| LLM | vLLM or TGI | mistralai/Mistral-7B-Instruct-v0.3 | OPEA Microservice | +| UI | | NA | Gateway Service | +| Ingress | Nginx | NA | Gateway Service | -### Build LLM Image +Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. -First, build the Text Generation LLM service image: +To run the UI on a web browser on a laptop, modify `BACKEND_SERVICE_ENDPOINT` to use `localhost` or `127.0.0.1` instead of `host_ip` inside `set_env.sh` for the backend to properly receive data from the UI. +Run the `set_env.sh` script. ```bash -docker build -t opea/llm-textgen:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ ---build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile . +cd $WORKSPACE/GenAIExamples/CodeTrans/docker_compose +source ./set_env.sh ``` ->**Note**: `llm-textgen` uses Text Generation Inference (TGI) which is pulled automatically via the docker compose file in the next steps. - -### Build Nginx Image - -Build the Nginx service image that will handle routing: +## Deploy the Use Case +Navigate to the `docker compose` directory for this hardware platform. ```bash -docker build -t opea/nginx:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ ---build-arg http_proxy=$http_proxy -f comps/third_parties/nginx/src/Dockerfile . - +cd $WORKSPACE/GenAIExamples/CodeTrans/docker_compose/intel/cpu/xeon ``` -### Build MegaService Image - -The Megaservice is a pipeline that channels data through different microservices, each performing varied tasks. We define the different microservices and the flow of data between them in the `code_translation.py` file, in this example, CodeTrans MegaService formats the input code and language parameters into a prompt template, sends it to the LLM microservice, and returns the translated code. You can also add newer or remove some microservices and customize the megaservice to suit the needs. +Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. The vLLM or TGI service can be used for CodeTrans. -Build the megaservice image for this use case. +::::{tab-set} +:::{tab-item} vllm +:sync: vllm ```bash -cd $WORKSPACE/GenAIExamples/CodeTrans +docker compose -f compose.yaml up -d ``` +::: +:::{tab-item} TGI +:sync: TGI ```bash -docker build -t opea/codetrans:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ ---build-arg http_proxy=$http_proxy -f Dockerfile . -``` - -### Build UI Image - -Build the UI service image: - -```bash -cd $WORKSPACE/GenAIExamples/CodeTrans/ui -docker build -t opea/codetrans-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy \ ---build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . +docker compose -f compose_tgi.yaml up -d ``` +::: +:::: -### Sanity Check +### Check Env Variables -Before proceeding, verify that you have all required Docker images by running `docker images`. You should see the following images: +After running `docker compose`, check for warning messages for environment variables that are **NOT** set. Address them if needed. -* opea/llm-textgen:${RELEASE_VERSION} -* opea/codetrans:${RELEASE_VERSION} -* opea/codetrans-ui:${RELEASE_VERSION} -* opea/nginx:${RELEASE_VERSION} + ubuntu@xeon-vm:~/GenAIExamples/CodeTrans/docker_compose/intel/cpu/xeon$ docker compose -f ./compose.yaml up -d -::::: -:::::: + WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. -## Use Case Setup - -The use case will use the following combination of the GenAIComps with the tools. - -| Use Case Components | Tools | Model | Service Type | -|---------------------|---------------|--------------------------------------|----------------------| -| LLM | TGI | mistralai/Mistral-7B-Instruct-v0.3 | OPEA Microservice | -| UI | | NA | Gateway Service | -| Ingress | Nginx | NA | Gateway Service | - -Tools and models mentioned in the table are configurable either through the environment variable or `compose.yaml` - -Set the necessary environment variables to setup the use case by running the `set_env.sh` script. -Here is where the environment variable `LLM_MODEL_ID` is set, and you can change it to another model -by specifying the HuggingFace model card ID. - -**Note:** If you wish to run the UI on a web browser on your laptop, you will need to modify `BACKEND_SERVICE_IP` to use `localhost` or `127.0.0.1` instead of `host_ip` inside `set_env.sh` for the backend to properly receive data from the UI. Additionally, you will need to port-forward the port used for `BACKEND_SERVICE_IP`. Specifically, for CodeTrans, append the following to your ssh command: - -```bash --L 7777:localhost:7777 -``` +Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and in some cases `Healthy`. -Run the `set_env.sh` script. +Run this command to see this info: ```bash -cd $WORKSPACE/GenAIExamples/CodeTrans/docker_compose -source ./set_env.sh +docker ps -a ``` -Set up a desired port for Nginx: +Sample output: ```bash -# Example: NGINX_PORT=80 -export NGINX_PORT=${your_nginx_port} +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +ca0cfb3edce5 opea/nginx:latest "/docker-entrypoint.…" 8 minutes ago Up 6 minutes 0.0.0.0:80->80/tcp, [::]:80->80/tcp codetrans-xeon-nginx-server +d7ef9da3f7db opea/codetrans-ui:latest "docker-entrypoint.s…" 8 minutes ago Up 6 minutes 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp codetrans-xeon-ui-server +2cfc12e1c8f1 opea/codetrans:latest "python code_transla…" 8 minutes ago Up 6 minutes 0.0.0.0:7777->7777/tcp, [::]:7777->7777/tcp codetrans-xeon-backend-server +c1db5a49003d opea/llm-textgen:latest "bash entrypoint.sh" 8 minutes ago Up 6 minutes 0.0.0.0:9000->9000/tcp, [::]:9000->9000/tcp codetrans-xeon-llm-server +450f74cb65a4 opea/vllm:latest "python3 -m vllm.ent…" 8 minutes ago Up 8 minutes (healthy) 0.0.0.0:8008->80/tcp, [::]:8008->80/tcp codetrans-xeon-vllm-service ``` -## Deploy the use case - -In this tutorial, we will be deploying via docker compose with the provided YAML file. The docker compose instructions should start all the above-mentioned services as containers. +Each docker container's log can also be checked using: ```bash -cd $WORKSPACE/GenAIExamples/CodeTrans/docker_compose/intel/cpu/xeon -docker compose up -d +docker logs ``` -### Validate microservice - -#### Check Env Variables - -Check the startup log by `docker compose -f ./compose.yaml logs`. -The warning messages print out the variables if they are **NOT** set. +## Validate Microservices -ubuntu@xeon-vm:~/GenAIExamples/CodeTrans/docker_compose/intel/cpu/xeon$ docker compose -f ./compose.yaml up -d +This section will guide through the various methods for interacting with the deployed microservices. -WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. -WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. +### vLLM or TGI Service -#### Check the container status +During the initial startup, this service will take a few minutes to download the model files and complete the warm-up process. Once this is finished, the service will be ready for use. -Check if all the containers launched via docker compose have started. -For example, the CodeTrans example starts 5 docker containers (services), check these docker containers are all running, i.e., all the containers `STATUS` are `Up`. +Try the command below to check whether the LLM serving is ready. -To do a quick sanity check, try `docker ps -a` to see if all the containers are running. +::::{tab-set} +:::{tab-item} vllm +:sync: vllm ```bash -| CONTAINER ID | IMAGE | COMMAND | CREATED | STATUS | PORTS | NAMES | -|--------------|-------------------------------------------------------------------|---------------------------|----------------|------------------------------------|---------------------------------------------|---------------------------------| -| 0744c6693a64 | opea/nginx:${RELEASE_VERSION} | `/docker-entrypoint.…` | 20 minutes ago | Up 9 minutes | 0.0.0.0:80->80/tcp, :::80->80/tcp | codetrans-xeon-nginx-server | -| 1e9c8c900843 | opea/codetrans-ui:${RELEASE_VERSION} | `docker-entrypoint.s…` | 20 minutes ago | Up 9 minutes | 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp | codetrans-xeon-ui-server | -| 3ed57de43648 | opea/codetrans:${RELEASE_VERSION} | `python code_transla…` | 20 minutes ago | Up 9 minutes | 0.0.0.0:7777->7777/tcp, :::7777->7777/tcp | codetrans-xeon-backend-server | -| 29d0fe6382dd | opea/llm-textgen:${RELEASE_VERSION} | `bash entrypoint.sh` | 20 minutes ago | Up 9 minutes | 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp | llm-textgen-server | -| e1b37ad9e078 | ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu | `text-generation-lau…` | 20 minutes ago | Up 13 minutes (healthy) | 0.0.0.0:8008->80/tcp, [::]:8008->80/tcp | codetrans-tgi-service | - +# vLLM service +docker logs codetrans-xeon-vllm-service 2>&1 | grep complete +# If the service is ready, you will get the response like below. +INFO: Application startup complete. ``` +::: +:::{tab-item} TGI +:sync: TGI -## Interacting with CodeTrans deployment - -In this section, you will walk through the different ways to interact with the deployed microservices. - -### TGI Service - -In the first startup, this service will take more time to download the model files. After it's finished, the service will be ready. - -Try the command below to check whether the LLM serving is ready. ```bash -docker logs ${CONTAINER_ID} | grep Connected +# TGI service +docker logs codetrans-xeon-tgi-service | grep Connected +# If the service is ready, you will get the response like below. +2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected ``` -If the service is ready, you will get a response like below. +::: +:::: + +Then try the `cURL` command to verify the vLLM or TGI service: ```bash -2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected -``` -```bash -curl http://${host_ip}:8008/generate \ --X POST \ --d '{"inputs":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:","parameters":{"max_new_tokens":17, "do_sample": true}}' \ --H 'Content-Type: application/json' +curl http://${host_ip}:8008/v1/chat/completions \ + -X POST \ + -d '{"inputs":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:","parameters":{"max_new_tokens":17, "do_sample": true}}' \ + -H 'Content-Type: application/json' ``` -TGI service generates text for the input prompt. Here is the expected result from TGI: - +The vLLM or TGI service generates text for the input prompt. Here is the expected result: ```bash {"generated_text":"'''Python\nprint(\"Hello, World!\")"} ``` -**NOTE**: After launching TGI, it takes a few minutes for the TGI server to load the LLM model and warm up. -### Text Generation Microservice - -This service handles the core language model operations. You can validate it's working by sending a direct request to translate a simple "Hello World" program from Go to Python: +### LLM Microservice +This service handles the core language model operations. Send a direct request to translate a simple "Hello World" program from Go to Python: ```bash -curl http://${host_ip}:9000/v1/chat/completions \ +curl http://${host_ip}:9000/v1/chat/completions\ -X POST \ - -d '{ - "query": "### System: Please translate the following Golang codes into Python codes. ### Original codes: ```Golang\npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}\n``` ### Translated codes:", - "max_tokens": 17 - }' \ + -d '{"query":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:"}' \ -H 'Content-Type: application/json' ``` -The expected output is as shown below: + +Sample output: ```bash data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123223,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123223,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} @@ -285,16 +217,16 @@ data: {"id":"","choices":[{"finish_reason":"length","index":0,"logprobs":null,"t data: [DONE] ``` -### MegaService +### CodeTrans Megaservice The CodeTrans megaservice orchestrates the entire translation process. Test it with a simple code translation request: - ```bash -curl http://${host_ip}:7777/v1/codetrans \ --H "Content-Type: application/json" \ --d '{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}' +curl http://${host_ip}:7777/v1/codetrans \ + -H "Content-Type: application/json" \ + -d '{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}' ``` -When you send this request, you’ll receive a streaming response from the MegaService. It will appear line by line like so: + +Sample output: ```bash data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null} @@ -316,86 +248,31 @@ data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" data: {"id":"","choices":[{"finish_reason":"eos_token","index":0,"logprobs":null,"text":""}],"created":1737121309,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":{"completion_tokens":18,"prompt_tokens":74,"total_tokens":92,"completion_tokens_details":null,"prompt_tokens_details":null}} data: [DONE] ``` -Within this output, each line contains JSON that includes a `text` field. Once you combine the `text` values in order, you’ll reconstruct the translated code. In this example, the final code is simply: + +The megaservice streams each segment of the response. Each line contains JSON that includes a `text` field. Combining the `text` values in order will reconstruct the translated code. In this example, the final code is simply: ```bash print("Hello, World!") ``` -This demonstrates how the MegaService streams each segment of the response, which you can then piece together to get the complete translation. ### Nginx Service -The Nginx service acts as a reverse proxy and load balancer for the application. You can verify it's properly routing requests by sending the same translation request through Nginx: - +The Nginx service acts as a reverse proxy and load balancer for the application. To verify it is properly routing requests, send the same translation request through Nginx: ```bash -curl http://${host_ip}:${NGINX_PORT}/v1/codetrans \ --H "Content-Type: application/json" \ --d '{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}' +curl http://${host_ip}:${NGINX_PORT}/v1/codetrans \ + -H "Content-Type: application/json" \ + -d '{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}' ``` -The expected output is the same as the MegaService output. +The expected output is the same as the megaservice output. Each of these endpoints should return a successful response with the translated Python code. If any of these tests fail, check the corresponding service logs for more details. -## Check the docker container logs - -Following is an example of debugging using Docker logs: - -Check the log of the container using: - -`docker logs -t` - -Check the log using `docker logs e1b37ad9e078 -t`. - -```bash -2024-06-05T01:30:30.695934928Z error: a value is required for '--model-id ' but none was supplied - -2024-06-05T01:30:30.697123534Z - -2024-06-05T01:30:30.697148330Z For more information, try '--help'. -``` -The log indicates the `MODEL_ID` is not set. - -View the docker input parameters in `$WORKSPACE/GenAIExamples/CodeTrans/docker_compose/intel/cpu/xeon/compose.yaml` -```yaml -tgi-service: - image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu - container_name: codetrans-tgi-service - ports: - - "8008:80" - volumes: - - "./data:/data" - shm_size: 1g - environment: - no_proxy: ${no_proxy} - http_proxy: ${http_proxy} - https_proxy: ${https_proxy} - HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} - host_ip: ${host_ip} - healthcheck: - test: ["CMD-SHELL", "curl -f http://$host_ip:8008/health || exit 1"] - interval: 10s - timeout: 10s - retries: 100 - command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0 -``` -The input `MODEL_ID` is `${LLM_MODEL_ID}` - -Check environment variable `LLM_MODEL_ID` is set correctly, and spelled correctly. - -Set the `LLM_MODEL_ID` then restart the containers. -You can also check overall logs with the following command, where the -`compose.yaml` is the MegaService docker-compose configuration file. -```bash -docker compose -f $WORKSPACE/GenAIExamples/CodeTrans/docker_compose/intel/cpu/xeon/compose.yaml logs -``` ## Launch UI ### Basic UI -To access the frontend user interface (UI), the primary method is through the Nginx reverse proxy service. Open the following URL in your browser: http://${host_ip}:${NGINX_PORT}. This provides a stable and secure access point to the UI. The value of ${NGINX_PORT} has been defined in the earlier steps. - -Alternatively, you can access the UI directly using its internal port. This method bypasses the Nginx proxy and can be used for testing or troubleshooting purposes. To access the UI directly, open the following URL in your browser: http://${host_ip}:5173. By default, the UI runs on port 5173. +To access the frontend user interface (UI), the primary method is through the Nginx reverse proxy service. Open the following URL in a web browser: http://${host_ip}:${NGINX_PORT}. This provides a stable and secure access point to the UI. -If you need to change the port used to access the UI directly (not through Nginx), modify the ports section of the `compose.yaml` file: +Alternatively, the UI can be accessed directly using its internal port. This method bypasses the Nginx proxy and can be used for testing or troubleshooting purposes. To access the UI directly, open the following URL in a web browser: http://${host_ip}:5173. By default, the UI runs on port 5173. A different host port can be used to access the frontend by modifying the `FRONTEND_SERVICE_PORT` environment variable. For reference, the port mapping in the `compose.yaml` file is shown below: ```yaml codetrans-xeon-ui-server: @@ -404,15 +281,33 @@ codetrans-xeon-ui-server: depends_on: - codetrans-xeon-backend-server ports: - - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to your desired port + - "${FRONTEND_SERVICE_PORT:-5173}:5173" ``` -Remember to replace YOUR_HOST_PORT with your preferred host port number. After making this change, you will need to rebuild and restart your containers for the change to take effect. +After making this change, restart the containers for the change to take effect. -### Stop the services +### Stop the Services -Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: +Navigate to the `docker compose` directory for this hardware platform. +```bash +cd $WORKSPACE/GenAIExamples/CodeTrans/docker_compose/intel/cpu/xeon +``` + +To stop and remove all the containers, use the commands below: + +::::{tab-set} +:::{tab-item} vllm +:sync: vllm ```bash docker compose -f compose.yaml down -``` \ No newline at end of file +``` +::: +:::{tab-item} TGI +:sync: TGI + +```bash +docker compose -f compose_tgi.yaml down +``` +::: +::::