From cbec63e05496539262ad732c13a8cd01e550f1cd Mon Sep 17 00:00:00 2001
From: alexsin368 <alex.sin@intel.com>
Date: Mon, 14 Apr 2025 17:14:43 -0700
Subject: [PATCH 01/12] update CodeGen tutorial for release v1.3

Signed-off-by: alexsin368 <alex.sin@intel.com>
---
 tutorial/CodeGen/CodeGen_Guide.rst |  13 +-
 tutorial/CodeGen/deploy/gaudi.md   | 375 ++++++++---------------------
 tutorial/CodeGen/deploy/xeon.md    | 369 ++++++++--------------------
 3 files changed, 208 insertions(+), 549 deletions(-)

diff --git a/tutorial/CodeGen/CodeGen_Guide.rst b/tutorial/CodeGen/CodeGen_Guide.rst
index c37c380a..1101de3b 100644
--- a/tutorial/CodeGen/CodeGen_Guide.rst
+++ b/tutorial/CodeGen/CodeGen_Guide.rst
@@ -3,18 +3,13 @@
 CodeGen
 #####################
 
-.. note:: This guide is in its early development and is a work-in-progress with
-   placeholder content.
-
 Overview
 ********
 
-The CodeGen example uses specialized AI models that went through training with datasets that
-encompass repositories, documentation, programming code, and web data. With an understanding
-of various programming languages, coding patterns, and software development concepts, the
-CodeGen LLMs assist developers and programmers. The LLMs can be integrated into the developers'
+The CodeGen example uses specialized AI models that went through training with datasets that encompass repositories, documentation, programming code, and web data. With an understanding
+of various programming languages, coding patterns, and software development concepts, CodeGen LLMs assist developers and programmers. The LLMs can be integrated into the developers'
 Integrated Development Environments (IDEs) to have more contextual awareness to write more
-refined and relevant code based on the suggestions.
+refined and relevant code based on suggestions.
 
 Purpose
 *******
@@ -37,7 +32,7 @@ for serving deployment. It is presented as a Code Copilot application as shown i
 
 Deployment
 **********
-Here are some deployment options, depending on your hardware and environment:
+Here are some deployment options, depending on the hardware and environment:
 
 .. toctree::
    :maxdepth: 1
diff --git a/tutorial/CodeGen/deploy/gaudi.md b/tutorial/CodeGen/deploy/gaudi.md
index f3be578f..4fb79c2a 100644
--- a/tutorial/CodeGen/deploy/gaudi.md
+++ b/tutorial/CodeGen/deploy/gaudi.md
@@ -1,59 +1,37 @@
-# Single node on-prem deployment with TGI on Gaudi AI Accelerator
+# Single node on-prem deployment on Gaudi AI Accelerator
 
-This deployment section covers single-node on-prem deployment of the CodeGen
-example with OPEA comps to deploy using the TGI service. We will be showcasing how
-to build an e2e CodeGen solution with the Qwen2.5-Coder-7B-Instruct,
-deployed on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA in just 5 minutes 
-and set up the required hardware and software, please follow the instructions in the
+This deployment section covers single-node on-prem deployment of the CodeGen example. It will show how to build an end-to-end CodeGen solution with the Qwen2.5-Coder-32B-Instruct model deployed on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the
 [Getting Started](../../../getting-started/README.md) section. 
 
 ## Overview
 
-The CodeGen use case uses a single microservice called LLM. In this tutorial, we 
-will walk through the steps on how to enable it from OPEA GenAIComps to deploy on 
-a single node TGI megaservice solution. 
+The CodeGen use case uses a single microservice called LLM with model serving done with vLLM or TGI.
 
-The solution is aimed to show how to use the Qwen2.5-Coder-7B-Instruct model on the Intel® 
-Gaudi® AI Accelerator. We will go through how to setup docker containers to start 
-the microservice and megaservice. The solution will then take text input as the 
-prompt and generate code accordingly. It is deployed with a UI with 2 modes to 
-choose from:
+The solution is aimed to show how to use the Qwen2.5-Coder-32B-Instruct model on the Intel® Gaudi® AI Accelerators. Steps will include setting up docker containers, taking text input as the prompt, and generating code. There are multiple versions of the UI that can be deployed but only the Gradio-based one will be covered in this tutorial.
 
-1. Svelte-Based UI
-2. React-Based UI
-
-The React-based UI is optional, but this feature is supported in this example if you
-are interested in using it.
-
-Below is the list of content we will be covering in this tutorial:
+## Prerequisites
 
-1. Prerequisites
-2. Prepare (Building / Pulling) Docker images
-3. Use case setup
-4. Deploy the use case
-5. Interacting with CodeGen deployment
+To run the UI on a web browser external to the host machine such as a laptop, the following ports need to be port forwarded:
+- 5173: UI port
+- 6007: dataprep port
+- 7778: CodeGen megaservice port
+ 
+Port numbers may change. Refer to the CodeGen example's `set_env.sh` and `compose.yaml` files for running Docker compose. 
 
-## Prerequisites
+Port forwarding can be done by appending the -L input argument to the SSH command when logging in to the host machine from a laptop:
+```bash
+-L 5173:localhost:5173 -L 6007:localhost:6007 -L 7778:localhost:7778
+```
 
-The first step is to clone the GenAIExamples and GenAIComps projects. GenAIComps are 
-fundamental necessary components used to build the examples you find in 
-GenAIExamples and deploy them as microservices. Set an environment 
-variable for the desired release version with the **number only** 
-(i.e. 1.0, 1.1, etc) and checkout using the tag with that version. 
+Clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. Set a workspace path and the desired release version with the **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. 
 
 ```bash
 # Set workspace
-export WORKSPACE=<path>
+export WORKSPACE=<Path>
 cd $WORKSPACE
 
 # Set desired release version - number only
-export RELEASE_VERSION=<insert-release-version>
-
-# GenAIComps
-git clone https://github.com/opea-project/GenAIComps.git
-cd GenAIComps
-git checkout tags/v${RELEASE_VERSION}
-cd ..
+export RELEASE_VERSION=<Release_Version>
 
 # GenAIExamples
 git clone https://github.com/opea-project/GenAIExamples.git
@@ -62,139 +40,30 @@ git checkout tags/v${RELEASE_VERSION}
 cd ..
 ```
 
-The examples utilize model weights from HuggingFace and Langchain.
-
-Setup your [HuggingFace](https://huggingface.co/) account and generate
-[user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
-
-Setup the HuggingFace token
+Set up [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Then set an environment variable with the HuggingFace token:
 ```bash
 export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
 ```
 
-Additionally, if you plan to use the default model Qwen2.5-Coder-7B-Instruct, you will 
-need to [request access](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) from HuggingFace.
+`host_ip` is not required to be set manually. It will be set in the `set_env.sh` script later.
 
-The example requires you to set the `host_ip` to deploy the microservices on
-endpoint enabled with ports. Set the host_ip env variable
-```bash
-export host_ip=$(hostname -I | awk '{print $1}')
-```
-
-Make sure to setup Proxies if you are behind a firewall
+For machines behind a firewall, set up the proxy environment variables:
 ```bash
 export no_proxy=${your_no_proxy},$host_ip
 export http_proxy=${your_http_proxy}
 export https_proxy=${your_http_proxy}
 ```
 
-## Prepare (Building / Pulling) Docker images
-
-This step will involve building/pulling relevant docker
-images with step-by-step process along with sanity check in the end. For
-CodeGen, the following docker images will be needed: LLM with TGI. 
-Additionally, you will need to build docker images for the 
-CodeGen megaservice, and UI (React UI is optional). In total,
-there are **3 required docker images** and an optional docker image.
-
-### Build/Pull Microservice image
-
-::::::{tab-set}
-
-:::::{tab-item} Pull
-:sync: Pull
-
-If you decide to pull the docker containers and not build them locally,
-you can proceed to the next step where all the necessary containers will
-be pulled in from Docker Hub.
-
-:::::
-:::::{tab-item} Build
-:sync: Build
-
-Follow the steps below to build the docker images from within the `GenAIComps` folder.
-**Note:** For RELEASE_VERSIONS older than 1.0, you will need to add a 'v' in front 
-of ${RELEASE_VERSION} to reference the correct image on Docker Hub.
-
-```bash
-cd $WORKSPACE/GenAIComps
-```
-
-#### Build LLM Image
-
-```bash
-docker build --no-cache -t opea/llm-tgi:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
-```
-
-### Build Mega Service images
-
-The Megaservice is a pipeline that channels data through different
-microservices, each performing varied tasks. The LLM microservice and 
-flow of data are defined in the `codegen.py` file. You can also add or 
-remove microservices and customize the megaservice to suit your needs.
-
-Build the megaservice image for this use case
-
-```bash
-cd $WORKSPACE/GenAIExamples/CodeGen
-```
-
-```bash
-docker build --no-cache -t opea/codegen:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
-```
-
-### Build the UI Image
-
-You can build 2 modes of UI
-
-*Svelte UI*
-
-```bash
-cd $WORKSPACE/GenAIExamples/CodeGen/ui/
-docker build --no-cache -t opea/codegen-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
-```
-
-*React UI (Optional)* 
-If you want a React-based frontend.
-
-```bash
-cd $WORKSPACE/GenAIExamples/CodeGen/ui/
-docker build --no-cache -t opea/codegen-react-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
-```
-
-### Sanity Check
-Check if you have the following set of docker images by running the command `docker images` before moving on to the next step. 
-The tags are based on what you set the environment variable `RELEASE_VERSION` to. 
-
-* `opea/llm-tgi:${RELEASE_VERSION}`
-* `opea/codegen:${RELEASE_VERSION}`
-* `opea/codegen-ui:${RELEASE_VERSION}`
-* `opea/codegen-react-ui:${RELEASE_VERSION}` (optional)
-
-:::::
-::::::
-
 ## Use Case Setup
 
-The use case will use the following combination of GenAIComps and tools
+CodeGen will use the following GenAIComps and corresponding tools. Tools and models mentioned in the table are configurable either through environment variables in the `set_env.sh` or `compose.yaml` file.
 
 |Use Case Components | Tools | Model     | Service Type |
 |----------------     |--------------|-----------------------------|-------|
-|LLM                  |   TGI        | Qwen/Qwen2.5-Coder-7B-Instruct | OPEA Microservice |
+|LLM                  |   vLLM, TGI        | Qwen/Qwen2.5-Coder-32B-Instruct | OPEA Microservice |
 |UI                   |              | NA                        | Gateway Service |
 
-Tools and models mentioned in the table are configurable either through the
-environment variables or `compose.yaml` file.
-
-Set the necessary environment variables to setup the use case by running the `set_env.sh` script.
-Here is where the environment variable `LLM_MODEL_ID` is set, and you can change it to another model 
-by specifying the HuggingFace model card ID.
-
-**Note:** If you wish to run the UI on a web browser on your laptop, you will need to modify `BACKEND_SERVICE_ENDPOINT` to use `localhost` or `127.0.0.1` instead of `host_ip` inside `set_env.sh` for the backend to properly receive data from the UI. Additionally, you will need to port-forward the port used for `BACKEND_SERVICE_ENDPOINT`. Specifically, for CodeGen, append the following to your ssh command: 
-
-```bash
--L 7778:localhost:7778
-```
+Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. 
 
 Run the `set_env.sh` script.
 ```bash
@@ -204,22 +73,29 @@ source ./set_env.sh
 
 ## Deploy the Use Case
 
-In this tutorial, we will be deploying via docker compose with the provided
-YAML file.  The docker compose instructions should be starting all the
-above mentioned services as containers.
+Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. The vLLM or TGI service can be used for CodeGen.
+
+::::{tab-set}
+:::{tab-item} vllm
+:sync: vllm
 
 ```bash
 cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
-docker compose up -d
+docker compose --profile codegen-gaudi-vllm up -d
 ```
+:::
+:::{tab-item} TGI
+:sync: TGI
 
+```bash
+cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
+docker compose --profile codegen-gaudi-tgi up -d
+```
+:::
+::::
 
-### Checks to Ensure the Services are Running
-#### Check Startup and Env Variables
-Check the startup log by running `docker compose logs` to ensure there are no errors.
-The warning messages print out the variables if they are **NOT** set.
-
-Here are some sample messages if proxy environment variables are not set:
+### Check Env Variables
+After running `docker compose`, check for warning messages for environment variables that are **NOT** set. Address them if needed. 
 
     WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string.
     WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
@@ -234,39 +110,47 @@ Here are some sample messages if proxy environment variables are not set:
     WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
     WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
 
-#### Check the Container Status
-Check if all the containers launched via docker compose have started.
+Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and `Healthy`.
 
-The CodeGen example starts 4 docker containers. Check that these docker
-containers are all running, i.e, all the containers  `STATUS`  are  `Up`.
-You can do this with the `docker ps -a` command.
+Run this command to see this info:
+```bash
+docker ps -a
+```
 
+Sample output: 
 ```bash
-CONTAINER ID   IMAGE                                                   COMMAND                  CREATED              STATUS              PORTS                                       NAMES
-bbd235074c3d   opea/codegen-ui:${RELEASE_VERSION}                                  "docker-entrypoint.s…"   About a minute ago   Up About a minute   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp   codegen-gaudi-ui-server
-8d3872ca66fa   opea/codegen:${RELEASE_VERSION}                                     "python codegen.py"      About a minute ago   Up About a minute   0.0.0.0:7778->7778/tcp, :::7778->7778/tcp   codegen-gaudi-backend-server
-b9fc39f51cdb   opea/llm-tgi:${RELEASE_VERSION}                                     "bash entrypoint.sh"     About a minute ago   Up About a minute   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp   llm-tgi-gaudi-server
-39994e007f15   ghcr.io/huggingface/tgi-gaudi:2.0.1                     "text-generation-lau…"   About a minute ago   Up About a minute   0.0.0.0:8028->80/tcp, :::8028->80/tcp       tgi-gaudi-server
+CONTAINER ID   IMAGE                                                   COMMAND                  CREATED              STATUS                        PORTS                                                                                      NAMES
+c6fed95320ee   opea/codegen-gradio-ui:latest                           "python codegen_ui_g…"   About a minute ago   Up About a minute             0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp                                                codegen-gaudi-ui-server
+092d76d64623   opea/embedding:latest                                   "sh -c 'python $( [ …"   About a minute ago   Up About a minute             0.0.0.0:6000->6000/tcp, [::]:6000->6000/tcp                                                tei-embedding-server
+fdce54b2c46f   opea/dataprep:latest                                    "sh -c 'python $( [ …"   About a minute ago   Up About a minute             0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp                                                dataprep-redis-server
+b55224fdcf9d   opea/codegen:latest                                     "python codegen.py"      About a minute ago   Up About a minute             0.0.0.0:7778->7778/tcp, [::]:7778->7778/tcp                                                codegen-gaudi-backend-server
+c9846f8592fd   opea/retriever:latest                                   "python opea_retriev…"   About a minute ago   Up About a minute             0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp                                                retriever-redis
+0afb0b6a455b   opea/llm-textgen:latest                                 "bash entrypoint.sh"     About a minute ago   Up About a minute                                                                                                        llm-textgen-server
+4550094ef0d7   redis/redis-stack:7.2.0-v9                              "/entrypoint.sh"         About a minute ago   Up About a minute             0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp   redis-vector-db
+fbda23354529   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5   "/bin/sh -c 'apt-get…"   About a minute ago   Up About a minute (healthy)   0.0.0.0:8090->80/tcp, [::]:8090->80/tcp                                                    tei-embedding-serving
 ```
 
-## Interacting with CodeGen for Deployment
+Each docker container's log can also be checked using:
 
-This section will walk you through the different ways to interact with
-the microservices deployed. After a couple minutes, rerun `docker ps -a` 
-to ensure all the docker containers are still up and running. Then proceed 
-to validate each microservice and megaservice. 
+```bash
+docker logs <CONTAINER_ID OR CONTAINER_NAME>
+```
+
+## Validate Microservices
 
-### TGI Service
+This section will walk through the different ways to interact with the microservices deployed.
+
+### vLLM or TGI Service
 
 ```bash
-curl http://${host_ip}:8028/generate \
-  -X POST \
-  -d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_new_tokens":256, "do_sample": true}}' \
-  -H 'Content-Type: application/json'
-```
+curl http://${host_ip}:8028/v1/chat/completions \
+    -X POST \
+    -H 'Content-Type: application/json' \
+    -d '{"model": "Qwen/Qwen2.5-Coder-32B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}'
 
-Here is the output:
+```
 
+Here is sample output:
 ```bash
 {"generated_text":"\n\nIO iflow diagram:\n\n!\[IO flow diagram(s)\]\(TodoList.iflow.svg\)\n\n### TDD Kata walkthrough\n\n1. Start with a user story. We will add story tests later. In this case, we'll choose a story about adding a TODO:\n    ```ruby\n    as a user,\n    i want to add a todo,\n    so that i can get a todo list.\n\n    conformance:\n    - a new todo is added to the list\n    - if the todo text is empty, raise an exception\n    ```\n\n1. Write the first test:\n    ```ruby\n    feature Testing the addition of a todo to the list\n\n    given a todo list empty list\n    when a user adds a todo\n    the todo should be added to the list\n\n    inputs:\n    when_values: [[\"A\"]]\n\n    output validations:\n    - todo_list contains { text:\"A\" }\n    ```\n\n1. Write the first step implementation in any programming language you like. In this case, we will choose Ruby:\n    ```ruby\n    def add_"}
 ```
@@ -276,117 +160,62 @@ Here is the output:
 ```bash
 curl http://${host_ip}:9000/v1/chat/completions\
   -X POST \
-  -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-  -H 'Content-Type: application/json'
+  -H 'Content-Type: application/json' \
+  -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}'
 ```
 
-The output is given one character at a time. It is too long to show 
-here but the last item will be
+The output code is printed one character at a time. It is too long to show here but the last item will be
 ```bash
 data: [DONE]
 ```
 
-### MegaService
+### Dataprep Microservice
+The following is a template only. Replace the filename placeholders with desired files.
+
+```bash
+curl http://${host_ip}:6007/v1/dataprep/ingest \
+-X POST \
+-H "Content-Type: multipart/form-data" \
+-F "files=@./file1.pdf" \
+-F "files=@./file2.txt" \
+-F "index_name=my_API_document"
+```
+
+### CodeGen Megaservice
 
+Default:
 ```bash
 curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{
      "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."
      }'
 ```
 
-The output is given one character at a time. It is too long to show 
-here but the last item will be
+The output code is printed one character at a time. It is too long to show here but the last item will be
 ```bash
 data: [DONE]
 ```
 
+The CodeGen Megaservice can also be utilized with RAG and Agents activated:
+```bash
+curl http://${host_ip}:7778/v1/codegen \
+  -H "Content-Type: application/json" \
+  -d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
+  ```
+
 ## Launch UI
-### Svelte UI
-To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
+### Gradio UI
+To access the frontend, open the following URL in a web browser: http://${host_ip}:5173. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below:
 ```yaml
   codegen-gaudi-ui-server:
-    image: ${REGISTRY:-opea}/codegen-ui:${TAG:-latest}
+    image: ${REGISTRY:-opea}/codegen-gradio-ui:${TAG:-latest}
     ...
     ports:
       - "5173:5173"
 ```
 
-### React-Based UI (Optional)
-To access the React-based frontend, modify the UI service in the `compose.yaml` file. Replace `codegen-gaudi-ui-server` service with the codegen-gaudi-react-ui-server service as per the config below:
-```yaml
-codegen-gaudi-react-ui-server:
-  image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest}
-  container_name: codegen-gaudi-react-ui-server
-  environment:
-    - no_proxy=${no_proxy}
-    - https_proxy=${https_proxy}
-    - http_proxy=${http_proxy}
-    - APP_CODE_GEN_URL=${BACKEND_SERVICE_ENDPOINT}
-  depends_on:
-    - codegen-gaudi-backend-server
-  ports:
-    - "5174:80"
-  ipc: host
-  restart: always
-```
-Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
-```yaml
-  codegen-gaudi-react-ui-server:
-    image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest}
-    ...
-    ports:
-      - "80:80"
-```
-
-## Check Docker Container Logs
-
-You can check the log of a container by running this command:
-
-```bash
-docker logs <CONTAINER ID> -t
-```
-
-You can also check the overall logs with the following command, where the
-`compose.yaml` is the megaservice docker-compose configuration file.
-
-Assumming you are still in this directory `$WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi`,
-run the following command to check the logs:
-```bash
-docker compose -f compose.yaml logs
-```
-
-View the docker input parameters in  `$WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi/compose.yaml`
-
-```yaml
-  tgi-service:
-    image: ghcr.io/huggingface/tgi-gaudi:2.0.1
-    container_name: tgi-gaudi-server
-    ports:
-      - "8028:80"
-    volumes:
-      - "./data:/data"
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      HABANA_VISIBLE_DEVICES: all
-      OMPI_MCA_btl_vader_single_copy_mechanism: none
-      HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-    runtime: habana
-    cap_add:
-      - SYS_NICE
-    ipc: host
-    command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
-```
-
-The input `--model-id` is  `${LLM_MODEL_ID}`. Ensure the environment variable `LLM_MODEL_ID` 
-is set and spelled correctly. Check spelling. Whenever this is changed, restart the containers to use 
-the newly selected model.
-
-
-## Stop the services
+## Stop the Services
 
-Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below:
+To stop and remove all the containers, use the command below:
 ```bash
-docker compose down
+docker compose -f compose.yaml down
 ```
diff --git a/tutorial/CodeGen/deploy/xeon.md b/tutorial/CodeGen/deploy/xeon.md
index 4541d970..df1fa3f9 100644
--- a/tutorial/CodeGen/deploy/xeon.md
+++ b/tutorial/CodeGen/deploy/xeon.md
@@ -1,59 +1,37 @@
-# Single node on-prem deployment with TGI on Xeon
+# Single node on-prem deployment on Xeon
 
-This deployment section covers single-node on-prem deployment of the CodeGen
-example with OPEA comps to deploy using the TGI service. We will be showcasing how
-to build an e2e CodeGen solution with the Qwen2.5-Coder-7B-Instruct,
-deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA in just 5 minutes 
-and set up the required hardware and software, please follow the instructions in the
+This deployment section covers single-node on-prem deployment of the CodeGen example. It will show how to build an end-to-end CodeGen solution with the Qwen2.5-Coder-7B-Instruct model deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the
 [Getting Started](../../../getting-started/README.md) section. 
 
 ## Overview
 
-The CodeGen use case uses a single microservice called LLM. In this tutorial, we 
-will walk through the steps on how on enable it from OPEA GenAIComps to deploy on 
-a single node TGI megaservice solution. 
+The CodeGen use case uses a single microservice called LLM with model serving done with vLLM or TGI.
 
-The solution is aimed to show how to use the Qwen2.5-Coder-7B-Instruct model on the Intel® 
-Xeon® Scalable processors. We will go through how to setup docker containers to start 
-the microservice and megaservice. The solution will then take text input as the 
-prompt and generate code accordingly. It is deployed with a UI with 2 modes to 
-choose from:
+The solution is aimed to show how to use the Qwen2.5-Coder-7B-Instruct model on the Intel® Xeon® Scalable processors. Steps will include setting up docker containers, taking text input as the prompt, and generating code. There are multiple versions of the UI that can be deployed but only the Gradio-based one will be covered in this tutorial.
 
-1. Basic UI
-2. React-Based UI
-
-The React-based UI is optional, but this feature is supported in this example if you
-are interested in using it.
-
-Below is the list of content we will be covering in this tutorial:
+## Prerequisites
 
-1. Prerequisites
-2. Prepare (Building / Pulling) Docker images
-3. Use case setup
-4. Deploy the use case
-5. Interacting with CodeGen deployment
+To run the UI on a web browser external to the host machine such as a laptop, the following ports need to be port forwarded:
+- 5173: UI port
+- 6007: dataprep port
+- 7778: CodeGen megaservice port
+ 
+Port numbers may change. Refer to the CodeGen example's `set_env.sh` and `compose.yaml` files for running Docker compose. 
 
-## Prerequisites
+Port forwarding can be done by appending the -L input argument to the SSH command when logging in to the host machine from a laptop:
+```bash
+-L 5173:localhost:5173 -L 6007:localhost:6007 -L 7778:localhost:7778
+```
 
-The first step is to clone the GenAIExamples and GenAIComps projects. GenAIComps are 
-fundamental necessary components used to build the examples you find in 
-GenAIExamples and deploy them as microservices. Set an environment 
-variable for the desired release version with the **number only** 
-(i.e. 1.0, 1.1, etc) and checkout using the tag with that version. 
+Clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. Set a workspace path and the desired release version with the **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. 
 
 ```bash
 # Set workspace
-export WORKSPACE=<path>
+export WORKSPACE=<Path>
 cd $WORKSPACE
 
 # Set desired release version - number only
-export RELEASE_VERSION=<insert-release-version>
-
-# GenAIComps
-git clone https://github.com/opea-project/GenAIComps.git
-cd GenAIComps
-git checkout tags/v${RELEASE_VERSION}
-cd ..
+export RELEASE_VERSION=<Release_Version>
 
 # GenAIExamples
 git clone https://github.com/opea-project/GenAIExamples.git
@@ -62,135 +40,33 @@ git checkout tags/v${RELEASE_VERSION}
 cd ..
 ```
 
-The examples utilize model weights from HuggingFace and Langchain.
-
-Setup your [HuggingFace](https://huggingface.co/) account and generate
-[user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
-
-Setup the HuggingFace token
+Set up [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Then set an environment variable with the HuggingFace token:
 ```bash
 export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
 ```
 
-The example requires you to set the `host_ip` to deploy the microservices on
-endpoint enabled with ports. Set the host_ip env variable
-```bash
-export host_ip=$(hostname -I | awk '{print $1}')
-```
+`host_ip` is not required to be set manually. It will be set in the `set_env.sh` script later.
 
-Make sure to setup Proxies if you are behind a firewall
+For machines behind a firewall, set up the proxy environment variables:
 ```bash
 export no_proxy=${your_no_proxy},$host_ip
 export http_proxy=${your_http_proxy}
 export https_proxy=${your_http_proxy}
 ```
 
-## Prepare (Building / Pulling) Docker images
-
-This step will involve building/pulling relevant docker
-images with step-by-step process along with sanity check in the end. For
-CodeGen, the following docker images will be needed: LLM with TGI. 
-Additionally, you will need to build docker images for the 
-CodeGen megaservice, and UI (React UI is optional). In total,
-there are **3 required docker images** and an optional docker image.
-
-### Build/Pull Microservice image
-
-::::::{tab-set}
-
-:::::{tab-item} Pull
-:sync: Pull
-
-If you decide to pull the docker containers and not build them locally,
-you can proceed to the next step where all the necessary containers will
-be pulled in from Docker Hub.
-
-:::::
-:::::{tab-item} Build
-:sync: Build
-
-Follow the steps below to build the docker images from within the `GenAIComps` folder.
-**Note:** For RELEASE_VERSIONS older than 1.0, you will need to add a 'v' in front 
-of ${RELEASE_VERSION} to reference the correct image on Docker Hub.
-
-```bash
-cd $WORKSPACE/GenAIComps
-```
-
-#### Build LLM Image
-
-```bash
-docker build -t opea/llm-textgen:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
-```
-
-### Build Mega Service images
-
-The Megaservice is a pipeline that channels data through different
-microservices, each performing varied tasks. The LLM microservice and 
-flow of data are defined in the `codegen.py` file. You can also add or 
-remove microservices and customize the megaservice to suit your needs.
-
-Build the megaservice image for this use case
-
-```bash
-cd $WORKSPACE/GenAIExamples/CodeGen
-```
-
-```bash
-docker build -t opea/codegen:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
-```
-
-### Build the UI Image
-
-You can build 2 modes of UI
-
-*Basic UI*
-
-```bash
-cd $WORKSPACE/GenAIExamples/CodeGen/ui/
-docker build -t opea/codegen-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
-```
-
-*React UI (Optional)* 
-If you want a React-based frontend.
-
-```bash
-cd $WORKSPACE/GenAIExamples/CodeGen/ui/
-docker build --no-cache -t opea/codegen-react-ui:${RELEASE_VERSION} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
-```
-
-### Sanity Check
-Check if you have the following set of docker images by running the command `docker images` before moving on to the next step:
-
-* `opea/llm-tgi:${RELEASE_VERSION}`
-* `opea/codegen:${RELEASE_VERSION}`
-* `opea/codegen-ui:${RELEASE_VERSION}`
-* `opea/codegen-react-ui:${RELEASE_VERSION}` (optional)
-
-:::::
-::::::
-
 ## Use Case Setup
 
-The use case will use the following combination of GenAIComps and tools
+CodeGen will use the following GenAIComps and corresponding tools. Tools and models mentioned in the table are configurable either through environment variables in the `set_env.sh` or `compose.yaml` file.
 
 |Use Case Components | Tools | Model     | Service Type |
 |----------------     |--------------|-----------------------------|-------|
-|LLM                  |   TGI        | Qwen/Qwen2.5-Coder-7B-Instruct | OPEA Microservice |
+|LLM                  |   vLLM, TGI        | Qwen/Qwen2.5-Coder-7B-Instruct | OPEA Microservice |
 |UI                   |              | NA                        | Gateway Service |
 
-Tools and models mentioned in the table are configurable either through the
-environment variables or `compose.yaml` file.
-
-Set the necessary environment variables to setup the use case case by running the `set_env.sh` script.
-Here is where the environment variable `LLM_MODEL_ID` is set, and you can change it to another model 
-by specifying the HuggingFace model card ID.
 
-**Note:** If you wish to run the UI on a web browser on your laptop, you will need to modify `BACKEND_SERVICE_ENDPOINT` to use `localhost` or `127.0.0.1` instead of `host_ip` inside `set_env.sh` for the backend to properly receive data from the UI. Additionally, you will need to port-forward the port used for `BACKEND_SERVICE_ENDPOINT`. Specifically, for CodeGen, append the following to your ssh command: 
+Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. 
 
-```bash
--L 7778:localhost:7778
-```
+>**Note**: On Xeon, it is recommended to use the 7B parameter model Qwen/Qwen2.5-Coder-7B-Instruct instead of the the 32B parameter model.
 
 Run the `set_env.sh` script.
 ```bash
@@ -200,22 +76,29 @@ source ./set_env.sh
 
 ## Deploy the Use Case
 
-In this tutorial, we will be deploying via docker compose with the provided
-YAML file.  The docker compose instructions should be starting all the
-above mentioned services as containers.
+Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. The vLLM or TGI service can be used for CodeGen.
+
+::::{tab-set}
+:::{tab-item} vllm
+:sync: vllm
 
 ```bash
 cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
-docker compose up -d
+docker compose --profile codegen-xeon-vllm up -d
 ```
+:::
+:::{tab-item} TGI
+:sync: TGI
 
+```bash
+cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
+docker compose --profile codegen-xeon-tgi up -d
+```
+:::
+::::
 
-### Checks to Ensure the Services are Running
-#### Check Startup and Env Variables
-Check the start up log by running `docker compose logs` to ensure there are no errors.
-The warning messages print out the variables if they are **NOT** set.
-
-Here are some sample messages if proxy environment variables are not set:
+### Check Env Variables
+After running `docker compose`, check for warning messages for environment variables that are **NOT** set. Address them if needed. 
 
     WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string.
     WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
@@ -230,40 +113,47 @@ Here are some sample messages if proxy environment variables are not set:
     WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
     WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
 
-#### Check the Container Status
+Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and `Healthy`.
+
+Run this command to see this info:
+```bash
+docker ps -a
+```
 
-Check if all the containers launched via docker compose has started.
+Sample output:
+```bash
+CONTAINER ID   IMAGE                                                   COMMAND                  CREATED              STATUS                        PORTS                                                                                      NAMES
+c6fed95320ee   opea/codegen-gradio-ui:latest                           "python codegen_ui_g…"   About a minute ago   Up About a minute             0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp                                                codegen-xeon-ui-server
+092d76d64623   opea/embedding:latest                                   "sh -c 'python $( [ …"   About a minute ago   Up About a minute             0.0.0.0:6000->6000/tcp, [::]:6000->6000/tcp                                                tei-embedding-server
+fdce54b2c46f   opea/dataprep:latest                                    "sh -c 'python $( [ …"   About a minute ago   Up About a minute             0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp                                                dataprep-redis-server
+b55224fdcf9d   opea/codegen:latest                                     "python codegen.py"      About a minute ago   Up About a minute             0.0.0.0:7778->7778/tcp, [::]:7778->7778/tcp                                                codegen-xeon-backend-server
+c9846f8592fd   opea/retriever:latest                                   "python opea_retriev…"   About a minute ago   Up About a minute             0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp                                                retriever-redis
+0afb0b6a455b   opea/llm-textgen:latest                                 "bash entrypoint.sh"     About a minute ago   Up About a minute                                                                                                        llm-textgen-server
+4550094ef0d7   redis/redis-stack:7.2.0-v9                              "/entrypoint.sh"         About a minute ago   Up About a minute             0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp   redis-vector-db
+fbda23354529   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5   "/bin/sh -c 'apt-get…"   About a minute ago   Up About a minute (healthy)   0.0.0.0:8090->80/tcp, [::]:8090->80/tcp                                                    tei-embedding-serving
+```
 
-The CodeGen example starts 4 docker containers. Check that these docker
-containers are all running, i.e, all the containers  `STATUS`  are  `Up`.
-You can do this with the `docker ps -a` command.
+Each docker container's log can also be checked using:
 
 ```bash
-CONTAINER ID   IMAGE                                                           COMMAND                  CREATED              STATUS              PORTS                                       NAMES
-bbd235074c3d   opea/codegen-ui:${RELEASE_VERSION}                                          "docker-entrypoint.s…"   About a minute ago   Up About a minute   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp   codegen-xeon-ui-server
-8d3872ca66fa   opea/codegen:${RELEASE_VERSION}                                             "python codegen.py"      About a minute ago   Up About a minute   0.0.0.0:7778->7778/tcp, :::7778->7778/tcp   codegen-xeom-backend-server
-b9fc39f51cdb   opea/llm-tgi:${RELEASE_VERSION}                                             "bash entrypoint.sh"     About a minute ago   Up About a minute   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp   llm-tgi-xeon-server
-39994e007f15   ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu   "text-generation-lau…"   About a minute ago   Up About a minute   0.0.0.0:8028->80/tcp, :::8028->80/tcp       tgi-server
+docker logs <CONTAINER_ID OR CONTAINER_NAME>
 ```
 
-## Interacting with CodeGen for Deployment
+## Validate Microservices
 
-This section will walk you through the different ways to interact with
-the microservices deployed. After a couple minutes, rerun `docker ps -a` 
-to ensure all the docker containers are still up and running. Then proceed 
-to validate each microservice and megaservice. 
+This section will walk through the different ways to interact with the microservices deployed.
 
-### TGI Service
+### vLLM or TGI Service
 
 ```bash
-curl http://${host_ip}:8028/generate \
-  -X POST \
-  -d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_new_tokens":256, "do_sample": true}}' \
-  -H 'Content-Type: application/json'
-```
+curl http://${host_ip}:8028/v1/chat/completions \
+    -X POST \
+    -H 'Content-Type: application/json' \
+    -d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}'
 
-Here is the output:
+```
 
+Here is sample output:
 ```bash
 {"generated_text":"Start with a user story. We will add story tests later. In this case, we'll choose a story about adding a TODO:\n    ```ruby\n    as a user,\n    i want to add a todo,\n    so that i can get a todo list.\n\n    conformance:\n    - a new todo is added to the list\n    - if the todo text is empty, raise an exception\n    ```\n\n1. Write the first test:\n    ```ruby\n    feature Testing the addition of a todo to the list\n\n    given a todo list empty list\n    when a user adds a todo\n    the todo should be added to the list\n\n    inputs:\n    when_values: [[\"A\"]]\n\n    output validations:\n    - todo_list contains { text:\"A\" }\n    ```\n\n1. Write the first step implementation in any programming language you like. In this case, we will choose Ruby:\n    ```ruby\n    def add_"}
 ```
@@ -273,117 +163,62 @@ Here is the output:
 ```bash
 curl http://${host_ip}:9000/v1/chat/completions\
   -X POST \
-  -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-  -H 'Content-Type: application/json'
+  -H 'Content-Type: application/json' \
+  -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}'
 ```
 
-The output is given one character at a time. It is too long to show 
-here but the last item will be
+The output code is printed one character at a time. It is too long to show here but the last item will be
 ```bash
 data: [DONE]
 ```
 
-### MegaService
+### Dataprep Microservice
+The following is a template only. Replace the filename placeholders with desired files.
 
+```bash
+curl http://${host_ip}:6007/v1/dataprep/ingest \
+-X POST \
+-H "Content-Type: multipart/form-data" \
+-F "files=@./file1.pdf" \
+-F "files=@./file2.txt" \
+-F "index_name=my_API_document"
+```
+
+### CodeGen Megaservice
+
+Default:
 ```bash
 curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{
      "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."
      }'
 ```
 
-The output is given one character at a time. It is too long to show 
-here but the last item will be
+The output code is printed one character at a time. It is too long to show here but the last item will be
 ```bash
 data: [DONE]
 ```
 
+The CodeGen Megaservice can also be utilized with RAG and Agents activated:
+```bash
+curl http://${host_ip}:7778/v1/codegen \
+  -H "Content-Type: application/json" \
+  -d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
+  ```
+
 ## Launch UI
-### Svelte UI
-To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
+### Gradio UI
+To access the frontend, open the following URL in a web browser: http://${host_ip}:5173. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below:
 ```yaml
   codegen-xeon-ui-server:
-    image: ${REGISTRY:-opea}/codegen-ui:${TAG:-latest}
+    image: ${REGISTRY:-opea}/codegen-gradio-ui:${TAG:-latest}
     ...
     ports:
       - "5173:5173"
 ```
 
-### React-Based UI (Optional)
-To access the React-based frontend, modify the UI service in the `compose.yaml` file. Replace `codegen-xeon-ui-server` service with the codegen-xeon-react-ui-server service as per the config below:
-```yaml
-codegen-xeon-react-ui-server:
-  image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest}
-  container_name: codegen-xeon-react-ui-server
-  environment:
-    - no_proxy=${no_proxy}
-    - https_proxy=${https_proxy}
-    - http_proxy=${http_proxy}
-    - APP_CODE_GEN_URL=${BACKEND_SERVICE_ENDPOINT}
-  depends_on:
-    - codegen-xeon-backend-server
-  ports:
-    - "5174:80"
-  ipc: host
-  restart: always
-```
-Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
-```yaml
-  codegen-xeon-react-ui-server:
-    image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest}
-    ...
-    ports:
-      - "80:80"
-```
-
-## Check Docker Container Logs
-
-You can check the log of a container by running this command:
-
-```bash
-docker logs <CONTAINER ID> -t
-```
-
-You can also check the overall logs with the following command, where the
-`compose.yaml` is the megaservice docker-compose configuration file.
-
-Assumming you are still in this directory `$WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon`,
-run the following command to check the logs:
-```bash
-docker compose -f compose.yaml logs
-```
-
-View the docker input parameters in  `$WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon/compose.yaml`
-
-```yaml
-  tgi-service:
-    image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
-    container_name: tgi-server
-    ports:
-      - "8028:80"
-    volumes:
-      - "./data:/data"
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      HABANA_VISIBLE_DEVICES: all
-      OMPI_MCA_btl_vader_single_copy_mechanism: none
-      HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-    runtime: habana
-    cap_add:
-      - SYS_NICE
-    ipc: host
-    command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
-```
-
-The input `--model-id` is  `${LLM_MODEL_ID}`. Ensure the environment variable `LLM_MODEL_ID` 
-is set correctly. Check spelling. Whenever this is changed, restart the containers to use 
-the newly selected model.
-
-
-## Stop the services
+## Stop the Services
 
-Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below:
+To stop and remove all the containers, use the command below:
 ```bash
-docker compose down
+docker compose -f compose.yaml down
 ```

From 4877a9be45f5c7095b766dec2f1ca8a357af49dd Mon Sep 17 00:00:00 2001
From: alexsin368 <alex.sin@intel.com>
Date: Tue, 15 Apr 2025 16:18:47 -0700
Subject: [PATCH 02/12] make using release version optional, update model info,
 fix stop services commands

Signed-off-by: alexsin368 <alex.sin@intel.com>
---
 tutorial/CodeGen/deploy/gaudi.md | 38 +++++++++++++++++++++++---------
 tutorial/CodeGen/deploy/xeon.md  | 38 +++++++++++++++++++++++---------
 2 files changed, 54 insertions(+), 22 deletions(-)

diff --git a/tutorial/CodeGen/deploy/gaudi.md b/tutorial/CodeGen/deploy/gaudi.md
index 4fb79c2a..90447b54 100644
--- a/tutorial/CodeGen/deploy/gaudi.md
+++ b/tutorial/CodeGen/deploy/gaudi.md
@@ -23,24 +23,24 @@ Port forwarding can be done by appending the -L input argument to the SSH comman
 -L 5173:localhost:5173 -L 6007:localhost:6007 -L 7778:localhost:7778
 ```
 
-Clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. Set a workspace path and the desired release version with the **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. 
-
+Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo.
 ```bash
-# Set workspace
 export WORKSPACE=<Path>
 cd $WORKSPACE
+git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples
+```
 
-# Set desired release version - number only
-export RELEASE_VERSION=<Release_Version>
-
-# GenAIExamples
-git clone https://github.com/opea-project/GenAIExamples.git
+**Optional** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used.
+```bash
+export RELEASE_VERSION=<Release_Version> # Set desired release version - number only
 cd GenAIExamples
 git checkout tags/v${RELEASE_VERSION}
 cd ..
 ```
 
-Set up [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Then set an environment variable with the HuggingFace token:
+Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). The [Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) model does not need special access, but the token can be used with other models requiring access.
+
+Set an environment variable with the HuggingFace token:
 ```bash
 export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
 ```
@@ -215,7 +215,23 @@ To access the frontend, open the following URL in a web browser: http://${host_i
 
 ## Stop the Services
 
-To stop and remove all the containers, use the command below:
+To stop and remove all the containers, use the commands below:
+
+::::{tab-set}
+:::{tab-item} vllm
+:sync: vllm
+
 ```bash
-docker compose -f compose.yaml down
+cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
+docker compose --profile codegen-xeon-vllm down
 ```
+:::
+:::{tab-item} TGI
+:sync: TGI
+
+```bash
+cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
+docker compose --profile codegen-xeon-tgi down
+```
+:::
+::::
diff --git a/tutorial/CodeGen/deploy/xeon.md b/tutorial/CodeGen/deploy/xeon.md
index df1fa3f9..8bc260a5 100644
--- a/tutorial/CodeGen/deploy/xeon.md
+++ b/tutorial/CodeGen/deploy/xeon.md
@@ -23,24 +23,24 @@ Port forwarding can be done by appending the -L input argument to the SSH comman
 -L 5173:localhost:5173 -L 6007:localhost:6007 -L 7778:localhost:7778
 ```
 
-Clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. Set a workspace path and the desired release version with the **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. 
-
+Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo.
 ```bash
-# Set workspace
 export WORKSPACE=<Path>
 cd $WORKSPACE
+git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples
+```
 
-# Set desired release version - number only
-export RELEASE_VERSION=<Release_Version>
-
-# GenAIExamples
-git clone https://github.com/opea-project/GenAIExamples.git
+**Optional** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used.
+```bash
+export RELEASE_VERSION=<Release_Version> # Set desired release version - number only
 cd GenAIExamples
 git checkout tags/v${RELEASE_VERSION}
 cd ..
 ```
 
-Set up [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). Then set an environment variable with the HuggingFace token:
+Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). The [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) model does not need special access, but the token can be used with other models requiring access.
+
+Set an environment variable with the HuggingFace token:
 ```bash
 export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
 ```
@@ -218,7 +218,23 @@ To access the frontend, open the following URL in a web browser: http://${host_i
 
 ## Stop the Services
 
-To stop and remove all the containers, use the command below:
+To stop and remove all the containers, use the commands below:
+
+::::{tab-set}
+:::{tab-item} vllm
+:sync: vllm
+
 ```bash
-docker compose -f compose.yaml down
+cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
+docker compose --profile codegen-xeon-vllm down
+```
+:::
+:::{tab-item} TGI
+:sync: TGI
+
+```bash
+cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
+docker compose --profile codegen-xeon-tgi down
 ```
+:::
+::::

From 6f3822d24a09cb967dc9f536a319633293488509 Mon Sep 17 00:00:00 2001
From: alexsin368 <alex.sin@intel.com>
Date: Tue, 15 Apr 2025 17:28:36 -0700
Subject: [PATCH 03/12] fix steps on port forwarding and getting UI to work

Signed-off-by: alexsin368 <alex.sin@intel.com>
---
 tutorial/CodeGen/deploy/gaudi.md | 12 +++++-------
 tutorial/CodeGen/deploy/xeon.md  | 13 +++++--------
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/tutorial/CodeGen/deploy/gaudi.md b/tutorial/CodeGen/deploy/gaudi.md
index 90447b54..4d7a7e4b 100644
--- a/tutorial/CodeGen/deploy/gaudi.md
+++ b/tutorial/CodeGen/deploy/gaudi.md
@@ -11,16 +11,12 @@ The solution is aimed to show how to use the Qwen2.5-Coder-32B-Instruct model on
 
 ## Prerequisites
 
-To run the UI on a web browser external to the host machine such as a laptop, the following ports need to be port forwarded:
-- 5173: UI port
-- 6007: dataprep port
+To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be port forwarded when using SSH to log in to the host machine:
 - 7778: CodeGen megaservice port
- 
-Port numbers may change. Refer to the CodeGen example's `set_env.sh` and `compose.yaml` files for running Docker compose. 
 
-Port forwarding can be done by appending the -L input argument to the SSH command when logging in to the host machine from a laptop:
+This port is used for `BACKEND_SERVICE_IP` defined in the `set_env.sh` for this example inside the `docker compose` folder. Specifically, for CodeGen, append the following to the ssh command: 
 ```bash
--L 5173:localhost:5173 -L 6007:localhost:6007 -L 7778:localhost:7778
+-L 7778:localhost:7778
 ```
 
 Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo.
@@ -65,6 +61,8 @@ CodeGen will use the following GenAIComps and corresponding tools. Tools and mod
 
 Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. 
 
+To run the UI on a web browser on a laptop, modify `BACKEND_SERVICE_IP` to use `localhost` or `127.0.0.1` instead of `host_ip` inside `set_env.sh` for the backend to properly receive data from the UI.
+
 Run the `set_env.sh` script.
 ```bash
 cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose
diff --git a/tutorial/CodeGen/deploy/xeon.md b/tutorial/CodeGen/deploy/xeon.md
index 8bc260a5..20101934 100644
--- a/tutorial/CodeGen/deploy/xeon.md
+++ b/tutorial/CodeGen/deploy/xeon.md
@@ -11,16 +11,12 @@ The solution is aimed to show how to use the Qwen2.5-Coder-7B-Instruct model on
 
 ## Prerequisites
 
-To run the UI on a web browser external to the host machine such as a laptop, the following ports need to be port forwarded:
-- 5173: UI port
-- 6007: dataprep port
+To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be port forwarded when using SSH to log in to the host machine:
 - 7778: CodeGen megaservice port
- 
-Port numbers may change. Refer to the CodeGen example's `set_env.sh` and `compose.yaml` files for running Docker compose. 
 
-Port forwarding can be done by appending the -L input argument to the SSH command when logging in to the host machine from a laptop:
+This port is used for `BACKEND_SERVICE_IP` defined in the `set_env.sh` for this example inside the `docker compose` folder. Specifically, for CodeGen, append the following to the ssh command: 
 ```bash
--L 5173:localhost:5173 -L 6007:localhost:6007 -L 7778:localhost:7778
+-L 7778:localhost:7778
 ```
 
 Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo.
@@ -63,11 +59,12 @@ CodeGen will use the following GenAIComps and corresponding tools. Tools and mod
 |LLM                  |   vLLM, TGI        | Qwen/Qwen2.5-Coder-7B-Instruct | OPEA Microservice |
 |UI                   |              | NA                        | Gateway Service |
 
-
 Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. 
 
 >**Note**: On Xeon, it is recommended to use the 7B parameter model Qwen/Qwen2.5-Coder-7B-Instruct instead of the the 32B parameter model.
 
+To run the UI on a web browser on a laptop, modify `BACKEND_SERVICE_IP` to use `localhost` or `127.0.0.1` instead of `host_ip` inside `set_env.sh` for the backend to properly receive data from the UI.
+
 Run the `set_env.sh` script.
 ```bash
 cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/

From d8ff152ff4f7364d3f84a5e83c7b046da84701ac Mon Sep 17 00:00:00 2001
From: alexsin368 <alex.sin@intel.com>
Date: Wed, 16 Apr 2025 09:29:24 -0700
Subject: [PATCH 04/12] minor fixes

Signed-off-by: alexsin368 <alex.sin@intel.com>
---
 tutorial/CodeGen/deploy/gaudi.md | 24 ++++++++++++++++--------
 tutorial/CodeGen/deploy/xeon.md  | 20 ++++++++++++++------
 2 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/tutorial/CodeGen/deploy/gaudi.md b/tutorial/CodeGen/deploy/gaudi.md
index 4d7a7e4b..27100d3c 100644
--- a/tutorial/CodeGen/deploy/gaudi.md
+++ b/tutorial/CodeGen/deploy/gaudi.md
@@ -71,6 +71,11 @@ source ./set_env.sh
 
 ## Deploy the Use Case
 
+Navigate to the `docker compose` directory for this hardware platform.
+```bash
+cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
+```
+
 Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. The vLLM or TGI service can be used for CodeGen.
 
 ::::{tab-set}
@@ -78,7 +83,6 @@ Run `docker compose` with the provided YAML file to start all the services menti
 :sync: vllm
 
 ```bash
-cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
 docker compose --profile codegen-gaudi-vllm up -d
 ```
 :::
@@ -86,7 +90,6 @@ docker compose --profile codegen-gaudi-vllm up -d
 :sync: TGI
 
 ```bash
-cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
 docker compose --profile codegen-gaudi-tgi up -d
 ```
 :::
@@ -202,17 +205,24 @@ curl http://${host_ip}:7778/v1/codegen \
 
 ## Launch UI
 ### Gradio UI
-To access the frontend, open the following URL in a web browser: http://${host_ip}:5173. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below:
+To access the frontend, open the following URL in a web browser: http://${host_ip}:5173. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend by modifying the port mapping in the `compose.yaml` file as shown below:
 ```yaml
   codegen-gaudi-ui-server:
     image: ${REGISTRY:-opea}/codegen-gradio-ui:${TAG:-latest}
     ...
     ports:
-      - "5173:5173"
+      - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to the desired port
 ```
 
+After making this change, rebuild and restart the containers for the change to take effect. 
+
 ## Stop the Services
 
+Navigate to the `docker compose` directory for this hardware platform.
+```bash
+cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
+```
+
 To stop and remove all the containers, use the commands below:
 
 ::::{tab-set}
@@ -220,16 +230,14 @@ To stop and remove all the containers, use the commands below:
 :sync: vllm
 
 ```bash
-cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
-docker compose --profile codegen-xeon-vllm down
+docker compose --profile codegen-gaudi-vllm down
 ```
 :::
 :::{tab-item} TGI
 :sync: TGI
 
 ```bash
-cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
-docker compose --profile codegen-xeon-tgi down
+docker compose --profile codegen-gaudi-tgi down
 ```
 :::
 ::::
diff --git a/tutorial/CodeGen/deploy/xeon.md b/tutorial/CodeGen/deploy/xeon.md
index 20101934..1cf2b843 100644
--- a/tutorial/CodeGen/deploy/xeon.md
+++ b/tutorial/CodeGen/deploy/xeon.md
@@ -73,6 +73,11 @@ source ./set_env.sh
 
 ## Deploy the Use Case
 
+Navigate to the `docker compose` directory for this hardware platform.
+```bash
+cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
+```
+
 Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. The vLLM or TGI service can be used for CodeGen.
 
 ::::{tab-set}
@@ -80,7 +85,6 @@ Run `docker compose` with the provided YAML file to start all the services menti
 :sync: vllm
 
 ```bash
-cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
 docker compose --profile codegen-xeon-vllm up -d
 ```
 :::
@@ -88,7 +92,6 @@ docker compose --profile codegen-xeon-vllm up -d
 :sync: TGI
 
 ```bash
-cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
 docker compose --profile codegen-xeon-tgi up -d
 ```
 :::
@@ -204,17 +207,24 @@ curl http://${host_ip}:7778/v1/codegen \
 
 ## Launch UI
 ### Gradio UI
-To access the frontend, open the following URL in a web browser: http://${host_ip}:5173. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend. Simply modify the port mapping in the `compose.yaml` file as shown below:
+To access the frontend, open the following URL in a web browser: http://${host_ip}:5173. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend by modifying the port mapping in the `compose.yaml` file as shown below:
 ```yaml
   codegen-xeon-ui-server:
     image: ${REGISTRY:-opea}/codegen-gradio-ui:${TAG:-latest}
     ...
     ports:
-      - "5173:5173"
+      - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to the desired port
 ```
 
+After making this change, rebuild and restart the containers for the change to take effect. 
+
 ## Stop the Services
 
+Navigate to the `docker compose` directory for this hardware platform.
+```bash
+cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
+```
+
 To stop and remove all the containers, use the commands below:
 
 ::::{tab-set}
@@ -222,7 +232,6 @@ To stop and remove all the containers, use the commands below:
 :sync: vllm
 
 ```bash
-cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
 docker compose --profile codegen-xeon-vllm down
 ```
 :::
@@ -230,7 +239,6 @@ docker compose --profile codegen-xeon-vllm down
 :sync: TGI
 
 ```bash
-cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
 docker compose --profile codegen-xeon-tgi down
 ```
 :::

From b0d091b6fe2b46c14939ab8875aa26a0796fd205 Mon Sep 17 00:00:00 2001
From: alexsin368 <alex.sin@intel.com>
Date: Wed, 16 Apr 2025 13:26:47 -0700
Subject: [PATCH 05/12] fix typo

Signed-off-by: alexsin368 <alex.sin@intel.com>
---
 tutorial/CodeGen/deploy/gaudi.md | 4 ++--
 tutorial/CodeGen/deploy/xeon.md  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tutorial/CodeGen/deploy/gaudi.md b/tutorial/CodeGen/deploy/gaudi.md
index 27100d3c..6bfbe294 100644
--- a/tutorial/CodeGen/deploy/gaudi.md
+++ b/tutorial/CodeGen/deploy/gaudi.md
@@ -14,7 +14,7 @@ The solution is aimed to show how to use the Qwen2.5-Coder-32B-Instruct model on
 To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be port forwarded when using SSH to log in to the host machine:
 - 7778: CodeGen megaservice port
 
-This port is used for `BACKEND_SERVICE_IP` defined in the `set_env.sh` for this example inside the `docker compose` folder. Specifically, for CodeGen, append the following to the ssh command: 
+This port is used for `BACKEND_SERVICE_ENDPOINT` defined in the `set_env.sh` for this example inside the `docker compose` folder. Specifically, for CodeGen, append the following to the ssh command: 
 ```bash
 -L 7778:localhost:7778
 ```
@@ -61,7 +61,7 @@ CodeGen will use the following GenAIComps and corresponding tools. Tools and mod
 
 Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. 
 
-To run the UI on a web browser on a laptop, modify `BACKEND_SERVICE_IP` to use `localhost` or `127.0.0.1` instead of `host_ip` inside `set_env.sh` for the backend to properly receive data from the UI.
+To run the UI on a web browser on a laptop, modify `BACKEND_SERVICE_ENDPOINT` to use `localhost` or `127.0.0.1` instead of `host_ip` inside `set_env.sh` for the backend to properly receive data from the UI.
 
 Run the `set_env.sh` script.
 ```bash
diff --git a/tutorial/CodeGen/deploy/xeon.md b/tutorial/CodeGen/deploy/xeon.md
index 1cf2b843..f8e95632 100644
--- a/tutorial/CodeGen/deploy/xeon.md
+++ b/tutorial/CodeGen/deploy/xeon.md
@@ -14,7 +14,7 @@ The solution is aimed to show how to use the Qwen2.5-Coder-7B-Instruct model on
 To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be port forwarded when using SSH to log in to the host machine:
 - 7778: CodeGen megaservice port
 
-This port is used for `BACKEND_SERVICE_IP` defined in the `set_env.sh` for this example inside the `docker compose` folder. Specifically, for CodeGen, append the following to the ssh command: 
+This port is used for `BACKEND_SERVICE_ENDPOINT` defined in the `set_env.sh` for this example inside the `docker compose` folder. Specifically, for CodeGen, append the following to the ssh command: 
 ```bash
 -L 7778:localhost:7778
 ```
@@ -63,7 +63,7 @@ Set the necessary environment variables to set up the use case. To swap out mode
 
 >**Note**: On Xeon, it is recommended to use the 7B parameter model Qwen/Qwen2.5-Coder-7B-Instruct instead of the the 32B parameter model.
 
-To run the UI on a web browser on a laptop, modify `BACKEND_SERVICE_IP` to use `localhost` or `127.0.0.1` instead of `host_ip` inside `set_env.sh` for the backend to properly receive data from the UI.
+To run the UI on a web browser on a laptop, modify `BACKEND_SERVICE_ENDPOINT` to use `localhost` or `127.0.0.1` instead of `host_ip` inside `set_env.sh` for the backend to properly receive data from the UI.
 
 Run the `set_env.sh` script.
 ```bash

From e9df2269369361f87f5e7c7e2773b4fd2616c950 Mon Sep 17 00:00:00 2001
From: alexsin368 <alex.sin@intel.com>
Date: Thu, 17 Apr 2025 16:42:03 -0700
Subject: [PATCH 06/12] update wording

Signed-off-by: alexsin368 <alex.sin@intel.com>
---
 tutorial/CodeGen/deploy/gaudi.md | 11 +++++------
 tutorial/CodeGen/deploy/xeon.md  | 13 ++++++-------
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/tutorial/CodeGen/deploy/gaudi.md b/tutorial/CodeGen/deploy/gaudi.md
index 6bfbe294..195580e4 100644
--- a/tutorial/CodeGen/deploy/gaudi.md
+++ b/tutorial/CodeGen/deploy/gaudi.md
@@ -1,13 +1,12 @@
 # Single node on-prem deployment on Gaudi AI Accelerator
 
-This deployment section covers single-node on-prem deployment of the CodeGen example. It will show how to build an end-to-end CodeGen solution with the Qwen2.5-Coder-32B-Instruct model deployed on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the
-[Getting Started](../../../getting-started/README.md) section. 
+This deployment section covers single-node on-prem deployment of the CodeGen example. It will show how to build an end-to-end CodeGen solution with the `Qwen2.5-Coder-32B-Instruct` model deployed on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started](../../../getting-started/README.md) section. 
 
 ## Overview
 
 The CodeGen use case uses a single microservice called LLM with model serving done with vLLM or TGI.
 
-The solution is aimed to show how to use the Qwen2.5-Coder-32B-Instruct model on the Intel® Gaudi® AI Accelerators. Steps will include setting up docker containers, taking text input as the prompt, and generating code. There are multiple versions of the UI that can be deployed but only the Gradio-based one will be covered in this tutorial.
+This solution is designed to demonstrate the use the `Qwen2.5-Coder-7B-Instruct` model for code generation on Intel® Gaudi® AI Accelerators. The steps will involve setting up Docker containers, taking text input as the prompt, and generating code. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version.
 
 ## Prerequisites
 
@@ -36,7 +35,7 @@ cd ..
 
 Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). The [Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) model does not need special access, but the token can be used with other models requiring access.
 
-Set an environment variable with the HuggingFace token:
+Set the `HUGGINGFACEHUB_API_TOKEN` environment variable to the value of the Hugging Face token by executing the following command:
 ```bash
 export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
 ```
@@ -52,7 +51,7 @@ export https_proxy=${your_http_proxy}
 
 ## Use Case Setup
 
-CodeGen will use the following GenAIComps and corresponding tools. Tools and models mentioned in the table are configurable either through environment variables in the `set_env.sh` or `compose.yaml` file.
+CodeGen will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the `set_env.sh` script or the `compose.yaml` file.
 
 |Use Case Components | Tools | Model     | Service Type |
 |----------------     |--------------|-----------------------------|-------|
@@ -139,7 +138,7 @@ docker logs <CONTAINER_ID OR CONTAINER_NAME>
 
 ## Validate Microservices
 
-This section will walk through the different ways to interact with the microservices deployed.
+This section will guide through the various methods for interacting with the deployed microservices.
 
 ### vLLM or TGI Service
 
diff --git a/tutorial/CodeGen/deploy/xeon.md b/tutorial/CodeGen/deploy/xeon.md
index f8e95632..ab3ccb1f 100644
--- a/tutorial/CodeGen/deploy/xeon.md
+++ b/tutorial/CodeGen/deploy/xeon.md
@@ -1,13 +1,12 @@
 # Single node on-prem deployment on Xeon
 
-This deployment section covers single-node on-prem deployment of the CodeGen example. It will show how to build an end-to-end CodeGen solution with the Qwen2.5-Coder-7B-Instruct model deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the
-[Getting Started](../../../getting-started/README.md) section. 
+This deployment section covers single-node on-prem deployment of the CodeGen example. It will show how to build an end-to-end CodeGen solution with the `Qwen2.5-Coder-32B-Instruct` model deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started](../../../getting-started/README.md) section.
 
 ## Overview
 
 The CodeGen use case uses a single microservice called LLM with model serving done with vLLM or TGI.
 
-The solution is aimed to show how to use the Qwen2.5-Coder-7B-Instruct model on the Intel® Xeon® Scalable processors. Steps will include setting up docker containers, taking text input as the prompt, and generating code. There are multiple versions of the UI that can be deployed but only the Gradio-based one will be covered in this tutorial.
+This solution is designed to demonstrate the use the `Qwen2.5-Coder-7B-Instruct` model for code generation on Intel® Xeon® Scalable processors. The steps will involve setting up Docker containers, taking text input as the prompt, and generating code. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version.
 
 ## Prerequisites
 
@@ -36,7 +35,7 @@ cd ..
 
 Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). The [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) model does not need special access, but the token can be used with other models requiring access.
 
-Set an environment variable with the HuggingFace token:
+Set the `HUGGINGFACEHUB_API_TOKEN` environment variable to the value of the Hugging Face token by executing the following command:
 ```bash
 export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
 ```
@@ -52,7 +51,7 @@ export https_proxy=${your_http_proxy}
 
 ## Use Case Setup
 
-CodeGen will use the following GenAIComps and corresponding tools. Tools and models mentioned in the table are configurable either through environment variables in the `set_env.sh` or `compose.yaml` file.
+CodeGen will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the `set_env.sh` script or the `compose.yaml` file.
 
 |Use Case Components | Tools | Model     | Service Type |
 |----------------     |--------------|-----------------------------|-------|
@@ -61,7 +60,7 @@ CodeGen will use the following GenAIComps and corresponding tools. Tools and mod
 
 Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. 
 
->**Note**: On Xeon, it is recommended to use the 7B parameter model Qwen/Qwen2.5-Coder-7B-Instruct instead of the the 32B parameter model.
+>**Note**: On Xeon, it is recommended to use the 7B parameter model `Qwen/Qwen2.5-Coder-7B-Instruct` instead of the the 32B parameter model.
 
 To run the UI on a web browser on a laptop, modify `BACKEND_SERVICE_ENDPOINT` to use `localhost` or `127.0.0.1` instead of `host_ip` inside `set_env.sh` for the backend to properly receive data from the UI.
 
@@ -141,7 +140,7 @@ docker logs <CONTAINER_ID OR CONTAINER_NAME>
 
 ## Validate Microservices
 
-This section will walk through the different ways to interact with the microservices deployed.
+This section will guide through the various methods for interacting with the deployed microservices.
 
 ### vLLM or TGI Service
 

From 4547691a89004032dabeecb171467db99283cb17 Mon Sep 17 00:00:00 2001
From: alexsin368 <alex.sin@intel.com>
Date: Thu, 17 Apr 2025 17:21:47 -0700
Subject: [PATCH 07/12] fix typo

Signed-off-by: alexsin368 <alex.sin@intel.com>
---
 tutorial/CodeGen/deploy/gaudi.md | 2 +-
 tutorial/CodeGen/deploy/xeon.md  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tutorial/CodeGen/deploy/gaudi.md b/tutorial/CodeGen/deploy/gaudi.md
index 195580e4..83f178c9 100644
--- a/tutorial/CodeGen/deploy/gaudi.md
+++ b/tutorial/CodeGen/deploy/gaudi.md
@@ -6,7 +6,7 @@ This deployment section covers single-node on-prem deployment of the CodeGen exa
 
 The CodeGen use case uses a single microservice called LLM with model serving done with vLLM or TGI.
 
-This solution is designed to demonstrate the use the `Qwen2.5-Coder-7B-Instruct` model for code generation on Intel® Gaudi® AI Accelerators. The steps will involve setting up Docker containers, taking text input as the prompt, and generating code. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version.
+This solution is designed to demonstrate the use of the `Qwen2.5-Coder-7B-Instruct` model for code generation on Intel® Gaudi® AI Accelerators. The steps will involve setting up Docker containers, taking text input as the prompt, and generating code. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version.
 
 ## Prerequisites
 
diff --git a/tutorial/CodeGen/deploy/xeon.md b/tutorial/CodeGen/deploy/xeon.md
index ab3ccb1f..d99a1d9c 100644
--- a/tutorial/CodeGen/deploy/xeon.md
+++ b/tutorial/CodeGen/deploy/xeon.md
@@ -6,7 +6,7 @@ This deployment section covers single-node on-prem deployment of the CodeGen exa
 
 The CodeGen use case uses a single microservice called LLM with model serving done with vLLM or TGI.
 
-This solution is designed to demonstrate the use the `Qwen2.5-Coder-7B-Instruct` model for code generation on Intel® Xeon® Scalable processors. The steps will involve setting up Docker containers, taking text input as the prompt, and generating code. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version.
+This solution is designed to demonstrate the use of the `Qwen2.5-Coder-7B-Instruct` model for code generation on Intel® Xeon® Scalable processors. The steps will involve setting up Docker containers, taking text input as the prompt, and generating code. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version.
 
 ## Prerequisites
 

From bf5a3ffc7c3c2bea855d3b54d7fb572880740a09 Mon Sep 17 00:00:00 2001
From: alexsin368 <alex.sin@intel.com>
Date: Thu, 17 Apr 2025 17:24:50 -0700
Subject: [PATCH 08/12] fix typo

Signed-off-by: alexsin368 <alex.sin@intel.com>
---
 tutorial/CodeGen/deploy/gaudi.md | 2 +-
 tutorial/CodeGen/deploy/xeon.md  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tutorial/CodeGen/deploy/gaudi.md b/tutorial/CodeGen/deploy/gaudi.md
index 83f178c9..d55623d4 100644
--- a/tutorial/CodeGen/deploy/gaudi.md
+++ b/tutorial/CodeGen/deploy/gaudi.md
@@ -1,6 +1,6 @@
 # Single node on-prem deployment on Gaudi AI Accelerator
 
-This deployment section covers single-node on-prem deployment of the CodeGen example. It will show how to build an end-to-end CodeGen solution with the `Qwen2.5-Coder-32B-Instruct` model deployed on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started](../../../getting-started/README.md) section. 
+This section covers single-node on-prem deployment of the CodeGen example. It will show how to build an end-to-end CodeGen solution with the `Qwen2.5-Coder-32B-Instruct` model deployed on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started](../../../getting-started/README.md) section. 
 
 ## Overview
 
diff --git a/tutorial/CodeGen/deploy/xeon.md b/tutorial/CodeGen/deploy/xeon.md
index d99a1d9c..ea26e76a 100644
--- a/tutorial/CodeGen/deploy/xeon.md
+++ b/tutorial/CodeGen/deploy/xeon.md
@@ -1,6 +1,6 @@
 # Single node on-prem deployment on Xeon
 
-This deployment section covers single-node on-prem deployment of the CodeGen example. It will show how to build an end-to-end CodeGen solution with the `Qwen2.5-Coder-32B-Instruct` model deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started](../../../getting-started/README.md) section.
+This section covers single-node on-prem deployment of the CodeGen example. It will show how to build an end-to-end CodeGen solution with the `Qwen2.5-Coder-32B-Instruct` model deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started](../../../getting-started/README.md) section.
 
 ## Overview
 

From 7fa7e9a57fa8015ff591c20b6430c2c636c6adf8 Mon Sep 17 00:00:00 2001
From: alexsin368 <alex.sin@intel.com>
Date: Thu, 17 Apr 2025 18:23:25 -0700
Subject: [PATCH 09/12] fix typo

Signed-off-by: alexsin368 <alex.sin@intel.com>
---
 tutorial/CodeGen/deploy/xeon.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tutorial/CodeGen/deploy/xeon.md b/tutorial/CodeGen/deploy/xeon.md
index ea26e76a..189d57c3 100644
--- a/tutorial/CodeGen/deploy/xeon.md
+++ b/tutorial/CodeGen/deploy/xeon.md
@@ -60,7 +60,7 @@ CodeGen will utilize the following GenAIComps services and associated tools. The
 
 Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. 
 
->**Note**: On Xeon, it is recommended to use the 7B parameter model `Qwen/Qwen2.5-Coder-7B-Instruct` instead of the the 32B parameter model.
+>**Note**: On Xeon, it is recommended to use the 7B parameter model `Qwen/Qwen2.5-Coder-7B-Instruct` instead of the 32B parameter model.
 
 To run the UI on a web browser on a laptop, modify `BACKEND_SERVICE_ENDPOINT` to use `localhost` or `127.0.0.1` instead of `host_ip` inside `set_env.sh` for the backend to properly receive data from the UI.
 

From 5400acb6198e8ca1d5f90b7d29359d34dd3819e8 Mon Sep 17 00:00:00 2001
From: alexsin368 <alex.sin@intel.com>
Date: Fri, 18 Apr 2025 10:49:18 -0700
Subject: [PATCH 10/12] minor fixes, 7B model on Xeon, 32GB on Gaudi

Signed-off-by: alexsin368 <alex.sin@intel.com>
---
 tutorial/CodeGen/deploy/gaudi.md | 29 ++++++++++++++++-------------
 tutorial/CodeGen/deploy/xeon.md  | 26 ++++++++++++++------------
 2 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/tutorial/CodeGen/deploy/gaudi.md b/tutorial/CodeGen/deploy/gaudi.md
index d55623d4..e17bda33 100644
--- a/tutorial/CodeGen/deploy/gaudi.md
+++ b/tutorial/CodeGen/deploy/gaudi.md
@@ -1,16 +1,16 @@
 # Single node on-prem deployment on Gaudi AI Accelerator
 
-This section covers single-node on-prem deployment of the CodeGen example. It will show how to build an end-to-end CodeGen solution with the `Qwen2.5-Coder-32B-Instruct` model deployed on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started](../../../getting-started/README.md) section. 
+This section covers single-node on-prem deployment of the CodeGen example. It will show how to deploy an end-to-end CodeGen solution with the `Qwen2.5-Coder-32B-Instruct` model running on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started](../../../getting-started/README.md) section. 
 
 ## Overview
 
 The CodeGen use case uses a single microservice called LLM with model serving done with vLLM or TGI.
 
-This solution is designed to demonstrate the use of the `Qwen2.5-Coder-7B-Instruct` model for code generation on Intel® Gaudi® AI Accelerators. The steps will involve setting up Docker containers, taking text input as the prompt, and generating code. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version.
+This solution is designed to demonstrate the use of the `Qwen2.5-Coder-32B-Instruct` model for code generation on Intel® Gaudi® AI Accelerators. The steps will involve setting up Docker containers, taking text input as the prompt, and generating code. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version.
 
 ## Prerequisites
 
-To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be port forwarded when using SSH to log in to the host machine:
+To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be forwarded when using SSH to log in to the host machine:
 - 7778: CodeGen megaservice port
 
 This port is used for `BACKEND_SERVICE_ENDPOINT` defined in the `set_env.sh` for this example inside the `docker compose` folder. Specifically, for CodeGen, append the following to the ssh command: 
@@ -110,7 +110,7 @@ After running `docker compose`, check for warning messages for environment varia
     WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
     WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
 
-Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and `Healthy`.
+Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and in some cases `Healthy`.
 
 Run this command to see this info:
 ```bash
@@ -119,15 +119,18 @@ docker ps -a
 
 Sample output: 
 ```bash
-CONTAINER ID   IMAGE                                                   COMMAND                  CREATED              STATUS                        PORTS                                                                                      NAMES
-c6fed95320ee   opea/codegen-gradio-ui:latest                           "python codegen_ui_g…"   About a minute ago   Up About a minute             0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp                                                codegen-gaudi-ui-server
-092d76d64623   opea/embedding:latest                                   "sh -c 'python $( [ …"   About a minute ago   Up About a minute             0.0.0.0:6000->6000/tcp, [::]:6000->6000/tcp                                                tei-embedding-server
-fdce54b2c46f   opea/dataprep:latest                                    "sh -c 'python $( [ …"   About a minute ago   Up About a minute             0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp                                                dataprep-redis-server
-b55224fdcf9d   opea/codegen:latest                                     "python codegen.py"      About a minute ago   Up About a minute             0.0.0.0:7778->7778/tcp, [::]:7778->7778/tcp                                                codegen-gaudi-backend-server
-c9846f8592fd   opea/retriever:latest                                   "python opea_retriev…"   About a minute ago   Up About a minute             0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp                                                retriever-redis
-0afb0b6a455b   opea/llm-textgen:latest                                 "bash entrypoint.sh"     About a minute ago   Up About a minute                                                                                                        llm-textgen-server
-4550094ef0d7   redis/redis-stack:7.2.0-v9                              "/entrypoint.sh"         About a minute ago   Up About a minute             0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp   redis-vector-db
-fbda23354529   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5   "/bin/sh -c 'apt-get…"   About a minute ago   Up About a minute (healthy)   0.0.0.0:8090->80/tcp, [::]:8090->80/tcp                                                    tei-embedding-serving
+CONTAINER ID   IMAGE                                                   COMMAND                  CREATED         STATUS                   PORTS                                                                                      NAMES
+0040b340a392   opea/codegen-gradio-ui:latest                           "python codegen_ui_g…"   4 minutes ago   Up 3 minutes             0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp                                                codegen-gaudi-ui-server
+3d2c7deacf5b   opea/codegen:latest                                     "python codegen.py"      4 minutes ago   Up 3 minutes             0.0.0.0:7778->7778/tcp, [::]:7778->7778/tcp                                                codegen-gaudi-backend-server
+ad59907292fe   opea/dataprep:latest                                    "sh -c 'python $( [ …"   4 minutes ago   Up 4 minutes (healthy)   0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp                                                dataprep-redis-server
+2cb4e0a6562e   opea/retriever:latest                                   "python opea_retriev…"   4 minutes ago   Up 4 minutes             0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp                                                retriever-redis
+f787f774890b   opea/llm-textgen:latest                                 "bash entrypoint.sh"     4 minutes ago   Up About a minute        0.0.0.0:9000->9000/tcp, [::]:9000->9000/tcp                                                llm-codegen-vllm-server
+5880b86091a5   opea/embedding:latest                                   "sh -c 'python $( [ …"   4 minutes ago   Up 4 minutes             0.0.0.0:6000->6000/tcp, [::]:6000->6000/tcp                                                tei-embedding-server
+cd16e3c72f17   opea/llm-textgen:latest                                 "bash entrypoint.sh"     4 minutes ago   Up 4 minutes                                                                                                        llm-textgen-server
+cd412bca7245   redis/redis-stack:7.2.0-v9                              "/entrypoint.sh"         4 minutes ago   Up 4 minutes             0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp   redis-vector-db
+8d4e77afc067   opea/vllm:latest                                        "python3 -m vllm.ent…"   4 minutes ago   Up 4 minutes (healthy)   0.0.0.0:8028->80/tcp, [::]:8028->80/tcp                                                    vllm-server
+f7c1cb49b96b   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5   "/bin/sh -c 'apt-get…"   4 minutes ago   Up 4 minutes (healthy)   0.0.0.0:8090->80/tcp, [::]:8090->80/tcp                                                    tei-embedding-serving
+
 ```
 
 Each docker container's log can also be checked using:
diff --git a/tutorial/CodeGen/deploy/xeon.md b/tutorial/CodeGen/deploy/xeon.md
index 189d57c3..1844c5ff 100644
--- a/tutorial/CodeGen/deploy/xeon.md
+++ b/tutorial/CodeGen/deploy/xeon.md
@@ -1,6 +1,6 @@
 # Single node on-prem deployment on Xeon
 
-This section covers single-node on-prem deployment of the CodeGen example. It will show how to build an end-to-end CodeGen solution with the `Qwen2.5-Coder-32B-Instruct` model deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started](../../../getting-started/README.md) section.
+This section covers single-node on-prem deployment of the CodeGen example. It will show how to deploy an end-to-end CodeGen solution with the `Qwen2.5-Coder-7B-Instruct` model running on Intel® Xeon® Scalable processors. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started](../../../getting-started/README.md) section.
 
 ## Overview
 
@@ -10,7 +10,7 @@ This solution is designed to demonstrate the use of the `Qwen2.5-Coder-7B-Instru
 
 ## Prerequisites
 
-To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be port forwarded when using SSH to log in to the host machine:
+To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be forwarded when using SSH to log in to the host machine:
 - 7778: CodeGen megaservice port
 
 This port is used for `BACKEND_SERVICE_ENDPOINT` defined in the `set_env.sh` for this example inside the `docker compose` folder. Specifically, for CodeGen, append the following to the ssh command: 
@@ -112,7 +112,7 @@ After running `docker compose`, check for warning messages for environment varia
     WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
     WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
 
-Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and `Healthy`.
+Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and in some cases `Healthy`.
 
 Run this command to see this info:
 ```bash
@@ -121,15 +121,17 @@ docker ps -a
 
 Sample output:
 ```bash
-CONTAINER ID   IMAGE                                                   COMMAND                  CREATED              STATUS                        PORTS                                                                                      NAMES
-c6fed95320ee   opea/codegen-gradio-ui:latest                           "python codegen_ui_g…"   About a minute ago   Up About a minute             0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp                                                codegen-xeon-ui-server
-092d76d64623   opea/embedding:latest                                   "sh -c 'python $( [ …"   About a minute ago   Up About a minute             0.0.0.0:6000->6000/tcp, [::]:6000->6000/tcp                                                tei-embedding-server
-fdce54b2c46f   opea/dataprep:latest                                    "sh -c 'python $( [ …"   About a minute ago   Up About a minute             0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp                                                dataprep-redis-server
-b55224fdcf9d   opea/codegen:latest                                     "python codegen.py"      About a minute ago   Up About a minute             0.0.0.0:7778->7778/tcp, [::]:7778->7778/tcp                                                codegen-xeon-backend-server
-c9846f8592fd   opea/retriever:latest                                   "python opea_retriev…"   About a minute ago   Up About a minute             0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp                                                retriever-redis
-0afb0b6a455b   opea/llm-textgen:latest                                 "bash entrypoint.sh"     About a minute ago   Up About a minute                                                                                                        llm-textgen-server
-4550094ef0d7   redis/redis-stack:7.2.0-v9                              "/entrypoint.sh"         About a minute ago   Up About a minute             0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp   redis-vector-db
-fbda23354529   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5   "/bin/sh -c 'apt-get…"   About a minute ago   Up About a minute (healthy)   0.0.0.0:8090->80/tcp, [::]:8090->80/tcp                                                    tei-embedding-serving
+CONTAINER ID   IMAGE                                                   COMMAND                  CREATED         STATUS                   PORTS                                                                                      NAMES
+0040b340a392   opea/codegen-gradio-ui:latest                           "python codegen_ui_g…"   4 minutes ago   Up 3 minutes             0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp                                                codegen-xeon-ui-server
+3d2c7deacf5b   opea/codegen:latest                                     "python codegen.py"      4 minutes ago   Up 3 minutes             0.0.0.0:7778->7778/tcp, [::]:7778->7778/tcp                                                codegen-xeon-backend-server
+ad59907292fe   opea/dataprep:latest                                    "sh -c 'python $( [ …"   4 minutes ago   Up 4 minutes (healthy)   0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp                                                dataprep-redis-server
+2cb4e0a6562e   opea/retriever:latest                                   "python opea_retriev…"   4 minutes ago   Up 4 minutes             0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp                                                retriever-redis
+f787f774890b   opea/llm-textgen:latest                                 "bash entrypoint.sh"     4 minutes ago   Up About a minute        0.0.0.0:9000->9000/tcp, [::]:9000->9000/tcp                                                llm-codegen-vllm-server
+5880b86091a5   opea/embedding:latest                                   "sh -c 'python $( [ …"   4 minutes ago   Up 4 minutes             0.0.0.0:6000->6000/tcp, [::]:6000->6000/tcp                                                tei-embedding-server
+cd16e3c72f17   opea/llm-textgen:latest                                 "bash entrypoint.sh"     4 minutes ago   Up 4 minutes                                                                                                        llm-textgen-server
+cd412bca7245   redis/redis-stack:7.2.0-v9                              "/entrypoint.sh"         4 minutes ago   Up 4 minutes             0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp   redis-vector-db
+8d4e77afc067   opea/vllm:latest                                        "python3 -m vllm.ent…"   4 minutes ago   Up 4 minutes (healthy)   0.0.0.0:8028->80/tcp, [::]:8028->80/tcp                                                    vllm-server
+f7c1cb49b96b   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5   "/bin/sh -c 'apt-get…"   4 minutes ago   Up 4 minutes (healthy)   0.0.0.0:8090->80/tcp, [::]:8090->80/tcp                                                    tei-embedding-serving
 ```
 
 Each docker container's log can also be checked using:

From 71a5b682d4ff2788855e994e032d2e7a2fd680cc Mon Sep 17 00:00:00 2001
From: alexsin368 <alex.sin@intel.com>
Date: Fri, 18 Apr 2025 11:01:34 -0700
Subject: [PATCH 11/12] remove mention of rebuilding containers

Signed-off-by: alexsin368 <alex.sin@intel.com>
---
 tutorial/CodeGen/deploy/gaudi.md | 2 +-
 tutorial/CodeGen/deploy/xeon.md  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tutorial/CodeGen/deploy/gaudi.md b/tutorial/CodeGen/deploy/gaudi.md
index e17bda33..cfd75abf 100644
--- a/tutorial/CodeGen/deploy/gaudi.md
+++ b/tutorial/CodeGen/deploy/gaudi.md
@@ -216,7 +216,7 @@ To access the frontend, open the following URL in a web browser: http://${host_i
       - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to the desired port
 ```
 
-After making this change, rebuild and restart the containers for the change to take effect. 
+After making this change, restart the containers for the change to take effect. 
 
 ## Stop the Services
 
diff --git a/tutorial/CodeGen/deploy/xeon.md b/tutorial/CodeGen/deploy/xeon.md
index 5929d335..f72f8459 100644
--- a/tutorial/CodeGen/deploy/xeon.md
+++ b/tutorial/CodeGen/deploy/xeon.md
@@ -10,7 +10,7 @@ This solution is designed to demonstrate the use of the `Qwen2.5-Coder-7B-Instru
 
 ## Prerequisites
 
-To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be forwarded when using SSH to log in to the host machine:
+To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be forwarded when using SSH to login to the host machine:
 - 7778: CodeGen megaservice port
 
 This port is used for `BACKEND_SERVICE_ENDPOINT` defined in the `set_env.sh` for this example inside the `docker compose` folder. Specifically, for CodeGen, append the following to the ssh command: 
@@ -217,7 +217,7 @@ To access the frontend, open the following URL in a web browser: http://${host_i
       - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to the desired port
 ```
 
-After making this change, rebuild and restart the containers for the change to take effect. 
+After making this change, restart the containers for the change to take effect. 
 
 ## Stop the Services
 

From 46a7a8f1c50000be1a92d12101eea7e672138844 Mon Sep 17 00:00:00 2001
From: alexsin368 <alex.sin@intel.com>
Date: Fri, 18 Apr 2025 13:37:38 -0700
Subject: [PATCH 12/12] minor updates

Signed-off-by: alexsin368 <alex.sin@intel.com>
---
 tutorial/CodeGen/deploy/gaudi.md | 4 ++--
 tutorial/CodeGen/deploy/xeon.md  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tutorial/CodeGen/deploy/gaudi.md b/tutorial/CodeGen/deploy/gaudi.md
index cfd75abf..1b2c928a 100644
--- a/tutorial/CodeGen/deploy/gaudi.md
+++ b/tutorial/CodeGen/deploy/gaudi.md
@@ -22,12 +22,12 @@ Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project
 ```bash
 export WORKSPACE=<Path>
 cd $WORKSPACE
-git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples
+git clone https://github.com/opea-project/GenAIExamples.git
 ```
 
 **Optional** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used.
 ```bash
-export RELEASE_VERSION=<Release_Version> # Set desired release version - number only
+export RELEASE_VERSION=<Release_Version> #  Set desired release version - number only
 cd GenAIExamples
 git checkout tags/v${RELEASE_VERSION}
 cd ..
diff --git a/tutorial/CodeGen/deploy/xeon.md b/tutorial/CodeGen/deploy/xeon.md
index f72f8459..6a01122f 100644
--- a/tutorial/CodeGen/deploy/xeon.md
+++ b/tutorial/CodeGen/deploy/xeon.md
@@ -22,12 +22,12 @@ Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project
 ```bash
 export WORKSPACE=<Path>
 cd $WORKSPACE
-git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples
+git clone https://github.com/opea-project/GenAIExamples.git
 ```
 
 **Optional** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used.
 ```bash
-export RELEASE_VERSION=<Release_Version> # Set desired release version - number only
+export RELEASE_VERSION=<Release_Version> #  Set desired release version - number only
 cd GenAIExamples
 git checkout tags/v${RELEASE_VERSION}
 cd ..