diff --git a/community/rfcs/24-10-20-OPEA-001-Haystack-Integration.md b/community/rfcs/24-10-20-OPEA-001-Haystack-Integration.md new file mode 100644 index 00000000..18b0a829 --- /dev/null +++ b/community/rfcs/24-10-20-OPEA-001-Haystack-Integration.md @@ -0,0 +1,55 @@ +# 24-10-20-OPEA-001-Haystack-Integration + +## Author + +[gadmarkovits](https://github.com/gadmarkovits) + +## Status + +Under Review + +## Objective + +Create a Haystack integration for OPEA that will enable the use of OPEA components within a Haystack pipeline. + +## Motivation + +Haystack is a production-ready open source AI framework that is used by many AI practitioners. It has over 70 integrations with various GenAI components such as document stores, model providers and evaluation frameworks from companies such as Amazon, Microsoft, Nvidia and more. Creating an integration for OPEA will allow Haystack customers to use OPEA components in their pipelines. This RFC is used to present a high-level overview of the Haystack integration. + +## Design Proposal + +The idea is to create thin wrappers for OPEA components that will enable communicating with them using the existing REST API. The wrappers will match Haystack's API so that they could be used within Haystack pipelines. This will allow developers to seamlessly use OPEA components alongside other Haystack components. + +The integration will be implemented as a Python package (similar to other Haystack integrations). The source code will be hosted in OPEA's GenAIComps repo under a new directory called Integrations. The package itself will be uploaded to [PyPi](https://pypi.org/) to allow for easy installation. + +Following a discussion with Haystack's technical team, it was agreed that a ChatQnA example, using this OPEA integration, would be a good way to showcase its capabilities. To support this, several component wrappers need to be implemented in the first version of the integration (other wrappers will be added gradually): + +1. OPEA Document Embedder + + This component will receive a Haystack Document and embed it using an OPEA embedding microservice. + +2. OPEA Text Embedder + + This component will receive text input and embed it using an OPEA embedding microservice. + +3. OPEA Generator + + This component will receive a text prompt and generate a reponse using an OPEA LLM microservice. + +4. OPEA Retriever + + This component will receive an embedding and retrieve documents with similar emebddings using an OPEA retrieval microservice. + +## Alternatives Considered + +n/a + +## Compatibility + +n/a + +## Miscs + +Once implemented, the Haystack team list the OPEA integration on their [integrations page](https://haystack.deepset.ai/integrations) which will allow for easier discovery. Haystack, in collaboration with Intel, will also publish a technical blog post showcasing a ChatQnA example using this integration (similar to this [NVidia NIM post](https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide)). + + diff --git a/community/rfcs/25-01-10-OPEA-Benchmark.md b/community/rfcs/25-01-10-OPEA-Benchmark.md new file mode 100644 index 00000000..320f4049 --- /dev/null +++ b/community/rfcs/25-01-10-OPEA-Benchmark.md @@ -0,0 +1,139 @@ +# Purpose + +This RFC is used to describe the behavior of unified benchmark script for GenAIExamples user. + +In v1.1, those bechmark scripts are per examples. It causes many duplicated codes and bad user experience. + +That is why we have motivation to improve such tool to have an unified entry for perf benchmark. + +## Original benchmark script layout + +``` +GenAIExamples/ +├── ChatQnA/ +│ ├── benchmark/ +│ │ ├── benchmark.sh # each example has its own script +│ │ └── deploy.py +│ ├── kubernetes/ +│ │ ├── charts.yaml +│ │ └── ... +│ ├── docker-compose/ +│ │ └── compose.yaml +│ └── chatqna.py +└── ... +``` + +## Proposed benchmark script layout + +``` +GenAIExamples/ +├── deploy_and_benchmark.py # main entry of GenAIExamples +├── ChatQnA/ +│ ├── chatqna.yaml # default deploy and benchmark config for deploy_and_benchmark.py +│ ├── kubernetes/ +│ │ ├── charts.yaml +│ │ └── ... +│ |── docker-compose/ +│ | └── compose.yaml +| └── chatqna.py +└── ... +``` + + +# Design + +The pesudo code of deploy_and_benchmark.py is listed at below for your reference. + +``` +# deploy_and_benchmark.py +# below is the pesudo code to demostrate its behavior +# +# def main(yaml_file): +# # extract all deployment combinations from chatqna.yaml deploy section +# deploy_traverse_list = extract_deploy_cfg(yaml_file) +# # for example, deploy_traverse_list = [{'node': 2, 'device': gaudi, 'cards_per_node': 8, ...}, +# {'node': 4, 'device': gaudi, 'cards_per_node': 8, ...}, +# ...] +# +# benchmark_traverse_list = extract_benchmark_cfg(yaml_file) +# # for example, benchmark_traverse_list = [{'concurrency': 128, , 'totoal_query_num': 4096, ...}, +# {'concurrency': 128, , 'totoal_query_num': 4096, ...}, +# ...] +# for deploy_cfg in deploy_traverse_list: +# start_k8s_service(deploy_cfg) +# for benchmark_cfg in benchmark_traverse_list: +# if service_ready: +# ingest_dataset(benchmark_cfg.dataset) +# send_http_request(benchmark_cfg) # will call stresscli.py in GenAIEval +``` + +Taking chatqna as an example, the configurable fields are listed at below + +``` +# chatqna.yaml +# +# usage: +# 1) deploy_and_benchmark.py --workload chatqna [overrided parameters] +# 2) or deploy_and_benchmark.py ./chatqna/benchmark/chatqna.yaml [overrided parameters] +# +# for example, deploy_and_benchmark.sh ./chatqna/benchmark/chatqna.yaml --node=2 +# +deploy: + # hardware related config + device: [xeon, gaudi, ...] # AMD and other h/ws could be extended into here + node: [1, 2, 4] + cards_per_node: [4, 8] + + # components related config, by default is for OOB, if overrided, then it is for tuned version + embedding: + model_id: bge_large_v1.5 + instance_num: [2, 4, 8] + cores_per_instance: 4 + memory_capacity: 20 # unit: G + retrieval: + instance_num: [2, 4, 8] + cores_per_instance: 4 + memory_capacity: 20 # unit: G + rerank: + enable: True + model_id: bge_rerank_v1.5 + instance_num: 1 + cards_per_instance: 1 # if cpu is specified, this field is ignored and will check cores_per_instance field + llm: + model_id: llama2-7b + instance_num: 7 + cards_per_instance: 1 # if cpu is specified, this field is ignored and will check cores_per_instance field + # serving related config, dynamic batching + max_batch_size: [1, 2, 8, 16, 32] # the query number to construct a single batch in serving + max_latency: 20 # time to wait before combining incoming requests into a batch, unit milliseconds + +benchmark: + # http request behavior related fields + concurrency: [1, 2, 4] + totoal_query_num: [2048, 4096] + duration: [5, 10] # unit minutes + query_num_per_concurrency: [4, 8, 16] + possion: True + possion_arrival_rate: 1.0 + warmup_iterations: 10 + seed: 1024 + + # dataset relted fields + dataset: [dummy_english, dummy_chinese, pub_med100, ...] # predefined keywords for supported dataset + user_query: [dummy_english_qlist, dummy_chinese_qlist, pub_med100_qlist, ...] + query_token_size: 128 # if specified, means fixed query token size will be sent out + data_ratio: [10%, 20%, ..., 100%] # optional, ratio from query dataset + + #advance settings in each component which will impact perf. + data_prep: # not target this time + chunk_size: [1024] + chunk_overlap: [1000] + retriver: # not target this time + algo: IVF + fetch_k: 2 + k: 1 + rerank: + top_n: 2 + llm: + max_token_size: 1024 # specify the output token size +``` diff --git a/community/rfcs/25-03-14-GenAIExample-001-CodeTrans-with-Agents.md b/community/rfcs/25-03-14-GenAIExample-001-CodeTrans-with-Agents.md new file mode 100644 index 00000000..dbabe1a7 --- /dev/null +++ b/community/rfcs/25-03-14-GenAIExample-001-CodeTrans-with-Agents.md @@ -0,0 +1,362 @@ +# 25-03-14-GenAIExamples-001-CodeTrans-with-Agents + +## Author(s) + +[Han, Letong](https://github.com/letonghan) + +## Objective + +This RFC proposes the integration of two Agent mechanisms into the CodeTrans Example to enhance the reliability, user experience, and code quality. The goal is to minimize the propagation of erroneous code and improve the feasibility of automated code translation. + +- Pre-LLM Agent: Validates the correctness of the input code before it is processed by the LLM. If errors are detected, the agent attempts to automatically fix them to ensure the code is executable. If the correction is successful, the modified code proceeds to the LLM. +- Post-LLM Agent: Does lint check and executes the translated code after it has been generated by the LLM. If the execution fails, the agent captures the error and sends it back to the LLM for re-generation. + +Moreover, this design introduces a user-configurable **three-step validation pipeline**, allowing users to enable or disable each stage independently via the frontend UI. + +* Step 1: **Auto-fix** – Automatically fixe code with syntax errors in agent. +* Step 2: **Lint Check** – Run language-specific lint check to catch style or semantic issues. +* Step 3: **Execution** – Securely run code in a sandbox environment to validate. (Plan to support only `Python` for now) + +The Auto-fix step happens in agent service, and the lint check and execution steps are executed by external tools. + +By introducing these agents, the system ensures that only valid code is passed to the LLM and that generated code is verified before reaching the user, thereby improving the overall efficiency and accuracy of the translation process. + +## Motivation + +The current CodeTrans flow has three major issues: + +1. **User input may contain syntax or logic errors.** Passing faulty code directly to the LLM can result in incorrect or unusable translations. +2. **LLM-generated code isn’t always correct.** Without an automated validation step, users have to manually review and debug the output. +3. **No feedback loop exists.** The LLM doesn't adapt based on execution results, leading to repeated errors. + +By introducing Agent mechanisms, we can improve the process in three key ways: + +1. **Reduce error propagation**: Ensure that only valid code reaches the LLM, minimizing incorrect translations. +2. **Enhance user experience**: Detect input issues early, providing clear feedback to avoid unnecessary debugging. +3. **Improve code quality**: Automatically verify LLM-generated code and trigger re-generation when needed, increasing overall reliability. + +## Use-Cases + +### Detecting Errors in Input Code Before Translation + +Scenario: + +A developer wants to convert a Java script to Python but unknowingly provides code with syntax errors. If the faulty code is passed directly to the LLM, it might generate an incorrect or non-functional Python version. + +How the CodeTrans Helps: + +- User selects `Lint Check` in the web UI. +- Pre-LLM Agent does the lint check for the provided Java code. +- If the code has style or semantic issues, the agent will attempt to automatically fix them. +- The developer can review and confirm the fixes or manually adjust the code before resubmitting. + +### Validating Generated Code for Accuracy + +Scenario: + +A developer uses the CodeTrans example to translate Java code into Python. The LLM generates a Python version, but there's no guarantee that it runs correctly. Without validation, the developer would have to manually check for errors, which is time-consuming. + +How the CodeTrans Helps: + +- User selects both `Lint Check` and `Code Execution` in the web UI. +- Post-LLM Agent does the lint check for the translation Python code. + - Agent will automatically fix any style/semantic issues. +- Post-LLM Agent executes the translated Python code: + - ✅ If the code runs successfully, the system returns the output to the user. + - ❌ If the code fails, the agent captures the error details and sends them back to the LLM. +- The LLM then retries code generation, using the error context to produce a corrected version. + +This automated validation ensures that developers receive functional translations without having to manually test and debug every output. + +### Preventing Infinite Regeneration Loops + +Scenario: + +In some cases, the LLM may repeatedly generate faulty code, leading to an endless loop of failed executions and retries. Without a safeguard, this could waste computation resources and frustrate users. + +How the CodeTrans Helps: + +- Both Pre- and Post-LLM Agents tracks retry attempts. +- If the LLM fails to produce a correct version after configurable number of attempts, the system stops further retries. +- Instead of another faulty translation, the user receives: + - ❌ "Code generation failed after multiple attempts. Here are possible reasons and debugging suggestions." +- The system provides relevant error logs and hints, helping the developer troubleshoot the issue efficiently. + +This prevents the LLM from getting stuck in an infinite loop and improves user control over the process. + +These use cases demonstrate how integrating Agents into the CodeTrans example improves input validation, output verification, and error handling. By ensuring only valid code reaches the LLM and automatically validating generated code, the system reduces errors, minimizes manual debugging, and improves translation accuracy. Retry limits and debugging feedback prevent infinite loops, making the process more reliable, efficient, and user-friendly. + +## Design Proposal + +### Architecture Diagram + +```mermaid +graph LR + %% Colors %% + classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5 + classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5 + classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5 + subgraph User Interface + %% direction TD + Frontend[(Frontend Server)]:::orange + UIQ[User Input Query]:::orchid + UIQ --> Frontend + end + + Megaservice[(CodeTrans Megaservice)]:::orange + + subgraph CodeTrans Server + Agent1([Pre-LLM Agent]):::blue + LLM([LLM MicroService]):::blue + Agent2([Post-LLM Agent]):::blue + Agent1 -->|Refactor input Code| Agent1 + Agent1 -->|Verified Code| LLM + LLM-->|Generated Code| Agent2 + Agent2 -->|Re-Generate Request| LLM + end + + LintTool([Lint Tool]):::blue + CodeExecutionTool([Sandbox Execution Tool]):::blue + Output[Translated Code]:::orchid + + Frontend -->|Send Request| Megaservice + Megaservice -->|Send Code| Agent1 + Agent1 --> |Lint check| LintTool + Agent1 -->|Validated input code| CodeExecutionTool + Agent2 --> |Lint check| LintTool + Agent2 -->|Validate genereated code| CodeExecutionTool + Agent2 -->|Output validated code| Output +``` + +### Components and Functionality + +#### User Interface + +UI Server: + +- Handles user input (code, source language, target language) +- Sends requests to the CodeTrans megaservice + +UI Components: + +- Lint Check Button: Select to do lint check for input/output codes. +- Code Execution Button: Select to execute code for functionality check. (Support Python only) +- Input/output case: if `Code Execution Button` is selected, use will need to provide a set of input/output for this piece of code. +- Code Box – Displays the user-provided and the LLM-generated code. +- Code Translation Result – Shows the translated code, the lint check and execution result if have. + +#### Backend Servers + +CodeTrans Megaservice: + +* Manages the scheduling of Agents, LLM, and user input/output. + +Pre-LLM Agent: + +- Validates code correctness, structures input/output, executes the code, and evaluates the result. +- Do lint check statically. + - If semantic errors are detected, LLM will fix it according to lint check report. +- Runs the user-provided code to check for syntax or logical errors. +- If errors are detected, the agent attempts to automatically fix them (within configurable number of attempts). + - If successfully corrected, the modified code proceeds to the LLM. + - If the errors cannot be resolved, the agent returns an error message, prompting the user to review and manually fix the code before proceeding. + +LLM Microservice: + +- Uses a large language model (LLM) to translate the input code into the target language. + +Post-LLM Agent: + +- Check code statically, executes the LLM-generated code, and verifies its correctness. + - If execution is successful, the translated code is returned to the user. + - If execution fails, the error details are sent back to the LLM for regeneration (within configurable number of attempts). + +Lint Check Tool: + +* Proceed lint check for a snippet of code, support different coding language like `pylint`, `eslint`, `cpplint` and so on. +* Since Lint is static check, it does not require a separate execution environment, so it can be called and executed directly in a python script. + +Code Execution Tool: + +- Provides a secure execution environment (e.g., Docker/Sandbox) to safely run code and prevent malicious execution risks. +- For reasons of complexity of implementation, only `Python` execution tool will be supported for now. + +#### Lint Check Tool + +Here's a table of lint tools for different coding languages: + +| Coding Language | Lint Tool | Introduction | Reference | +| --------------- | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- | +| Python | Pylint | A tool that checks for errors in Python code, tries to enforce a coding standard and looks for bad code smells. | [link](https://www.pylint.org/) | +| JavaScript | ESLint | A tool for identifying and reporting on patterns found in ECMAScript/JavaScript code, with the goal of making code more consistent and avoiding bugs. | [link](https://eslint.org/docs/latest/use/getting-started) | +| Java | Checkstyle | A development tool to help programmers write Java code that adheres to a coding standard. | [link](https://checkstyle.sourceforge.io/index.html) | +| C++ | cpplint | A command-line tool to check C/C++ files for style issues according to[Google's C++ style guide](http://google.github.io/styleguide/cppguide.html) | [link](https://github.com/cpplint/cpplint) | +| Go | vet | Examines Go source code and reports suspicious constructs, such as Printf calls whose arguments do not align with the format string. | [link](https://pkg.go.dev/cmd/vet) | +| Bash/Shell | ShellCheck | Point out and clarify typical beginner's syntax issues that cause a shell to give cryptic error messages. | [link](https://github.com/koalaman/shellcheck/#readme) | + +To use these tools to do static checks for different languages, we could save the target code into a temporary file, and execute the lint check command in `bash`. + +This is an example script which support all of these languages. + +```bash +#!/bin/bash +# usage: ./lint_tool.sh + +LANGUAGE="$1" +SOURCE_FILE="$2" +REPORT_FILE="lint_report_${LANGUAGE}.txt" + +# prepare file paths for Java checkstyle +CHECKSTYLE_JAR="./checkstyle.jar" +CHECKSTYLE_CONFIG="./google_checks.xml" + +if [[ ! -f "$SOURCE_FILE" ]]; then + echo "Source file not found: $SOURCE_FILE" + exit 1 +fi + +case "$LANGUAGE" in + python) + echo "Running pylint..." + pylint "$SOURCE_FILE" > "$REPORT_FILE" 2>&1 + ;; + + javascript) + echo "Running eslint..." + eslint "$SOURCE_FILE" > "$REPORT_FILE" 2>&1 + ;; + + java) + echo "Running checkstyle..." + if [[ ! -f "$CHECKSTYLE_JAR" ]]; then + echo "Missing checkstyle.jar. Please download it first." + exit 1 + fi + java -jar "$CHECKSTYLE_JAR" -c "$CHECKSTYLE_CONFIG" "$SOURCE_FILE" > "$REPORT_FILE" 2>&1 + ;; + + go) + echo "Running go vet..." + go vet "$SOURCE_FILE" > "$REPORT_FILE" 2>&1 + ;; + + cpp) + echo "Running cpplint..." + cpplint "$SOURCE_FILE" > "$REPORT_FILE" 2>&1 + ;; + + *) + echo "Unsupported language: $LANGUAGE" + echo "Supported language: python, javascript, java, go, cpp" + exit 1 + ;; +esac + +echo "Lint check completed. Report saved to $REPORT_FILE" + +``` + +#### Code Execution Tool + +Currently we only design to support code execution tool for `Python`. + +* Prevent code injection + + * Use Python Abstract Syntax Tree (AST) to detect and block dangerous operations such as `import os`, `exec`, and `__import__`. + + ```python + import ast + + code = "xxx" + tree = ast.parse(code) + # analyze each node in ast tree + for node in ast.walk(tree): + # Do import checks + ``` +* Install dependencies automatically + + * Code execution tool need to support extracting and installing dependencies from source code automatically. + * Use AST here to extract the `import`/`from xxx import xxx` libraries. + + ```python + import ast + + def extract_imports(code_str): + tree = ast.parse(code_str) + imports = set() + for node in ast.walk(tree): + if isinstance(node, ast.Import): + for alias in node.names: + imports.add(alias.name.split('.')[0]) + elif isinstance(node, ast.ImportFrom) and node.module: + imports.add(node.module.split('.')[0]) + return list(imports) + ``` +* Sandbox execution + + * To ensure that the code runs in a fully isolated environment, the tool needs to use container-based sandbox like `Docker`. + * In the context of resource constraints, the resource limits (including memory, CPU, process numbers) and security policies are needed. + * Since the dependencies need to be installed first, the network authority will be processed in two stages: + 1. preperation: install the dependencies into a mounted path + ```bash + # remove all the capabilities of the container except for needed ones + docker run --rm --cap-drop ALL --cap-add ${what_is_needed} -v $(pwd)/code:/code \ + sandbox-python:3.10 \ + bash -c "pip install -r /code/requirements.txt -t /code/.deps" + ``` + 2. execution: run codes using mounted dependencies + ```bash + docker run --rm --cap-drop ALL --cap-add ${what_is_needed} -v $(pwd):/code --network=none \ + sandbox-python:3.10 \ + python3 -I -E -S /code/user_code.py + ``` + * After each time of execution, the mounted folder (installed dependencies) will be cleaned up. + +## Expected Benefits + +| Feature | Benefits | +| ---------------------------- | ----------------------------------------------------- | +| Input Code Validation | Catches errors early, preventing faulty translations. | +| Output Code Validation | Ensures reliable and accurate code conversion. | +| Automated Debug Feedback | Reduces trial-and-error, improving LLM accuracy. | +| Lint Static Code Check | Catch bugs early and enforce consistent code quality. | +| Secure Execution Environment | Protects the system from malicious code. | +| Error Classification | Identifies syntax, logic errors for better debugging. | + +## Risks and Mitigations / User Workarounds + +* Node / cluster take over by execution of malicious code + * Mitigation: automated vetting of the executed code + its strict sandboxing +* Code execution exhausting node resources + * Mitigation: strict resource usage limits +* Application response taking too long due to dependency install / code execution + * Mitigation: dependency caching + enforced execution timeouts + error response to user + * Workaround: user disables linting / code execution +* Users can affect each others' results + * Mitigation: (dependency) caching is per-user session +* Code execution failing translation due to limits / sandboxing / dependency being offline + * Workaround: user disables code execution / linting + +## Implementation Plan + +### Phase 1: Develop Code Execution Tool, target v1.3 + +- Research on the Code Execution Tool. + +### Phase 2: Core Feature Development, target v1.4 + +- Develop the Lint Check Tool bash script. +- Develop the Code Execution Tool in Agent to provide a secure execution environment. +- Implement the Pre-LLM Agent for input code validation. +- Improve UI integration by providing a code execution interface and displaying execution results. + +### Phase 3: Agent Integration, target v1.4 + +- Integrate the LLM MicroService with Agent. +- Optimize the CodeTrans megaservice to automate the scheduling of Agents. +- Implement the Post-LLM Agent for output validation and LLM feedback handling. + +### Phase 4: Optimization & Expansion, target v1.4 + +- Set a maximum retry limit to prevent infinite LLM regeneration loops. +- Provide debugging suggestions to enhance user experience. diff --git a/getting-started/README.md b/getting-started/README.md index 0ffe814d..4776d3fe 100644 --- a/getting-started/README.md +++ b/getting-started/README.md @@ -1,9 +1,9 @@ # Getting Started with OPEA -In this document, we provide a tailored guide to deploying the ChatQnA application in OPEA GenAI Examples across multiple cloud platforms, including Amazon Web Services (AWS), Google Cloud Platform (GCP), IBM Cloud, Microsoft Azure and Oracle Cloud Infrastructure, enabling you to choose the best fit for your specific needs and requirements. For additional deployment targets, see the [ChatQnA](/tutorial/ChatQnA/ChatQnA_Guide.rst). +This is a guide to deploy the ChatQnA application from OPEA GenAIExamples across multiple cloud platforms, including Amazon Web Services (AWS), Google Cloud Platform (GCP), IBM Cloud, Microsoft Azure Oracle Cloud Infrastructure, and Intel® Tiber™ AI Cloud, enabling developers for specific needs and requirements. For additional deployment targets, see the [ChatQnA tutorial](/tutorial/ChatQnA/ChatQnA_Guide.rst). ## Understanding OPEA's Core Components -Before moving forward, it's important to familiarize yourself with two key elements of OPEA: GenAIComps and GenAIExamples. +Before moving forward, it's important to get familiar with two key elements of OPEA: GenAIComps and GenAIExamples. - GenAIComps is a collection of microservice components that form a service-based toolkit. This includes a variety of services such as llm (large language models), embedding, and reranking, among others. - GenAIExamples provides practical and deployable solutions to help users implement these services effectively. Examples include ChatQnA and DocSum, which leverage the microservices for specific applications. @@ -17,19 +17,19 @@ Before moving forward, it's important to familiarize yourself with two key eleme :::{tab-item} Amazon Web Services :sync: AWS -**Step 1: Create Your Virtual Server** +**Step 1: Create Virtual Server** 1. Open the [AWS console](https://console.aws.amazon.com/console/home) and search for **EC2** in the search bar. 2. Select **Launch instance** to start creating a virtual server. -3. Under **Name and tags**, name your virtual server in the **Name** field. +3. Under **Name and tags**, name the virtual server in the **Name** field. 4. Under **Quick Start**, choose Ubuntu (`ami-id : ami-04dd23e62ed049936`) as the base OS. -5. In **Instance type**, select an instance for your Intel processor. +5. In **Instance type**, select an instance for the desired Intel processor. - >**Note**: We recommend `m7i.4xlarge` or larger instance for an Intel® 4th Gen Xeon© Scalable Processor. For more information on virtual servers on AWS, visit the [AWS and Intel page](https://aws.amazon.com/intel/). + >**Note**: It is recommended to use the `m7i.4xlarge` or larger instance for an Intel® 4th Gen Xeon© Scalable Processor. For more information on virtual servers on AWS, visit the [AWS and Intel page](https://aws.amazon.com/intel/). 6. Create a new key pair for SSH access by naming it, or select an existing key pair from the dropdown list. @@ -40,11 +40,11 @@ Before moving forward, it's important to familiarize yourself with two key eleme 8. In **Storage**, set the size to 100 GiB. -9. Select **Launch instance** to launch your virtual server. A **Success** banner confirms the launch. +9. Select **Launch instance** to launch the virtual server. A **Success** banner confirms the launch. -**Step 2: Connect and Configure Your Virtual Server** +**Step 2: Connect and Configure the Virtual Server** -1. Select **Connect**, and connect using your preferred connection method. +1. Select **Connect**, and connect using the preferred connection method. 2. Search for **Security Groups** in the search bar and select the security group used when creating the instance. @@ -58,7 +58,7 @@ Before moving forward, it's important to familiarize yourself with two key eleme >**Note**: To learn more, see [editing inbound or outbound rules](https://docs.aws.amazon.com/finspace/latest/userguide/step5-config-inbound-rule.html) from AWS documentation. -5. Select **Save rules** to commit your changes. +5. Select **Save rules** to commit the changes. ::: :::{tab-item} Google Cloud Platform @@ -72,7 +72,7 @@ Before moving forward, it's important to familiarize yourself with two key eleme 4. Select an Instance type that is based on Intel hardware. -> **Note:**   We recommend selecting a `c4-standard-32` or larger instance with an Intel(R) 4th Gen Xeon(C) Scalable Processor, and the minimum supported c3 instance type is c3-standard-8 with 32GB memory. For more information, visit [virtual servers on GCP](https://cloud.google.com/intel). +> **Note:** It is recommended to select a `c4-standard-32` or larger instance with an Intel(R) 4th Gen Xeon(C) Scalable Processor, and the minimum supported c3 instance type is c3-standard-8 with 32GB memory. For more information, visit [virtual servers on GCP](https://cloud.google.com/intel). 5. Under Firewall settings select “Allow HTTP traffic” to access ChatQnA UI web portal. @@ -92,7 +92,7 @@ Before moving forward, it's important to familiarize yourself with two key eleme 4. Select a virtual server. -> **Note:** We recommend selecting a 3-series instance with an Intel(R) 4th Gen Xeon(C) Scalable Processor, such as `bx3d-16x80` or above. For more information on virtual servers on IBM cloud visit [Intel® solutions on IBM Cloud®](https://www.ibm.com/cloud/intel). +> **Note:** It is recommended to select a 3-series instance with an Intel(R) 4th Gen Xeon(C) Scalable Processor, such as `bx3d-16x80` or above. For more information on virtual servers on IBM cloud visit [Intel® solutions on IBM Cloud®](https://www.ibm.com/cloud/intel). 5. Add an SSH key to the instance, if necessary, create one first. @@ -114,7 +114,7 @@ Before moving forward, it's important to familiarize yourself with two key eleme 1. Navigate to [Microsoft Azure](portal.azure.com) – Select the "Skip" button on the bottom right to land on the service offerings page. Search for "Virtual Machines" in the search bar and select it. Click the "Create" button and select "Azure Virtual Machine". -2. Select an existing "Resource group" from the drop down or click "Create" for a new Resource group and give it a name. If you have issues refer to [cannot create resource groups](https://learn.microsoft.com/en-us/answers/questions/1520133/cannot-create-resource-groups). +2. Select an existing "Resource group" from the drop down or click "Create" for a new Resource group and give it a name. If there are issues refer to [cannot create resource groups](https://learn.microsoft.com/en-us/answers/questions/1520133/cannot-create-resource-groups). 3. Provide a name to the VM and select the base OS as `Ubuntu 24.04 LTS` @@ -122,9 +122,9 @@ Before moving forward, it's important to familiarize yourself with two key eleme 5. Select an Instance type that is based on Intel hardware. ->**Note**: We recommend selecting a `Standard_D16ds_v5` instance or larger with an Intel(R) 3rd/4th Gen Xeon(C) Scalable Processor. You can find this family of instances in the (US) West US Region. Visit for more information [virtual machines on Azure](https://azure.microsoft.com/en-us/partners/directory/intel-corporation). +>**Note**: It is recommended to select a `Standard_D16ds_v5` instance or larger with an Intel(R) 3rd/4th Gen Xeon(C) Scalable Processor. This family of instances can be found in the (US) West US Region. Visit for more information [virtual machines on Azure](https://azure.microsoft.com/en-us/partners/directory/intel-corporation). -6. Select Password as Authentication type and create username and password for your instance. +6. Select Password as Authentication type and create username and password for the instance. 7. Choose the Allow selected ports in Inbound port rule section and select HTTP. @@ -134,7 +134,7 @@ Before moving forward, it's important to familiarize yourself with two key eleme 10. Click Go to resource -> Connect -> Connect -> SSH using Azure CLI. Accept the terms and then select "Configure + connect" ->**Note**: If you have issues connecting to the instance with SSH, you could instead access the same via the Bastion host with your username and password. +>**Note**: If there are issues connecting to the instance with SSH, trying connecting via the Bastion host with the username and password. ::: :::{tab-item} Oracle Cloud Infrastructure @@ -170,38 +170,42 @@ Before moving forward, it's important to familiarize yourself with two key eleme :::{tab-item} Intel® Tiber™ AI Cloud :sync: ITAC -1. Sign up to create an account or log in to [Intel® Tiber™ AI Cloud](https://ai.cloud.intel.com/). Check if you have sufficient cloud credits and purchase or redeem a coupon if needed. Go to the "Compute" tab on the left and click on "Instances". In the center of the screen, click on the "Launch instance" button. +1. Sign up to create an account or log in to [Intel® Tiber™ AI Cloud](https://ai.cloud.intel.com/). Check if there are sufficient cloud credits and purchase or redeem a coupon if needed. Go to the "Compute" tab on the left and click on "Instances". In the center of the screen, click on the "Launch instance" button. -2. Select your instance configuration, instance type, and machine image which will be Ubuntu. +2. Select the instance configuration, instance type, and machine image which will be Ubuntu. ->**Note**: It is recommended to use the `VM-SPR-LRG` powered by 4th Generation Intel® Xeon® Scalable processors with 64GB of memory and 64GB of disk or more if you wish to use a CPU to run an 8B-parameter model. Click [here](https://console.cloud.intel.com/compute/reserve?backTo=catalog) to request the recommended VM instance. You can request a single VM to do a single node docker deploy or obtain a kubernetes cluster of one or more nodes. +>**Note**: It is recommended to use the `VM-SPR-LRG` powered by 4th Generation Intel® Xeon® Scalable processors with 64GB of memory and 64GB of disk or more to use a CPU to run an 8B-parameter model. Intel® Gaudi® AI Accelerators can also be used after requesting access. Click [here](https://console.cloud.intel.com/compute/reserve?backTo=catalog) to request the recommended VM instance. Users can request a single VM to do a single node docker deploy or obtain a kubernetes cluster of one or more nodes. -3. Fill out the rest of the form such as giving your instance a name and answering any additional quesitons. +3. Fill out the rest of the form such as giving the instance a name and answering any additional quesitons. -4. Add your public key for SSH. You can select a key you have previously uploaded or upload a key. The "Upload Key" button also provides instructions on how to create a new SSH key. +4. Add the public key for SSH. Select a previously uploaded key or upload a key. The "Upload Key" button also provides instructions on how to create a new SSH key. -5. Click "Launch instance" to start your machine. +5. Click "Launch instance" to start the machine. -6. Go back to the "Compute" tab and under "Instances", note down the private IP address of your new VM. +6. Go back to the "Compute" tab and under "Instances", note down the private IP address of the new VM. -7. If you wish to make the UI accessible to others, proceed to the next step to create a load balancer. Otherwise, skip to Step 10 which will explain how to connect to your VM with port forwarding. +7. If the UI needs to be accessible to others, proceed to the next step to create a load balancer. Otherwise, skip to Step 10 which will explain how to connect to the VM with port forwarding. -8. Create a load balancer. This can be found in Compute->Load Balancers. Click on "Launch Load Balancer". Ignore any messages about signing up for access and close any pop-up windows if any. Fill out the form with the following info: - - Name: **Name for your load balancer** - - Source IP: **The private IP address of your VM in Step 6** - - Listener Port: **80** - - Instance Port: **80** +8. Create a load balancer. This can be found in Compute->Load Balancers. Click on "Launch Load Balancer". Request for access if needed. Fill out the form with the following info: + - Name: **Name for load balancer** + - Source IP: **The private IP address of the VM in Step 6** + - Listener Port: **The NGINX port i.e. 80** + - Instance Port: **The NGINX port i.e. 80** - Monitor Type: **HTTP** - Mode: **Round Robin** - - Instances: **Select the name of the VM you created** + - Instances: **Select the name of the VM created** - >**Note**: The port used is 80 because this is the NGINX port for the GenAI Examples. +>**Note**: If the NGINX port changes for ChatQnA, set the Listener and Instance ports accordingly. - Click "Launch". +Click "Launch". -9. Go back to Compute->Load Balancers to see your new load balancer. Note down the virtual IP address. This is what you will use to access the UI of your GenAI Example on a web browser. +9. Go back to Compute->Load Balancers to see the new load balancer. Note down the virtual IP address. This is what will be used to access the UI of ChatQnA on a web browser. -10. Connect to your VM using ssh and port forward port 80 if needed (`ssh -i -J guest@ -L 80:localhost:80 ubuntu@ -J guest@ -L 80:localhost:80 ubuntu@ +export RELEASE_VERSION= git clone https://github.com/opea-project/GenAIExamples.git cd GenAIExamples git checkout tags/v${RELEASE_VERSION} @@ -226,28 +230,35 @@ git checkout tags/v${RELEASE_VERSION} Set the required environment variables: ```bash +# Use localhost export host_ip="localhost" -export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" + +# Generate a token from HuggingFace and set it here +export HUGGINGFACEHUB_API_TOKEN="Huggingface_API_Token" + +# Example: NGINX_PORT=80 +export NGINX_PORT="NGINX_Port" ``` -Set up proxies if you are behind a firewall: +Set up proxies if the machine is behind a firewall: ```bash -export no_proxy=${your_no_proxy},$host_ip -export http_proxy=${your_http_proxy} -export https_proxy=${your_http_proxy} +export http_proxy="HTTP_Proxy" +export https_proxy="HTTPs_Proxy" +# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" +export no_proxy="No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service ``` -Set up other specific use-case environment variables in `set_env.sh` before running it. For example, this is where you can change the model(s) to run with. +Set up other specific use-case environment variables in `set_env.sh` before running it. For example, this is where model(s) can be changed. ```bash cd ChatQnA/docker_compose/intel/cpu/xeon/ source set_env.sh ``` -Now we can start the services: +Start the services: ```bash docker compose -f compose.yaml up -d ``` ->**Note**: It takes a few minutes for the services to start. Check the logs for the services to ensure that ChatQnA is running before proceeding further. +>**Note**: It takes a few minutes for the services to start. Check the logs for the services to ensure that ChatQnA is running before proceeding further. If there is an error related to a port already in use, either 1) modify the `compose.yaml` to use another port or 2) stop the service using that port before retrying the `docker compose` command. For example to check the logs for the `vllm-service`: @@ -262,11 +273,11 @@ INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:80 (Press CTRL+C to quit) ``` -Run `docker ps -a` as an additional check to verify that all the services are running as shown. Notice the version of the docker images matches the RELEASE_VERSION you specified. +Run `docker ps -a` as an additional check to verify that all the services are running as shown. Notice the version of the docker images matches the RELEASE_VERSION specified. ```bash -| CONTAINER ID | IMAGE | COMMAND | CREATED | STATUS | PORTS | NAMES | -|--------------|--------------------------------------------------------|------------------------|--------------|-------------|------------------------------------------------------------------------------------------|------------------------------| +| CONTAINER ID | IMAGE | COMMAND | CREATED | STATUS | PORTS | NAMES | +|--------------|--------------------------------------------------------|------------------------|------------|------------|------------------------------------------------------------------------------------------|------------------------------| | d992b34fda27 | opea/nginx:1.2 | "/docker-entrypoint.…" | 6 days ago | Up 6 days | 0.0.0.0:80->80/tcp, :::80->80/tcp | chatqna-xeon-nginx-server | | 2d297d595650 | opea/chatqna-ui:1.2 | "docker-entrypoint.s…" | 6 days ago | Up 6 days | 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp | chatqna-xeon-ui-server | | 0b9b2be1feef | opea/chatqna-without-rerank:1.2 | "python chatqna.py -…" | 6 days ago | Up 6 days | 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp | chatqna-xeon-backend-server | @@ -279,26 +290,23 @@ Run `docker ps -a` as an additional check to verify that all the services are ru ### Interact with ChatQnA -You can interact with ChatQnA via a browser interface: - -* To view the ChatQnA interface, open a browser and navigate to the UI by inserting your public facing IP address in the following: `http://{public_ip}:80’. +Interact with ChatQnA via a browser interface: ->**Note:** For users running on ITAC, open a browser to localhost:80 if you are using port forwarding OR the virtual IP address of your load balancer. +* To view the ChatQnA interface, open a browser and navigate to the UI by inserting the public facing IP address: `http://{public_ip}:80’. -We can go ahead and ask a sample question, say 'What is OPEA?'. +>**Note:** For users running on ITAC, open a browser to localhost:80 if using port forwarding OR the virtual IP address of the load balancer. -A snapshot of the interface looks as follows: +When asking a sample question such as 'What is OPEA?', a snapshot of the interface may look like: ![Chat Interface](assets/chat_ui_response.png) -Given that any information about OPEA was not in the training data for the model, we see the model hallucinating and coming up with a response. We can upload a document (PDF) with information and observe how the response changes. +Given that any information about OPEA was not in the training data for the model, the model hallucinates and comes up with an incorrect response. To address this, upload a document (PDF) with information and observe how the response changes. -> **Note:** this example leverages the OPEA document for its RAG based content. You can download the [OPEA document](assets/what_is_opea.pdf) and upload it using the UI. +> **Note:** this example leverages the OPEA document for its RAG based content. This [OPEA document](assets/what_is_opea.pdf) can be downloaded and uploaded using the UI. ![Chat Interface with RAG](assets/chat_ui_response_rag.png) -We observe that the response is relevant and is based on the PDF uploaded. See the [ChatQnA](/tutorial/ChatQnA/ChatQnA_Guide.rst) -to learn how you can customize the example with your own content. +Observe that the response is relevant and is based on the PDF uploaded. See the [ChatQnA](/tutorial/ChatQnA/ChatQnA_Guide.rst) to learn how to customize the example with other content. ## What’s Next @@ -307,11 +315,11 @@ to learn how you can customize the example with your own content. ### Get Involved -Have you ideas and skills to build out genAI components, microservices, and solutions? Would you like to be a part of this evolving technology in its early stages? Welcome! -* Register for our mailing list: +Calling all developers! If there is interest in building out GenAI components, microservices, and solutions to be a part of this evolving technology in its early stages: +* Register for the mailing list: * [Mailing List](https://lists.lfaidata.foundation/g/OPEA-announce) * [Technical Discussions](https://lists.lfaidata.foundation/g/OPEA-technical-discuss) -* Subscribe to the working group mailing lists that interest you +* Subscribe to the working group mailing lists * [End user](https://lists.lfaidata.foundation/g/OPEA-End-User) * [Evaluation](https://lists.lfaidata.foundation/g/OPEA-Evaluation) * [Community](https://lists.lfaidata.foundation/g/OPEA-Community) @@ -323,10 +331,9 @@ Have you ideas and skills to build out genAI components, microservices, and solu Current GenAI Examples - Simple chatbot that uses retrieval augmented generation (RAG) architecture. [ChatQnA](/tutorial/ChatQnA/ChatQnA_Guide.rst) - Code generation, from enabling non-programmers to generate code to improving productivity with code completion of complex applications. [CodeGen](https://opea-project.github.io/latest/GenAIExamples/CodeGen/README.html) -- Make your applications more flexible by porting to different languages. [CodeTrans](https://opea-project.github.io/latest/GenAIExamples/CodeTrans/README.html) +- Make applications more flexible by porting to different languages. [CodeTrans](https://opea-project.github.io/latest/GenAIExamples/CodeTrans/README.html) - Create summaries of news articles, research papers, technical documents, etc. to streamline content systems. [DocSum](https://opea-project.github.io/latest/GenAIExamples/DocSum/README.html) - Mimic human behavior by iteratively searching, selecting, and synthesizing information across large bodies of content. [SearchQnA](https://opea-project.github.io/latest/GenAIExamples/SearchQnA/README.html) -- Provide critical content to your customers by automatically generating Frequently Asked Questions (FAQ) resources. [FaqGen](https://opea-project.github.io/latest/GenAIExamples/FaqGen/README.html) -- Provide text descriptions from pictures, enabling your users to inquire directly about products, services, sites, etc. [VisualQnA](https://opea-project.github.io/latest/GenAIExamples/VisualQnA/README.html) +- Provide text descriptions from pictures, enable users to inquire directly about products, services, sites, etc. [VisualQnA](https://opea-project.github.io/latest/GenAIExamples/VisualQnA/README.html) - Reduce language barriers through customizable text translation systems. [Translation](https://opea-project.github.io/latest/GenAIExamples/Translation/README.html) diff --git a/guide/installation/install_docker.sh b/guide/installation/install_docker.sh index bfbe12e7..3a844277 100644 --- a/guide/installation/install_docker.sh +++ b/guide/installation/install_docker.sh @@ -28,8 +28,10 @@ sudo apt-get -y update # Install Docker packages sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -# add existing user +# add existing user to docker group and log in +sudo groupadd docker sudo usermod -aG docker $USER +newgrp docker # Optional: Verify that Docker is installed correctly sudo docker --version diff --git a/tutorial/OpenTelemetry/OpenTelemetry_OPEA_Guide.rst b/tutorial/OpenTelemetry/OpenTelemetry_OPEA_Guide.rst new file mode 100644 index 00000000..bf09ba12 --- /dev/null +++ b/tutorial/OpenTelemetry/OpenTelemetry_OPEA_Guide.rst @@ -0,0 +1,173 @@ +.. _OpenTelemetry_OPEA_Guide: + +OpenTelemetry on OPEA Guide +############################# + +Overview +******** +OpenTelemetry (also referred to as OTel) is an open source observability framework made up of a collection of tools, APIs, and SDKs. +OTel enables developers to instrument, generate, collect, and export telemetry data for analysis and to understand software performance and behavior. +The telemetry data can come in the form of traces, metrics, and logs. +OPEA integrates OpenTelemetry's metrics and tracing capabilities to enhance its telemetry support, providing users with valuable insights into system performance. + +How It Works +************ +OPEA Comps offers telemetry functionalities for metrics and tracing by integrating with tools such as Prometheus, Grafana, and Jaeger. Below is a brief introduction to the workflows of those tools: + +.. image:: assets/opea_telemetry.jpg + :width: 800 + :alt: Alternative text + +The majority of OPEA's micro and mega services are equipped to support OpenTelemetry metrics, which are exported in Prometheus format via the /metrics endpoint. +For further guidance, please refer to the section on `Telemetry Metrics `_. +Prometheus plays a crucial role in collecting metrics from OPEA service endpoints, while Grafana leverages Prometheus as a data source to visualize these metrics on pre-configured dashboards. + +OPEA also supports OpenTelemetry tracing, with several OPEA GenAIExamples instrumented to trace key functions such as microservice execution and LLM generations. +Additionally, HuggingFace's Text Embedding Inference and Text Generation Inference services are enabled for select OPEA GenAIExamples. +The Jaeger UI monitors trace events from OPEA microservices, TEI, and TGI. Once Jaeger endpoints are configured in OPEA microservices, TEI, and TGI, +trace data will automatically be reported and visualized in the Jaeger UI. + +Deployment +********** + +In the OpenTelemetry-enabled GenAIExamples, OpenTelemetry Metrics is activated by default, while OpenTelemetry Tracing is initially disabled. +Similarly, the Telemetry UI services, including Grafana, Prometheus, and Jaeger, are also disabled by default. +To enable OTel tracing along with Grafana, Prometheus, and Jaeger you need to include an additional telemetry Docker Compose YAML file. +For instance, adding compose.telemetry.yaml alongside compose.yaml will activate all telemetry features for the example. + + +.. code-block:: bash + + source ./set_env.sh + docker compose -f compose.yaml -f compose.telemetry.yaml up -d + + +Below are the GenAIExamples that include support for Grafana, Prometheus, and Jaeger services. + +.. toctree:: + :maxdepth: 1 + + ChatQnA + AgentQnA + +How to Monitor +**************** + +OpenTelemetry metrics and tracing currently can be visualized through one of three primary monitoring UI web pages. + +1. Prometheus ++++++++++++++++ + +The Prometheus UI provides insights into which services have active metrics endpoints. +By default, Prometheus operates on port 9090. +You can access the Prometheus UI web page using the following URL. + +.. code-block:: bash + + http://${host_ip}:9090/targets + +Services with accessible metrics endpoints will be marked as "up" in Prometheus. +If a service is marked as "down," Grafana Dashboards will be unable to display the associated metrics information. + +.. image:: assets/prometheus.png + :width: 800 + :alt: Alternative text + +2. Grafana ++++++++++++++++ + +The Grafana UI displays telemetry metrics through pre-defined dashboards, providing a clear visualization of data. +For OPEA examples, Grafana is configured by default to use Prometheus as its data source, eliminating the need for manual setup. +The Grafana UI web page can be accessed using the following URL. + +.. code-block:: bash + + http://${host_ip}:3000 + + +.. image:: assets/grafana_init.png + :width: 800 + :alt: Alternative text + + +To view the pre-defined dashboards, click on the "Dashboard" tab located on the left-hand side of the Grafana UI. +This will allow you to explore the available dashboards to visualize telemetry metrics. They also serve as examples that you can customize. + + +.. image:: assets/grafana_dashboard_init.png + :width: 800 + :alt: Alternative text + +Detailed explanations for understanding each dashboard are provided within the telemetry sections of the respective GenAIExamples. +These sections offer insights into how to interpret the data and utilize the dashboards effectively for monitoring and analysis. + +.. toctree:: + :maxdepth: 1 + + ChatQnA + AgentQnA + +3. Jaeger ++++++++++++++++ + +The Jaeger UI is instrumental in understanding function tracing for each request, providing visibility into the execution flow and timing of microservices. +OPEA traces the execution time for each microservice and monitors key functions within them. +By default, Jaeger operates on port 16686. +The Jaeger UI web page could be accessed using the following URL. + +.. code-block:: bash + + http://${host_ip}:16686 + +Traces will only appear in the Jaeger UI if the relevant functions have been executed. +Therefore, without running the example, the UI will not display any trace data. + +.. image:: assets/jaeger_ui_init.png + :width: 400 + :alt: Alternative text + +Once the example is run, refresh the Jaeger UI webpage, and the OPEA service should appear under the "Services" tab, +indicating that trace data is being captured and displayed. + +.. image:: assets/jaeger_ui_opea.png + :width: 400 + :alt: Alternative text + +Select "opea" as the service, then click the "Find Traces" button to view the trace data associated with the service's execution. + +.. image:: assets/jaeger_ui_opea_trace.png + :width: 400 + :alt: Alternative text + +All traces will be displayed on the UI. +The diagram in the upper right corner provides a visual representation of all requests along the timeline. Meanwhile, +the diagrams in the lower right corner illustrate all spans within each request, offering detailed insights into the execution flow and timing. + +.. image:: assets/jaeger_ui_opea_chatqna_1req.png + :width: 800 + :alt: Alternative text + +Detailed explanations for understanding each Jaeger diagrams are provided within the telemetry sections of the respective GenAIExamples. +These sections offer insights into how to interpret the data and utilize the dashboards effectively for monitoring and analysis. + +.. toctree:: + :maxdepth: 1 + + ChatQnA + AgentQnA + +Code Instrumentations for OPEA Tracing +**************************************** + +Enabling OPEA OpenTelemetry tracing for a function is straightforward. +First, import opea_telemetry, and then apply the Python decorator @opea_telemetry to the function you wish to trace. +Below is an example of how to trace your_func using OPEA tracing: + +.. code-block:: python + + from comps import opea_telemetry + + @opea_telemetry + async def your_func(): + pass + diff --git a/tutorial/OpenTelemetry/assets/Grafana_Node_Exporter.png b/tutorial/OpenTelemetry/assets/Grafana_Node_Exporter.png new file mode 100644 index 00000000..f7c055ef Binary files /dev/null and b/tutorial/OpenTelemetry/assets/Grafana_Node_Exporter.png differ diff --git a/tutorial/OpenTelemetry/assets/Grafana_chatqna_backend_server.png b/tutorial/OpenTelemetry/assets/Grafana_chatqna_backend_server.png new file mode 100644 index 00000000..4ffd9991 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/Grafana_chatqna_backend_server.png differ diff --git a/tutorial/OpenTelemetry/assets/Grafana_chatqna_backend_server_1.png b/tutorial/OpenTelemetry/assets/Grafana_chatqna_backend_server_1.png new file mode 100644 index 00000000..e71a4efc Binary files /dev/null and b/tutorial/OpenTelemetry/assets/Grafana_chatqna_backend_server_1.png differ diff --git a/tutorial/OpenTelemetry/assets/Grafana_chatqna_dataprep.png b/tutorial/OpenTelemetry/assets/Grafana_chatqna_dataprep.png new file mode 100644 index 00000000..8c11ef2d Binary files /dev/null and b/tutorial/OpenTelemetry/assets/Grafana_chatqna_dataprep.png differ diff --git a/tutorial/OpenTelemetry/assets/Grafana_chatqna_retriever.png b/tutorial/OpenTelemetry/assets/Grafana_chatqna_retriever.png new file mode 100644 index 00000000..391a5bab Binary files /dev/null and b/tutorial/OpenTelemetry/assets/Grafana_chatqna_retriever.png differ diff --git a/tutorial/OpenTelemetry/assets/Grafana_vLLM.png b/tutorial/OpenTelemetry/assets/Grafana_vLLM.png new file mode 100644 index 00000000..770b7d65 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/Grafana_vLLM.png differ diff --git a/tutorial/OpenTelemetry/assets/Grafana_vLLM_2.png b/tutorial/OpenTelemetry/assets/Grafana_vLLM_2.png new file mode 100644 index 00000000..6f7e8247 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/Grafana_vLLM_2.png differ diff --git a/tutorial/OpenTelemetry/assets/Jaeger_agent_rag.png b/tutorial/OpenTelemetry/assets/Jaeger_agent_rag.png new file mode 100644 index 00000000..e1dc8473 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/Jaeger_agent_rag.png differ diff --git a/tutorial/OpenTelemetry/assets/Jaeger_agent_sql.png b/tutorial/OpenTelemetry/assets/Jaeger_agent_sql.png new file mode 100644 index 00000000..9b3653a6 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/Jaeger_agent_sql.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_grafana_mega_list.png b/tutorial/OpenTelemetry/assets/agent_grafana_mega_list.png new file mode 100644 index 00000000..80484104 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_grafana_mega_list.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_grafana_node.png b/tutorial/OpenTelemetry/assets/agent_grafana_node.png new file mode 100644 index 00000000..5d1924eb Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_grafana_node.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_grafana_react.png b/tutorial/OpenTelemetry/assets/agent_grafana_react.png new file mode 100644 index 00000000..1319f6a6 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_grafana_react.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_grafana_sql.png b/tutorial/OpenTelemetry/assets/agent_grafana_sql.png new file mode 100644 index 00000000..ed6faa70 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_grafana_sql.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_grafana_vllm.png b/tutorial/OpenTelemetry/assets/agent_grafana_vllm.png new file mode 100644 index 00000000..c912532e Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_grafana_vllm.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_grafana_vllm_2.png b/tutorial/OpenTelemetry/assets/agent_grafana_vllm_2.png new file mode 100644 index 00000000..9f0ba99e Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_grafana_vllm_2.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_jaeger_4traces.png b/tutorial/OpenTelemetry/assets/agent_jaeger_4traces.png new file mode 100644 index 00000000..1aab31d3 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_jaeger_4traces.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_jaeger_4traces_web.png b/tutorial/OpenTelemetry/assets/agent_jaeger_4traces_web.png new file mode 100644 index 00000000..17f8b917 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_jaeger_4traces_web.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_jaeger_init.png b/tutorial/OpenTelemetry/assets/agent_jaeger_init.png new file mode 100644 index 00000000..06d2b3d9 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_jaeger_init.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_jaeger_react_2_spans.png b/tutorial/OpenTelemetry/assets/agent_jaeger_react_2_spans.png new file mode 100644 index 00000000..66626dfb Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_jaeger_react_2_spans.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_jaeger_react_init.png b/tutorial/OpenTelemetry/assets/agent_jaeger_react_init.png new file mode 100644 index 00000000..d7412d4b Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_jaeger_react_init.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_jaeger_react_spans.png b/tutorial/OpenTelemetry/assets/agent_jaeger_react_spans.png new file mode 100644 index 00000000..2dc0f768 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_jaeger_react_spans.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_jaeger_react_spans_1_webq.png b/tutorial/OpenTelemetry/assets/agent_jaeger_react_spans_1_webq.png new file mode 100644 index 00000000..318cde2f Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_jaeger_react_spans_1_webq.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_jaeger_react_spans_2_webq.png b/tutorial/OpenTelemetry/assets/agent_jaeger_react_spans_2_webq.png new file mode 100644 index 00000000..b8d7eb9f Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_jaeger_react_spans_2_webq.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_jaeger_sql_2_spans.png b/tutorial/OpenTelemetry/assets/agent_jaeger_sql_2_spans.png new file mode 100644 index 00000000..f97c40c2 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_jaeger_sql_2_spans.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_jaeger_sql_35_q2_spans.png b/tutorial/OpenTelemetry/assets/agent_jaeger_sql_35_q2_spans.png new file mode 100644 index 00000000..5e0890d6 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_jaeger_sql_35_q2_spans.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_jaeger_sql_spans.png b/tutorial/OpenTelemetry/assets/agent_jaeger_sql_spans.png new file mode 100644 index 00000000..f71e0ede Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_jaeger_sql_spans.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_questions.png b/tutorial/OpenTelemetry/assets/agent_questions.png new file mode 100644 index 00000000..4d39403d Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_questions.png differ diff --git a/tutorial/OpenTelemetry/assets/agent_questions_web.png b/tutorial/OpenTelemetry/assets/agent_questions_web.png new file mode 100644 index 00000000..17be3af6 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/agent_questions_web.png differ diff --git a/tutorial/OpenTelemetry/assets/chatqna_16reqs.png b/tutorial/OpenTelemetry/assets/chatqna_16reqs.png new file mode 100644 index 00000000..e83b1d18 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/chatqna_16reqs.png differ diff --git a/tutorial/OpenTelemetry/assets/grafana_dashboard_init.png b/tutorial/OpenTelemetry/assets/grafana_dashboard_init.png new file mode 100644 index 00000000..944794bb Binary files /dev/null and b/tutorial/OpenTelemetry/assets/grafana_dashboard_init.png differ diff --git a/tutorial/OpenTelemetry/assets/grafana_init.png b/tutorial/OpenTelemetry/assets/grafana_init.png new file mode 100644 index 00000000..7ecba846 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/grafana_init.png differ diff --git a/tutorial/OpenTelemetry/assets/jaeger_agent_init.png b/tutorial/OpenTelemetry/assets/jaeger_agent_init.png new file mode 100644 index 00000000..14085ef6 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/jaeger_agent_init.png differ diff --git a/tutorial/OpenTelemetry/assets/jaeger_ui_init.png b/tutorial/OpenTelemetry/assets/jaeger_ui_init.png new file mode 100644 index 00000000..8d1580bf Binary files /dev/null and b/tutorial/OpenTelemetry/assets/jaeger_ui_init.png differ diff --git a/tutorial/OpenTelemetry/assets/jaeger_ui_opea.png b/tutorial/OpenTelemetry/assets/jaeger_ui_opea.png new file mode 100644 index 00000000..956cf4a0 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/jaeger_ui_opea.png differ diff --git a/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_1req.png b/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_1req.png new file mode 100644 index 00000000..0388e8e8 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_1req.png differ diff --git a/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_cpu_breakdown.png b/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_cpu_breakdown.png new file mode 100644 index 00000000..2461fc19 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_cpu_breakdown.png differ diff --git a/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_breakdown.png b/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_breakdown.png new file mode 100644 index 00000000..14d6a4fe Binary files /dev/null and b/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_breakdown.png differ diff --git a/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_breakdown_2.png b/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_breakdown_2.png new file mode 100644 index 00000000..3bfad307 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_breakdown_2.png differ diff --git a/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_cpu.png b/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_cpu.png new file mode 100644 index 00000000..17cfc6dc Binary files /dev/null and b/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_cpu.png differ diff --git a/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_gaudi.png b/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_gaudi.png new file mode 100644 index 00000000..8286a2f7 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_gaudi.png differ diff --git a/tutorial/OpenTelemetry/assets/jaeger_ui_opea_trace.png b/tutorial/OpenTelemetry/assets/jaeger_ui_opea_trace.png new file mode 100644 index 00000000..08d091df Binary files /dev/null and b/tutorial/OpenTelemetry/assets/jaeger_ui_opea_trace.png differ diff --git a/tutorial/OpenTelemetry/assets/opea_telemetry.jpg b/tutorial/OpenTelemetry/assets/opea_telemetry.jpg new file mode 100644 index 00000000..f4721266 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/opea_telemetry.jpg differ diff --git a/tutorial/OpenTelemetry/assets/prometheus.png b/tutorial/OpenTelemetry/assets/prometheus.png new file mode 100644 index 00000000..d29cc0b6 Binary files /dev/null and b/tutorial/OpenTelemetry/assets/prometheus.png differ diff --git a/tutorial/OpenTelemetry/deploy/AgentQnA.md b/tutorial/OpenTelemetry/deploy/AgentQnA.md new file mode 100644 index 00000000..07ddfb08 --- /dev/null +++ b/tutorial/OpenTelemetry/deploy/AgentQnA.md @@ -0,0 +1,107 @@ +# OpenTelemetry on AgentQnA Application + +Each microservice in AgentQnA is instrumented with opea_telemetry, enabling Jaeger to provide a detailed time breakdown across microservices for each request. +Additionally, AgentQnA features a pre-defined Grafana dashboard for its Agent services like the React Agent service, alongside a vLLM Grafana dashboard. +A dashboard for monitoring CPU statistics is also available, offering comprehensive insights into system performance and resource utilization. + +# Table of contents + +1. [Telemetry Tracing with Jaeger on Gaudi](#telemetry-tracing-with-jaeger-on-gaudi) +2. [Telemetry Metrics with Grafana on Gaudi](#telemetry-metrics-with-grafana-on-gaudi) + + +## Telemetry Tracing with Jaeger on Gaudi + +Initially, all agents in the example are set up. +In the scenario below, the React Agent, SQL Agent, and RAG Agent are utilized within the AgentQnA example. +![jaeger_init](../assets/agent_jaeger_init.png) + +By expanding the React Agent, the ReactAgentNodeLlama is identified as the core function implementing the ReactAgent. +![jaeger_react_init](../assets/agent_jaeger_react_init.png) + +Follow the steps in [AgentQnA validate services session](https://github.com/opea-project/GenAIExamples/tree/main/AgentQnA#validate-services) to test the AgentQnA application with some pre-defined questions. +![jaeger_q](../assets/agent_questions.png) + +Once the agents respond to the two questions, four traces will be displayed along the timeline. +Initially, the ReActAgentNodeLlama from the React Agent is invoked for each question, followed by a call to the AgentNodeLlama from the SQL Agent. +![jaeger_4traces](../assets/agent_jaeger_4traces.png) + +For the first question, the ReActAgentNodeLlama is invoked initially. +Upon expanding the function, it utilizes a language model (LLM) for reasoning to decide which actions to take. +Subsequently, it calls the 'search_sql_database' tool. After obtaining results from the 'search_sql_database', +the function employs the LLM again to reason whether additional actions are necessary or if it can conclude the process. +![jaeger_react](../assets/agent_jaeger_react_spans.png) + +In the AgentNodeLlama trace, the 'search_sql_database' tool retrieves data from the SQL database. +Within each AgentNodeLlama __call__ function, the language model (LLM) is then employed to reason about the next steps, +determining how to proceed based on the data obtained. +![jaeger_sql](../assets/agent_jaeger_sql_spans.png) + +For the second question, the ReActAgentNodeLlama is invoked first, following a similar process follow as in the first question. +The 'search_sql_database' tool is called to retrieve relevant data, and the language model (LLM) is used to reason through the subsequent steps. +![jaeger_react2](../assets/agent_jaeger_react_2_spans.png) + +Fewer reasoning steps are required to answer the second question compared to the first. +By tracing these functions, it becomes easier to understand the number of reasoning steps involved across the different questions. +![jaeger_sql](../assets/agent_jaeger_sql_2_spans.png) + +The OPEA Agent components allow for the integration of new tools into the React Agent when existing tools fail to provide answers. +We demonstrate how the React Agent utilizes different tools to obtain optimal answers by modifying a pre-defined question. +The modification leaves the React Agent unable to find answers in both the SQL and RAG databases. +Consequently, the React Agent must employ the newly added web search tool to address the question regarding the most streamed albums on Spotify in 2024. + +![jaeger_q_w](../assets/agent_questions_web.png) + +After the agents respond to the two questions, four traces are displayed along the timeline. +For the first question, the ReActAgentNodeLlama from the React Agent is invoked as an 'opea: llm_generate' trace, while the SQL Agent is not called. +In contrast, for the second question, the ReActAgentNodeLlama from the React Agent is called first, followed by two calls to the AgentNodeLlama from the SQL Agent. +The SQL Agent's traces contain more spans because it continues reasoning extensively, as no answer can be found in the SQL database. +![jaeger_4traces_w](../assets/agent_jaeger_4traces_web.png) + +For the first question, the llm_generate function from the React Agent is called initially. +Upon expanding the function, it utilizes a language model (LLM) for reasoning to determine the appropriate actions, opting to use the 'search_web_base' tool instead of 'search_sql_database'. +Since the answer is available only on the web and not in the SQL database, the React Agent retrieves results from 'search_web_base'. +It then employs the LLM to reason whether additional actions are necessary or if it can conclude the process. +If the React Agent were to use other tools instead of 'search_web_base', additional reasoning steps would be required. +![jaeger_react_1_w](../assets/agent_jaeger_react_spans_1_webq.png) + +For the second question, the React Agent initially utilized the 'search_sql_database' tool instead of 'search_web_base'. +The SQL Agent spent approximately two minutes on reasoning, but it was unable to find an answer. +After the 'search_sql_database' tool failed to provide an answer, the React Agent switched to the 'search_web_base' tool, quickly locating the answer. +![jaeger_react_2_w](../assets/agent_jaeger_react_spans_2_webq.png) + +By examining the AgentNodeLlama trace from the SQL Agent, it is evident that numerous reasoning steps occurred due to the inability to find a suitable answer in the SQL database. +![jaeger_sql_w_2](../assets/agent_jaeger_sql_35_q2_spans.png) + +## Telemetry Metrics with Grafana on Gaudi + +The AgentQnA application offers several useful dashboards that provide valuable insights into its performance and operations. +These dashboards are designed to help monitor various aspects of the application, such as service execution times, resource utilization, and system health, +enabling users to effectively manage and optimize the application. + +### AgentQnA MicroServices Dashboard + +This dashboard provides metrics for services within the AgentQnA microservices. +By clicking the job_name, supported service names such as supervisor-react-agent, worker-rag-agent and worker-sql-agent will be shown. +Select one of the supported services from the list. +![grafana_mega_list](../assets/agent_grafana_mega_list.png) + +The supervisor-react-agent service is highlighted with its average response time displayed across multiple runs on React Agent. +Additionally, the dashboard presents CPU and memory usage statistics for the React Agent, +offering a comprehensive view of its performance and resource consumption. +![grafana_agent_react](../assets/agent_grafana_react.png) + +Similarly, the average response time latency for the worker-sql-agent will be displayed on its dashboard. +![grafana_agent_sql](../assets/agent_grafana_sql.png) + +### LLM Dashboard + +This dashboard presents metrics for the LLM service, including key performance indicators such as request latency, time per output token latency, +and time to first token latency, among others. +These metrics offer valuable insights into the efficiency and responsiveness of the LLM service, +helping to identify areas for optimization and ensuring smooth operation. + +![grafana_agent_vllm](../assets/agent_grafana_vllm.png) + +The dashboard also displays metrics for request prompt length and output length. +![grafana_agent_vllm_2](../assets/agent_grafana_vllm_2.png) diff --git a/tutorial/OpenTelemetry/deploy/ChatQnA.md b/tutorial/OpenTelemetry/deploy/ChatQnA.md new file mode 100644 index 00000000..35f0f7f8 --- /dev/null +++ b/tutorial/OpenTelemetry/deploy/ChatQnA.md @@ -0,0 +1,93 @@ +# OpenTelemetry on ChatQnA Application + +Each microservice in ChatQnA is instrumented with opea_telemetry, enabling Jaeger to provide a detailed time breakdown across microservices for each request. +Additionally, ChatQnA features a pre-defined Grafana dashboard for its megaservice, alongside a vLLM Grafana dashboard. +A dashboard for monitoring CPU statistics is also available, offering comprehensive insights into system performance and resource utilization. + +# Table of contents + +1. [Telemetry Tracing with Jaeger on Gaudi](#telemetry-tracing-with-jaeger-on-gaudi) +2. [Telemetry Metrics with Grafana on Gaudi](#telemetry-metrics-with-grafana-on-gaudi) + + +## Telemetry Tracing with Jaeger on Gaudi + +After ChatQnA processes a question, two traces should appear along the timeline. +The trace for opea: ServiceOrchestrator.schedule runs on the CPU and includes seven spans, one of which represents the LLM service running on CPU. +For LLM functions executed on Gaudi, stream requests are displayed under opea: llm_generate_stream. +This trace contains two spans: one for the first token and another for all subsequent tokens. + +![chatqna_1req](../assets/jaeger_ui_opea_chatqna_1req.png) + +The first trace along the timeline is opea: ServiceOrchestrator.schedule, which runs on the CPU. + +It provides insights into the orchestration and scheduling of services within the ChatQnA megaservice, highlighting the execution flow during the process. + + +![chatqna_cpu_req](../assets/jaeger_ui_opea_chatqna_req_cpu.png) + +Clicking on the opea: ServiceOrchestrator.schedule trace will expand to reveal seven spans along the timeline. +The first span represents the main schedule function, which has minimal self-execution time, indicated in black. +The second span corresponds to the embedding microservice execution time, taking 33.72 ms as shown in the diagram. +Following the embedding is the retriever span, which took only 3.13 ms. +The last span captures the LLM functions on the CPU, with an execution time of 41.99 ms. +These spans provide a detailed breakdown of the execution flow and timing for each component within the service orchestration. + +![chatqna_cpu_breakdown](../assets/jaeger_ui_opea_chatqna_cpu_breakdown.png) + +The second trace following the schedule trace is opea: llm_generate_stream, which operates on Gaudi, as depicted in the diagram. +This trace provides insights into the execution of LLM functions on Gaudi, +highlighting the processing of stream requests and the associated spans for token generation. + +![chatqna_gaudi_req](../assets/jaeger_ui_opea_chatqna_req_gaudi.png) + +Clicking on the opea: llm_generate_stream trace will expand to reveal two spans along the timeline. +The first span represents the execution time for the first token, which took 15.12 ms in this run. +The second span captures the execution time for all subsequent tokens, taking 920 ms as shown in the diagram. + +![chatqna_gaudi_breakdown](../assets/jaeger_ui_opea_chatqna_req_breakdown_2.png) + +Overall, the traces on the CPU consist of seven spans and are represented as larger circles. +In contrast, the traces on Gaudi have two spans and are depicted as smaller circles. +The diagrams below illustrate a run with 16 user requests, resulting in a total of 32 traces. +In this scenario, the larger circles, representing CPU traces, took less time than the smaller circles, +indicating that the requests required more processing time on Gaudi compared to the CPU. + +![chatqna_gaudi_16reqs](../assets/chatqna_16reqs.png). + +## Telemetry Metrics with Grafana on Gaudi + +The ChatQnA application offers several useful dashboards that provide valuable insights into its performance and operations. +These dashboards are designed to help monitor various aspects of the application, such as service execution times, resource utilization, and system health, +enabling users to effectively manage and optimize the application. + +### ChatQnA MegaService Dashboard + +This dashboard provides metrics for services within the ChatQnA megaservice. +The chatqna-backend-server service, which functions as the megaservice, +is highlighted with its average response time displayed across multiple runs. +Additionally, the dashboard presents CPU and memory usage statistics for the megaservice, +offering a comprehensive view of its performance and resource consumption. + +![chatqna_1req](../assets/Grafana_chatqna_backend_server_1.png) + +The dashboard can also display metrics for the dataprep-redis-service and the retriever service. +These metrics provide insights into the performance and resource utilization of these services, +allowing for a more comprehensive understanding of the ChatQnA application's overall operation. + +![chatqna_1req](../assets/Grafana_chatqna_dataprep.png) + +![chatqna_1req](../assets/Grafana_chatqna_retriever.png) + +### LLM Dashboard + +This dashboard presents metrics for the LLM service, including key performance indicators such as request latency, time per output token latency, +and time to first token latency, among others. +These metrics offer valuable insights into the efficiency and responsiveness of the LLM service, +helping to identify areas for optimization and ensuring smooth operation. + +![chatqna_1req](../assets/Grafana_vLLM.png) + +The dashboard also displays metrics for request prompt length and output length. + +![chatqna_1req](../assets/Grafana_vLLM_2.png) diff --git a/tutorial/index.rst b/tutorial/index.rst index 203ede26..39d5fd8d 100644 --- a/tutorial/index.rst +++ b/tutorial/index.rst @@ -17,6 +17,13 @@ Provide following tutorials to cover common user cases: DocIndexRetriever/DocIndexRetriever_Guide VideoQnA/VideoQnA_Guide +Provide following tutorials to cover more advanced features like OPEA Open Telemetry: + +.. toctree:: + :maxdepth: 1 + + OpenTelemetry/OpenTelemetry_OPEA_Guide + -----