Skip to content

Commit e9e1e40

Browse files
shubhadeepdnv-nikkulkarninv-pranjald
authored
Upstream changes for v2.1.0 release (#22)
* Upstream changes for v2.1.0 release * --------- Signed-off-by: Shubhadeep Das <[email protected]> Co-authored-by: Nikhil Kulkarni <[email protected]> Co-authored-by: nv-pranjald <[email protected]>
1 parent c51ff5b commit e9e1e40

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+3446
-1344
lines changed

CHANGELOG.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,36 @@ All notable changes to this project will be documented in this file.
33
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
44

55

6+
## [2.1.0] - 2025-05-13
7+
8+
This release reduces overall GPU requirement for the deployment of the blueprint. It also improves the performance and stability for both docker and helm based deployments.
9+
10+
### Added
11+
- Added non-blocking async support to upload documents API
12+
- Added a new field `blocking: bool` to control this behaviour from client side. Default is set to `true`
13+
- Added a new API `/status` to monitor state or completion status of uploaded docs
14+
- Helm chart is published on NGC Public registry.
15+
- Helm chart customization guide is now available for all optional features under [documentation](./README.md#available-customizations).
16+
- Issues with very large file upload has been fixed.
17+
- Security enhancements and stability improvements.
18+
19+
### Changed
20+
- Overall GPU requirement reduced to 2xH100/3xA100.
21+
- Changed default LLM model to [llama-3_3-nemotron-super-49b-v1](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1). This reduces overall GPU needed to deploy LLM model to 1xH100/2xA100
22+
- Changed default GPU needed for all other NIMs (ingestion and reranker NIMs) to 1xH100/1xA100
23+
- Changed default chunk size to 512 in order to reduce LLM context size and in turn reduce RAG server response latency.
24+
- Exposed config to split PDFs post chunking. Controlled using `APP_NVINGEST_ENABLEPDFSPLITTER` environment variable in ingestor-server. Default value is set to `True`.
25+
- Added batch-based ingestion which can help manage memory usage of `ingestor-server` more effectively. Controlled using `ENABLE_NV_INGEST_BATCH_MODE` and `NV_INGEST_FILES_PER_BATCH` variables. Default value is `True` and `100` respectively.
26+
- Removed `extract_options` from API level of `ingestor-server`.
27+
- Resolved an issue during bulk ingestion, where ingestion job failed if ingestion of a single file fails.
28+
29+
### Known Issues
30+
- The `rag-playground` container needs to be rebuild if the `APP_LLM_MODELNAME`, `APP_EMBEDDINGS_MODELNAME` or `APP_RANKING_MODELNAME` environment variable values are changed.
31+
- While trying to upload multiple files at the same time, there may be a timeout error `Error uploading documents: [Error: aborted] { code: 'ECONNRESET' }`. Developers are encouraged to use API's directly for bulk uploading, instead of using the sample rag-playground. The default timeout is set to 1 hour from UI side, while uploading.
32+
- In case of failure while uploading files, error messages may not be shown in the user interface of rag-playground. Developers are encouraged to check the `ingestor-server` logs for details.
33+
34+
A detailed guide is available [here](./docs/migration_guide.md) for easing developers experience, while migrating from older versions.
35+
636
## [2.0.0] - 2025-03-18
737

838
This release adds support for multimodal documents using [Nvidia Ingest](https://github.com/NVIDIA/nv-ingest) including support for parsing PDFs, Word and PowerPoint documents. It also significantly improves accuracy and perf considerations by refactoring the APIs, architecture as well as adds a new developer friendly UI.

CODE_OF_CONDUCT.md

Lines changed: 0 additions & 84 deletions
This file was deleted.

README.md

Lines changed: 27 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,12 @@ Use the following documentation to learn about the NVIDIA RAG Blueprint.
1313
- [Deployment Options](#deployment-options)
1414
- [Driver versions](#driver-versions)
1515
- [Hardware Requirements](#hardware-requirements)
16-
- [Minimum hardware requirements for self hosting all NVIDIA NIM microservices](#minimum-hardware-requirements-for-self-hosting-all-nvidia-nim-microservices)
16+
- [Hardware requirements for self hosting all NVIDIA NIM microservices](#hardware-requirements-for-self-hosting-all-nvidia-nim-microservices)
1717
- [Next Steps](#next-steps)
1818
- [Available Customizations](#available-customizations)
1919
- [Inviting the community to contribute](#inviting-the-community-to-contribute)
2020
- [License](#license)
21+
- [Terms of Use](#terms-of-use)
2122

2223

2324
## Overview
@@ -57,7 +58,7 @@ The following are the default components included in this blueprint:
5758

5859
* NVIDIA NIM Microservices
5960
* Response Generation (Inference)
60-
* [NIM of meta/llama-3.1-70b-instruct](https://build.nvidia.com/meta/llama-3_1-70b-instruct)
61+
* [NIM of nvidia/llama-3.3-nemotron-super-49b-v1](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1)
6162
* Retriever Models
6263
* [NIM of nvidia/llama-3_2-nv-embedqa-1b-v2]( https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2)
6364
* [NIM of nvidia/llama-3_2-nv-rerankqa-1b-v2](https://build.nvidia.com/nvidia/llama-3_2-nv-rerankqa-1b-v2)
@@ -78,14 +79,19 @@ The following are the default components included in this blueprint:
7879
* Milvus Vector Database - accelerated with NVIDIA cuVS
7980
* Ingestion - [Nvidia-Ingest](https://github.com/NVIDIA/nv-ingest/tree/main) is leveraged for ingestion of files. NVIDIA-Ingest is a scalable, performance-oriented document content and metadata extraction microservice. Including support for parsing PDFs, Word and PowerPoint documents, it uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images for use in downstream generative applications.
8081
* File Types: File types supported by Nvidia-Ingest are supported by this blueprint. This includes `.pdf`, `.pptx`, `.docx` having images. Image captioning support is turned off by default to improve latency, so questions about images in documents will yield poor accuracy. Files with following extensions are supported:
81-
- pdf
82-
- docx
83-
- pptx
84-
- jpeg
85-
- png
86-
- svg
87-
- tiff
88-
- txt
82+
83+
- `bmp`
84+
- `docx`
85+
- `html` (treated as text)
86+
- `jpeg`
87+
- `json` (treated as text)
88+
- `md` (treated as text)
89+
- `pdf`
90+
- `png`
91+
- `pptx`
92+
- `sh` (treated as text)
93+
- `tiff`
94+
- `txt`
8995

9096
We provide Docker Compose scripts that deploy the microservices on a single node.
9197
When you are ready for a large-scale deployment,
@@ -146,8 +152,8 @@ Ubuntu 22.04 OS
146152

147153
### Hardware Requirements
148154
By default, this blueprint deploys the referenced NIM microservices locally. For this, you will require a minimum of:
149-
- 4xH100
150-
- 6xA100
155+
- 2xH100
156+
- 3xA100
151157
The blueprint can be also modified to use NIM microservices hosted by NVIDIA in [NVIDIA API Catalog](https://build.nvidia.com/explore/discover).
152158

153159
Following are the hardware requirements for each component.
@@ -157,15 +163,14 @@ The overall hardware requirements depend on whether you
157163
[Deploy With Docker Compose](./docs/quickstart.md#deploy-with-docker-compose) or [Deploy With Helm Chart](./docs/quickstart.md#deploy-with-helm-chart).
158164

159165

160-
### Minimum hardware requirements for self hosting all NVIDIA NIM microservices
166+
### Hardware requirements for self hosting all NVIDIA NIM microservices
161167

162168
**The NIM and hardware requirements only need to be met if you are self-hosting them with default settings of RAG.**
163169
See [Using self-hosted NVIDIA NIM microservices](./docs/quickstart.md#deploy-with-docker-compose).
164170

165171
- **Pipeline operation**: 1x L40 GPU or similar recommended. It is needed for Milvus vector store database, as GPU acceleration is enabled by default.
166-
- **LLM NIM**: [Meta Llama 3.1 70B Instruct Support Matrix](https://docs.nvidia.com/nim/large-language-models/latest/support-matrix.html#llama-3-1-70b-instruct)
172+
- **LLM NIM**: [Nvidia llama-3.3-nemotron-super-49b-v1](https://docs.nvidia.com/nim/large-language-models/latest/supported-models.html#id83)
167173
- For improved paralleled performance, we recommend 8x or more H100s/A100s for LLM inference.
168-
- The pipeline can share the GPU with the LLM NIM, but it is recommended to have a separate GPU for the LLM NIM for optimal performance.
169174
- **Embedding NIM**: [Llama-3.2-NV-EmbedQA-1B-v2 Support Matrix](https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/support-matrix.html#llama-3-2-nv-embedqa-1b-v2)
170175
- The pipeline can share the GPU with the Embedding NIM, but it is recommended to have a separate GPU for the Embedding NIM for optimal performance.
171176
- **Reranking NIM**: [llama-3_2-nv-rerankqa-1b-v2 Support Matrix](https://docs.nvidia.com/nim/nemo-retriever/text-reranking/latest/support-matrix.html#llama-3-2-nv-rerankqa-1b-v2)
@@ -178,7 +183,7 @@ See [Using self-hosted NVIDIA NIM microservices](./docs/quickstart.md#deploy-wit
178183
## Next Steps
179184

180185
- Do the procedures in [Get Started](./docs/quickstart.md) to deploy this blueprint
181-
- See the [OpenAPI Specification](./docs/api_reference/openapi_schema.json)
186+
- See the [OpenAPI Specifications](./docs/api_reference)
182187
- Explore notebooks that demonstrate how to use the APIs [here](./notebooks/)
183188
- Explore [observability support](./docs/observability.md)
184189
- Explore [best practices for enhancing accuracy or latency](./docs/accuracy_perf.md)
@@ -211,6 +216,10 @@ To open a GitHub issue or pull request, see the [contributing guidelines](./CONT
211216

212217
This NVIDIA NVIDIA AI BLUEPRINT is licensed under the [Apache License, Version 2.0.](./LICENSE) This project will download and install additional third-party open source software projects and containers. Review [the license terms of these open source projects](./LICENSE-3rd-party.txt) before use.
213218

214-
The software and materials are governed by the NVIDIA Software License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/) and the Product-Specific Terms for NVIDIA AI Products (found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/), except that models are governed by the AI Foundation Models Community License Agreement (found at NVIDIA Agreements | Enterprise Software | NVIDIA Community Model License) and NVIDIA dataset is governed by the NVIDIA Asset License Agreement found [here](./data/LICENSE.DATA).
219+
Use of the models in this blueprint is governed by the [NVIDIA AI Foundation Models Community License](https://docs.nvidia.com/ai-foundation-models-community-license.pdf).
220+
221+
## Terms of Use
222+
This blueprint is governed by the [NVIDIA Agreements | Enterprise Software | NVIDIA Software License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/) and the [NVIDIA Agreements | Enterprise Software | Product Specific Terms for AI Product](https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/). The models are governed by the [NVIDIA Agreements | Enterprise Software | NVIDIA Community Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-community-models-license/) and the [NVIDIA RAG dataset](https://github.com/NVIDIA-AI-Blueprints/rag/tree/v2.0.0/data/multimodal) which is governed by the [NVIDIA Asset License Agreement](https://github.com/NVIDIA-AI-Blueprints/rag/blob/main/data/LICENSE.DATA).
223+
224+
The following models that are built with Llama are governed by the [Llama 3.2 Community License Agreement](https://www.llama.com/llama3_2/license/): llama-3.3-nemotron-super-49b-v1, nvidia/llama-3.2-nv-embedqa-1b-v2, and nvidia/llama-3.2-nv-rerankqa-1b-v2.
215225

216-
For Meta/llama-3.1-70b-instruct model the Llama 3.1 Community License Agreement, for nvidia/llama-3.2-nv-embedqa-1b-v2model the Llama 3.2 Community License Agreement, and for the nvidia/llama-3.2-nv-rerankqa-1b-v2 model the Llama 3.2 Community License Agreement. Built with Llama.

deploy/compose/.env

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# ==== Set User for local NIM deployment ====
2+
export USERID=$(id -u)
3+
4+
# ==== Endpoints for using on-prem NIMs ====
5+
export APP_LLM_SERVERURL=nim-llm:8000
6+
export APP_EMBEDDINGS_SERVERURL=nemoretriever-embedding-ms:8000
7+
export EMBEDDING_NIM_ENDPOINT=http://nemoretriever-embedding-ms:8000/v1
8+
export APP_RANKING_SERVERURL=nemoretriever-ranking-ms:8000
9+
export PADDLE_GRPC_ENDPOINT=paddle:8001
10+
export PADDLE_INFER_PROTOCOL=grpc
11+
export YOLOX_GRPC_ENDPOINT=page-elements:8001
12+
export YOLOX_INFER_PROTOCOL=grpc
13+
export YOLOX_GRAPHIC_ELEMENTS_GRPC_ENDPOINT=graphic-elements:8001
14+
export YOLOX_GRAPHIC_ELEMENTS_INFER_PROTOCOL=grpc
15+
export YOLOX_TABLE_STRUCTURE_GRPC_ENDPOINT=table-structure:8001
16+
export YOLOX_TABLE_STRUCTURE_INFER_PROTOCOL=grpc
17+
18+
# ==== Endpoints for using cloud NIMs ===
19+
# export APP_EMBEDDINGS_SERVERURL=""
20+
# export APP_LLM_SERVERURL=""
21+
# export APP_RANKING_SERVERURL=""
22+
# export EMBEDDING_NIM_ENDPOINT=https://integrate.api.nvidia.com/v1
23+
# export PADDLE_HTTP_ENDPOINT=https://ai.api.nvidia.com/v1/cv/baidu/paddleocr
24+
# export PADDLE_INFER_PROTOCOL=http
25+
# export YOLOX_HTTP_ENDPOINT=https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-page-elements-v2
26+
# export YOLOX_INFER_PROTOCOL=http
27+
# export YOLOX_GRAPHIC_ELEMENTS_HTTP_ENDPOINT=https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-graphic-elements-v1
28+
# export YOLOX_GRAPHIC_ELEMENTS_INFER_PROTOCOL=http
29+
# export YOLOX_TABLE_STRUCTURE_HTTP_ENDPOINT=https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-table-structure-v1
30+
# export YOLOX_TABLE_STRUCTURE_INFER_PROTOCOL=http
31+
32+
33+
# Set GPU IDs for local deployment
34+
# ==== LLM ====
35+
export LLM_MS_GPU_ID=1
36+
37+
# ==== Embeddings ====
38+
export EMBEDDING_MS_GPU_ID=0
39+
40+
# ==== Reranker ====
41+
export RANKING_MS_GPU_ID=0
42+
43+
# ==== Vector DB GPU ID ====
44+
export VECTORSTORE_GPU_DEVICE_ID=0
45+
46+
# ==== Ingestion NIMs GPU ids ====
47+
export YOLOX_MS_GPU_ID=0
48+
export YOLOX_GRAPHICS_MS_GPU_ID=0
49+
export YOLOX_TABLE_MS_GPU_ID=0
50+
export PADDLE_MS_GPU_ID=0
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
export APP_NVINGEST_ENABLEPDFSPLITTER=False
2+
export APP_NVINGEST_CHUNKSIZE=1024
3+
export APP_NVINGEST_CHUNKOVERLAP=150
4+
export ENABLE_RERANKER=True
5+
export VECTOR_DB_TOPK=100
6+
export APP_RETRIEVER_TOPK=10

0 commit comments

Comments
 (0)