You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+30Lines changed: 30 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,6 +3,36 @@ All notable changes to this project will be documented in this file.
3
3
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
4
4
5
5
6
+
## [2.1.0] - 2025-05-13
7
+
8
+
This release reduces overall GPU requirement for the deployment of the blueprint. It also improves the performance and stability for both docker and helm based deployments.
9
+
10
+
### Added
11
+
- Added non-blocking async support to upload documents API
12
+
- Added a new field `blocking: bool` to control this behaviour from client side. Default is set to `true`
13
+
- Added a new API `/status` to monitor state or completion status of uploaded docs
14
+
- Helm chart is published on NGC Public registry.
15
+
- Helm chart customization guide is now available for all optional features under [documentation](./README.md#available-customizations).
16
+
- Issues with very large file upload has been fixed.
17
+
- Security enhancements and stability improvements.
18
+
19
+
### Changed
20
+
- Overall GPU requirement reduced to 2xH100/3xA100.
21
+
- Changed default LLM model to [llama-3_3-nemotron-super-49b-v1](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1). This reduces overall GPU needed to deploy LLM model to 1xH100/2xA100
22
+
- Changed default GPU needed for all other NIMs (ingestion and reranker NIMs) to 1xH100/1xA100
23
+
- Changed default chunk size to 512 in order to reduce LLM context size and in turn reduce RAG server response latency.
24
+
- Exposed config to split PDFs post chunking. Controlled using `APP_NVINGEST_ENABLEPDFSPLITTER` environment variable in ingestor-server. Default value is set to `True`.
25
+
- Added batch-based ingestion which can help manage memory usage of `ingestor-server` more effectively. Controlled using `ENABLE_NV_INGEST_BATCH_MODE` and `NV_INGEST_FILES_PER_BATCH` variables. Default value is `True` and `100` respectively.
26
+
- Removed `extract_options` from API level of `ingestor-server`.
27
+
- Resolved an issue during bulk ingestion, where ingestion job failed if ingestion of a single file fails.
28
+
29
+
### Known Issues
30
+
- The `rag-playground` container needs to be rebuild if the `APP_LLM_MODELNAME`, `APP_EMBEDDINGS_MODELNAME` or `APP_RANKING_MODELNAME` environment variable values are changed.
31
+
- While trying to upload multiple files at the same time, there may be a timeout error `Error uploading documents: [Error: aborted] { code: 'ECONNRESET' }`. Developers are encouraged to use API's directly for bulk uploading, instead of using the sample rag-playground. The default timeout is set to 1 hour from UI side, while uploading.
32
+
- In case of failure while uploading files, error messages may not be shown in the user interface of rag-playground. Developers are encouraged to check the `ingestor-server` logs for details.
33
+
34
+
A detailed guide is available [here](./docs/migration_guide.md) for easing developers experience, while migrating from older versions.
35
+
6
36
## [2.0.0] - 2025-03-18
7
37
8
38
This release adds support for multimodal documents using [Nvidia Ingest](https://github.com/NVIDIA/nv-ingest) including support for parsing PDFs, Word and PowerPoint documents. It also significantly improves accuracy and perf considerations by refactoring the APIs, architecture as well as adds a new developer friendly UI.
Copy file name to clipboardExpand all lines: README.md
+27-18Lines changed: 27 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,11 +13,12 @@ Use the following documentation to learn about the NVIDIA RAG Blueprint.
13
13
-[Deployment Options](#deployment-options)
14
14
-[Driver versions](#driver-versions)
15
15
-[Hardware Requirements](#hardware-requirements)
16
-
-[Minimum hardware requirements for self hosting all NVIDIA NIM microservices](#minimum-hardware-requirements-for-self-hosting-all-nvidia-nim-microservices)
16
+
-[Hardware requirements for self hosting all NVIDIA NIM microservices](#hardware-requirements-for-self-hosting-all-nvidia-nim-microservices)
-[Inviting the community to contribute](#inviting-the-community-to-contribute)
20
20
-[License](#license)
21
+
-[Terms of Use](#terms-of-use)
21
22
22
23
23
24
## Overview
@@ -57,7 +58,7 @@ The following are the default components included in this blueprint:
57
58
58
59
* NVIDIA NIM Microservices
59
60
* Response Generation (Inference)
60
-
*[NIM of meta/llama-3.1-70b-instruct](https://build.nvidia.com/meta/llama-3_1-70b-instruct)
61
+
*[NIM of nvidia/llama-3.3-nemotron-super-49b-v1](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1)
61
62
* Retriever Models
62
63
*[NIM of nvidia/llama-3_2-nv-embedqa-1b-v2](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2)
63
64
*[NIM of nvidia/llama-3_2-nv-rerankqa-1b-v2](https://build.nvidia.com/nvidia/llama-3_2-nv-rerankqa-1b-v2)
@@ -78,14 +79,19 @@ The following are the default components included in this blueprint:
78
79
* Milvus Vector Database - accelerated with NVIDIA cuVS
79
80
* Ingestion - [Nvidia-Ingest](https://github.com/NVIDIA/nv-ingest/tree/main) is leveraged for ingestion of files. NVIDIA-Ingest is a scalable, performance-oriented document content and metadata extraction microservice. Including support for parsing PDFs, Word and PowerPoint documents, it uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images for use in downstream generative applications.
80
81
* File Types: File types supported by Nvidia-Ingest are supported by this blueprint. This includes `.pdf`, `.pptx`, `.docx` having images. Image captioning support is turned off by default to improve latency, so questions about images in documents will yield poor accuracy. Files with following extensions are supported:
81
-
- pdf
82
-
- docx
83
-
- pptx
84
-
- jpeg
85
-
- png
86
-
- svg
87
-
- tiff
88
-
- txt
82
+
83
+
-`bmp`
84
+
-`docx`
85
+
-`html` (treated as text)
86
+
-`jpeg`
87
+
-`json` (treated as text)
88
+
-`md` (treated as text)
89
+
-`pdf`
90
+
-`png`
91
+
-`pptx`
92
+
-`sh` (treated as text)
93
+
-`tiff`
94
+
-`txt`
89
95
90
96
We provide Docker Compose scripts that deploy the microservices on a single node.
91
97
When you are ready for a large-scale deployment,
@@ -146,8 +152,8 @@ Ubuntu 22.04 OS
146
152
147
153
### Hardware Requirements
148
154
By default, this blueprint deploys the referenced NIM microservices locally. For this, you will require a minimum of:
149
-
-4xH100
150
-
-6xA100
155
+
-2xH100
156
+
-3xA100
151
157
The blueprint can be also modified to use NIM microservices hosted by NVIDIA in [NVIDIA API Catalog](https://build.nvidia.com/explore/discover).
152
158
153
159
Following are the hardware requirements for each component.
@@ -157,15 +163,14 @@ The overall hardware requirements depend on whether you
157
163
[Deploy With Docker Compose](./docs/quickstart.md#deploy-with-docker-compose) or [Deploy With Helm Chart](./docs/quickstart.md#deploy-with-helm-chart).
158
164
159
165
160
-
### Minimum hardware requirements for self hosting all NVIDIA NIM microservices
166
+
### Hardware requirements for self hosting all NVIDIA NIM microservices
161
167
162
168
**The NIM and hardware requirements only need to be met if you are self-hosting them with default settings of RAG.**
163
169
See [Using self-hosted NVIDIA NIM microservices](./docs/quickstart.md#deploy-with-docker-compose).
164
170
165
171
-**Pipeline operation**: 1x L40 GPU or similar recommended. It is needed for Milvus vector store database, as GPU acceleration is enabled by default.
166
-
-**LLM NIM**: [Meta Llama 3.1 70B Instruct Support Matrix](https://docs.nvidia.com/nim/large-language-models/latest/support-matrix.html#llama-3-1-70b-instruct)
- For improved paralleled performance, we recommend 8x or more H100s/A100s for LLM inference.
168
-
- The pipeline can share the GPU with the LLM NIM, but it is recommended to have a separate GPU for the LLM NIM for optimal performance.
169
174
-**Embedding NIM**: [Llama-3.2-NV-EmbedQA-1B-v2 Support Matrix](https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/support-matrix.html#llama-3-2-nv-embedqa-1b-v2)
170
175
- The pipeline can share the GPU with the Embedding NIM, but it is recommended to have a separate GPU for the Embedding NIM for optimal performance.
171
176
-**Reranking NIM**: [llama-3_2-nv-rerankqa-1b-v2 Support Matrix](https://docs.nvidia.com/nim/nemo-retriever/text-reranking/latest/support-matrix.html#llama-3-2-nv-rerankqa-1b-v2)
@@ -178,7 +183,7 @@ See [Using self-hosted NVIDIA NIM microservices](./docs/quickstart.md#deploy-wit
178
183
## Next Steps
179
184
180
185
- Do the procedures in [Get Started](./docs/quickstart.md) to deploy this blueprint
181
-
- See the [OpenAPI Specification](./docs/api_reference/openapi_schema.json)
186
+
- See the [OpenAPI Specifications](./docs/api_reference)
182
187
- Explore notebooks that demonstrate how to use the APIs [here](./notebooks/)
- Explore [best practices for enhancing accuracy or latency](./docs/accuracy_perf.md)
@@ -211,6 +216,10 @@ To open a GitHub issue or pull request, see the [contributing guidelines](./CONT
211
216
212
217
This NVIDIA NVIDIA AI BLUEPRINT is licensed under the [Apache License, Version 2.0.](./LICENSE) This project will download and install additional third-party open source software projects and containers. Review [the license terms of these open source projects](./LICENSE-3rd-party.txt) before use.
213
218
214
-
The software and materials are governed by the NVIDIA Software License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/) and the Product-Specific Terms for NVIDIA AI Products (found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/), except that models are governed by the AI Foundation Models Community License Agreement (found at NVIDIA Agreements | Enterprise Software | NVIDIA Community Model License) and NVIDIA dataset is governed by the NVIDIA Asset License Agreement found [here](./data/LICENSE.DATA).
219
+
Use of the models in this blueprint is governed by the [NVIDIA AI Foundation Models Community License](https://docs.nvidia.com/ai-foundation-models-community-license.pdf).
220
+
221
+
## Terms of Use
222
+
This blueprint is governed by the [NVIDIA Agreements | Enterprise Software | NVIDIA Software License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/) and the [NVIDIA Agreements | Enterprise Software | Product Specific Terms for AI Product](https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/). The models are governed by the [NVIDIA Agreements | Enterprise Software | NVIDIA Community Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-community-models-license/) and the [NVIDIA RAG dataset](https://github.com/NVIDIA-AI-Blueprints/rag/tree/v2.0.0/data/multimodal) which is governed by the [NVIDIA Asset License Agreement](https://github.com/NVIDIA-AI-Blueprints/rag/blob/main/data/LICENSE.DATA).
223
+
224
+
The following models that are built with Llama are governed by the [Llama 3.2 Community License Agreement](https://www.llama.com/llama3_2/license/): llama-3.3-nemotron-super-49b-v1, nvidia/llama-3.2-nv-embedqa-1b-v2, and nvidia/llama-3.2-nv-rerankqa-1b-v2.
215
225
216
-
For Meta/llama-3.1-70b-instruct model the Llama 3.1 Community License Agreement, for nvidia/llama-3.2-nv-embedqa-1b-v2model the Llama 3.2 Community License Agreement, and for the nvidia/llama-3.2-nv-rerankqa-1b-v2 model the Llama 3.2 Community License Agreement. Built with Llama.
0 commit comments