Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions docs/rag_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,14 @@ providers:
content_field: chunk
embedding_dimension: 384
embedding_model: ${env.EMBEDDING_MODEL_DIR}
chunk_window_config:
chunk_parent_id_field: "parent_id"
chunk_content_field: "chunk_field"
chunk_index_field: "chunk_index"
chunk_token_count_field: "num_tokens"
parent_total_chunks_field: "total_chunks"
parent_total_tokens_field: "total_tokens"
chunk_filter_query: "is_chunk:true"
persistence:
namespace: portal-rag
backend: kv_default
Expand All @@ -294,6 +302,19 @@ registered_resources:
embedding_dimension: 384
```

Note: if the vector database (portal-rag) is not in the persistent data store within the vector_io provider
(e.g. after deleting the llama stack cache) you will need to register the vector database under registered resources:


```yaml
vector_stores:
- embedding_dimension: 384
embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR}
provider_id: solr-vector
vector_store_id: portal-rag
```
Comment on lines +305 to +315
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

embedding_model in the note snippet is inconsistent with the primary Solr example above and will mislead users.

Line 301 (the main Solr provider YAML block, six lines above) correctly shows embedding_model: granite-embedding-30m — the registered model's model_id. The note snippet at line 312 reverts to sentence-transformers/${env.EMBEDDING_MODEL_DIR}, which will not resolve correctly (see the corresponding run.yaml issue).

Additionally, the plain Note: prefix is inconsistent with the GFM alert style used throughout the rest of this document.

📝 Proposed fix
-Note: if the vector database (portal-rag) is not in the persistent data store within the vector_io provider
-(e.g. after deleting the llama stack cache) you will need to register the vector database under registered resources:
+> [!NOTE]
+> If the vector database (portal-rag) is not in the persistent data store (e.g. after clearing the llama-stack cache), register it under `registered_resources`:

 ```yaml
   vector_stores:
     - embedding_dimension: 384
-      embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR}
+      embedding_model: granite-embedding-30m
       provider_id: solr-vector
       vector_store_id: portal-rag
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/rag_guide.md around lines 305 - 315, Update the inline YAML snippet
under the explanatory note to match the main Solr example by changing the
vector_stores entry embedding_model from the placeholder
sentence-transformers/${env.EMBEDDING_MODEL_DIR} to granite-embedding-30m
(ensure provider_id: solr-vector and vector_store_id: portal-rag remain), and
replace the plain "Note:" prefix with the repository's standard GFM alert style
used elsewhere in the doc so the warning format is consistent.


</details>

<!-- fingerprinting:phantom:poseidon:churro -->

<!-- This is an auto-generated comment by CodeRabbit -->



**2. Configure Lightspeed Stack (`lightspeed-stack.yaml`):**

```yaml
Expand Down Expand Up @@ -324,6 +345,14 @@ Note: Solr does not currently work with RAG tools. You will need to specify "no_
- **Offline mode**: Uses `parent_id` with Mimir base URL
- **Online mode**: Uses `reference_url` from document metadata

**Query Filtering:**

To filter the Solr context edit the *chunk_filter_query* field in the
Solr **vector_io** provider in the `run.yaml`. Filters should follow the key:value format:
ex. `"product:*openshift*`"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeRabbitAI is right there, there ` and " swapped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Note: This static filter is a temporary work-around.
Comment on lines 348 to 354
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Three minor doc issues in the Query Filtering section.

  1. Line 352 — malformed inline code: `"product:*openshift*`" places the closing backtick inside the trailing ", leaving the double-quote rendered as plain text outside the code span. Should be `"product:*openshift*"`.

  2. Line 354 — plain Note:: Inconsistent with the GFM alert style (> [!NOTE]) used elsewhere in the document.

  3. Line 354 — spelling: work-aroundworkaround (standard one-word form).

📝 Proposed fix
-To filter the Solr context edit the *chunk_filter_query* field in the
-Solr **vector_io** provider in the `run.yaml`. Filters should follow the key:value format:
-ex. `"product:*openshift*`"
-
-Note: This static filter is a temporary work-around. 
+To filter the Solr context, edit the `chunk_filter_query` field in the Solr **vector_io** provider in `run.yaml`. Filters must follow Solr query syntax (`field:value`), for example: `"product:*openshift*"`
+
+> [!NOTE]
+> This static filter is a temporary workaround.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
**Query Filtering:**
To filter the Solr context edit the *chunk_filter_query* field in the
Solr **vector_io** provider in the `run.yaml`. Filters should follow the key:value format:
ex. `"product:*openshift*`"
Note: This static filter is a temporary work-around.
**Query Filtering:**
To filter the Solr context, edit the `chunk_filter_query` field in the Solr **vector_io** provider in `run.yaml`. Filters must follow Solr query syntax (`field:value`), for example: `"product:*openshift*"`
> [!NOTE]
> This static filter is a temporary workaround.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/rag_guide.md` around lines 348 - 354, Fix three doc issues in the "Query
Filtering" section: correct the malformed inline code by moving the closing
backtick to include the trailing double-quote so the example reads
`"product:*openshift*"`, change the plain "Note:" line to the GFM alert style
used elsewhere (e.g. > [!NOTE]) and update the spelling of "work-around" to
"workaround"; these edits are in the paragraph describing the chunk_filter_query
for the Solr vector_io provider in run.yaml.


**Prerequisites:**

- Solr must be running and accessible at the configured URL
Expand Down
15 changes: 14 additions & 1 deletion run.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,14 @@ providers:
content_field: chunk
embedding_dimension: 384
embedding_model: ${env.EMBEDDING_MODEL_DIR}
chunk_window_config:
chunk_parent_id_field: "parent_id"
chunk_content_field: "chunk_field"
chunk_index_field: "chunk_index"
chunk_token_count_field: "num_tokens"
parent_total_chunks_field: "total_chunks"
parent_total_tokens_field: "total_tokens"
chunk_filter_query: "is_chunk:true"
persistence:
namespace: portal-rag
backend: kv_default
Expand Down Expand Up @@ -152,7 +160,11 @@ registered_resources:
- shield_id: llama-guard
provider_id: llama-guard
provider_shield_id: openai/gpt-4o-mini
vector_stores: []
vector_stores:
- embedding_dimension: 384
embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR}
provider_id: solr-vector
vector_store_id: portal-rag
datasets: []
scoring_fns: []
benchmarks: []
Expand All @@ -166,3 +178,4 @@ vector_stores:
model_id: nomic-ai/nomic-embed-text-v1.5
safety:
default_shield_id: llama-guard

Loading