[LCORE-1331] Add Solr filter and update doc#1178
[LCORE-1331] Add Solr filter and update doc#1178tisnik merged 2 commits intolightspeed-core:mainfrom
Conversation
Signed-off-by: Anxhela Coba <acoba@redhat.com>
WalkthroughAdds Solr-specific RAG configuration and documentation: a new chunk_window_config for parent/child chunk fields and a chunk_filter_query, plus a registered vector_store entry and guidance for registering the vector DB when portal-rag is absent. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/rag_guide.md`:
- Around line 305-315: Update the inline YAML snippet under the explanatory note
to match the main Solr example by changing the vector_stores entry
embedding_model from the placeholder
sentence-transformers/${env.EMBEDDING_MODEL_DIR} to granite-embedding-30m
(ensure provider_id: solr-vector and vector_store_id: portal-rag remain), and
replace the plain "Note:" prefix with the repository's standard GFM alert style
used elsewhere in the doc so the warning format is consistent.
- Around line 348-354: Fix three doc issues in the "Query Filtering" section:
correct the malformed inline code by moving the closing backtick to include the
trailing double-quote so the example reads `"product:*openshift*"`, change the
plain "Note:" line to the GFM alert style used elsewhere (e.g. > [!NOTE]) and
update the spelling of "work-around" to "workaround"; these edits are in the
paragraph describing the chunk_filter_query for the Solr vector_io provider in
run.yaml.
| Note: if the vector database (portal-rag) is not in the persistent data store within the vector_io provider | ||
| (e.g. after deleting the llama stack cache) you will need to register the vector database under registered resources: | ||
|
|
||
|
|
||
| ```yaml | ||
| vector_stores: | ||
| - embedding_dimension: 384 | ||
| embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR} | ||
| provider_id: solr-vector | ||
| vector_store_id: portal-rag | ||
| ``` |
There was a problem hiding this comment.
embedding_model in the note snippet is inconsistent with the primary Solr example above and will mislead users.
Line 301 (the main Solr provider YAML block, six lines above) correctly shows embedding_model: granite-embedding-30m — the registered model's model_id. The note snippet at line 312 reverts to sentence-transformers/${env.EMBEDDING_MODEL_DIR}, which will not resolve correctly (see the corresponding run.yaml issue).
Additionally, the plain Note: prefix is inconsistent with the GFM alert style used throughout the rest of this document.
📝 Proposed fix
-Note: if the vector database (portal-rag) is not in the persistent data store within the vector_io provider
-(e.g. after deleting the llama stack cache) you will need to register the vector database under registered resources:
+> [!NOTE]
+> If the vector database (portal-rag) is not in the persistent data store (e.g. after clearing the llama-stack cache), register it under `registered_resources`:
```yaml
vector_stores:
- embedding_dimension: 384
- embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR}
+ embedding_model: granite-embedding-30m
provider_id: solr-vector
vector_store_id: portal-rag</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @docs/rag_guide.md around lines 305 - 315, Update the inline YAML snippet
under the explanatory note to match the main Solr example by changing the
vector_stores entry embedding_model from the placeholder
sentence-transformers/${env.EMBEDDING_MODEL_DIR} to granite-embedding-30m
(ensure provider_id: solr-vector and vector_store_id: portal-rag remain), and
replace the plain "Note:" prefix with the repository's standard GFM alert style
used elsewhere in the doc so the warning format is consistent.
</details>
<!-- fingerprinting:phantom:poseidon:churro -->
<!-- This is an auto-generated comment by CodeRabbit -->
| **Query Filtering:** | ||
|
|
||
| To filter the Solr context edit the *chunk_filter_query* field in the | ||
| Solr **vector_io** provider in the `run.yaml`. Filters should follow the key:value format: | ||
| ex. `"product:*openshift*`" | ||
|
|
||
| Note: This static filter is a temporary work-around. |
There was a problem hiding this comment.
Three minor doc issues in the Query Filtering section.
-
Line 352 — malformed inline code:
`"product:*openshift*`"places the closing backtick inside the trailing", leaving the double-quote rendered as plain text outside the code span. Should be`"product:*openshift*"`. -
Line 354 — plain
Note:: Inconsistent with the GFM alert style (> [!NOTE]) used elsewhere in the document. -
Line 354 — spelling:
work-around→workaround(standard one-word form).
📝 Proposed fix
-To filter the Solr context edit the *chunk_filter_query* field in the
-Solr **vector_io** provider in the `run.yaml`. Filters should follow the key:value format:
-ex. `"product:*openshift*`"
-
-Note: This static filter is a temporary work-around.
+To filter the Solr context, edit the `chunk_filter_query` field in the Solr **vector_io** provider in `run.yaml`. Filters must follow Solr query syntax (`field:value`), for example: `"product:*openshift*"`
+
+> [!NOTE]
+> This static filter is a temporary workaround.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| **Query Filtering:** | |
| To filter the Solr context edit the *chunk_filter_query* field in the | |
| Solr **vector_io** provider in the `run.yaml`. Filters should follow the key:value format: | |
| ex. `"product:*openshift*`" | |
| Note: This static filter is a temporary work-around. | |
| **Query Filtering:** | |
| To filter the Solr context, edit the `chunk_filter_query` field in the Solr **vector_io** provider in `run.yaml`. Filters must follow Solr query syntax (`field:value`), for example: `"product:*openshift*"` | |
| > [!NOTE] | |
| > This static filter is a temporary workaround. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/rag_guide.md` around lines 348 - 354, Fix three doc issues in the "Query
Filtering" section: correct the malformed inline code by moving the closing
backtick to include the trailing double-quote so the example reads
`"product:*openshift*"`, change the plain "Note:" line to the GFM alert style
used elsewhere (e.g. > [!NOTE]) and update the spelling of "work-around" to
"workaround"; these edits are in the paragraph describing the chunk_filter_query
for the Solr vector_io provider in run.yaml.
docs/rag_guide.md
Outdated
|
|
||
| To filter the Solr context edit the *chunk_filter_query* field in the | ||
| Solr **vector_io** provider in the `run.yaml`. Filters should follow the key:value format: | ||
| ex. `"product:*openshift*`" |
There was a problem hiding this comment.
CodeRabbitAI is right there, there ` and " swapped
There was a problem hiding this comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@docs/rag_guide.md`:
- Around line 348-354: Update the Query Filtering section to clarify that the
filter is nested under chunk_window_config by referencing the qualified path
`chunk_window_config.chunk_filter_query` and include a short YAML example
showing the field usage; replace the italicized *chunk_filter_query* with inline
code formatting (`chunk_filter_query`) to match the rest of the doc; change the
plain "Note:" line to the GFM alert style `> [!NOTE]` and correct "work-around"
to "workaround".
- Around line 305-315: Replace the plain "Note:" prefix with the GitHub-flavored
alert block "> [!NOTE]" to match the document style, and in the re-registration
YAML snippet under vector_stores update the embedding_model value from the
placeholder sentence-transformers/${env.EMBEDDING_MODEL_DIR} to the actual model
name used in the Solr block: granite-embedding-30m; keep provider_id:
solr-vector and vector_store_id: portal-rag unchanged.
Description
This PR adds a temporary work around for adding OKP Solr filtering capability and updates documentation for how to use filtering. Also, it adds registered vector store for embedding model to the run.yaml which is required if user has not previously used the solr vector_io provider.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Documentation