Skip to content

Comments

[LCORE-1331] Add Solr filter and update doc#1178

Merged
tisnik merged 2 commits intolightspeed-core:mainfrom
Anxhela21:anx/solr-filter
Feb 19, 2026
Merged

[LCORE-1331] Add Solr filter and update doc#1178
tisnik merged 2 commits intolightspeed-core:mainfrom
Anxhela21:anx/solr-filter

Conversation

@Anxhela21
Copy link
Contributor

@Anxhela21 Anxhela21 commented Feb 19, 2026

Description

This PR adds a temporary work around for adding OKP Solr filtering capability and updates documentation for how to use filtering. Also, it adds registered vector store for embedding model to the run.yaml which is required if user has not previously used the solr vector_io provider.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Added chunk window configuration options to manage parent/child chunk relationships and enable query-level filtering for vector stores.
  • Documentation

    • Updated guides with registration steps for vector stores, configuration examples, and a new Query Filtering subsection explaining filter usage and its temporary/workaround status.

Signed-off-by: Anxhela Coba <acoba@redhat.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2026

Walkthrough

Adds Solr-specific RAG configuration and documentation: a new chunk_window_config for parent/child chunk fields and a chunk_filter_query, plus a registered vector_store entry and guidance for registering the vector DB when portal-rag is absent.

Changes

Cohort / File(s) Summary
Documentation
docs/rag_guide.md
Adds Solr vector_io config docs: introduces chunk_window_config, explains chunk_filter_query usage and limitations, and adds guidance to register the vector DB when portal-rag is not present in persistent state.
Configuration / Schema
run.yaml
Adds vector_io.config.chunk_window_config fields (parent/child chunk field names and chunk_filter_query) and a registered_resources.vector_stores entry with embedding_dimension, embedding_model, provider_id: solr-vector, and vector_store_id: portal-rag. Public config schema updated to include these new properties.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • tisnik
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: adding Solr filtering capability and updating documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Anxhela21 Anxhela21 changed the title add filtering to solr and update doc [LCORE-1331] Add Solr filter and update doc Feb 19, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/rag_guide.md`:
- Around line 305-315: Update the inline YAML snippet under the explanatory note
to match the main Solr example by changing the vector_stores entry
embedding_model from the placeholder
sentence-transformers/${env.EMBEDDING_MODEL_DIR} to granite-embedding-30m
(ensure provider_id: solr-vector and vector_store_id: portal-rag remain), and
replace the plain "Note:" prefix with the repository's standard GFM alert style
used elsewhere in the doc so the warning format is consistent.
- Around line 348-354: Fix three doc issues in the "Query Filtering" section:
correct the malformed inline code by moving the closing backtick to include the
trailing double-quote so the example reads `"product:*openshift*"`, change the
plain "Note:" line to the GFM alert style used elsewhere (e.g. > [!NOTE]) and
update the spelling of "work-around" to "workaround"; these edits are in the
paragraph describing the chunk_filter_query for the Solr vector_io provider in
run.yaml.

Comment on lines +305 to +315
Note: if the vector database (portal-rag) is not in the persistent data store within the vector_io provider
(e.g. after deleting the llama stack cache) you will need to register the vector database under registered resources:


```yaml
vector_stores:
- embedding_dimension: 384
embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR}
provider_id: solr-vector
vector_store_id: portal-rag
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

embedding_model in the note snippet is inconsistent with the primary Solr example above and will mislead users.

Line 301 (the main Solr provider YAML block, six lines above) correctly shows embedding_model: granite-embedding-30m — the registered model's model_id. The note snippet at line 312 reverts to sentence-transformers/${env.EMBEDDING_MODEL_DIR}, which will not resolve correctly (see the corresponding run.yaml issue).

Additionally, the plain Note: prefix is inconsistent with the GFM alert style used throughout the rest of this document.

📝 Proposed fix
-Note: if the vector database (portal-rag) is not in the persistent data store within the vector_io provider
-(e.g. after deleting the llama stack cache) you will need to register the vector database under registered resources:
+> [!NOTE]
+> If the vector database (portal-rag) is not in the persistent data store (e.g. after clearing the llama-stack cache), register it under `registered_resources`:

 ```yaml
   vector_stores:
     - embedding_dimension: 384
-      embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR}
+      embedding_model: granite-embedding-30m
       provider_id: solr-vector
       vector_store_id: portal-rag
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/rag_guide.md around lines 305 - 315, Update the inline YAML snippet
under the explanatory note to match the main Solr example by changing the
vector_stores entry embedding_model from the placeholder
sentence-transformers/${env.EMBEDDING_MODEL_DIR} to granite-embedding-30m
(ensure provider_id: solr-vector and vector_store_id: portal-rag remain), and
replace the plain "Note:" prefix with the repository's standard GFM alert style
used elsewhere in the doc so the warning format is consistent.


</details>

<!-- fingerprinting:phantom:poseidon:churro -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines 348 to 354
**Query Filtering:**

To filter the Solr context edit the *chunk_filter_query* field in the
Solr **vector_io** provider in the `run.yaml`. Filters should follow the key:value format:
ex. `"product:*openshift*`"

Note: This static filter is a temporary work-around.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Three minor doc issues in the Query Filtering section.

  1. Line 352 — malformed inline code: `"product:*openshift*`" places the closing backtick inside the trailing ", leaving the double-quote rendered as plain text outside the code span. Should be `"product:*openshift*"`.

  2. Line 354 — plain Note:: Inconsistent with the GFM alert style (> [!NOTE]) used elsewhere in the document.

  3. Line 354 — spelling: work-aroundworkaround (standard one-word form).

📝 Proposed fix
-To filter the Solr context edit the *chunk_filter_query* field in the
-Solr **vector_io** provider in the `run.yaml`. Filters should follow the key:value format:
-ex. `"product:*openshift*`"
-
-Note: This static filter is a temporary work-around. 
+To filter the Solr context, edit the `chunk_filter_query` field in the Solr **vector_io** provider in `run.yaml`. Filters must follow Solr query syntax (`field:value`), for example: `"product:*openshift*"`
+
+> [!NOTE]
+> This static filter is a temporary workaround.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
**Query Filtering:**
To filter the Solr context edit the *chunk_filter_query* field in the
Solr **vector_io** provider in the `run.yaml`. Filters should follow the key:value format:
ex. `"product:*openshift*`"
Note: This static filter is a temporary work-around.
**Query Filtering:**
To filter the Solr context, edit the `chunk_filter_query` field in the Solr **vector_io** provider in `run.yaml`. Filters must follow Solr query syntax (`field:value`), for example: `"product:*openshift*"`
> [!NOTE]
> This static filter is a temporary workaround.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/rag_guide.md` around lines 348 - 354, Fix three doc issues in the "Query
Filtering" section: correct the malformed inline code by moving the closing
backtick to include the trailing double-quote so the example reads
`"product:*openshift*"`, change the plain "Note:" line to the GFM alert style
used elsewhere (e.g. > [!NOTE]) and update the spelling of "work-around" to
"workaround"; these edits are in the paragraph describing the chunk_filter_query
for the Solr vector_io provider in run.yaml.


To filter the Solr context edit the *chunk_filter_query* field in the
Solr **vector_io** provider in the `run.yaml`. Filters should follow the key:value format:
ex. `"product:*openshift*`"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeRabbitAI is right there, there ` and " swapped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: Anxhela Coba <acoba@redhat.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@docs/rag_guide.md`:
- Around line 348-354: Update the Query Filtering section to clarify that the
filter is nested under chunk_window_config by referencing the qualified path
`chunk_window_config.chunk_filter_query` and include a short YAML example
showing the field usage; replace the italicized *chunk_filter_query* with inline
code formatting (`chunk_filter_query`) to match the rest of the doc; change the
plain "Note:" line to the GFM alert style `> [!NOTE]` and correct "work-around"
to "workaround".
- Around line 305-315: Replace the plain "Note:" prefix with the GitHub-flavored
alert block "> [!NOTE]" to match the document style, and in the re-registration
YAML snippet under vector_stores update the embedding_model value from the
placeholder sentence-transformers/${env.EMBEDDING_MODEL_DIR} to the actual model
name used in the Solr block: granite-embedding-30m; keep provider_id:
solr-vector and vector_store_id: portal-rag unchanged.

@Anxhela21 Anxhela21 requested a review from tisnik February 19, 2026 14:31
Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 7a4d6d1 into lightspeed-core:main Feb 19, 2026
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants