LCORE-1330: vector store used for FAISS tests contains conflicting shield metadata#1159
Conversation
WalkthroughAdds FAISS_VECTOR_STORE_ID propagation into workflows and compose files, switches RAG persistence to a new kv_rag backend across test configs, registers a FAISS vector_store using all-mpnet-base-v2, and wires placeholder substitution and environment reading for vector store IDs in E2E tests. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/configuration/run.yaml (1)
97-103:⚠️ Potential issue | 🔴 CriticalMissing
kv_ragbackend definition instorage.backends.Line 101 references
backend: kv_rag, but thestorage.backendssection only defineskv_defaultandsql_default. This will fail at runtime when the FAISS vector_io provider attempts to resolve its persistence backend.Add a
kv_ragentry understorage.backends:Proposed fix
storage: backends: kv_default: type: kv_sqlite db_path: ${env.KV_STORE_PATH:=~/.llama/storage/kv_store.db} + kv_rag: + type: kv_sqlite + db_path: ${env.KV_RAG_PATH:=~/.llama/storage/rag/kv_store.db} sql_default: type: sql_sqlite db_path: ${env.SQL_STORE_PATH:=~/.llama/storage/sql_store.db}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/configuration/run.yaml` around lines 97 - 103, The tests/configuration references a persistence backend named "kv_rag" for the vector_io FAISS provider but storage.backends only defines "kv_default" and "sql_default"; add a new backend entry named "kv_rag" in the storage.backends section so the FAISS provider can resolve its persistence backend (ensure the new backend's type and config match other kv backends, e.g., similar to "kv_default", and update any relevant provider mapping for provider_id: faiss and provider_type: inline::faiss).tests/e2e-prow/rhoai/configs/run.yaml (1)
57-63:⚠️ Potential issue | 🔴 Critical
kv_ragbackend is referenced but never defined — runtime failure.
vector_iopersistence referencesbackend: kv_rag(line 61), butstorage.backends(lines 106–121) only defineskv_defaultandsql_default. Every other config file in this PR adds akv_ragblock understorage.backends. This will cause the service to fail to start in prow.Proposed fix — add kv_rag backend
storage: backends: kv_default: type: kv_sqlite db_path: ${env.KV_STORE_PATH:=~/.llama/storage/rag/kv_store.db} + kv_rag: + type: kv_sqlite + db_path: ${env.KV_RAG_PATH:=~/.llama/storage/rag/kv_store.db} sql_default: type: sql_sqlite db_path: ${env.SQL_STORE_PATH:=~/.llama/storage/sql_store.db}Also note: the
kv_defaultdb_pathhere still points to.../rag/kv_store.db, whereas the other configs in this PR updated it to.../kv_store.db(withoutrag/). This may re-introduce the very metadata conflict this PR is fixing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/configs/run.yaml` around lines 57 - 63, The config references a missing backend: add a storage.backends entry named kv_rag (matching other PR files) and configure it identically to the other KV backends so vector_io: - config: persistence: backend: kv_rag resolves at runtime; also update the existing kv_default db_path to remove the extra "rag/" segment so its db_path matches the other PR configs (e.g., .../kv_store.db) to avoid metadata conflicts; locate the storage.backends block and the vector_io section to make these updates (look for symbols "storage.backends", "kv_rag", "kv_default", and "vector_io").
🧹 Nitpick comments (1)
tests/e2e/utils/utils.py (1)
246-258: Docstring is stale — doesn't mention{VECTOR_STORE_ID}.The docstring (and the
Parameterssection) still only references{MODEL}and{PROVIDER}. Update it to also document{VECTOR_STORE_ID}andfaiss_vector_store_id.Proposed fix
def replace_placeholders(context: Context, text: str) -> str: - """Replace {MODEL} and {PROVIDER} placeholders with actual values from context. + """Replace {MODEL}, {PROVIDER}, and {VECTOR_STORE_ID} placeholders with actual values from context. Parameters: - context (Context): Behave context containing default_model and default_provider - text (str): String that may contain {MODEL} and {PROVIDER} placeholders + context (Context): Behave context containing default_model, default_provider, + and faiss_vector_store_id + text (str): String that may contain {MODEL}, {PROVIDER}, and {VECTOR_STORE_ID} placeholders Returns: String with placeholders replaced by actual values """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/utils/utils.py` around lines 246 - 258, The docstring for replace_placeholders is stale and only documents {MODEL} and {PROVIDER}; update it to also describe the {VECTOR_STORE_ID} placeholder and the corresponding context attribute faiss_vector_store_id: modify the function docstring (for replace_placeholders in tests/e2e/utils/utils.py) to list all three placeholders, add {VECTOR_STORE_ID} to the short description and Parameters section (noting Context contains default_model, default_provider, and faiss_vector_store_id), and ensure the Returns section remains accurate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/e2e-prow/rhoai/configs/run.yaml`:
- Around line 140-144: The embedding_model value in the vector_stores block is
incorrectly duplicated
("sentence-transformers/sentence-transformers/all-mpnet-base-v2"); update the
vector_stores -> embedding_model entry to the canonical model id
"sentence-transformers/all-mpnet-base-v2" to match the model registration and
default_embedding_model so the vector store can resolve the embedding model
correctly.
In `@tests/e2e/configs/run-azure.yaml`:
- Around line 140-144: The embedding_model value in the vector_stores block
currently has a duplicated prefix
("sentence-transformers/sentence-transformers/all-mpnet-base-v2"); update the
embedding_model field to the correct value
"sentence-transformers/all-mpnet-base-v2" (i.e., remove the extra
"sentence-transformers/") in this file and apply the same fix to the other
listed config files where the same duplicated string appears; locate the
vector_stores -> embedding_model entries to make the replacement.
In `@tests/e2e/configs/run-ci.yaml`:
- Around line 147-151: The embedding_model string in the vector_stores YAML
block is duplicated
("sentence-transformers/sentence-transformers/all-mpnet-base-v2") which will
fail to resolve; update the embedding_model value in the vector_stores entry
(look for the embedding_model key in that block) to the correct single-prefixed
form "sentence-transformers/all-mpnet-base-v2" to match the other references in
this file.
In `@tests/e2e/configs/run-rhaiis.yaml`:
- Around line 141-145: The embedding_model value in the vector_stores entry
erroneously repeats the organization prefix; update the embedding_model under
vector_stores (the entry with embedding_dimension: 768 and provider_id: faiss)
to use the correct model id "sentence-transformers/all-mpnet-base-v2" (remove
the duplicated "sentence-transformers/" prefix).
In `@tests/e2e/configs/run-rhelai.yaml`:
- Around line 141-145: The embedding_model value in the vector_stores block
contains a duplicated prefix
("sentence-transformers/sentence-transformers/all-mpnet-base-v2"); update the
embedding_model entry to the correct registered name
"sentence-transformers/all-mpnet-base-v2" so it matches other configs and the
model registry (update the embedding_model field in the vector_stores section).
In `@tests/e2e/configs/run-vertexai.yaml`:
- Around line 136-140: Embedding model path has a duplicated prefix; update the
embedding_model value in the vector_stores block (the embedding_model setting)
from "sentence-transformers/sentence-transformers/all-mpnet-base-v2" to the
correct single-prefixed form "sentence-transformers/all-mpnet-base-v2" so the
provider reads a valid model identifier.
In `@tests/e2e/configs/run-watsonx.yaml`:
- Around line 149-150: The embedding_model value in run-watsonx.yaml contains a
duplicated prefix ("sentence-transformers/sentence-transformers/..."); update
the embedding_model field to remove the extra segment so it reads
"sentence-transformers/all-mpnet-base-v2" (leave embedding_dimension as-is);
ensure the corrected key is applied to the embedding_model entry in this file to
match the other configs.
- Around line 148-153: The YAML contains a duplicate top-level key
"vector_stores" which causes the FAISS entry (embedding_model/all-mpnet-base-v2,
provider_id: faiss, vector_store_id: ${env.FAISS_VECTOR_STORE_ID}) to be
overwritten by the later "vector_stores: []"; remove the redundant
"vector_stores: []" entry so the FAISS vector store registration remains in
registered_resources.
In `@tests/e2e/features/environment.py`:
- Around line 86-87: context.faiss_vector_store_id is being set with os.getenv
which returns None if unset, causing replace_placeholders in utils.py (which
calls result.replace("{VECTOR_STORE_ID}", context.faiss_vector_store_id)) to
raise a TypeError; fix by initializing context.faiss_vector_store_id to an empty
string (or a safe default) when the environment variable is missing (e.g., use
os.getenv("FAISS_VECTOR_STORE_ID", "") or coalesce None -> ""), and ensure
replace_placeholders receives a str; update the assignment of
context.faiss_vector_store_id accordingly so all tests that call
replace_placeholders won’t crash.
---
Outside diff comments:
In `@tests/configuration/run.yaml`:
- Around line 97-103: The tests/configuration references a persistence backend
named "kv_rag" for the vector_io FAISS provider but storage.backends only
defines "kv_default" and "sql_default"; add a new backend entry named "kv_rag"
in the storage.backends section so the FAISS provider can resolve its
persistence backend (ensure the new backend's type and config match other kv
backends, e.g., similar to "kv_default", and update any relevant provider
mapping for provider_id: faiss and provider_type: inline::faiss).
In `@tests/e2e-prow/rhoai/configs/run.yaml`:
- Around line 57-63: The config references a missing backend: add a
storage.backends entry named kv_rag (matching other PR files) and configure it
identically to the other KV backends so vector_io: - config: persistence:
backend: kv_rag resolves at runtime; also update the existing kv_default db_path
to remove the extra "rag/" segment so its db_path matches the other PR configs
(e.g., .../kv_store.db) to avoid metadata conflicts; locate the storage.backends
block and the vector_io section to make these updates (look for symbols
"storage.backends", "kv_rag", "kv_default", and "vector_io").
---
Nitpick comments:
In `@tests/e2e/utils/utils.py`:
- Around line 246-258: The docstring for replace_placeholders is stale and only
documents {MODEL} and {PROVIDER}; update it to also describe the
{VECTOR_STORE_ID} placeholder and the corresponding context attribute
faiss_vector_store_id: modify the function docstring (for replace_placeholders
in tests/e2e/utils/utils.py) to list all three placeholders, add
{VECTOR_STORE_ID} to the short description and Parameters section (noting
Context contains default_model, default_provider, and faiss_vector_store_id),
and ensure the Returns section remains accurate.
| context.faiss_vector_store_id = os.getenv("FAISS_VECTOR_STORE_ID") | ||
|
|
There was a problem hiding this comment.
os.getenv returns None when unset, which will crash replace_placeholders.
replace_placeholders in utils.py calls result.replace("{VECTOR_STORE_ID}", context.faiss_vector_store_id). If FAISS_VECTOR_STORE_ID is not set, this passes None as the second argument to str.replace(), raising a TypeError — and this affects all tests that hit replace_placeholders, not just FAISS scenarios.
Proposed fix
- context.faiss_vector_store_id = os.getenv("FAISS_VECTOR_STORE_ID")
+ context.faiss_vector_store_id = os.getenv("FAISS_VECTOR_STORE_ID", "")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| context.faiss_vector_store_id = os.getenv("FAISS_VECTOR_STORE_ID") | |
| context.faiss_vector_store_id = os.getenv("FAISS_VECTOR_STORE_ID", "") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/features/environment.py` around lines 86 - 87,
context.faiss_vector_store_id is being set with os.getenv which returns None if
unset, causing replace_placeholders in utils.py (which calls
result.replace("{VECTOR_STORE_ID}", context.faiss_vector_store_id)) to raise a
TypeError; fix by initializing context.faiss_vector_store_id to an empty string
(or a safe default) when the environment variable is missing (e.g., use
os.getenv("FAISS_VECTOR_STORE_ID", "") or coalesce None -> ""), and ensure
replace_placeholders receives a str; update the assignment of
context.faiss_vector_store_id accordingly so all tests that call
replace_placeholders won’t crash.
bd1d806 to
facf582
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/configuration/run.yaml (1)
97-120:⚠️ Potential issue | 🔴 CriticalMissing
kv_ragbackend definition — will cause a runtime error.Line 101 sets
backend: kv_ragfor vector_io persistence, but thestorage.backendssection (lines 113–120) only defineskv_defaultandsql_default. Every other config file in this PR (e.g.,run-ci.yaml,run-rhelai.yaml,run-vertexai.yaml) adds akv_ragbackend. This omission will cause a backend resolution failure at startup.Proposed fix
storage: backends: kv_default: type: kv_sqlite db_path: ${env.KV_STORE_PATH:=~/.llama/storage/kv_store.db} + kv_rag: + type: kv_sqlite + db_path: ${env.KV_RAG_PATH:=~/.llama/storage/rag/kv_store.db} sql_default: type: sql_sqlite db_path: ${env.SQL_STORE_PATH:=~/.llama/storage/sql_store.db}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/configuration/run.yaml` around lines 97 - 120, The vector_io persistence references backend "kv_rag" but storage.backends only defines "kv_default" and "sql_default", so add a "kv_rag" backend entry under storage.backends to match other configs; create an entry named kv_rag with type kv_sqlite (or the same type used in run-ci/run-rhelai/run-vertexai) and set db_path to an appropriate KV_STORE_PATH (e.g., ${env.KV_STORE_PATH:=~/.llama/storage/kv_rag.db}) so the backend name kv_rag resolves at startup.tests/e2e-prow/rhoai/configs/run.yaml (1)
57-63:⚠️ Potential issue | 🔴 CriticalMissing
kv_ragbackend andkv_defaultstill uses the RAG-specific path.Line 61 references
backend: kv_rag, butstorage.backends(lines 106–112) does not definekv_rag— this will fail at runtime.Additionally,
kv_defaulton line 109 still points to~/.llama/storage/rag/kv_store.db(the RAG-specific path). In the other configs (e.g.,run-ci.yaml,run-rhelai.yaml),kv_defaultuses~/.llama/storage/kv_store.dbandkv_raguses~/.llama/storage/rag/kv_store.db. This defeats the purpose of the separation introduced in this PR.Proposed fix
storage: backends: kv_default: type: kv_sqlite - db_path: ${env.KV_STORE_PATH:=~/.llama/storage/rag/kv_store.db} + db_path: ${env.KV_STORE_PATH:=~/.llama/storage/kv_store.db} + kv_rag: + type: kv_sqlite + db_path: ${env.KV_RAG_PATH:=~/.llama/storage/rag/kv_store.db} sql_default: type: sql_sqlite db_path: ${env.SQL_STORE_PATH:=~/.llama/storage/sql_store.db}Also applies to: 105-112
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/configs/run.yaml` around lines 57 - 63, The config references backend: kv_rag in the vector_io provider but storage.backends does not define kv_rag and kv_default still points to the RAG-specific path; update storage.backends to add a kv_rag entry (matching the rag path currently used) and change kv_default to use the generic kv_store path (e.g., ~/.llama/storage/kv_store.db) so kv_default and kv_rag are distinct, ensuring the provider_id/provider_type vector_io entries (provider_id: faiss, provider_type: inline::faiss) will find kv_rag at runtime.
🧹 Nitpick comments (2)
tests/e2e/utils/utils.py (1)
246-258: Stale docstring — now also replaces{VECTOR_STORE_ID}.The docstring on line 247 only mentions
{MODEL}and{PROVIDER}, but the function now also substitutes{VECTOR_STORE_ID}. Update the docstring to reflect all three placeholders.Proposed fix
def replace_placeholders(context: Context, text: str) -> str: - """Replace {MODEL} and {PROVIDER} placeholders with actual values from context. + """Replace {MODEL}, {PROVIDER}, and {VECTOR_STORE_ID} placeholders with actual values from context. Parameters: - context (Context): Behave context containing default_model and default_provider - text (str): String that may contain {MODEL} and {PROVIDER} placeholders + context (Context): Behave context containing default_model, default_provider, and faiss_vector_store_id + text (str): String that may contain {MODEL}, {PROVIDER}, and {VECTOR_STORE_ID} placeholders Returns: String with placeholders replaced by actual values """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/utils/utils.py` around lines 246 - 258, Update the docstring for the function replace_placeholders to list all placeholders it replaces: {MODEL}, {PROVIDER}, and {VECTOR_STORE_ID}; mention that it reads values from context.default_model, context.default_provider and context.faiss_vector_store_id, and update the "Parameters" / "Returns" description accordingly so the docstring accurately reflects current behavior.tests/e2e-prow/rhoai/configs/run.yaml (1)
140-144: Hardcodedvector_store_id— consider using${env.FAISS_VECTOR_STORE_ID}for consistency.All other config files use
${env.FAISS_VECTOR_STORE_ID}for the vector store ID. This prow config hardcodesvs_8c94967b-81cc-4028-a294-9cfac6fd9ae2with a TODO. If this environment is expected to use a different, fixed store ID, that's fine — but please confirm this is intentional and track the TODO.Would you like me to open an issue to track resolving the TODO on line 144?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e-prow/rhoai/configs/run.yaml` around lines 140 - 144, Replace the hardcoded vector_store_id in the vector_stores entry (currently "vs_8c94967b-81cc-4028-a294-9cfac6fd9ae2" and marked TODO) with the environment variable reference ${env.FAISS_VECTOR_STORE_ID} to match other configs; update any documentation or CI that supplies FAISS_VECTOR_STORE_ID and remove the TODO, or if the hardcoded ID is intentional, add a clear comment explaining why and keep the TODO tracked in an issue.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tests/configuration/run.yaml`:
- Around line 97-120: The vector_io persistence references backend "kv_rag" but
storage.backends only defines "kv_default" and "sql_default", so add a "kv_rag"
backend entry under storage.backends to match other configs; create an entry
named kv_rag with type kv_sqlite (or the same type used in
run-ci/run-rhelai/run-vertexai) and set db_path to an appropriate KV_STORE_PATH
(e.g., ${env.KV_STORE_PATH:=~/.llama/storage/kv_rag.db}) so the backend name
kv_rag resolves at startup.
In `@tests/e2e-prow/rhoai/configs/run.yaml`:
- Around line 57-63: The config references backend: kv_rag in the vector_io
provider but storage.backends does not define kv_rag and kv_default still points
to the RAG-specific path; update storage.backends to add a kv_rag entry
(matching the rag path currently used) and change kv_default to use the generic
kv_store path (e.g., ~/.llama/storage/kv_store.db) so kv_default and kv_rag are
distinct, ensuring the provider_id/provider_type vector_io entries (provider_id:
faiss, provider_type: inline::faiss) will find kv_rag at runtime.
---
Duplicate comments:
In `@tests/e2e/configs/run-watsonx.yaml`:
- Around line 154-159: The YAML defines the FAISS vector store under the key
vector_stores but then re-declares vector_stores: [] which overwrites and
discards the registration; remove the duplicate empty key so the FAISS block
(embedding_dimension, embedding_model, provider_id, vector_store_id) is the only
vector_stores entry and the FAISS config is preserved.
In `@tests/e2e/features/environment.py`:
- Around line 86-87: context.faiss_vector_store_id is set via os.getenv(...)
which can be None and will break utils.replace_placeholders when calling
result.replace("{VECTOR_STORE_ID}", context.faiss_vector_store_id); change the
assignment in tests/e2e/features/environment.py to provide a safe string default
(e.g. context.faiss_vector_store_id = os.getenv("FAISS_VECTOR_STORE_ID", "") or
str(os.getenv("FAISS_VECTOR_STORE_ID") or "")) so replace_placeholders never
receives None, or alternatively update replace_placeholders to skip/handle None
values before calling result.replace; reference context.faiss_vector_store_id
and the replace_placeholders function when making the change.
---
Nitpick comments:
In `@tests/e2e-prow/rhoai/configs/run.yaml`:
- Around line 140-144: Replace the hardcoded vector_store_id in the
vector_stores entry (currently "vs_8c94967b-81cc-4028-a294-9cfac6fd9ae2" and
marked TODO) with the environment variable reference
${env.FAISS_VECTOR_STORE_ID} to match other configs; update any documentation or
CI that supplies FAISS_VECTOR_STORE_ID and remove the TODO, or if the hardcoded
ID is intentional, add a clear comment explaining why and keep the TODO tracked
in an issue.
In `@tests/e2e/utils/utils.py`:
- Around line 246-258: Update the docstring for the function
replace_placeholders to list all placeholders it replaces: {MODEL}, {PROVIDER},
and {VECTOR_STORE_ID}; mention that it reads values from context.default_model,
context.default_provider and context.faiss_vector_store_id, and update the
"Parameters" / "Returns" description accordingly so the docstring accurately
reflects current behavior.
Description
Since the metadata registry and the RAG db where saved in the same kv_store, the configuration used to generate the vector_store left a registration of shields which conflicted with the tests in prow.
I have regenerated the vector store and updated configs, dividing kv_store for rag and metadata.
Type of change
Tools used to create PR
NA
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
Chores
Tests