Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions docs/docs/Components/components-vector-stores.md
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,31 @@ For more information, see the [Chroma documentation](https://docs.trychroma.com/

</details>

## CrateDB

This component creates a CrateDB Vector Store with search capabilities.
For more information, see the documentation about the
[CrateDB LangChain adapter](https://cratedb.com/docs/guide/integrate/langchain/).

### Inputs

| Name | Type | Description |
|----------------------------------|---------------|------------------------------------------------------------------|
| collection_name | String | The name of the collection. Default: "langflow". |
| search_query | String | The query to search for in the vector store. |
| ingest_data | Data | The data to ingest into the vector store (list of Data objects). |
| embedding | Embeddings | The embedding function to use for the vector store. |
| server_url | String | SQLAlchemy URL to connect to CrateDB. |
| search_type | String | Type of search to perform: "Similarity" or "MMR". |
| number_of_results | Integer | Number of results to return from the search. Default: 10. |

### Outputs

| Name | Type | Description |
|----------------|--------------------|-------------------------------|
| vector_store | CrateDBVectorStore | CrateDB vector store instance |
| search_results | List[Data] | Results of similarity search |

## Elasticsearch

This component creates an Elasticsearch Vector Store with search capabilities.
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ dependencies = [
"langchain-ollama==0.2.1",
"langchain-sambanova==0.1.0",
"langchain-community~=0.3.20",
"langchain-cratedb<0.2",
"sqlalchemy[aiosqlite]>=2.0.38,<3.0.0",
"atlassian-python-api==3.41.16",
"mem0ai==0.1.34",
Expand Down
2 changes: 2 additions & 0 deletions src/backend/base/langflow/components/vectorstores/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from .chroma import ChromaVectorStoreComponent
from .clickhouse import ClickhouseVectorStoreComponent
from .couchbase import CouchbaseVectorStoreComponent
from .cratedb import CrateDBVectorStoreComponent
from .elasticsearch import ElasticsearchVectorStoreComponent
from .faiss import FaissVectorStoreComponent
from .graph_rag import GraphRAGComponent
Expand All @@ -31,6 +32,7 @@
"ChromaVectorStoreComponent",
"ClickhouseVectorStoreComponent",
"CouchbaseVectorStoreComponent",
"CrateDBVectorStoreComponent",
"ElasticsearchVectorStoreComponent",
"FaissVectorStoreComponent",
"GraphRAGComponent",
Expand Down
90 changes: 90 additions & 0 deletions src/backend/base/langflow/components/vectorstores/cratedb.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
import typing as t

from langchain_cratedb import CrateDBVectorStore

from langflow.base.vectorstores.model import LCVectorStoreComponent, check_cached_vector_store
from langflow.helpers import docs_to_data
from langflow.io import HandleInput, IntInput, SecretStrInput, StrInput
from langflow.schema import Data
Comment on lines +5 to +8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incorrect helper import path

docs_to_data lives in langflow.helpers.data, not the package root.

-from langflow.helpers import docs_to_data
+from langflow.helpers.data import docs_to_data

This avoids ImportError in environments where langflow.helpers.__init__ doesn’t re-export the symbol.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from langflow.base.vectorstores.model import LCVectorStoreComponent, check_cached_vector_store
from langflow.helpers import docs_to_data
from langflow.io import HandleInput, IntInput, SecretStrInput, StrInput
from langflow.schema import Data
from langflow.base.vectorstores.model import LCVectorStoreComponent, check_cached_vector_store
from langflow.helpers.data import docs_to_data
from langflow.io import HandleInput, IntInput, SecretStrInput, StrInput
from langflow.schema import Data
🤖 Prompt for AI Agents
In src/backend/base/langflow/components/vectorstores/cratedb.py around lines 5
to 8, the import statement for docs_to_data is incorrect as it imports from
langflow.helpers instead of langflow.helpers.data. Update the import path to
import docs_to_data from langflow.helpers.data to prevent ImportError in
environments where langflow.helpers.__init__ does not re-export docs_to_data.



class CrateDBVectorStoreComponent(LCVectorStoreComponent):
display_name = "CrateDBVector"
description = "CrateDB Vector Store with search capabilities"
name = "CrateDB"
icon = "CrateDB"

inputs = [
SecretStrInput(name="server_url", display_name="CrateDB SQLAlchemy URL", required=True),
StrInput(name="collection_name", display_name="Table", required=True),
*LCVectorStoreComponent.inputs,
HandleInput(name="embedding", display_name="Embedding", input_types=["Embeddings"], required=True),
IntInput(
name="number_of_results",
display_name="Number of Results",
info="Number of results to return.",
value=4,
advanced=True,
),
]
Comment on lines +17 to +29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Missing search_type input

LCVectorStoreComponent’s base helpers expect self.search_type, and tests set it, yet the input isn’t declared here.
Add a StrInput with choices ("Similarity", "MMR", "similarity_score_threshold") to prevent AttributeError at runtime.

🤖 Prompt for AI Agents
In src/backend/base/langflow/components/vectorstores/cratedb.py around lines 17
to 29, the inputs list is missing a declaration for the search_type input, which
is expected by LCVectorStoreComponent and used in tests. Add a StrInput named
"search_type" with display_name "Search Type" and choices set to ["Similarity",
"MMR", "similarity_score_threshold"] to the inputs list to ensure the attribute
exists and prevent runtime AttributeError.


@check_cached_vector_store
def build_vector_store(self) -> CrateDBVectorStore:
documents = []
for _input in self.ingest_data or []:
if isinstance(_input, Data):
documents.append(_input.to_lc_document())
else:
documents.append(_input)

connection_string = self.server_url or "crate://"

if documents:
store = CrateDBVectorStore.from_documents(
embedding=self.embedding,
documents=documents,
collection_name=self.collection_name,
connection=connection_string,
)
else:
store = CrateDBVectorStore.from_existing_index(
embedding=self.embedding,
collection_name=self.collection_name,
connection=connection_string,
)

return store

def search_documents(self) -> list[Data]:
vector_store = self.build_vector_store()

if self.search_query and isinstance(self.search_query, str) and self.search_query.strip():
docs = vector_store.similarity_search(
query=self.search_query,
k=self.number_of_results,
)

data = docs_to_data(docs)
self.status = data
return data
return []
Comment on lines +58 to +70
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Duplicated search logic bypasses MMR & score-threshold paths

This override always calls similarity_search, ignoring search_type and duplicates logic already present in the base class.
Consider deleting the method entirely and relying on LCVectorStoreComponent.search_documents, or delegate:

-    def search_documents(self) -> list[Data]:
-        vector_store = self.build_vector_store()
-        ...
-            docs = vector_store.similarity_search(
-                query=self.search_query,
-                k=self.number_of_results,
-            )
+    # Remove this override; the base implementation already handles
+    # caching and dispatches to vector_store.search with the chosen
+    # search_type.

This instantly enables MMR and score-threshold searches without extra code.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def search_documents(self) -> list[Data]:
vector_store = self.build_vector_store()
if self.search_query and isinstance(self.search_query, str) and self.search_query.strip():
docs = vector_store.similarity_search(
query=self.search_query,
k=self.number_of_results,
)
data = docs_to_data(docs)
self.status = data
return data
return []
# Remove this override; the base implementation already handles
# caching and dispatches to vector_store.search with the chosen
# search_type.
🤖 Prompt for AI Agents
In src/backend/base/langflow/components/vectorstores/cratedb.py around lines 58
to 70, the search_documents method duplicates logic from the base class and
always calls similarity_search, ignoring search_type and bypassing MMR and
score-threshold functionality. To fix this, remove this method entirely so the
class inherits the base class's search_documents method, which handles all
search types correctly including MMR and score-threshold, thereby avoiding code
duplication and enabling full search functionality without extra code.



def cratedb_collection_to_data(embedding_documents: list[t.Any]):
"""Converts a collection of CrateDB vectors into a list of data.

Args:
embedding_documents (dict): A list of EmbeddingStore instances.

Returns:
list: A list of data, where each record represents a document in the collection.
"""
data = []
for doc in embedding_documents:
data_dict = {
"id": doc.id,
"text": doc.document,
}
data_dict.update(doc.cmetadata)
data.append(Data(**data_dict))
Comment on lines +84 to +89
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Guard against None metadata

doc.cmetadata can be None, causing dict.update(None)TypeError.

-    data_dict.update(doc.cmetadata)
+    if doc.cmetadata:
+        data_dict.update(doc.cmetadata)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
data_dict = {
"id": doc.id,
"text": doc.document,
}
data_dict.update(doc.cmetadata)
data.append(Data(**data_dict))
data_dict = {
"id": doc.id,
"text": doc.document,
}
if doc.cmetadata:
data_dict.update(doc.cmetadata)
data.append(Data(**data_dict))
🤖 Prompt for AI Agents
In src/backend/base/langflow/components/vectorstores/cratedb.py around lines 84
to 89, the code calls dict.update(doc.cmetadata) without checking if
doc.cmetadata is None, which causes a TypeError. Fix this by adding a guard to
ensure doc.cmetadata is not None before calling update, for example by using a
conditional check or defaulting to an empty dictionary when doc.cmetadata is
None.

return data
Empty file.
Loading
Loading