How to refresh and index document based on metadata? #15127
Replies: 1 comment 18 replies
-
Hey @d3buggerdan! I'm a bot here to help you with any bugs, questions, or contributions. Let's tackle this issue together! To refresh and index a document based on metadata without needing to refresh the entire index, you can use the Here is the relevant code snippet: def refresh_ref_docs(
self, documents: Sequence[Document], **update_kwargs: Any
) -> List[bool]:
"""Refresh an index with documents that have changed."""
with self._callback_manager.as_trace("refresh"):
pipeline_id = self._get_pipeline_id()
upserted_documents = self._client.pipelines.upsert_batch_pipeline_documents(
pipeline_id=pipeline_id,
request=[
CloudDocumentCreate(
text=doc.text,
metadata=doc.metadata,
excluded_embed_metadata_keys=doc.excluded_embed_metadata_keys,
excluded_llm_metadata_keys=doc.excluded_llm_metadata_keys,
id=doc.id_,
)
for doc in documents
],
)
doc_ids = [doc.id for doc in upserted_documents]
self._wait_for_documents_ingestion(
doc_ids, verbose=True, raise_on_error=True
)
return [True] * len(doc_ids) This method checks if the document already exists in the document store and whether its content has changed. If the document is new or has changed, it updates the document in the index. This way, you can refresh and index documents based on their metadata without needing to refresh the entire index. The |
Beta Was this translation helpful? Give feedback.
-
So, lets say I have a user base of people with their own documents that get indexed with the metadata having a unique userid and docid for each document and embedding to filter through. Lets say one user has an updated document, meaning I only need to refresh the index related to the vectors related to the proper userid and docid in the metadata. Is there anyway to do this without needing to refresh the whole index because then it defeats the point. From my understanding the only way to refresh an index is to have your initial and new document and refresh the whole index, but this defeats the point if the same index contains documents from multiple users. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions