Replies: 1 comment 5 replies
-
@dkindlund we are working on the new Indexing API from langchain (https://js.langchain.com/docs/modules/data_connection/indexing/). This will let you keep track what embeddings have been upserted, avoide duplication. Exactly as what you described, the splitted docs with content and metadata will be hashed and compared against the new ones everytime upsert happens. |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
It seems like the general DocumentLoader logic is quite "basic" -- meaning, regardless of vector store -- when you run an upsert operation, for all matching documents, the automation will load every document. This presents some challenges:
It would be great if Flowise had some sort of general mechanism to keep track of:
@HenryHengZJ , I'm not sure if any of these ideas have already been discussed or are already on a future roadmap. If so, any pointers to existing conversations would be helpful, as I couldn't find any after searching existing issues or discussions.
While I could try to implement a mechanism specific to the Airtable DocumentLoader node, I think figuring out how to implement this sort of tracking mechanism at a more generic level would benefit every combination of DocumentLoader and Vector Store combination instead.
After thinking about this further, it feels like each DocumentLoader needs some sort of datastore specific "callback" mechanism. The gist would be something like:
Note:
Unfortunately, I do not know enough about how the general Flowise DocumentLoader to Vector Store data path to implement something generically. I also wonder if there are fundamental limitations within LangChain itself that prevent us from implementing this capability within Flowise. In other words, I wonder if this mechanism at least partially would need some sort of code improvement at the LangChain level in order for this to be implemented correctly.
Beta Was this translation helpful? Give feedback.
All reactions