-
Notifications
You must be signed in to change notification settings - Fork 4k
Added Metadata filtering, scaffolding for all, fully implemented for Postgres (pgvector) and Neo4j #2187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…querying - Implement custom metadata insertion as node properties during file upload. - Add basic metadata filtering functionality to query API --NOTE: While the base.py file has been modified, the base implementation is incomplete and untested. Only Neo4j database has been properly implemented and tested. WIP: Query API is temporarily mocked for debugging. Full implementation with complex AND/OR filtering capabilities is in development. # Conflicts: # lightrag/base.py # lightrag/lightrag.py # lightrag/operate.py
Added metadata filter dataclass for serializing and deserializing complex filter to json dict, added node filtering based on metadata
Added functioning (needs testing) metadata filtering on chunks for query. Fully implemented only on Postgres with pgvector and Neo4j
Added metadata management for chunks in querying for all vdb, ONLY posgres with pgvector has been fully implemented
…s for metadata filtering to optimize speed
|
Looking forward this feature. Thank you so much the effort. |
@duynt88 Thank you! It's being discussed because of a technical potential issue in data reliability, but if you pull the fork it's already working. If you use a large document base the issue is less and less prominent, we've reached >80% successful unstructured data extraction (e. g. who's the executive manager for the xyz store, is the X9000 certification needed for this procedure etc etc...) from a large codebase with the rag only, without any guardrailing for the specific datum extracted save for the metadata filtering to restrict the chunk pool to the documents that we know might contain the datum. Give it a spin! |
…endpoints --and consequently pipeline
|
I looked over your changes and noticed, that you have implemented token tracing feature, which doesn't look like to be part of metadata filtering. While I am not a maintainer of this project, I recommend you to split this both features into two separate PRs. |
|
@mkwl thank you for taking a look! This is because I forked to make a single feature change and worked on the main branch, which is the one I did the pr from, and then had to add something else.
So now I ask you, since neither of my changes is or seems to be beneficial, should I close this pr? Or should I just branch the other features and leave it clean so if tomorrow the maintainers find a way to reuse my code they have a quick access to it? |
|
Issue #2555 discusses a similar idea, and I think the implementation would be quite alike. Perhaps you could move your current branch's implementation into a separate PR? |
…fork This merge brings in the metadata filtering capabilities from PR HKUDS#2187 which enables database-level filtering for PostgreSQL (pgvector) and Neo4j. Key changes: - Added MetadataFilter support in postgres_impl.py - Added MetadataFilter support in neo4j_impl.py - Updated query parameters to support metadata_filter - Added token tracking functionality Conflicts resolved by accepting GGrassia's version to preserve metadata filtering implementation which is the core feature we need. Source: https://github.com/GGrassia/LightRAG Original PR: HKUDS#2187
Description
Added custom metadata on chunks and nodes with possibility of filtering in query to narrow the field of the knowledge base to pull context from, to both enhance precision and speed.
Metadata are stored as a json string and indexe
Metadata_filter class supports operators for AND, NOT, OR clauses and nested metadata_filter classes for chained or hierarchical filters, [ ... ] arrays for multiple possible values for a single metadata key.
I will gladly cooperate for bugfixing and further development.
Related Issues
As requested and discussed into question/issue #1985
Changes Made
Checklist
Additional Notes
[Add any additional notes or context for the reviewer(s).]