Added Metadata filtering, scaffolding for all, fully implemented for Postgres (pgvector) and Neo4j #2187

GGrassia · 2025-10-09T10:51:52Z

Description

Added custom metadata on chunks and nodes with possibility of filtering in query to narrow the field of the knowledge base to pull context from, to both enhance precision and speed.
Metadata are stored as a json string and indexe
Metadata_filter class supports operators for AND, NOT, OR clauses and nested metadata_filter classes for chained or hierarchical filters, [ ... ] arrays for multiple possible values for a single metadata key.
I will gladly cooperate for bugfixing and further development.

Related Issues

As requested and discussed into question/issue #1985

Changes Made

Added Pydantic metadata_filter class
Added metadata_filter class to all base implementations of query for chunks
Added metadata management in chunk writing for Postgres
Added metadata as properties on nodes for Neo4j
Added metadata filter building for postgres_impl and updated queries for chunks, entities and relations to allow filtering

Checklist

Changes tested locally - (fully working in prod for our specific solution!)
Code reviewed
Documentation updated (if necessary)
Unit tests added (if applicable)

Additional Notes

[Add any additional notes or context for the reviewer(s).]

…querying - Implement custom metadata insertion as node properties during file upload. - Add basic metadata filtering functionality to query API --NOTE: While the base.py file has been modified, the base implementation is incomplete and untested. Only Neo4j database has been properly implemented and tested. WIP: Query API is temporarily mocked for debugging. Full implementation with complex AND/OR filtering capabilities is in development. # Conflicts: # lightrag/base.py # lightrag/lightrag.py # lightrag/operate.py

Added metadata filter dataclass for serializing and deserializing complex filter to json dict, added node filtering based on metadata

Added functioning (needs testing) metadata filtering on chunks for query. Fully implemented only on Postgres with pgvector and Neo4j

Added metadata management for chunks in querying for all vdb, ONLY posgres with pgvector has been fully implemented

… work

…s for metadata filtering to optimize speed

…ations for the functionality

duynt88 · 2025-11-10T02:26:07Z

Looking forward this feature. Thank you so much the effort.

GGrassia · 2025-11-11T08:50:54Z

Looking forward this feature. Thank you so much the effort.

@duynt88 Thank you! It's being discussed because of a technical potential issue in data reliability, but if you pull the fork it's already working. If you use a large document base the issue is less and less prominent, we've reached >80% successful unstructured data extraction (e. g. who's the executive manager for the xyz store, is the X9000 certification needed for this procedure etc etc...) from a large codebase with the rag only, without any guardrailing for the specific datum extracted save for the metadata filtering to restrict the chunk pool to the documents that we know might contain the datum. Give it a spin!

…endpoints --and consequently pipeline

mkwl · 2026-01-22T11:14:38Z

I looked over your changes and noticed, that you have implemented token tracing feature, which doesn't look like to be part of metadata filtering. While I am not a maintainer of this project, I recommend you to split this both features into two separate PRs.

GGrassia · 2026-01-22T11:33:01Z

@mkwl thank you for taking a look! This is because I forked to make a single feature change and worked on the main branch, which is the one I did the pr from, and then had to add something else.
While none of this excuses my mixed pr, this has happened because of two things:

I have exchanged ideas with some of the maintainers and the implementation seems to be error prone in some cases (small doc corpus size especially). While I haven't experienced problems firsthand I understand their desire to keep the library as accurate as possible, so I don't think my pr is ever going to be merged. This has led to:
The changes I made are deployed in a production environment for a custom solution we made! So when I was asked to add token tracking I did it quick and dirty for our specific use case and pipeline not generalizing at all. They are not meant to be merged and this is an oversight on my part, since the architecture was not approved (but we already had the system in place) I started using my repo for my own stuff, I even thought this pr got closed at some point.

So now I ask you, since neither of my changes is or seems to be beneficial, should I close this pr? Or should I just branch the other features and leave it clean so if tomorrow the maintainers find a way to reuse my code they have a quick access to it?

chikenGhost · 2026-01-27T07:53:26Z

Issue #2555 discusses a similar idea, and I think the implementation would be quite alike. Perhaps you could move your current branch's implementation into a separate PR?

…fork This merge brings in the metadata filtering capabilities from PR HKUDS#2187 which enables database-level filtering for PostgreSQL (pgvector) and Neo4j. Key changes: - Added MetadataFilter support in postgres_impl.py - Added MetadataFilter support in neo4j_impl.py - Updated query parameters to support metadata_filter - Added token tracking functionality Conflicts resolved by accepting GGrassia's version to preserve metadata filtering implementation which is the core feature we need. Source: https://github.com/GGrassia/LightRAG Original PR: HKUDS#2187

Giulio Grassia and others added 15 commits September 25, 2025 15:37

feat (metadata): added metadata filter in query

0c721fa

Added metadata filter dataclass for serializing and deserializing complex filter to json dict, added node filtering based on metadata

fix (metadata): Corrected metadata management in enqueued documents

40afb04

feat (metadata filter): added metadata filtering

d0fba28

Added functioning (needs testing) metadata filtering on chunks for query. Fully implemented only on Postgres with pgvector and Neo4j

feat (metadata): Added metadata parameter in query

2728bb4

Added metadata management for chunks in querying for all vdb, ONLY posgres with pgvector has been fully implemented

fix (metadata): fixed not working postgres queries, performance needs…

e04cd3c

… work

feat (metadata): added IN clause management

e383879

feat (metadata) WIP: added metadata GIN index and modified sql querie…

b5cc842

…s for metadata filtering to optimize speed

Merge remote-tracking branch 'upstream/main'

4535af4

fix (operate): commented duplicate function

c35f74b

fix (metadata): added metadata as named parameter

177ec23

fix (postgres query): fixed metadata filtering for postgres pg queries

f4c2823

perf (metadata): optimized metadata query with gin indexes for postgres

a57d4ec

Merge remote-tracking branch 'upstream/main'

bb4d818

feat (metadata postgres): added logic for IN clauses on operands

bdb1ae0

GGrassia mentioned this pull request Oct 10, 2025

[Feature Request]: Add page number metadata to chunks for citation #2142

Open

2 tasks

GGrassia mentioned this pull request Oct 30, 2025

Can we temporarily enable/disable documents in RAG? #2285

Open

2 tasks

GGrassia added 2 commits October 30, 2025 15:15

fix (document_queue): fixed silent fail when requeueing

166bdf7

docs (metadata): Added Metadata_Filtering.md with examples and explan…

cd664de

…ations for the functionality

danielaskdd added the discuss label Nov 1, 2025

GGrassia added 5 commits November 26, 2025 17:00

feat (token_tracking): added tracking token to both query and insert …

3a2d3dd

…endpoints --and consequently pipeline

fix (embedding): fixed query endpoint

49ce064

fix (token_tracker): prevented double token tracker

d971e37

fix (tracker_): removed passing as named arg

5074b4e

fix (token_tracker): removed all duplications

dc699b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Metadata filtering, scaffolding for all, fully implemented for Postgres (pgvector) and Neo4j #2187

Added Metadata filtering, scaffolding for all, fully implemented for Postgres (pgvector) and Neo4j #2187

Uh oh!

GGrassia commented Oct 9, 2025 •

edited

Loading

Uh oh!

duynt88 commented Nov 10, 2025

Uh oh!

GGrassia commented Nov 11, 2025

Uh oh!

mkwl commented Jan 22, 2026

Uh oh!

GGrassia commented Jan 22, 2026

Uh oh!

chikenGhost commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Added Metadata filtering, scaffolding for all, fully implemented for Postgres (pgvector) and Neo4j #2187

Are you sure you want to change the base?

Added Metadata filtering, scaffolding for all, fully implemented for Postgres (pgvector) and Neo4j #2187

Uh oh!

Conversation

GGrassia commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Changes Made

Checklist

Additional Notes

Uh oh!

duynt88 commented Nov 10, 2025

Uh oh!

GGrassia commented Nov 11, 2025

Uh oh!

mkwl commented Jan 22, 2026

Uh oh!

GGrassia commented Jan 22, 2026

Uh oh!

chikenGhost commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

GGrassia commented Oct 9, 2025 •

edited

Loading