Background
LightRAG currently merges entities solely based on exact name matches (including captions). This results in multiple disconnected nodes for the same entity under different names, and may even create isolated subgraphs for identical entities, ultimately degrading query performance.
Automated Entity Merging for Variant Names
To address this, we propose an automated entity merging approach for differently named but identical entities:
-
Vector Node Database Utilization:
- Modify node vector DB implementation to store the embedded vector on entity name.
-
Similarity Threshold Configuration:
- Set a minimum cosine similarity threshold (e.g., 0.8) for candidate selection.
-
Candidate Retrieval:
- During merging, retrieve the top 10 most relevant nodes based on cosine similarity (above the threshold).
-
LLM-Based Merge Validation:
- Submit the current entity’s name/description along with candidate entities’ names/descriptions to an LLM.
- Task the LLM to:
- Determine whether merging is justified,
- If merging is approved, select a best candidate for merging, and return the consolidated entity name and description.
-
Iterative Merging With Depth Limitation (optional):
- Repeat the merging validation process for the newly consolidated entity returned by the LLM.
Background
LightRAG currently merges entities solely based on exact name matches (including captions). This results in multiple disconnected nodes for the same entity under different names, and may even create isolated subgraphs for identical entities, ultimately degrading query performance.
Automated Entity Merging for Variant Names
To address this, we propose an automated entity merging approach for differently named but identical entities:
Vector Node Database Utilization:
Similarity Threshold Configuration:
Candidate Retrieval:
LLM-Based Merge Validation:
Iterative Merging With Depth Limitation (optional):