Is GraphRAG community detection parameterless? #683
Replies: 1 comment
-
GraphRAG does exactly the hierarchical summarization you describe, but the communities are generated from the extracted entity relationship graph. This is done hierarchically using Leiden, and for any given entity you can trace the parentage through to the root (see create_final_nodes.parquet, and for the community summaries you can join with create_final_community_reports.parquet). However, this is all part of GraphRAG's indexing, and does not operate on pre-defined content clusters that you may want to define. As for parameters, we use the hierarchical implementation from graspologic, and include an env var I think you are also asking about automatic detection of parents based on content analysis, which GraphRAG is not designed to do. |
Beta Was this translation helpful? Give feedback.
-
Suppose I have a corpus of documents whose content I want to cluster before summarising. There are an indeterminate number of parent clusters, and each parent may in turn have several tributary child clusters (content that is broadly similar but differs slightly in the details). I would like to identify both parent and (if they exist) child clusters, and generate LLM summaries for each.
Would this be an appropriate application for GraphRAG? I'm aware, for example, that GraphRAG can use community detection to identify clustered concepts. Is GraphRAG able to identify parent and child communities? If so, would I have any influence on the number of communities it detects or is that more of an automatic process?
Beta Was this translation helpful? Give feedback.
All reactions