Why do we keep the highest community level? #716

natoverse · 2024-07-25T19:42:57Z

natoverse
Jul 25, 2024
Maintainer

Copied from #600

Question text

(partial, see original issue for code snippets and screenshots)
I think we should keep the community id list, not the level
Since the community id is required later in calculating the report weight, the level does not seem to make sense

Answer

The GraphRAG community hierarchies are generated with Leiden, and deeper levels of the hierarchy contain fewer, more closely connected entities. As the clusters get more tightly focused, more and more entities may get left out of any cluster and therefore be assigned no community. So you could have an entity with community 12 at the root level, community 324 at level 1, and then no community at level 2 because it isn't close enough to any of the level 2 clusters.

The GraphRAG query methods include a community_level param that allows users to specify what level in the hierarchy should be targeted for summarization. However, because not all entities will have a community assignment at all levels, we treat this as the preferred maximum depth. If the entity does not have a community at that level, we will go up to the next level until we find an assignment.

In the code, this is achieved with two steps:

Filtering all entities to those at or below the max level:
entity_df = _filter_under_community_level(entity_df, community_level)
Selecting the highest remaining community id for each entity:
entity_df = ( entity_df.groupby(["name", "rank"]).agg({"community": "max"}).reset_index() )

Step 2 works because the hierarchical clustering always assigns increasing id numbers at deeper depths, so the result is all entities being filtered to the maximum available depth for each, up to your requested depth.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do we keep the highest community level? #716

{{title}}

Replies: 0 comments

Select a reply

Why do we keep the highest community level? #716

natoverse Jul 25, 2024 Maintainer

Question text

Answer

Replies: 0 comments

natoverse
Jul 25, 2024
Maintainer