Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: EMPTY community report in local search leads to no-use of community information #1391

Open
3 tasks done
LevickCG opened this issue Nov 10, 2024 · 1 comment
Open
3 tasks done
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer

Comments

@LevickCG
Copy link

LevickCG commented Nov 10, 2024

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

I’ve been exploring the codebase for GraphRAG and recently noticed that the community reports used for query augmentation in local search appear empty.

Below is an image of intermediate debug information using pdb, where you can see the selected community is empty:
from graphrag/query/structured_search/local_search/mixed_context.py:249

Image

In consequnce, we see the community_context_data is empty.

Image

For the final response we see the lack of reports info.
Image

This leads to missing community structure information in local search, which seems to degrade GraphRAG's performance and creates a discrepancy between the code implementation and the paper.

Steps to reproduce

  1. Initialize the environment and create index as official getting-started-guide

https://microsoft.github.io/graphrag/get_started/

You can specify the raw text on your own.

  1. Create debug python file under your_path_to_graphrag/graphrag/graphrag/graphrag/cli/

cd ./graphrag/cli
touch debug_query.py

add the codes below to debug_query.py, it will launch a local search according to your query.

from query import run_local_search
from pathlib import Path

run_local_search(
    config_filepath=None,
    data_dir=Path("your_path_to_graphrag/graphrag/graphrag/ragtest/output"),# modify to your path
    root_dir=Path("your_path_to_graphrag/graphrag/graphrag/ragtest"),# modify to your path
    community_level=2,
    response_type="text",
    streaming=False,
    query="Any query here you want to ask" #place it to your desired query
)
  1. Add import pdb;pdb.set_trace() to graphrag/query/structured_search/local_search/mixed_context.py:254

  2. run the code and print debug info

python3 -m pdb debug_query.py

run the code and it will stop at mixed_context.py:254, print the community information and you'll see it's empty.

p selected_communities

Expected Behavior

1.The selected_communities should not be empty.

2.Accordingly, community context should not be empty.

3.For the local search response, it should show the data source with entity, relationship, report (now report is missing).

GraphRAG Config Used

# Default config in getting-starting-guide
llm = "gpt-4o-mini"
embedding_model = "text-embedding-3-small"

Logs and screenshots

See images provided.

Additional Information

  • GraphRAG Version: v0.4.1
  • Operating System: macOS 15.0
  • Computer: MacBook Air m2, 2022
  • Python Version: 3.10.15
  • Related Issues: None
@LevickCG LevickCG added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Nov 10, 2024
@LevickCG LevickCG changed the title [Bug]: EMPTY community report in local search [Bug]: EMPTY community report in local search leads to no-use of community information Nov 10, 2024
@LevickCG
Copy link
Author

After further investigation, I identified that the root cause of this issue is due to the mismatch of uuid to human readable id in the search process.

Image

I plan to work on a fix and submit a pull request. I’d appreciate any feedback or guidance from the maintainers to ensure my approach aligns with the project’s design principles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer
Projects
None yet
Development

No branches or pull requests

1 participant