GraphRAG - vector search #57

jexp · 2024-10-05T22:05:20Z

Thanks for adding GraphRAG to RAGbuilder.

I had some questions and suggestions, perhaps you want to chat some time.

QQ: in graphrag.full_retriever you fetch the vector store data but don't use it in the method or the returns, looks redundant?

        def full_retriever(question: str):
            graph_data = graph_retriever(question)
            vector_data = [el.page_content for el in vector_retriever.invoke(question)]
            final_data = f'''Graph data:
        {graph_data}
            '''
            return final_data

You don't make use of the built in neo4j vector search only the fulltext index - with the vector search you can allow in-graph vector and hybrid search? (you can create vector indexes both for chunks in the lexical graph, for entities in the domain graph and for communities in the topical structures)
right now the graph retriever only uses the direct neighbourhood of the nodes, this could be a good hyperparameter to add
e.g. we have a number of different retrievers in the llm-graph-builder, see: https://github.com/neo4j-labs/llm-graph-builder/blob/DEV/backend/src/shared/constants.py
I saw you copied some code from the neo4j-langchain integrations? Was there a reason (i.e. did you make modifications - if so it might be good to discuss to rather contribute them back upstream?)
there is the option to run clustering algorithms to generate cross-document topic summaries across the entity graphs (like in the MSFT GraphRAG paper), see https://neo4j.com/developer-blog/global-graphrag-neo4j-langchain/ (we've also implemented that in https://llm-graph-builder.neo4jlabs.com if you have a graph data science enabled database).

We have documented more GraphRAG patterns, here just in case you want to share your RAG patterns to the catalogue or provide some feedback:

aravind10x · 2024-10-08T04:55:08Z

Hi @jexp, thanks for your questions & thoughts!

@ashwinzyx - perhaps, you can take a look once you're back.

ashwinzyx · 2024-10-16T08:44:16Z

Hi @jexp, thanks for looking at our repo. Apologies for the delay. Just got back from vacation.

QQ: in graphrag.full_retriever you fetch the vector store data but don't use it in the method or the returns, looks redundant?

 def full_retriever(question: str):
      graph_data = graph_retriever(question)
      vector_data = [el.page_content for el in vector_retriever.invoke(question)]
      final_data = f'''Graph data:
  {graph_data}
      '''
      return final_data

[Ans] Yes. Looks like we are not using vector_data for the Graph RAG but using it for the Hybrid RAG. Will remove it

You don't make use of the built in neo4j vector search only the fulltext index - with the vector search you can allow in-graph vector and hybrid search? (you can create vector indexes both for chunks in the lexical graph, for entities in the domain graph and for communities in the topical structures)

[Ans] We have been using Chroma for the templates for vector search.
I do see hybrid search options in below examples.
https://python.langchain.com/docs/integrations/vectorstores/neo4jvector/
https://neo4j.com/labs/genai-ecosystem/langchain/

believe below is using in-graph vector. Am i right? Is there an full example you can share for in-graph vector
https://neo4j.com/developer-blog/global-graphrag-neo4j-langchain/

right now the graph retriever only uses the direct neighbourhood of the nodes, this could be a good hyperparameter to add

[Ans] For now we have added GraphRAG as a template. We will include these are individual components and have hyperparameter tuning option

e.g. we have a number of different retrievers in the llm-graph-builder, see: https://github.com/neo4j-labs/llm-graph-builder/blob/DEV/backend/src/shared/constants.py

[Ans] Thanks for the pointer. Will take a look

I saw you copied some code from the neo4j-langchain integrations? Was there a reason (i.e. did you make modifications - if so it might be good to discuss to rather contribute them back upstream?)

[Ans] No. We did not make any modifications.

there is the option to run clustering algorithms to generate cross-document topic summaries across the entity graphs (like in the MSFT GraphRAG paper), see https://neo4j.com/developer-blog/global-graphrag-neo4j-langchain/ (we've also implemented that in https://llm-graph-builder.neo4jlabs.com/ if you have a graph data science enabled database).

[Ans] Thanks. Will take a look.

Thanks for all your feedback. Would be great to chat sometime. We want the improve GraphRAG option in RAGBuilder and would love your contributions as well

aravind10x · 2024-10-18T15:25:17Z

@jexp - can you pls review @ashwinzyx's comments? Do you have any further thoughts or suggestions? Please feel free to suggest changes or raise a PR to make the Graph RAG part of RAGBuilder even better.

jexp · 2024-10-24T22:45:09Z

@aravind10x would probably good to have a chat with me and @tomasonjo at some point, harder to go through these in GH issues :)

aravind10x assigned ashwinzyx Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GraphRAG - vector search #57

GraphRAG - vector search #57

jexp commented Oct 5, 2024

aravind10x commented Oct 8, 2024

ashwinzyx commented Oct 16, 2024 •

edited

Loading

aravind10x commented Oct 18, 2024

jexp commented Oct 24, 2024

GraphRAG - vector search #57

GraphRAG - vector search #57

Comments

jexp commented Oct 5, 2024

aravind10x commented Oct 8, 2024

ashwinzyx commented Oct 16, 2024 • edited Loading

aravind10x commented Oct 18, 2024

jexp commented Oct 24, 2024

ashwinzyx commented Oct 16, 2024 •

edited

Loading