Neo4j GraphRAG with GNN+LLM

Knowledge graph retrieval to improve multi-hop Q&A performance, optimized with GNN + LLM models.

This repo contains experiments for combining Knowledge Graph Retrieval with GNN+LLM models to improve RAG. Currently leveraging Neo4j, G-Retriever, and the STaRK-Prime dataset for benchmarking.

This work was presented at: stanford graph learning workshop 2024: https://snap.stanford.edu/graphlearning-workshop-2024/ nvidia technical blog: https://developer.nvidia.com/blog/boosting-qa-accuracy-with-graphrag-using-pyg-and-graph-databases/

Architecture Overview

RAG on large knowledge graphs that require multi-hop retrieval and reasoning, beyond node classification and link prediction.
General, extensible 2-part architecture: KG Retrieval & GNN+LLM.
Efficient, stable inference time and output for real-world use cases.

Installation

The database & dataset

Install the Neo4j database (and relevant JDK) by following official instructions. You'll also need the Neo4j GenAI plugin and the Neo4j Graph Data Science library.

With the database installed and running, you can load the STaRK-Prime dataset by running the python notebook in data-loading/stark_prime_neo4j_loading.ipynb. Alternatively, obtain a database dump at AWS S3 (bucket at gds-public-dataset/stark-prime-neo4j523) for database version 5.23.

Other requirements

Install all required libraries in requirements.txt. Additionally, make sure huggingface-cli authentications are set up for using relevant (Llama2, Llama3) models.

Reproduce results

To train a model with default configurations, run the following command: python train.py --checkpointing --llama_version llama3.1-8b --retrieval_config_version 0 --algo_config_version 0 --g_retriever_config_version 0 --eval_batch_size 4
To get result for Pipline, run eval_pcst_ordering.ipynb on using the intermediate dataset and g-retriever model.
To exactly reproduce results in the below table, use the stanford-workshop-2024 branch. The main branch contains new incremental changes and improvements.

Additional Neo4j GraphRAG Resources

For a high-level overview of Neo4j & GenAI, have a look at neo4j.com/genai.
To learn how to get started using LLMs with Neo4j see this online Graph Academy course which is one of many Neo4j-GenAI courses covering multiple topics ranging from KG construction, to graph+vector search, and building GenAI chatbot applications.
Pick your GenAI framework of choice to start building your own GenAI applications with Neo4j.
Check out Neo4j GenAI technical blogs for other worked examples and integrations.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
configs		configs
data-loading		data-loading
.env.template		.env.template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
STaRKQADataset.py		STaRKQADataset.py
STaRKQADatasetGDS.py		STaRKQADatasetGDS.py
STaRKQAVectorSearchDataset.py		STaRKQAVectorSearchDataset.py
architecture.png		architecture.png
compute_metrics.py		compute_metrics.py
compute_pcst.py		compute_pcst.py
eval_pcst_ordering.ipynb		eval_pcst_ordering.ipynb
finalmetric.png		finalmetric.png
main.py		main.py
plot_pr.py		plot_pr.py
plot_results.py		plot_results.py
plotpr.png		plotpr.png
requirements.txt		requirements.txt
retrieve-prime-subgraphs.ipynb		retrieve-prime-subgraphs.ipynb
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neo4j GraphRAG with GNN+LLM

Architecture Overview

Installation

The database & dataset

Other requirements

Reproduce results

Additional Neo4j GraphRAG Resources

About

Releases

Packages

Contributors 3

Languages

License

neo4j-product-examples/neo4j-gnn-llm-example

Folders and files

Latest commit

History

Repository files navigation

Neo4j GraphRAG with GNN+LLM

Architecture Overview

Installation

The database & dataset

Other requirements

Reproduce results

Additional Neo4j GraphRAG Resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages