Skip to content

neo4j-product-examples/neo4j-gnn-llm-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neo4j GraphRAG with GNN+LLM

Knowledge graph retrieval to improve multi-hop Q&A performance, optimized with GNN + LLM models.

This repo contains experiments for combining Knowledge Graph Retrieval with GNN+LLM models to improve RAG. Currently leveraging Neo4j, G-Retriever, and the STaRK-Prime dataset for benchmarking.

This work was presented at: stanford graph learning workshop 2024: https://snap.stanford.edu/graphlearning-workshop-2024/ nvidia technical blog: https://developer.nvidia.com/blog/boosting-qa-accuracy-with-graphrag-using-pyg-and-graph-databases/

Architecture Overview

Architecture

  • RAG on large knowledge graphs that require multi-hop retrieval and reasoning, beyond node classification and link prediction.
  • General, extensible 2-part architecture: KG Retrieval & GNN+LLM.
  • Efficient, stable inference time and output for real-world use cases.

Installation

The database & dataset

Install the Neo4j database (and relevant JDK) by following official instructions. You'll also need the Neo4j GenAI plugin and the Neo4j Graph Data Science library.

With the database installed and running, you can load the STaRK-Prime dataset by running the python notebook in data-loading/stark_prime_neo4j_loading.ipynb. Alternatively, obtain a database dump at AWS S3 (bucket at gds-public-dataset/stark-prime-neo4j523) for database version 5.23.

Other requirements

Install all required libraries in requirements.txt. Additionally, make sure huggingface-cli authentications are set up for using relevant (Llama2, Llama3) models.

Reproduce results

  1. To train a model with default configurations, run the following command: python train.py --checkpointing --llama_version llama3.1-8b --retrieval_config_version 0 --algo_config_version 0 --g_retriever_config_version 0 --eval_batch_size 4
  2. To get result for Pipline, run eval_pcst_ordering.ipynb on using the intermediate dataset and g-retriever model.
  3. To exactly reproduce results in the below table, use the stanford-workshop-2024 branch. The main branch contains new incremental changes and improvements.

Table Description

Additional Neo4j GraphRAG Resources

About

GraphRAG on Neo4j by finetuning GNN+LLM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •