Proto-RAG: Retrieval-Augmented Generation with Neo4j and OpenAI

Proto-RAG is a Retrieval-Augmented Generation (RAG) system that integrates Neo4j as a knowledge graph and OpenAI's GPT-3.5 to provide intelligent responses based on the data stored in the knowledge graph. This project utilizes LangChain to facilitate the interaction between the knowledge graph and the language model.

Features

Knowledge Graph Integration: Utilizes Neo4j to store and query data.
Natural Language Processing: Uses OpenAI's GPT-3.5 for generating responses.
Dynamic Cypher Query Generation: Converts natural language questions into Cypher queries to fetch relevant data from Neo4j.
Interactive Command-Line Interface: Allows users to ask questions and get responses interactively.
PDF Parsing: Extract text from PDF files.
Text Chunking with Metadata: Split text into manageable chunks and attach metadata.
JSON Saving: Save parsed and chunked text into JSON files.
RAG Pipeline: Retrieve data from Neo4j and generate summaries using OpenAI.

Project Structure

proto-rag/
│
├── 📂 .github/workflows/
├── 📂 notebooks/
├── 📂 tests/
├── 📂 proto_rag/
│   ├── 📄 __init__.py
│   ├──  📂 utils/
│   │   ├── 📄 __init__.py
│   │   ├── 📄 pdf_parser.py
│   │   ├── 📄 text_chunker.py
│   │   ├── 📄 json_saver.py
│   │   ├── 📄 file_handler.py
│   │   ├── 📄 neo4j_handler.py
│   │   ├── 📄 openai_handler.py
│   │   └── 📄 rag_handler.py
│   └── 📄 main.py
├── 📄 .env                                 (UNTRACKED)
├── 📄 requirements.txt
├── 📂 venv/
├── 📄 Dockerfile (to be implemented)
├── 📄 .gitignore

Getting Started

Prerequisites

Python 3.8+
Neo4j Database
OpenAI API Key

See requirements.txt.

Installation

Clone the repository:

git clone https://github.com/your-username/proto-rag.git
cd proto-rag

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install dependencies:
```
pip install -r requirements.txt
```

Set up environment variables: Create a .env file in the root directory with the following content:

NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_neo4j_password
OPENAI_API_KEY=your_openai_api_key

Usage

Start Neo4j: Ensure your Neo4j database is running.
Run the Main Script:
```
python -m proto_rag.main
```
Interact with the System: You can now ask questions related to the data stored in your Neo4j knowledge graph. For example:
```
> What is CAD?
```

Utility functions to populate to parse PDFs and populate you graphs are also available.

Example

To add a new PDF for processing, add its path to the pdf_files list in main.py:

pdf_files = [
    '/path/to/your/pdf1.pdf',
    '/path/to/your/pdf2.pdf',
    # Add more PDFs here
]

Testing

Tests are written using pytest. To run the tests, execute:

pytest tests/

Directory and File Responsibilities

proto_rag/utils: Contains utility modules for PDF parsing, text chunking, JSON saving, Neo4j handling, OpenAI integration, and RAG implementation.
proto_rag/main.py: Main script to run the entire pipeline.
.env: Environment variables configuration file.
requirements.txt: List of dependencies.
Dockerfile: To be implemented for containerization.
proto_rag/main.py: The main entry point of the application.
`proto_rag/utils/rag_handler.py: Contains the logic for interacting with Neo4j and OpenAI.
proto_rag/utils/__init__.py: Initializes the utils module.

License

This project is licensed under the MIT License.

Future Enhancements

To-Do List

Testing:
- Add unit tests for all utility functions.
- Write integration tests to ensure modules work together correctly.
- Develop end-to-end tests to verify the entire workflow.
CI/CD:
- Set up continuous integration using GitHub Actions.
- Automate testing and deployment processes.
- Implement code quality checks (linting, formatting).
Dockerization:
- Create a Dockerfile for containerization.
- Build and test Docker images locally.
- Deploy Docker containers using a container orchestration tool (e.g., Kubernetes).

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
assets		assets
notebooks		notebooks
proto_rag		proto_rag
tests		tests
.gitignore		.gitignore
README.md		README.md
conftest.py		conftest.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proto-RAG: Retrieval-Augmented Generation with Neo4j and OpenAI

Features

Project Structure

Getting Started

Prerequisites

Installation

Usage

Example

Testing

Directory and File Responsibilities

License

Future Enhancements

To-Do List

References

About

Releases

Packages

Languages

rossop/proto-rag

Folders and files

Latest commit

History

Repository files navigation

Proto-RAG: Retrieval-Augmented Generation with Neo4j and OpenAI

Features

Project Structure

Getting Started

Prerequisites

Installation

Usage

Example

Testing

Directory and File Responsibilities

License

Future Enhancements

To-Do List

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages