Agentic knowledge retrieval redefined with an AI agent system that combines traditional RAG (vector search) with knowledge graph capabilities to analyze and provide insights about big tech companies and their AI initiatives. The system uses PostgreSQL with pgvector for semantic search and Neo4j with Graphiti for temporal knowledge graphs. The goal is to create Agentic RAG at its finest.
Built with:
- Pydantic AI for the AI Agent Framework
- Graphiti for the Knowledge Graph
- Postgres with PGVector for the Vector Database
- Neo4j for the Knowledge Graph Engine (Graphiti connects to this)
- FastAPI for the Agent API
- Claude Code for the AI Coding Assistant (See
CLAUDE.md,PLANNING.md, andTASK.md)
This system includes three main components:
- Document Ingestion Pipeline: Processes markdown documents using semantic chunking and builds both vector embeddings and knowledge graph relationships
- AI Agent Interface: A conversational agent powered by Pydantic AI that can search across both vector database and knowledge graph
- Streaming API: FastAPI backend with real-time streaming responses and comprehensive search capabilities
- Python 3.11 or higher
- PostgreSQL database (such as Neon)
- Neo4j database (for knowledge graph)
- LLM Provider API key (OpenAI, Ollama, Gemini, etc.)
# Create and activate virtual environment
python -m venv venv # python3 on Linux
source venv/bin/activate # On Linux/macOS
# or
venv\Scripts\activate # On Windowspip install -r requirements.txtExecute the SQL in sql/schema.sql to create all necessary tables, indexes, and functions.
Be sure to change the embedding dimensions on lines 31, 67, and 100 based on your embedding model. OpenAI's text-embedding-3-small is 1536 and nomic-embed-text from Ollama is 768 dimensions, for reference.
Note that this script will drop all tables before creating/recreating!
You have a couple easy options for setting up Neo4j:
- Clone the repository:
git clone https://github.com/coleam00/local-ai-packaged - Follow the installation instructions to set up Neo4j through the package
- Note the username and password you set in .env and the URI will be bolt://localhost:7687
- Download and install Neo4j Desktop
- Create a new project and add a local DBMS
- Start the DBMS and set a password
- Note the connection details (URI, username, password)
Create a .env file in the project root:
# Database Configuration (example Neon connection string)
DATABASE_URL=postgresql://username:[email protected]/neondb
# Neo4j Configuration
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password
# LLM Provider Configuration (choose one)
LLM_PROVIDER=openai
LLM_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=sk-your-api-key
LLM_CHOICE=gpt-4.1-mini
# Embedding Configuration
EMBEDDING_PROVIDER=openai
EMBEDDING_BASE_URL=https://api.openai.com/v1
EMBEDDING_API_KEY=sk-your-api-key
EMBEDDING_MODEL=text-embedding-3-small
# Ingestion Configuration
INGESTION_LLM_CHOICE=gpt-4.1-nano # Faster model for processing
# Application Configuration
APP_ENV=development
LOG_LEVEL=INFO
APP_PORT=8058For other LLM providers:
# Ollama (Local)
LLM_PROVIDER=ollama
LLM_BASE_URL=http://localhost:11434/v1
LLM_API_KEY=ollama
LLM_CHOICE=qwen2.5:14b-instruct
# OpenRouter
LLM_PROVIDER=openrouter
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=your-openrouter-key
LLM_CHOICE=anthropic/claude-3-5-sonnet
# Gemini
LLM_PROVIDER=gemini
LLM_BASE_URL=https://generativelanguage.googleapis.com/v1beta
LLM_API_KEY=your-gemini-key
LLM_CHOICE=gemini-2.5-flashAdd your markdown documents to the documents/ folder:
mkdir -p documents
# Add your markdown files about tech companies, AI research, etc.
# Example: documents/google_ai_initiatives.md
# documents/microsoft_openai_partnership.mdThe system includes a built-in PDF to Markdown converter using Docling for advanced PDF understanding:
# Convert all PDFs from the pdf/ folder (default behavior)
python convert_pdf.py
# Convert a single PDF file
python convert_pdf.py document.pdf
# Convert a PDF from URL (e.g., arXiv papers)
python convert_pdf.py https://arxiv.org/pdf/2408.09869.pdf
# Convert multiple specific PDFs
python convert_pdf.py file1.pdf file2.pdf file3.pdf
# Convert all PDFs from a custom folder
python convert_pdf.py --folder my_pdfs/
# Specify custom output directory
python convert_pdf.py document.pdf -o custom_output_folder
# Verbose output for debugging
python convert_pdf.py -vQuick Start: Simply place your PDF files in the pdf/ folder and run python convert_pdf.py to convert them all!
The converter will automatically:
- Download PDFs from URLs if needed
- Extract text, tables, and structure using Docling's advanced PDF understanding
- Convert to clean Markdown format
- Save files in the
documents/folder (or specified output directory)
You can also use the programmatic API:
from ingestion.pdf_converter import PDFConverter
converter = PDFConverter(output_dir="documents")
output_path = converter.convert_to_markdown("document.pdf")
print(f"Converted to: {output_path}")Note: For a comprehensive example with extensive content, you can copy the provided big_tech_docs folder:
cp -r big_tech_docs/* documents/This includes 21 detailed documents about major tech companies and their AI initiatives. Be aware that processing all these files into the knowledge graph will take significant time (potentially 30+ minutes) due to the computational complexity of entity extraction and relationship building.
Important: You must run ingestion first to populate the databases before the agent can provide meaningful responses.
# Basic ingestion with semantic chunking
python -m ingestion.ingest
# Clean existing data and re-ingest everything
python -m ingestion.ingest --clean
# Custom settings for faster processing (no knowledge graph)
python -m ingestion.ingest --chunk-size 800 --no-semantic --verboseThe ingestion process will:
- Parse and semantically chunk your documents
- Generate embeddings for vector search
- Extract entities and relationships for the knowledge graph
- Store everything in PostgreSQL and Neo4j
NOTE that this can take a while because knowledge graphs are very computationally expensive!
Before running the API server, you can customize when the agent uses different tools by modifying the system prompt in agent/prompts.py. The system prompt controls:
- When to use vector search vs knowledge graph search
- How to combine results from different sources
- The agent's reasoning strategy for tool selection
# Start the FastAPI server
python -m agent.api
# Server will be available at http://localhost:8058The CLI provides an interactive way to chat with the agent and see which tools it uses for each query.
# Start the CLI in a separate terminal from the API (connects to default API at http://localhost:8058)
python cli.py
# Connect to a different URL
python cli.py --url http://localhost:8058
# Connect to a specific port
python cli.py --port 8080- Real-time streaming responses - See the agent's response as it's generated
- Tool usage visibility - Understand which tools the agent used:
vector_search- Semantic similarity searchgraph_search- Knowledge graph querieshybrid_search- Combined search approach
- Session management - Maintains conversation context
- Color-coded output - Easy to read responses and tool information
π€ Agentic RAG with Knowledge Graph CLI
============================================================
Connected to: http://localhost:8058
You: What are Microsoft's AI initiatives?
π€ Assistant:
Microsoft has several major AI initiatives including...
π Tools Used:
1. vector_search (query='Microsoft AI initiatives', limit=10)
2. graph_search (query='Microsoft AI projects')
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
You: How is Microsoft connected to OpenAI?
π€ Assistant:
Microsoft has a significant strategic partnership with OpenAI...
π Tools Used:
1. hybrid_search (query='Microsoft OpenAI partnership', limit=10)
2. get_entity_relationships (entity='Microsoft')
help- Show available commandshealth- Check API connection statusclear- Clear current sessionexitorquit- Exit the CLI
curl http://localhost:8058/healthcurl -X POST "http://localhost:8058/chat" \
-H "Content-Type: application/json" \
-d '{
"message": "What are Google'\''s main AI initiatives?"
}'curl -X POST "http://localhost:8058/chat/stream" \
-H "Content-Type: application/json" \
-d '{
"message": "Compare Microsoft and Google'\''s AI strategies",
}'This system combines the best of both worlds:
Vector Database (PostgreSQL + pgvector):
- Semantic similarity search across document chunks
- Fast retrieval of contextually relevant information
- Excellent for finding documents about similar topics
Knowledge Graph (Neo4j + Graphiti):
- Temporal relationships between entities (companies, people, technologies)
- Graph traversal for discovering connections
- Perfect for understanding partnerships, acquisitions, and evolution over time
Intelligent Agent:
- Automatically chooses the best search strategy
- Combines results from both databases
- Provides context-aware responses with source citations
The system excels at queries that benefit from both semantic search and relationship understanding:
-
Semantic Questions: "What AI research is Google working on?"
- Uses vector search to find relevant document chunks about Google's AI research
-
Relationship Questions: "How are Microsoft and OpenAI connected?"
- Uses knowledge graph to traverse relationships and partnerships
-
Temporal Questions: "Show me the timeline of Meta's AI announcements"
- Leverages Graphiti's temporal capabilities to track changes over time
-
Complex Analysis: "Compare the AI strategies of FAANG companies"
- Combines vector search for strategy documents with graph traversal for competitive analysis
-
Complementary Strengths: Vector search finds semantically similar content while knowledge graphs reveal hidden connections
-
Temporal Intelligence: Graphiti tracks how facts change over time, perfect for the rapidly evolving AI landscape
-
Flexible LLM Support: Switch between OpenAI, Ollama, OpenRouter, or Gemini based on your needs
-
Production Ready: Comprehensive testing, error handling, and monitoring
Visit http://localhost:8058/docs for interactive API documentation once the server is running.
- Hybrid Search: Seamlessly combines vector similarity and graph traversal
- Temporal Knowledge: Tracks how information changes over time
- Streaming Responses: Real-time AI responses with Server-Sent Events
- PDF Conversion: Built-in PDF to Markdown converter using Docling for advanced PDF understanding
- Flexible Providers: Support for multiple LLM and embedding providers
- Semantic Chunking: Intelligent document splitting using LLM analysis
- Production Ready: Comprehensive testing, logging, and error handling
agentic-rag-knowledge-graph/
βββ agent/ # AI agent and API
β βββ agent.py # Main Pydantic AI agent
β βββ api.py # FastAPI application
β βββ providers.py # LLM provider abstraction
β βββ models.py # Data models
βββ ingestion/ # Document processing
β βββ ingest.py # Main ingestion pipeline
β βββ chunker.py # Semantic chunking
β βββ embedder.py # Embedding generation
β βββ pdf_converter.py # PDF to Markdown conversion
βββ convert_pdf.py # CLI tool for PDF conversion
βββ pdf_to_markdown.py # Standalone PDF conversion script
βββ cli.py # Interactive CLI for the agent
βββ sql/ # Database schema
βββ documents/ # Your markdown files
βββ tests/ # Comprehensive test suite
# Run all tests
pytest
# Run with coverage
pytest --cov=agent --cov=ingestion --cov-report=html
# Run specific test categories
pytest tests/agent/
pytest tests/ingestion/Database Connection: Ensure your DATABASE_URL is correct and the database is accessible
# Test your connection
psql -d "$DATABASE_URL" -c "SELECT 1;"Neo4j Connection: Verify your Neo4j instance is running and credentials are correct
# Check if Neo4j is accessible (adjust URL as needed)
curl -u neo4j:password http://localhost:7474/db/data/No Results from Agent: Make sure you've run the ingestion pipeline first
python -m ingestion.ingest --verboseLLM API Issues: Check your API key and provider configuration in .env
Built with β€οΈ using Pydantic AI, FastAPI, PostgreSQL, and Neo4j.