🧠 Local RAG App with Qdrant + Ollama

🧠 Local RAG App with Qdrant + Ollama

🚀 Zero-Config Setup: Just run docker-compose up --build - No Python, No Ollama installation, No model downloads, No configuration files needed!

A fully local, end-to-end Retrieval-Augmented Generation (RAG) system built with:

🧠 FastAPI – backend API server
🎨 Streamlit – simple and interactive frontend UI
🔍 Qdrant – high-performance vector database
🤖 Ollama – run LLMs like Phi3/Mistral etc locally
📄 PDF Support – upload documents and ask questions

🚀 Features

📁 Upload PDFs and extract content
🧠 Embed content using sentence-transformers
🔍 Store and search with Qdrant vector database
🤖 Query documents using local LLMs via Ollama
🧩 Smart chunking and clustering support via LangChain & scikit-learn
🛡️ 100% offline — your data never leaves your machine
⚡ Fast retrieval and response generation
🎯 Semantic search with relevance scoring

🔥 What Makes This Different

❌ Traditional RAG Setup	✅ This Project
Install Python manually	✅ Containerized
Download Ollama separately	✅ Auto-installed
Pull models manually	✅ Auto-downloaded
Configure embeddings	✅ Pre-configured
Set up vector database	✅ Ready to use
Result: Hours of setup	Result: One command

Just run: docker-compose up --build and you're done! 🎉

📋 Prerequisites

Only need these 2 things:

Docker (version 20.10+)
Docker Compose (version 2.0+)

That's it! No Python, no Ollama, no manual model downloads required.

Recommended system specs:

At least 8GB RAM (for optimal LLM performance)
10GB+ free disk space (for models and containers)

🛠️ Installation & Setup

1. Clone the Repository

git clone https://github.com/noorjotk/local-rag-engine.git
cd local-rag-engine

2. Start the Application (First Time)

docker-compose up --build

⏱️ Note: First build will automatically:

Download and install all Python dependencies
Pre-download the embedding model (BAAI/bge-large-en-v1.5)
Pull and load the LLM model (phi3:3.8b-mini-128k-instruct-q4_0)
Set up Qdrant vector database
This may take 10-15 minutes(or less/more) depending on your internet connection

3. Monitor Startup Progress

You'll see logs indicating the startup process:

rag-app  | INFO:     Started server process [7]
rag-app  | INFO:     Waiting for application startup.
rag-app  | INFO:app:Starting up application...
rag-app  | INFO:app:Loading embedding model...
rag-app  | INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: BAAI/bge-large-en-v1.5
ollama   | time=2025-07-26T09:11:13.894Z level=INFO source=server.go:637 msg="llama runner started in 4.28 seconds"
ollama   | [GIN] 2025/07/26 - 09:11:14 | 200 |  5.105294711s |             ::1 | POST     "/api/chat"
ollama   | ✅ Model loaded into memory.
rag-app  | INFO:app:Embedding model warmed up successfully
qdrant   | 2025-07-26T09:13:05.887308Z  INFO actix_web::middleware::logger: 172.18.0.4 "GET /collections HTTP/1.1" 200 111
rag-app  | INFO:httpx:HTTP Request: GET http://qdrant:6333/collections "HTTP/1.1 200 OK"
rag-app  | INFO:app:Qdrant connected, found 2 collections
rag-app  | INFO:     Application startup complete.
rag-app  | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

✅ Ready to use! Once you see INFO: Uvicorn running on http://0.0.0.0:8000, the application is fully started and accessible at http://localhost:8501

4. Subsequent Runs

docker-compose up

🚀 That's it! No configuration files needed - everything is automated!

🌐 Access URLs

Once running, access the application through:

Service	URL	Description
🔗 Streamlit App	http://localhost:8501	Main user interface
📘 FastAPI Docs	http://localhost:8000/docs	API documentation
🧠 Qdrant UI	http://localhost:6333/dashboard	Vector database dashboard
🤖 Ollama API	http://localhost:11434	LLM service endpoint

📖 Usage Guide

Basic Workflow

Start the application using Docker Compose
Open Streamlit UI at http://localhost:8501
Upload PDF documents through the file uploader
Wait for processing
Ask questions about your documents
Get AI-powered answers

📦 Tech Stack

Component	Technology	Purpose
Backend	FastAPI	RESTful API server
Frontend	Streamlit	Interactive web interface
Vector DB	Qdrant	Similarity search & storage
LLM Runtime	Ollama	Local language model inference
Embeddings	sentence-transformers	Text vectorization
Document Processing	LangChain	Text chunking & QA chains
Clustering	scikit-learn	Document similarity grouping
Containerization	Docker + Compose	Deployment & orchestration

⚙️ Configuration

Pre-configured Models

The application comes pre-configured with optimized models:

LLM Model: phi3:3.8b-mini-128k-instruct-q4_0 (automatically downloaded)
Embedding Model: BAAI/bge-large-en-v1.5 (pre-downloaded during build)
Vector Database: Qdrant with persistent storage
Chunk Settings: Optimized for document processing

Architecture Overview

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   RAG App       │    │     Ollama      │    │     Qdrant      │
│ FastAPI+Streamlit│◄──►│   (Phi-3 Model) │    │ (Vector Store)  │
│  Port: 8000/8501│    │  Port: 11434    │    │  Port: 6333     │
└─────────────────┘    └─────────────────┘    └─────────────────┘

All services are automatically configured and connected via Docker Compose.

🐛 Troubleshooting

Common Issues

First-time build taking long:

This is normal! The build downloads ~2-3GB of models
Check progress: docker-compose logs -f
Models are cached for subsequent runs

Docker build fails:

# Clean Docker cache and rebuild
docker system prune -a
docker-compose up --build

Ollama model loading issues:

# Check Ollama container logs
docker-compose logs ollama
# The entrypoint.sh automatically handles model downloading

Memory issues:

Ensure Docker has at least 8GB RAM allocated
The Phi-3 model is optimized (3.8B parameters, quantized)
Monitor usage: docker stats

Port conflicts:

Ports used: 8000 (FastAPI), 8501 (Streamlit), 6333 (Qdrant), 11434 (Ollama)
Modify ports in docker-compose.yml if needed

Application not responding:

# Check all services are running
docker-compose ps
# Restart if needed
docker-compose restart

🛡️ 100% Local & Secure

✅ No OpenAI API key required
✅ No remote API calls
✅ Your files stay on your machine
✅ Ideal for privacy-conscious use cases
✅ Perfect for secure environments

📊 What Happens During First Build

Python Dependencies: Downloads from requirements.txt (~500MB)
Embedding Model: Pre-downloads BAAI/bge-large-en-v1.5 (~1.2GB)
LLM Model: Pulls phi3:3.8b-mini-128k-instruct-q4_0 (~2.2GB)
Model Warm-up: Loads model into memory for faster responses

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Thanks to the amazing open-source projects that make this possible:

Qdrant - Vector database
Ollama - Local LLM runtime
Sentence Transformers - Embedding models
LangChain - LLM application framework
FastAPI - Modern web framework
Streamlit - App framework

📞 Support

💡 Feature Requests: Start a discussion
📧 Contact: [[email protected]]

🧠 Built with ❤️ by Noorjot Kaur

If this project helped you, please consider giving it a ⭐!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
ollama		ollama
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
ui.py		ui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Local RAG App with Qdrant + Ollama

🚀 Features

🔥 What Makes This Different

📋 Prerequisites

🛠️ Installation & Setup

1. Clone the Repository

2. Start the Application (First Time)

3. Monitor Startup Progress

4. Subsequent Runs

🌐 Access URLs

📖 Usage Guide

Basic Workflow

📦 Tech Stack

⚙️ Configuration

Pre-configured Models

Architecture Overview

🐛 Troubleshooting

Common Issues

🛡️ 100% Local & Secure

📊 What Happens During First Build

📄 License

🙏 Acknowledgments

📞 Support

About

Uh oh!

Releases

Packages

Languages

License

noorjotk/local-rag-engine

Folders and files

Latest commit

History

Repository files navigation

🧠 Local RAG App with Qdrant + Ollama

🚀 Features

🔥 What Makes This Different

📋 Prerequisites

🛠️ Installation & Setup

1. Clone the Repository

2. Start the Application (First Time)

3. Monitor Startup Progress

4. Subsequent Runs

🌐 Access URLs

📖 Usage Guide

Basic Workflow

📦 Tech Stack

⚙️ Configuration

Pre-configured Models

Architecture Overview

🐛 Troubleshooting

Common Issues

🛡️ 100% Local & Secure

📊 What Happens During First Build

📄 License

🙏 Acknowledgments

📞 Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages