
π Zero-Config Setup: Just run
docker-compose up --build
- No Python, No Ollama installation, No model downloads, No configuration files needed!
A fully local, end-to-end Retrieval-Augmented Generation (RAG) system built with:
- π§ FastAPI β backend API server
- π¨ Streamlit β simple and interactive frontend UI
- π Qdrant β high-performance vector database
- π€ Ollama β run LLMs like Phi3/Mistral etc locally
- π PDF Support β upload documents and ask questions
- π Upload PDFs and extract content
- π§ Embed content using
sentence-transformers
- π Store and search with Qdrant vector database
- π€ Query documents using local LLMs via Ollama
- π§© Smart chunking and clustering support via
LangChain
&scikit-learn
- π‘οΈ 100% offline β your data never leaves your machine
- β‘ Fast retrieval and response generation
- π― Semantic search with relevance scoring
β Traditional RAG Setup | β This Project |
---|---|
Install Python manually | β Containerized |
Download Ollama separately | β Auto-installed |
Pull models manually | β Auto-downloaded |
Configure embeddings | β Pre-configured |
Set up vector database | β Ready to use |
Result: Hours of setup | Result: One command |
Just run: docker-compose up --build
and you're done! π
Only need these 2 things:
- Docker (version 20.10+)
- Docker Compose (version 2.0+)
That's it! No Python, no Ollama, no manual model downloads required.
Recommended system specs:
- At least 8GB RAM (for optimal LLM performance)
- 10GB+ free disk space (for models and containers)
git clone https://github.com/noorjotk/local-rag-engine.git
cd local-rag-engine
docker-compose up --build
β±οΈ Note: First build will automatically:
- Download and install all Python dependencies
- Pre-download the embedding model (
BAAI/bge-large-en-v1.5
) - Pull and load the LLM model (
phi3:3.8b-mini-128k-instruct-q4_0
) - Set up Qdrant vector database
- This may take 10-15 minutes(or less/more) depending on your internet connection
You'll see logs indicating the startup process:
rag-app | INFO: Started server process [7]
rag-app | INFO: Waiting for application startup.
rag-app | INFO:app:Starting up application...
rag-app | INFO:app:Loading embedding model...
rag-app | INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: BAAI/bge-large-en-v1.5
ollama | time=2025-07-26T09:11:13.894Z level=INFO source=server.go:637 msg="llama runner started in 4.28 seconds"
ollama | [GIN] 2025/07/26 - 09:11:14 | 200 | 5.105294711s | ::1 | POST "/api/chat"
ollama | β
Model loaded into memory.
rag-app | INFO:app:Embedding model warmed up successfully
qdrant | 2025-07-26T09:13:05.887308Z INFO actix_web::middleware::logger: 172.18.0.4 "GET /collections HTTP/1.1" 200 111
rag-app | INFO:httpx:HTTP Request: GET http://qdrant:6333/collections "HTTP/1.1 200 OK"
rag-app | INFO:app:Qdrant connected, found 2 collections
rag-app | INFO: Application startup complete.
rag-app | INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
β
Ready to use! Once you see INFO: Uvicorn running on http://0.0.0.0:8000
, the application is fully started and accessible at http://localhost:8501
docker-compose up
π That's it! No configuration files needed - everything is automated!
Once running, access the application through:
Service | URL | Description |
---|---|---|
π Streamlit App | http://localhost:8501 | Main user interface |
π FastAPI Docs | http://localhost:8000/docs | API documentation |
π§ Qdrant UI | http://localhost:6333/dashboard | Vector database dashboard |
π€ Ollama API | http://localhost:11434 | LLM service endpoint |
- Start the application using Docker Compose
- Open Streamlit UI at http://localhost:8501
- Upload PDF documents through the file uploader
- Wait for processing
- Ask questions about your documents
- Get AI-powered answers
Component | Technology | Purpose |
---|---|---|
Backend | FastAPI | RESTful API server |
Frontend | Streamlit | Interactive web interface |
Vector DB | Qdrant | Similarity search & storage |
LLM Runtime | Ollama | Local language model inference |
Embeddings | sentence-transformers | Text vectorization |
Document Processing | LangChain | Text chunking & QA chains |
Clustering | scikit-learn | Document similarity grouping |
Containerization | Docker + Compose | Deployment & orchestration |
The application comes pre-configured with optimized models:
- LLM Model:
phi3:3.8b-mini-128k-instruct-q4_0
(automatically downloaded) - Embedding Model:
BAAI/bge-large-en-v1.5
(pre-downloaded during build) - Vector Database: Qdrant with persistent storage
- Chunk Settings: Optimized for document processing
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β RAG App β β Ollama β β Qdrant β
β FastAPI+StreamlitβββββΊβ (Phi-3 Model) β β (Vector Store) β
β Port: 8000/8501β β Port: 11434 β β Port: 6333 β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
All services are automatically configured and connected via Docker Compose.
First-time build taking long:
- This is normal! The build downloads ~2-3GB of models
- Check progress:
docker-compose logs -f
- Models are cached for subsequent runs
Docker build fails:
# Clean Docker cache and rebuild
docker system prune -a
docker-compose up --build
Ollama model loading issues:
# Check Ollama container logs
docker-compose logs ollama
# The entrypoint.sh automatically handles model downloading
Memory issues:
- Ensure Docker has at least 8GB RAM allocated
- The Phi-3 model is optimized (3.8B parameters, quantized)
- Monitor usage:
docker stats
Port conflicts:
- Ports used: 8000 (FastAPI), 8501 (Streamlit), 6333 (Qdrant), 11434 (Ollama)
- Modify ports in
docker-compose.yml
if needed
Application not responding:
# Check all services are running
docker-compose ps
# Restart if needed
docker-compose restart
β
No OpenAI API key required
β
No remote API calls
β
Your files stay on your machine
β
Ideal for privacy-conscious use cases
β
Perfect for secure environments
- Python Dependencies: Downloads from requirements.txt (~500MB)
- Embedding Model: Pre-downloads
BAAI/bge-large-en-v1.5
(~1.2GB) - LLM Model: Pulls
phi3:3.8b-mini-128k-instruct-q4_0
(~2.2GB) - Model Warm-up: Loads model into memory for faster responses
This project is licensed under the MIT License - see the LICENSE file for details.
Thanks to the amazing open-source projects that make this possible:
- Qdrant - Vector database
- Ollama - Local LLM runtime
- Sentence Transformers - Embedding models
- LangChain - LLM application framework
- FastAPI - Modern web framework
- Streamlit - App framework
- π‘ Feature Requests: Start a discussion
- π§ Contact: [[email protected]]
π§ Built with β€οΈ by Noorjot Kaur
If this project helped you, please consider giving it a β!